Edgewall Software
Modify

Opened 15 years ago

Closed 14 years ago

Last modified 12 years ago

#380 closed defect (fixed)

When the build for *one* revision hangs, the slave hangs forever, and builds aren't triggered anymore

Reported by: edgewall.org@… Owned by: cmlenz
Priority: critical Milestone: 0.6
Component: General Version: dev
Keywords: Cc: felix.schwarz@…
Operating System: BSD

Description (last modified by osimons)

Someone in our teams scr*wed up and committed some tests that are hanging the build. We corrected after a few commits, but now, when the slaves do the build, they hang when they get to build the revisions that hang, and we have to restart them by hand when there's a new revision to build...

Unchecking "Build all revisions" doesn't seem to change any of this behavior. Actually, This option isn't really doing what I would expect: it should be called "Trigger a build for every commit, even if it is not on the path for the configuration" or something like that. It would be good that there's a "Only build latest revision" option.

The problem is worse for me, as I build two different configurations: and Bitten tries to build *all the revisions* for one of the configurations, before building the other. Since the slaves hang at some of the revisions for the first configurations, the second configuration is never built - for any revision. I cannot get this configuration to be built, at all.

I'm currently trying to patch 'queue.py', to simply skip the builds that are causing trouble:

  • queue.

    diff -u queue.py queue.py-original 
    old new  
    221221            platforms = []
    222222            for platform, rev, build in collect_changes(repos, config, db):
    223223
    224                 if rev > 1710 and rev < 1726:
    225                     continue
    226 
    227224                if not self.build_all and platform.id in platforms:
    228225                    # We've seen this platform already, so these are older
    229226                    # builds that should only be built if built_all=True

I think it will work, but there may be a cleaner way of doing it ?

The *slave* code should have a way to stop a build if it doesn't finish before the timeout defined in the admin (currently this timeout is only used on the master). I've looked at the code in the slave, it doesn't seem too difficult to implement a control thread that would stop the build. However, i'm not familiar enough with threading in Python to do it...

Attachments (0)

Change History (7)

comment:1 Changed 15 years ago by dfraser

  • Description modified (diff)

Agreed, I've had lots of trouble with this before. Useful things would be:

  • Timeout limit for the slave Even nicer, remember the previous execution times and adjust the timeout limit to be N*previous-longest-successful-execution
  • The ability to mark certain revisions as not-for-testing (perhaps only on certain platforms)

comment:2 Changed 15 years ago by wbell

I don't like the idea of doing a heuristic of how long a build should take-- we have many build slaves of differing speeds, and some slaves take 12 hours for builds that others only take 9.

Anytime a slave stops running a build (as far as the master is concerned), it should make an effort to stop building it, so as not to stay stuck orphaned. One consequence of the current timeout behavior is that if your build does hang and exceeds the timeout, the master happily invalidates it, and assigns it to another slave. The slave processing it continues to run the build (hanging), and a new slave starts, and eventually will hang to repeat the process until all slaves are running the same build, but none are shown as running it as far as the master is concerned.

comment:3 Changed 15 years ago by osimons

  • Milestone changed from 0.6 to 0.6.1

comment:4 Changed 15 years ago by osimons

  • Description modified (diff)

(Fixed diff formatting in description)

comment:5 Changed 14 years ago by Felix Schwarz <felix.schwarz@…>

  • Cc felix.schwarz@… added

comment:6 Changed 14 years ago by wbell

  • Resolution set to fixed
  • Status changed from new to closed

Closed with [830]

comment:7 Changed 14 years ago by osimons

  • Milestone changed from 0.6.1 to 0.6

Add Comment

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain cmlenz.
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.