Ticket #380 (closed defect: fixed)
When the build for *one* revision hangs, the slave hangs forever, and builds aren't triggered anymore
| Reported by: | edgewall.org@… | Owned by: | cmlenz |
|---|---|---|---|
| Priority: | critical | Milestone: | 0.6 |
| Component: | General | Version: | dev |
| Keywords: | Cc: | felix.schwarz@… | |
| Operating System: | BSD |
Description (last modified by osimons) (diff)
Someone in our teams scr*wed up and committed some tests that are hanging the build. We corrected after a few commits, but now, when the slaves do the build, they hang when they get to build the revisions that hang, and we have to restart them by hand when there's a new revision to build...
Unchecking "Build all revisions" doesn't seem to change any of this behavior. Actually, This option isn't really doing what I would expect: it should be called "Trigger a build for every commit, even if it is not on the path for the configuration" or something like that. It would be good that there's a "Only build latest revision" option.
The problem is worse for me, as I build two different configurations: and Bitten tries to build *all the revisions* for one of the configurations, before building the other. Since the slaves hang at some of the revisions for the first configurations, the second configuration is never built - for any revision. I cannot get this configuration to be built, at all.
I'm currently trying to patch 'queue.py', to simply skip the builds that are causing trouble:
-
queue.
diff -u queue.py queue.py-original
old new 221 221 platforms = [] 222 222 for platform, rev, build in collect_changes(repos, config, db): 223 223 224 if rev > 1710 and rev < 1726:225 continue226 227 224 if not self.build_all and platform.id in platforms: 228 225 # We've seen this platform already, so these are older 229 226 # builds that should only be built if built_all=True
I think it will work, but there may be a cleaner way of doing it ?
The *slave* code should have a way to stop a build if it doesn't finish before the timeout defined in the admin (currently this timeout is only used on the master). I've looked at the code in the slave, it doesn't seem too difficult to implement a control thread that would stop the build. However, i'm not familiar enough with threading in Python to do it...
