#214 closed defect (fixed)
duplicate entries in bitten_build table
Reported by: | joel@… | Owned by: | osimons |
---|---|---|---|
Priority: | major | Milestone: | 0.6 |
Component: | General | Version: | dev |
Keywords: | Cc: | johannes.spohr@… | |
Operating System: | Linux |
Description
I don't know yet what causes this, but I have noticed my bitten_build table becoming corrupt with duplicated entries (same config, rev, and platform, but different ids.) When a slave starts building one, its name will be assigned to one of them, but the duplicates remain in 'P' state. The build status overview page then incorrectly displays no builds in progress.
Running bitten trunk (r497) on the master, and multiple slaves (about a dozen of them, with varying versions of bitten trunk, r497 or newer.)
The multiple build entries show up after committing a new revision in SVN.
I have added a unique key to the table in an attempt to eliminate these duplicates:
CREATE UNIQUE INDEX bitten_build_rev_platform ON bitten_build USING btree (rev, platform)
However, I don't know yet if this will cause a problem with Bitten.
Attachments (7)
Change History (29)
comment:1 Changed 17 years ago by anonymous
comment:2 Changed 15 years ago by osimons
- Milestone changed from 0.6 to 0.6.1
Does this still happen with recent trunk (0.6dev)? What database backend do you use?
comment:3 follow-up: ↓ 9 Changed 15 years ago by hodgestar+bitten@…
I can confirm that this is still happening on trunk.
The underlying issue is that there is a race condition in the Build Queue?.populate() method of build.py. If a second slave checks for builds while a first is still calculating which new builds to add, both slaves can end up adding the same set of builds to the database.
There probably isn't a simple locking solution that can prevent this when running multiple Trac processes (e.g. under Apache) so I think the best option is to hand the responsibility for checking for duplicate builds off to the database (by adding a unique index to the model as suggested by Joel). After doing this, Bitten will need to catch the resulting error from db.commit() at the end of Build Queue?.populate(). It may also be necessary to commit each build separately (although this is unlikely to be serious issue since one of the slaves should succeed in adding the new builds and any new builds left out will be added the next time a slave connects).
comment:4 Changed 15 years ago by osimons
- Milestone changed from 0.6.1 to 0.6
Seems this needs fixing before 0.6 release.
Changed 15 years ago by hodgestar+bitten@…
Patch that adds a unique index on (config, platform, rev) to bitten_build table.
comment:5 Changed 15 years ago by hodgestar+bitten@…
I've attached a patch that includes an upgrade script. So far it hasn't broken anything as far as I can see and the upgrade worked for me (tm). I'll report back as soon as I see the race condition triggered in my Trac logs.
comment:6 Changed 15 years ago by osimons
Two things I see before looking into this further:
- We need a test case for this that makes sure the exception gets raised - this should hopefully fail now and not fail after upgrade. Populate a build and then try to create an identical build.
- Seeing all the upgrade script does is add a new index, can't we just create the new index instead of recreating the whole table?
comment:7 follow-up: ↓ 8 Changed 15 years ago by anonymous
Regarding point 2, currently all three of Trac's database backends use the same CREATE INDEX syntax so I guess it's fine.
I was a little concerned that I would be tripped up by some odd SQL syntax on one of the underlying databases so I initially took the approach of only using constructs already employed by other database upgrade functions.
comment:8 in reply to: ↑ 7 Changed 15 years ago by osimons
I see that Trac does it in its upgrade scripts number 4, 16 and 18. I guess it is OK?
Changed 15 years ago by hodgestar+bitten@…
Patch that adds a unique index on (config, platform, rev) to bitten_build table.
Changed 15 years ago by hodgestar+bitten@…
Replaces upgrade script with simple "CREATE UNIQUE INDEX" execution rather than table copying.
comment:9 in reply to: ↑ 3 Changed 15 years ago by jhampton
Replying to hodgestar+bitten@…:
It may also be necessary to commit each build separately (although this is unlikely to be serious issue since one of the slaves should succeed in adding the new builds and any new builds left out will be added the next time a slave connects).
With the patch as it stands now, this is the case. Currently, in populate() we are passing in the database to the build.insert() method. As we create the db in populate(), the conditional commit in build.insert() only fires if we do not pass in the db.
The effect of this, on PostgreSQL at least, is that after one of the inserts fails, the cursor will be left in a state where a db.rollback() is required.
The rollback can be added to the except: in the patch. However, as is, this will rollback the successful commit also.
If we simply remove the db=db from the call to build.insert(), we can safely put the db.rollback() in the except:. The worse case of this would be a bunch of log messages re: Failed build insert.
Changed 15 years ago by anonymous
Now including a test for re-inserting builds and proper db commit and rollback.
comment:10 Changed 15 years ago by hodgestar+bitten@…
Updated patch with test and what I think is now the correct db.commit() and db.rollback() handling.
Thanks jhampton for pointing out the build.insert() behaviour. I opted to continue passing in the db argument and explicitly do the db.commit() after each build.insert(db=db) to make it clear that a "try: (do stuff with db); db.commit(); except: db.rollback()" pattern was being used.
comment:11 Changed 15 years ago by osimons
- Owner changed from cmlenz to osimons
Seeing all the code was there in the test, I just switched from monkey-patching to threading. It fails in same place as the multicall test, but I haven't yet looked at the upgrade code so i can't make it pass just yet. Does it pass after upgrade?
comment:12 Changed 15 years ago by anonymous
The threaded version of the test passes fine for me when running "python setup.py unittest" on trunk. I've also tested the new upgrade script on my sqlite database and it works there.
comment:13 Changed 15 years ago by osimons
Thanks for patch and testing!
I've been staring at the schema again, and searching source to try to make sense of the current index: Index(['config', 'rev', 'slave']). I just don't get it, and don't see what value it actually brings.
Especially as the patch adds a new index which is really much more correct in terms of contraints. Also whenenver slave is queried, platform is also included. The only SELECT occurence of slave=%s" I can find is in Build.select() method as criteria - in the line after platform=%s.
I'm very tempted to drop this index when we are creating the new one - unless someone can help me comprehend its value...
comment:14 Changed 15 years ago by osimons
- Resolution set to fixed
- Status changed from new to closed
Committed in [724] including threaded test + dropping the old index that serves no real purpose anymore.
Thanks!
comment:15 Changed 15 years ago by osimons
- Resolution fixed deleted
- Status changed from closed to reopened
#434 just arrived - it seems our upgrade script does not account for the situation where duplicate builds may already exist.
How do we deal with that? Run some SQL and check and inform user that manual cleaning is needed? Or just delete some of the builds?
I'm not sure yet.
comment:16 Changed 15 years ago by Johannes Spohr <johannes.spohr@…>
- Cc johannes.spohr@… added
comment:17 Changed 15 years ago by osimons
We'll prepare a fix shortly that will print out duplicates and offer a simple way of deleting the unwanted, duplicte builds.
Changed 15 years ago by hodgestar+bitten@…
Patch that checks for bitten_build duplicates before adding the new index.
comment:18 Changed 15 years ago by hodgestar+bitten@…
I've added a patch that prints out the duplicates and doesn't attempt to add the new index if duplicates are found. It still needs to have the suggested one-liner for removing builds tested and it needs to notify Trac that the upgrade failed.
I also uploaded test_upgrade.py that can be used to try out the upgrade function on a dummy in-memory sqlite db.
comment:19 Changed 15 years ago by osimons
Thanks! We need to raise an exception or else the upgrade will be recorded as completed even when nothing is done. Other than that it looks good, and I'll test and integrate it today.
comment:20 Changed 15 years ago by osimons
- Resolution set to fixed
- Status changed from reopened to closed
I've committed the patch.
Johannes S: Can you please verify that this is the information you need and that you are able to resolve and complete the upgrade? Thanks.
comment:21 follow-up: ↓ 22 Changed 15 years ago by Johannes Spohr <johannes.spohr@…>
Yep, I was able to fix it using that info. I wrote a script which deleted the builds, in which I copied the output of the upgrade script and reformatted it as a python tuple by hand. So printing the builds in python syntax would have been easier for me. We have more than 11000 builds and 192 were duplicates, so removing them manually using the given command line was not possible.
Otherwise, thanks for the fix, it works fine now!
comment:22 in reply to: ↑ 21 Changed 15 years ago by osimons
Replying to Johannes Spohr <johannes.spohr@…>:
Yep, I was able to fix it using that info.
Superb! Hopefully that should be the last of the duplicates :-)
For future reference, the commit fixing the upgrade was [730] (forgot in my previous comment).
Now that I've added the unique key constraint to the database, here is a traceback showing the code that attempts to insert a duplicate build entry:
Also note I use the standalone tracd to serve the build slaves.