Edgewall Software
Modify

Opened 19 years ago

Closed 15 years ago

#95 closed defect (fixed)

Multiple slaves claim the same build

Reported by: Walter Bell <wwb2@…> Owned by: osimons
Priority: major Milestone: 0.6
Component: Build master Version: 0.5
Keywords: Cc:
Operating System:

Description

Start up multiple slaves around the same time. If transferring the tarball takes long enough, multiple of them will grab the same build. You'll get lots of errors in the log

2006-01-13 09:34:07,645 [bitten.master] INFO: Slave vs2002-jwdesk01 started build 171 ("Countrywide" as of [3533])
2006-01-13 09:34:13,834 [bitten.beep] ERROR: columns build, name are not unique
Traceback (most recent call last):
  File "d:\Python23\lib\asyncore.py", line 69, in read
    obj.handle_read_event()
  File "d:\Python23\lib\asyncore.py", line 390, in handle_read_event
    self.handle_read()
  File "d:\Python23\lib\asynchat.py", line 136, in handle_read
    self.found_terminator()
  File "build\bdist.win32\egg\bitten\util\beep.py", line 278, in found_terminator
  File "build\bdist.win32\egg\bitten\util\beep.py", line 311, in _handle_frame
  File "build\bdist.win32\egg\bitten\util\beep.py", line 469, in handle_data_frame
  File "build\bdist.win32\egg\bitten\master.py", line 221, in handle_reply
  File "build\bdist.win32\egg\bitten\master.py", line 277, in _build_step_completed
  File "build\bdist.win32\egg\bitten\model.py", line 574, in insert
  File "d:\Python23\lib\site-packages\sqlite\main.py", line 255, in execute
    self.rs = self.con.db.execute(SQL % parms)
IntegrityError: columns build, name are not unique

It's not fatal and works itself out, but it's a waste of resources.

Attachments (2)

master-reserve.patch (6.5 KB) - added by Walter Bell <wwb2@…> 19 years ago.
Simple patch for #95 which introduces a new RESERVED state so that multiple slaves can't claim the same build. Not the cleanest, but it seems to work.
t95-slaves_claim_same_build-r712.diff (1.5 KB) - added by osimons 15 years ago.
Problem of multiple claims to same build in current trunk.

Download all attachments as: .zip

Change History (9)

Changed 19 years ago by Walter Bell <wwb2@…>

Simple patch for #95 which introduces a new RESERVED state so that multiple slaves can't claim the same build. Not the cleanest, but it seems to work.

comment:1 Changed 19 years ago by cmlenz

  • Milestone set to 0.6
  • Status changed from new to assigned

Looks good, thanks for the patch!

comment:2 Changed 17 years ago by cmlenz

Need to port this to the HTTP branch.

comment:3 Changed 15 years ago by wbell

The simplest fix I've found for this is to add a constraint into the database, but it's not ideal. Discard this original patch.

comment:4 Changed 15 years ago by osimons

  • Milestone changed from 0.6 to 0.7

Changed 15 years ago by osimons

Problem of multiple claims to same build in current trunk.

comment:5 Changed 15 years ago by osimons

  • Milestone changed from 0.7 to 0.6
  • Owner changed from cmlenz to osimons
  • Status changed from assigned to new

I think I've found a problem in current trunk related to this. The code that loops the pending builds will break from the loop if it finds a matching build (and build variable will be populated with the correct build). However, if it does not find a matching build at the end of looping, the build variable will still be populated - but now with the last build of the loop. That build will then updated and given to the new slave.

The patch in attachment:t95-slaves_claim_same_build-r712.diff should hopefully fix this. Could anyone review my understanding of this?

comment:6 Changed 15 years ago by osimons

Actually, seeing that build = None when not explicitly found all the changes at the end are noe needed. The new simplified patch:

  • bitten/queue.py

    a b  
    134134        # Iterate through pending builds by descending revision timestamp, to
    135135        # avoid the first configuration/platform getting all the builds
    136136        platforms = [p.id for p in self.match_slave(name, properties)]
    137         build = None
    138137        builds_to_delete = []
     138        build_found = False
    139139        for build in Build.select(self.env, status=Build.PENDING, db=db):
    140140            if self.should_delete_build(build, repos):
    141141                self.log.info('Scheduling build %d for deletion', build.id)
    142142                builds_to_delete.append(build)
    143143            elif build.platform in platforms:
     144                build_found = True
    144145                break
    145         else:
     146        if not build_found:
    146147            self.log.debug('No pending builds.')
    147148            build = None
    148149

comment:7 Changed 15 years ago by osimons

  • Resolution set to fixed
  • Status changed from new to closed

Patch applied in [713]. Note however also #214 that describes a related problem situation - the possible race-condition of two requests populating new builds at the same time for same rev+config+platform combination.

Add Comment

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain osimons.
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.