Context Navigation

← Previous Ticket
Next Ticket →

Modify ↓

#403 new enhancement

[PATCH] Add a network retry count for unreliable networks

Reported by:	mike@…	Owned by:	osimons
Priority:	minor	Milestone:	0.6.1
Component:	Build slave	Version:	dev
Keywords:		Cc:
Operating System:	Linux

Description

I'm doing long builds on slaves with occasional network interruptions. This adds an exponential die-off retry count so the build doesn't abort due to a transient interruption.

Attachments (2)

0002-Add-a-network-retry-count-for-unreliable-networks.patch (4.4 KB) - added by mike@… 16 years ago.: patch
0002-Add-a-network-retry-count-for-unreliable-networks.2.patch (4.3 KB) - added by mike@… 16 years ago.: patch (rebased against trunk this time) (this supersedes the previous patch, but I can't replace it)

Download all attachments as: .zip

Change History (8)

Changed 16 years ago by mike@…

Attachment 0002-Add-a-network-retry-count-for-unreliable-networks.patch added

patch

Changed 16 years ago by mike@…

Attachment 0002-Add-a-network-retry-count-for-unreliable-networks.2.patch added

patch (rebased against trunk this time) (this supersedes the previous patch, but I can't replace it)

comment:1 Changed 16 years ago by osimons

#395 closed as duplicate.

comment:2 Changed 16 years ago by osimons

Owner changed from cmlenz to osimons
Summary changed from Add a network retry count for unreliable networks to [PATCH] Add a network retry count for unreliable networks

Patch looks good and useful. I'll put it on my todo list.

comment:3 Changed 16 years ago by osimons

I'm not so sure this is the right location to patch. If the server has received the request and returned a response with a status code in the error-range, can't we presume that it is an actual error? The only error codes I can imagine are valid for this use is something like "503 Service Unavailable" and "502 Bad Gateway", and if so we should check specifically for such temporary states from the server/proxy side. What is the error codes you see?

However, if I take down my webserver to simulate typical connection errors I get <urlopen error (61, 'Connection refused')> in the logs and the slave keeps retrying. If I break the connection in the middle of a build, the slave just loops and restarts the build when server is available again. Wouldn't the proper thing be to instead loop the step-posting attempt until network is available again so that the slave in reality continues what it is doing? In that case it really should just loop by default forever until halted, and not make this a separate setting.

Could you please elaborate on the actual messages and status codes you see in your slave logs when the network is unreliable?

comment:4 Changed 16 years ago by anonymous

There are a variety of messages/codes that can be generated here. My use case: I'm in a coffee shop, using my laptop as a build machine. A 45-minute build/test cycle completes, and the coffee shop's wireless flakes out, as coffee shop wireless is wont to do. Without this patch, I have to start the 45-minute cycle completely over. With this patch, I just add an exponential die-off retry count, and the build status will make it to the master.

comment:5 Changed 16 years ago by mike@…

Sorry for the dual comment, I submitted early, and I forgot to put in my name.

I'm loath to add any sort of mandatory infinite retry, because it's possible that the network error isn't transient. In this case, I just happen to know more about my network situation than the HTTP spec does. Ideally, on a network error the slave would just do the next build step, and buffer the build status until the network comes back up again (or give up and forget about it eventually if it doesn't come back) but that would have required more code, and this does exactly what I needed it to :)

comment:6 Changed 16 years ago by osimons

Milestone changed from 0.6 to 0.6.1

I can see why it works, of course - providing all is correct with the request. However, if someone has invalidated your build at some point during those 45 minutes, your slave will try making the invalid post over and over. Or authentication fails. Or problems authenticating. Or problems with the included XML, or really anything out of the ordinary.

I like the retry idea, but it needs to be tuned for the class of errors it is intended to catch. The problem is not critical for a 0.6 release, so I'm rescheduling it and will look at it again in not too long.

Add Comment

You may use WikiFormatting here.

Modify Ticket

Change Properties

Summary:
Type:		Priority:
Milestone:		Component:
Version:		Keywords:
Cc:	Set your email in Preferences	Operating System:

Action

leave as new The owner will remain osimons.

Author

Your email or username:

E-mail address and user name can be saved in the Preferences.

Attachments ↑ Description ↑

Note: See TracTickets for help on using tickets.

Download in other formats:

Context Navigation

#403 new enhancement

[PATCH] Add a network retry count for unreliable networks

Description

Attachments (2)

Change History (8)

Changed 16 years ago by mike@…

Changed 16 years ago by mike@…

comment:1 Changed 16 years ago by osimons

comment:2 Changed 16 years ago by osimons

comment:3 Changed 16 years ago by osimons

comment:4 Changed 16 years ago by anonymous

comment:5 Changed 16 years ago by mike@…

comment:6 Changed 16 years ago by osimons

Add Comment

Modify Ticket

Changed by anonymous

Download in other formats: