#116 closed defect (wontfix)
on windows (cygwin), bitten-slave sometimes gets into tight loop
Reported by: | joel@… | Owned by: | cmlenz |
---|---|---|---|
Priority: | critical | Milestone: | 0.6 |
Component: | Build slave | Version: | 0.5.3 |
Keywords: | Cc: | ||
Operating System: | Windows |
Description
I do not yet know what causes this, but sometimes bittem-slave wedges itself on windows, running in a tight loop, printing this output over-and-over again:
[ERROR ] (113, 'Software caused connection abort') Traceback (most recent call last): File "/tmp/python.572/usr/lib/python2.4/asynchat.py", line 89, in handle_read data = self.recv (self.ac_in_buffer_size) File "/tmp/python.572/usr/lib/python2.4/asyncore.py", line 343, in recv data = self.socket.recv(buffer_size) error: (113, 'Software caused connection abort') [ERROR ] (113, 'Software caused connection abort') Traceback (most recent call last): File "/tmp/python.572/usr/lib/python2.4/asynchat.py", line 219, in initiate_send num_sent = self.send (self.ac_out_buffer[:obs]) File "/tmp/python.572/usr/lib/python2.4/asyncore.py", line 332, in send result = self.socket.send(data) error: (113, 'Software caused connection abort')
The fact that this exception occurs is not necessarily a problem, but bitten-slave getting itself into a tight loop definitely is. Maybe it should just exit when this kind of error happens.
Attachments (2)
Change History (7)
comment:1 Changed 19 years ago by joel@…
comment:2 Changed 19 years ago by cmlenz
- Milestone set to 0.6
- Status changed from new to assigned
Thanks for the patch, much appreciated!
I think this is the same issue reported in #74, so I'll close that one as duplicate (since this one has the patch ;-) ).
comment:3 Changed 18 years ago by jabs@…
I found the same problem, and investigated a bit, since the path does not fix it for me. the difference between #116 and #74 is the message displayed, which stems from different execution paths. The problem seems to be in asyncore and python 2.4
asyncore in 2.4 uses nonblocking sockets, which on windows (no cygwin here!) results in the following situation: connect to nonexisting server returns immediately, and a subsequent select ( as done by poll ) will also return immediately, returning our socket in the except_fd list. this will try to call handle_expt on our object. since Beep Session does not implement that, a warning is written. "warning: unhandled exception" if handle_expt does exist and just raises again, handle_error is called, which in its defautl impl(by asyncore) will print the stack trace above. After that, poll is immediately called again, which leads to the tight loop.
The patch above amends handle_error to raise, which would be ok, but does not get called at all on my tests. In fact, if i try to override handle_expt in Beep.Session, that method does not get called either. (This may be due to the fact that i know little python and messed up somewhere ;-)
btw, python2.3 seems to work ok, it seems to use blocking sockets in asyncore (correct me if wrong) and silently ignores the except_fd set after select (it still fills it before calling???). Most important, no loop is entered, the program just seems to block (on connect ???)
Summary: we should somehow handle_expt(self) in Eventloop (raise/exit) Someone should kick asyncore devs for assuming nonblocking sockets are good for everyone. (I wonder why one can give a timeout param, if its overridden/unused by nonblocking sockets, at least that's the way bsd select works, iirc)
comment:4 Changed 18 years ago by jabs@…
above patch works for me. i have specifically tested for win32, to protect other uses of handle_expt. I did not query exception type, since it is not set to any meaningful value.
comment:5 Changed 17 years ago by cmlenz
- Resolution set to wontfix
- Status changed from assigned to closed
BEEP is going away in the next release, being replaced by Master Slave Protocol Http.
Update-- I found a way to easily reproduce this problem. Simply running bitten-slave when the master is not running will cause it.
Comparing this behavior to bitten-slave on Unix systems, it seems that Windows gives a different socket error than Unix does when the connection is broken. On Unix, when I run bitten-slave without the master, I get "connection refused" in the recv function and "broken pipe" in the send function, and the slave properly aborts.