#421 closed defect (fixed)
403 forbidden error if bitten-slave IP address changes mid-build
Reported by: | cto@… | Owned by: | osimons |
---|---|---|---|
Priority: | blocker | Milestone: | 0.6 |
Component: | Build master | Version: | dev |
Keywords: | Cc: | ||
Operating System: | Linux |
Description (last modified by dfraser)
Currently bitten-slave receives a 403 forbidden error most of the time. Originally 403 forbidden error was occasional and seemingly random, but due to retries i was able to ignore it as builds would go through eventually. Recently however as my trac page has gotten more popular the error is so common no builds go all the way through (at most they are successful at one or two stages before the error occurs).
I have tried various apache configurations to allow for authentication and most recently gave anonymous BUILD_EXEC permission and doing it without any authentication yet 403 error continues.
The following is the error reported by the server:
Trac[main] WARNING: [HTTPForbidden] 403 Forbidden (Build 1023 has been invalidated for host 66.166.153.194.)
The following is the error output from bitten-slave right after svn checkout of the source (the first step for the bitten-slave):
[INFO ] Checked out revision 321. [INFO ] Build step Svn Checkout completed successfully [WARNING ] Server returned error 403: Forbidden [ERROR ] HTTP Error 403: Forbidden
Attachments (4)
Change History (22)
comment:1 Changed 15 years ago by dfraser
- Description modified (diff)
comment:2 follow-up: ↓ 3 Changed 15 years ago by dfraser
For clarity:
- Did you restart your Apache server after adjusting the permissions?
- If you go to http://yourservername/tracpath/builds (i.e. the same as the Builds status page but with an extra s, from a clean browser instance that has never logged on to trac, what happens? (Do you get an HTTP authentication dialog, etc)
comment:3 in reply to: ↑ 2 ; follow-up: ↓ 4 Changed 15 years ago by anonymous
Replying to dfraser:
For clarity:
- Did you restart your Apache server after adjusting the permissions?
- If you go to http://yourservername/tracpath/builds (i.e. the same as the Builds status page but with an extra s, from a clean browser instance that has never logged on to trac, what happens? (Do you get an HTTP authentication dialog, etc)
While i did not specifically restart the apache server i did just now with no change (however im not sure why thats important because the permissions are set in trac not apache, apache just has the usernames and passwords, not the permissions).
When i go to http://bugs.syncleus.com/trac.cgi/dANN/builds (feel free to try it yourself) on a fresh build i get a "Error: Method Not Allowed". Hope that helps.
comment:4 in reply to: ↑ 3 ; follow-up: ↓ 5 Changed 15 years ago by dfraser
Replying to anonymous:
While i did not specifically restart the apache server i did just now with no change (however im not sure why thats important because the permissions are set in trac not apache, apache just has the usernames and passwords, not the permissions).
It's important because it could be that the separate Apache processes (it runs in multiple processes usually) each are running the Trac code independently. It could be that trac was caching the permissions incorrectly, and this would explain your inconsistent permissions you're seeing (some processes had read the new permissions, others hadn't, and the greater the load on your server the more likely the cache is stale)
When i go to http://bugs.syncleus.com/trac.cgi/dANN/builds (feel free to try it yourself) on a fresh build i get a "Error: Method Not Allowed". Hope that helps.
That's actually a good sign. This means that you are allowed to access that page (but it expects the slave, not a web browser doing a GET).
When I run: bitten-slave -d /tmp/bitten-test/ --no-loop -v http://bugs.syncleus.com/trac.cgi/dANN/builds I get the following output:
[DEBUG ] Configured packages: {} [DEBUG ] Sending slave configuration: <slave name="megye"><platform processor="x86_64">x86_64</platform><os version="2.6.27.9-159.fc10.x86_64" family="posix">Linux</os></slave> [DEBUG ] Sending POST request to 'http://bugs.syncleus.com/trac.cgi/dANN/builds' [INFO ] No pending builds
This is with an old bitten slave (r634)... can you try exactly the same command and post your output?
comment:5 in reply to: ↑ 4 Changed 15 years ago by anonymous
Replying to dfraser:
This is with an old bitten slave (r634)... can you try exactly the same command and post your output?
That command is fairly similar to how i run it, but i tried it that exact same way and got 403 forbidden again (ill admit it got a few more steps before failing this time, but that seems to happen from time to time randomly). Anyway here is the output:
[INFO ] BUILD SUCCESSFUL [INFO ] Total time: 1 second [DEBUG ] ant exited with code 0 [INFO ] Build step Prepare Core completed successfully [DEBUG ] Sending POST request to 'http://bugs.syncleus.com/trac.cgi/dANN/builds/1011/steps/' [WARNING ] Server returned error 403: Forbidden [DEBUG ] Removing build directory /tmp/bitten-test/build_Java_Examples_Trunk_1011 [ERROR ] HTTP Error 403: Forbidden
Also for the record im using the following version of the bitten slave (although ive also tried witht he latest build just yesterday without any different results): bitten-slave 0.6dev-r638
comment:6 Changed 15 years ago by dfraser
- Owner set to dfraser
- Status changed from new to assigned
OK, I have run on r638 multiple times with no 403 (but again not actually running builds, just checking for them). Going to test with a build next.
comment:7 Changed 15 years ago by dfraser
Running like that I have received an error as you did, on the POST to builds/$NNNN/steps/. Is the error consistently on this URL (rather than the main builds or builds/$NNNN URL?
comment:8 Changed 15 years ago by anonymous
I figured this out...
Basically i was behind a load balancer that would cause my clients ip address to randomly change every time an outgoing connection was made. In addition this might effect servers who's dns name resolve to more than one IP.
The solution was to change the following line http://bitten.edgewall.org/browser/trunk/bitten/master.py#L216
It was:
if build.status != Build.IN_PROGRESS or \ build.slave_info.get(Build.IP_ADDRESS) != req.remote_addr:
I changed it to:
if build.status != Build.IN_PROGRESS:
It now works fine.
comment:9 Changed 15 years ago by anonymous
- Component changed from Build slave to Build master
The solution/problem was in the master.
comment:10 follow-up: ↓ 12 Changed 15 years ago by dfraser
- Summary changed from 403 forbidden error indicated by bitten-slave (build invalidated) to 403 forbidden error if bitten-slave IP address changes mid-build
(01:02:41 PM) cmlenz: davidfraser: the proper fix for this IMHO would be to remove the IP check, and instead add a "token" column to the builds table, which would be set to a random hexdigest() thing when a build is started, would get communicated back to the slave as a cookie, and would be checked on every subsequent request
comment:11 Changed 15 years ago by osimons
This ungraceful stop of the client also happens just based on the build.status != Build.IN_PROGRESS if you do 'Invalidate build' during build run (in progress).
This really shouldn't raise HTTPForbidden, but instead some other return code that allows the slave to gracefully halt the build and move on to poll for next job.
comment:12 in reply to: ↑ 10 Changed 15 years ago by osimons
Replying to dfraser:
(01:02:41 PM) cmlenz: davidfraser: the proper fix for this IMHO would be to remove the IP check, and instead add a "token" column to the builds table, which would be set to a random hexdigest() thing when a build is started, would get communicated back to the slave as a cookie, and would be checked on every subsequent request
Wouldn't a simpler solution be to have each slave create a 'token' on startup, and communicate that token in its requests - that way we can store the slave token just like any other slave property associated with the build?
comment:13 Changed 15 years ago by osimons
See #266. If we move from IP => Tokens, we should also review the actual need for storing/displaying IP address which in most cases is not useful.
Changed 15 years ago by osimons
Using tokens to coordinate build ownership instead of IP address. Lots of tests not yet updated.
comment:14 follow-up: ↓ 15 Changed 15 years ago by osimons
- Owner changed from dfraser to osimons
- Status changed from assigned to new
In attachment:t421-tokens-r711.diff I've implemented a fix using tokens. Each slave makes a simple token on startup, and then passes that token as a request header on each request. The token gets stored on the build, and any build-related access with token mismatch will raise a 409 Conflict error instead of forbidden.
I'll like some input on this strategy before fixing the 16 tests that now fail as the request mock object don't know about headers...
comment:15 in reply to: ↑ 14 ; follow-up: ↓ 16 Changed 15 years ago by dfraser
Replying to osimons:
In attachment:t421-tokens-r711.diff I've implemented a fix using tokens. Each slave makes a simple token on startup, and then passes that token as a request header on each request. The token gets stored on the build, and any build-related access with token mismatch will raise a 409 Conflict error instead of forbidden.
That looks good, and I prefer this idea to the idea of a cookie. Do we still need the build.status != Build.IN_PROGRESS check though?
I'll like some input on this strategy before fixing the 16 tests that now fail as the request mock object don't know about headers...
I think you should go ahead with this - it's basically what we proposed above except using a header instead of a cookie, which I think is clearly better...
Changed 15 years ago by osimons
Updated patch. Adding back IN_PROGRESS check + fixed/extended tests.
comment:16 in reply to: ↑ 15 Changed 15 years ago by osimons
Replying to dfraser:
Do we still need the build.status != Build.IN_PROGRESS check though?
Yeah, I suppose we do. I've added it back in latest patch. Other than that, latest patch is just fixing and extending tests. Patch should be complete as far as I can see.
Changed 15 years ago by osimons
Updated patch that is reworked to use trac_auth/session as token. Depends on cookie-support for slave, see #208.
comment:17 Changed 15 years ago by osimons
Updated patch. Essentially identical, only this time it existing cookie or session instead of header. This patch works in conjunction with the patch on #208 that introduces cookie-support for the slave as part of authentication.
If slave doesn't support cookies (ie. older slaves), all build-related requests will essentially raise a HTTPConflict with error logged.
Apply t208_osimons_auth_and_cookies-r716.diff and then the updated patch for this ticket (t421-session_token-r716.diff).
Changed 15 years ago by osimons
Updated patch that applies to latest trunk, and that adds some tests and incorporates #223 protocol version testing.
comment:18 Changed 15 years ago by osimons
- Resolution set to fixed
- Status changed from new to closed
Last patch committed in [723].
Adjusted formatting for clarity