= Bitten !Master/Slave Protocol, using HTTP = [[PageOutline(2)]] {{{ #!div class=important '''Note:''' This is a proposal, and the final / current implementation may differ. Please refer to [wiki:Documentation/index.html the documentation] for current state of features and commands. }}} This is a proposal for an HTTP-based protocol enabling communication between the build master and various build slaves. The protocol presented here is not final yet. Implementation was done on the [source:sandbox/http@440] branch, and has been merged to trunk as of r438. == Comparison to the Previous BEEP Protocol == The BEEP-based protocol currently used by Bitten is described on MasterSlaveProtocol. The differences can be summarized as follows: * The build master would be a simple HTTP server, implemented as part of the Trac plugin. That means there would no longer be a separate daemon process needed for the master. * The build slaves are simply HTTP clients, probably using [http://bitworking.org/projects/httplib2/ httplib2] and falling back to the httplib or urllib modules in the standard library. * Both SSL and the various authentication methods of HTTP can be used to secure the communication. * Directionality of communication is always from the slave to the master. The master no longer initiates actions on the slave, rather the slave polls the master for pending actions when it is idle. * The build master would no longer be responsible for packaging tarballs and sending them to the slaves; instead, the slaves receive connection details for the repository, and perform a normal checkout. This is a [ticket:79 long standing ticket]. == Build Creation == A new slave connects to the build master and “asks” the master whether there are any pending builds it could perform. The slave does this by `POST`ing its profile to the master, which contains information such as: * the platform/architecture of the slave machine, * the operating system, * the product name and version number of each of the dependencies of the project to build (for example, the C compiler or the Python runtime), and * the name and email address of the maintainer of the machine. {{{ #!xml POST /builds/ HTTP/1.1 Host: example.org Content-Type: application/x-bitten+xml Content-Length: 666 Christopher Lenz <cmlenz@gmx.de> Power Macintosh Darwin }}} If the build master finds any pending builds that can be performed by the target platform matching the slave, it would send back a response similar to the following: {{{ #!xml HTTP/1.1 201 Created Location: http://example.org/builds/trunk/123/ Set-Cookie: slave=lamech; Path=/builds/trunk/123/ }}} The response contains the URL to a build recipe as the value of the `Location` header. At this point, the master has allocated a pending build entity in its database. The progress on this build can be viewed as HTML at the specified URL using any HTTP user agent. The master also sets a cookie on the slave so that it can be identified on subsequent requests. In the example above, the cookie contains only the slave name; we'll probably need to include more information, such as when the build was started. On the other hand, if the master has no work for the slave, it would return a `204 No Content` response: {{{ #!xml HTTP/1.1 204 No Content }}} ''Open issue: we'd need to either repost the slave name/info with every request, or set a cookie that identifies the slave on subsequent requests.'' == Build Initiation == When the slave has received the URL to a build recipe, it can request the build recipe using a simple `GET` request: {{{ #!xml GET /builds/trunk/123/ HTTP/1.1 Host: example.org Cookie: slave=lamech Accept: application/x-bitten+xml }}} If the master still has that build in pending state in the database, it will respond with the recipe: {{{ #!xml HTTP/1.1 200 OK Content-Type: application/x-bitten+xml Content-Length: 666 }}} The first element would pretty much always be a “checkout” step that retrieves the source from the version control repository. == Build Status Reporting == As soon as the slave has received the recipe, it should perform the checkout and execute the steps outlined in the build. After every completed step, the slave should make a `PUT` request to the `steps` member of build collection: {{{ #!xml POST /builds/trunk/123/steps/ HTTP/1.1 Host: example.org Cookie: slave=lamech Content-Type: application/x-bitten+xml Content-Length: 666 ... }}} The `started` attribute specifies the date and time at which processing of this step was started. The `duration` attribute contains the number of seconds that it took to complete the step (this may include fractions). The `` element may contain one or more of the following child elements: * `` elements indicate errors in the execution of the step, * `` elements contain the build log output, and * `` elements contain generated [wiki:DataStorage report data]. The build is assumed to be complete after the master has received a request for every step in the recipe. The server responds with a `201 Created` response. == Uploading of Build Artifacts == If the recipe contains an `` element at the end (after all `` elements), the slave is expected to perform file uploads of any of the files specified. This is done using `PUT` requests the the `files` member of the build collection: {{{ #!xml POST /builds/trunk/123/files/ HTTP/1.1 Host: example.org Cookie: slave=lamech Content-Type: multipart/form-data Content-Length: 666 ... }}} The server responds with a `201 Created` response. == Cancelling Builds == Using the BEEP protocol, the build master would mark builds as aborted if the connection to the slave was closed unexpectedly. This is no longer possible when using HTTP. To handle the case of build slaves going away at some point between having created a build and completing the build, the build master should have a configurable timeout. All in-progress builds would be checked against this timeout; if there has been no activity on the build for an amount of time exceeding the timeout, the master should cancel the build, resetting it the `PENDING` state. If a slave later '''does''' decide to come back to life and post results, it would get 404 (Not Found) or 409 (Conflict) errors, and should cancel the build on its side, too. There should probably be a background thread posting heartbeat requests to the master while lengthy build steps are executed.