Edgewall Software

Changes between Version 12 and Version 13 of Master Slave Protocol


Ignore:
Timestamp:
Sep 23, 2005, 7:30:19 PM (19 years ago)
Author:
cmlenz
Comment:

Updated and extended to reflect protocol changes

Legend:

Unmodified
Added
Removed
Modified
  • Master Slave Protocol

    v12 v13  
    11= Bitten Master/Slave Protocol =
    2 
    32[[PageOutline(2)]]
    43
    5 To decouple the master and slave, an application protocol will be defined on top of the meta-protocol BEEP (Blocks Extensible Exchange Protocol, [http://www.faqs.org/rfcs/rfc3080.html RFC 3080]). BEEP was chosen because it provides peer-to-peer communication (so that both the client and the server can send requests to the other) and because of its relative simplicity compared to protocols such as XMPP.
     4To decouple the build master and build slave, Bitten defines an application-level protocol on top of the meta-protocol BEEP (Blocks Extensible Exchange Protocol, [http://www.faqs.org/rfcs/rfc3080.html RFC 3080]). BEEP was chosen because it provides peer-to-peer communication (so that both the client and the server can initiate exchanges), and because of its relative simplicity compared to protocols such as XMPP.
    65
    7  '''Why BEEP?'''
     6BEEP is simple and flexible, and explicitly designed as a foundation for custom application protocols. Bitten includes a simple [wiki:BeepImplementation implementation] of BEEP. This implementation does not yet support some of the advanced protocol features such as support for authentication (SASL) and privacy/encryption (TLS).
    87
    9  ''I first looked [http://www.jabber.org/ Jabber]/[http://www.xmpp.org/ XMPP], but it seemed to be very complex (with dozens of related specifications), and there are no sufficiently mature implementations for Python. I could live with the complexity, but not if I have to implement the whole stack myself. I didn't look into other IM protocols because I wanted to build on something open/standardized. Note that even if I'd chosen XMPP/etc I would have to design a protocol on top of the provided infrastructure.''
     8== Protocol Overview ==
    109
    11  ''BEEP is simple and flexible, and explicitly designed as a foundation for custom application protocols. While the only Python implementation I found ([http://beepy.sourceforge.net/ BEEPy]) uses [http://twistedmatrix.com/ Twisted] and looks dead, BEEP is really simple enough to be implemented in a basic way in the scope of this project (i.e. minus
    12 the authentication and security features, which ''could'' of course be added later).''
     10Any build slave will connect to exactly one build master, but the master can be connected to a theoretically unlimited number of slaves simultaneously. The connections between master and slave are kept alive across many exchanges.
     11
     12The following diagram shows an example of the exchanges between a single build slave and a build master.
    1313
    1414[[Image(protocol.png)]]
    1515
     16This includes the [wiki:MasterSlaveProtocol#SlaveRegistration registration of the slave] with the master, the [wiki:MasterSlaveProtocol#BuildInitiation initiation of a build] by the build master, and finally the actual [wiki:MasterSlaveProtocol#BuildExecution execution of the build] by the slave. These phases are explained in detail in the following sections.
     17
    1618== Slave Registration ==
    1719
    18 A new client connects to the build master and signals its' availability for executing builds by starting a channel for the Bitten [source:trunk/doc/orchestration.dtd#latest orchestration profile].
     20A new client connects to the build master and signals its' availability for executing builds by starting a channel for the Bitten [source:trunk/doc/orchestration.dtd orchestration profile].
    1921
    20 First, the server needs to query some information about the client for orchestration:
    21  * Platform/architecture
    22  * Operating system
    23  * The product name and version number of each of the dependencies of the project to build (for example, the C compiler or the Python runtime).
    24  * Name and email address of the maintainer
     22First, the master needs some information about the slave for orchestration:
     23 * The platform/architecture of the slave machine,
     24 * the operating system,
     25 * the product name and version number of each of the dependencies of the project to build (for example, the C compiler or the Python runtime), and
     26 * the name and email address of the maintainer of the machine.
    2527
    26 After the Bitten channel has been started, the client would send a message like this to the server:
     28After the build orchestration channel has been started, the client would send a message like this to the server:
    2729{{{
    2830  MSG 1 0 . 0 78
     
    3638}}}
    3739
    38 The server acknowledges that it received the registration with a positive or negative reply.
     40The server acknowledges that it received the registration with a positive or negative reply, using the `<ok/>` or `<error/>` elements in the payload, respectively.
    3941
    40 Next, the server checks whether there are any pending builds for that client (see BuildConfigurations). For example, if it is the only client that supports GCC 4.0, and there has been no build of some revision with GCC 4 yet, it will initiate a build on that client. Anyway, the server remembers the client configuration for as long as the connection is open, and may choose to route build requests to that client when repository changes are detected, or a build is triggered otherwise.
     42The master may reject the registration of a slave if no [wiki:BuildConfigurations build configuration] has a target platform that matches the properties of the slave. Effectively this means that the build master doesn't have any build that the slave could perform. Registration of a slave may also be rejected if there are already too many slaves connected to the build master.
     43
     44If registration of the slave is accepted, the server checks whether there are any pending builds for the target platform matching the slave. For example, if it is the only slave that supports GCC 4.0, and there has been no build of some revision with GCC 4 yet, the build master will initiate a build on that slave. In any case, the master remembers the slave configuration for as long as the connection is open, and may choose to route build requests to that machine when repository changes are detected.
    4145
    4246== Build Initiation ==
     
    4448When the build server detects that builds are necessary for some revision of the project, it queries its database of available slaves and chooses a set of slaves with non-overlapping configurations. For example, if there are 10 clients available that could execute the build of a Java project on Windows 2000 with JDK 1.4, it will only select one of those to actually perform the build.
    4549
    46 A build request might look like this (the text is optional and only provided for diagnostic purposes):
     50A build request consists of the [wiki:BuildRecipes build recipe], and contains the instructions that the slave must follow to execute the build:
    4751{{{
    4852  MSG 1 1 . 0 78
    4953  Content-Type: application/beep+xml
    5054 
    51   <build recipe="path/to/recipe.xml">trunk as of revision 492</build>
     55  <build xmlns:python="http://bitten.cmlenz.net/tools/python">
     56    <step id="compile">
     57      <python:distutils command="build"/>
     58    </step>
     59    <step id="dist">
     60      <python:distutils command="sdist"/>
     61    </step>
     62  </build>
    5263  END
    5364}}}
    5465
    55 The build request must include the path to the recipe file relative to the root of the code base.
    56 
    57 A client can decline a build request, in which case the build master selects the next available client with the same (or sufficiently similar) configuration. A build request is declined using a negative reply containing an {{{<error></error>}}} element in the payload:
     66The slave should validate the build recipe and check whether all of the referenced recipe commands are available, before starting the build. In case of a problem the slave must decline such the build request using a negative reply containing an `<error></error>` element in the payload.
    5867
    5968{{{
     
    6170  Content-Type: application/beep+xml
    6271 
    63   <error code="550">Too busy</error>
     72  <error code="550">
     73    Unsupported recipe command http://bitten.cmlenz.net/tools/python#distutils
     74  </error>
    6475  END
    6576}}}
    6677
    67 In this case the slave remains in the pool maintained by the master, but the master should attempt to prioritize slaves that accept build requests over those that regularly reject requests, as to avoid constantly polling the latter with requests that will probably be rejected again anyway.
     78A build initiation can also be declined because the machine on which the slave process is being run has a too high load.
    6879
    69  '''TODO''': ''Specify error scenarios and error codes.''
     80When a build request is declined, the build master must select the next available client with the same (or sufficiently similar) configuration. The slave remains in the pool maintained by the master, but the master should attempt to prioritize slaves that accept build requests over those that regularly reject requests, as to avoid constantly polling the latter with requests that will probably be rejected again anyway.
    7081
    7182== Build Execution ==
     
    7384If the client accepts a build request by sending a positive reply, the server will transmit a tarball of the code base that is to be built. The client does not need to know which exact revision (or branch) of the project it is building, nor does it need to perform a checkout itself.
    7485
    75 A client accepts a build request by responding with a '''{{{RPY}}}''' message containing a {{{<proceed></proceed>}}} element in the payload. The reply must contain a list of archive formats that the slave supports for transmission of the code. For example:
     86A client accepts a build request by responding with a '''`RPY`''' message containing a `<proceed></proceed>` element in the payload. The reply must contain a list of archive formats that the slave supports for transmission of the code. For example:
    7687
    7788{{{
     
    8697}}}
    8798
    88 In this message, the client indicates that it will accept {{{tar}}} archives with {{{bzip2}}} or {{{gzip}}} compression (preferring the former). Another client might specify that it supported only ZIP archives, for example.
     99In this message, the client indicates that it will accept `tar` archives with `bzip2` or `gzip` compression (preferring the former). Another client might specify that it supported only ZIP archives.
    89100
    90101After having received such a reply, the master can proceed by transmitting a snapshot of the code base to the slave:
    91 
    92102{{{
    93   MSG 1 2 * 0 78
     103  MSG 1 2 * 0 3421
    94104  Content-Type: application/tar
    95105  Content-Disposition: myproject-r456.tar
     
    99109}}}
    100110
    101 The client may respond to this transmission either with a negative reply ('''{{{ERR}}}''' containing an {{{<error></error>}}} element with a description of the error), or by starting a sequence of '''{{{ANS}}}''' replies, terminated by a final '''{{{NUL}}}''' message (see next section).
     111If the slave is not able to handle the received archive, it should respond to this transmission with a negative reply:
    102112
    103  '''TODO''': ''Specify error scenarios and error codes.''
     113{{{
     114  ERR 1 1 . 0 60
     115  Content-Type: application/beep+xml
     116 
     117  <error code="550">
     118    Invalid tar.gz archive
     119  </error>
     120  END
     121}}}
     122
     123Otherwise, the slave should proceed immediately with the execution of the build, and respond a sequence of '''`ANS`''' replies, terminated by a final '''`NUL`''' message (see [wiki:MasterSlaveProtocol#BuildStatusReporting next section]).
    104124
    105125== Build Status Reporting ==
    106126
    107 After having received and upacked the snapshot archive, and having successfully parsed the build recipe, the slave responds with '''{{{ANS}}}''' message containing a {{{<started/>}}} element in the payload:
     127After having received and upacked the snapshot archive the slave responds with an '''`ANS`''' message containing a `<started/>` element in the payload:
    108128
    109129{{{
     
    115135}}}
    116136
    117 The {{{time}}} attribute contains the date and time (in ISO 8601 format) at which the build was started. These timestamps must be UTC, and consequently must not contain a timezone offset.
     137The `time` attribute contains the date and time (in ISO 8601 format) at which the build was started. These timestamps must be UTC, and must ''not'' contain a timezone offset.
    118138
    119 The slave then begins executing the steps in the recipe one-by-one (in the order they appear in the file). After each step of the [wiki:BuildRecipes build recipe], the client informs the server, with '''{{{ANS}}}''' messages containing a {{{<step/>}}} element in the payload, about the step it has processed, and what the outcome was (success or failure):
     139The slave then begins executing the build steps in the recipe one-by-one, in the order they appear in the recipe. After each step is completed, the client informs the server about the step it has processed, and what the outcome was (success or failure), using an '''`ANS`''' message containing a `<step/>` element in the payload:
    120140
    121141{{{
     
    124144
    125145  <step id="test" description="Run all unit tests" result="success"
    126         time="2005-06-29T16:41:53" duration="7.61"/>
    127   END
    128 }}}
    129 
    130 The {{{time}}} attribute specifies the date and time at which processing of this step was started. The {{{duration}}} attribute contains the number of seconds that it took to complete the step (this may include fractions).
    131 
    132 In case of an error, the message should include the primary error message in the body of the {{{<step></step>}}} element:
    133 
    134 {{{
    135   ANS 1 2 . 0 135 1
    136   Content-Type: application/beep+xml
    137 
    138   <step id="test" description="Run all unit tests" result="failure"
    139146        time="2005-06-29T16:41:53" duration="7.61">
    140     Could not load command "unittest".
     147    ...
    141148  </step>
    142149  END
    143150}}}
    144151
    145  '''TODO''': ''Transmission of build log and generated reports to the master''
     152  '''TODO''': report and log elements
    146153
    147 After the slave has processed all of the build steps, it sends an '''{{{ANS}}}''' message containing the element {{{<completed/>}}} in the payload:
     154The `time` attribute specifies the date and time at which processing of this step was started. The `duration` attribute contains the number of seconds that it took to complete the step (this may include fractions).
     155
     156After the slave has processed all of the build steps, it sends a final '''`ANS`''' message containing the element `<completed/>` in the payload:
    148157
    149158{{{
     
    155164}}}
    156165
    157 Furthermore, in case the slave is unexpectedly interrupted while executing a build, it should send an '''{{{ANS}}}''' message containing the element {{{<abort></abort>}}} in the payload:
     166Furthermore, in case the slave is unexpectedly interrupted while executing a build, it should send an '''`ANS`''' message containing the element `<abort></abort>` in the payload:
    158167
    159168{{{