Bitten Data Storage
The data generated by automated builds can almost always be mapped to the physical and/or the logical view of the code base, where the physical view corresponds to files and line numbers, while the logical view is composed of units such as packages, classes and functions. Specific metrics basically annotate either view with the extracted information. The structure of the annotation, however, depends entirely on the type of metric.
For example, test results might have the following attributes:
- Name of the test
- Result: success, failure or error
- Error message and stack trace (in case of an error or failure)
- Output of the test (standard error and output streams)
- Time it took to execute the test
Code coverage, on the other hand, might look like this:
- Name of the unit (module, class, etc)
- Total lines of code in the unit
- Percentage of lines executed
- Number of times every line was executed
Other data such as style check results would in turn consist of other properties.
See Report Formats for more details.
As Bitten should support the storage of arbritray metrics, the tables for storing this data need to be fairly generic. Versions of Bitten prior to 0.5 used a Berkeley DB XML backend to store reports, but for various reasons the report data is now mapped to tables in the relational database already provided by the Trac environment.
As depicated in the following diagram, report data is stored in two tables in the database: bitten_report and bitten_report_item.
There is one bitten_report record per report. This record associates the report with a step of a specific build (see Object Model), and stores the category and the generator of the report.
The category can be seen as indicator of the schema with which the report data complies. For example, there's a category “test” for test results, and a category “coverage” for code coverage. Test results data may be generated by different recipe commands and tools, but as long as they comply with the schema of a known category, the data can be analyzed and visualized by the same generic components.
The report data itself is stored in the bitten_report_item table. The basic assumption is that a report consists of a flat, unordered list of data points, each having a variable set of named properties. The names and values of these properties are storeds in the name and value columns of the bitten_report_item table, respectively. The item column simply contains an artificial key that groups the individual properties into "report items".
Mapping XML Reports to Database Tables
A report is received in XML format from the build slave. For example, a report generated by the <python:unittest> recipe command might look like this:
<report category="test" generator="http://bitten.cmlenz.net/tools/python#unittest"> ... <test duration="0.073" status="success" fixture="bitten.tests.model.BuildConfigTestCase" name="test_config_update_name" file="bitten/tests/model.py"> <stdout>Renaming build configuration</stdout> </test> ... </report>
For such a report, Bitten first inserts a record into the bitten_report table, taking the category and generator from the corresponding attributes of the <report> element. Next, a couple of records are inserted into the bitten_report_item table for every child element of <report>; one for every attribute of the child element. In addition, the tag name is stored as a property named type.
To allow for the transmission of data that cannot be easily encoded into XML attribute values, children of the child element itself are also mapped to properties: the tag name of the “grandchild” is used as the property name, and its´ body text is used as the property value.
For example, the following snippets would be stored in exactly the same way:
<test duration="0.073" status="success" fixture="bitten.tests.model.BuildConfigTestCase" name="test_config_update_name" file="bitten/tests/model.py" stdout="Renaming build configuration"> </test>
<test> <duration>0.073</duration> <status>success</status> <fixture>bitten.tests.model.BuildConfigTestCase</fixture> <name>test_config_update_name</name> <file>bitten/tests/model.py</file> <stdout>Renaming build configuration</stdout> </test>
These two methods can be mixed. When multiple properties with the same name are encountered, only the last one is stored. Child elements take precedence over attributes.
For the example above, the database tables should contain the following data:
|123||1||stdout||Renaming build configuration|
Querying the Report Store
Querying the report data is not straight-forward due to the generic nature of the tables. This usually results in a number of “self-joins” in the SQL SELECT statement.
For example, the following query aggregates the number of failed/succeeded unit tests by fixture:
SELECT item_fixture.value AS fixture, item_file.value AS file, COUNT(item_success.value) AS num_success, COUNT(item_failure.value) AS num_failure, COUNT(item_error.value) AS num_error FROM bitten_report AS report LEFT OUTER JOIN bitten_report_item AS item_fixture ON (item_fixture.report=report.id AND item_fixture.name='fixture') LEFT OUTER JOIN bitten_report_item AS item_file ON (item_file.report=report.id AND item_file.item=item_fixture.item AND item_file.name='file') LEFT OUTER JOIN bitten_report_item AS item_success ON (item_success.report=report.id AND item_success.item=item_fixture.item AND item_success.name='status' AND item_success.value='success') LEFT OUTER JOIN bitten_report_item AS item_failure ON (item_failure.report=report.id AND item_failure.item=item_fixture.item AND item_failure.name='status' AND item_failure.value='failure') LEFT OUTER JOIN bitten_report_item AS item_error ON (item_error.report=report.id AND item_error.item=item_fixture.item AND item_error.name='status' AND item_error.value='error') WHERE category='test' AND build=%s GROUP BY file, fixture ORDER BY fixture