A framework for collecting software metrics via continuous integration
Today´s automated builds generate a tremendous volume of information about the state of software development projects. This starts with basic status indicators such as compilation errors and test failures, but is increasingly extended to include advanced software metrics such as dependency analysis, code coverage analysis or style checking.
Traditionally, continuous integration systems such as Gump, Tinderbox and BuildBot only record and display the data that the build system prints to the standard output and error streams. Thus all the information about a code base generated by the build often cannot be used to full extent.
To effectively provide value for the ongoing development and management of a project, data generated by builds needs to be collected in a central repository, and in a machine-readable format, to allow for analysis and presentation of the data even long after the actual build has been run. In addition to being able to adjust how data is analyzed and presented in retrospect, this approach is essential for historical reports that show how specific metrics are evolving over time – which is often more valuable than the absolute values of these metrics at one specific point in time.
Motivation and Introduction
A build system in this context is a system such as Make, Ant or SCons that allows the automated construction of software products from source code. For traditional “static” languages such as C, C++ or Java, this primarily means compiling and linking source code so that it becomes executable or otherwise usable in the form of machine code or byte code, and packaging the end results for distribution.
A continuous integration (CI) system automatically executes builds at certain points, for example after a change has been checked in to the version control system, or unconditionally at specific intervals (“nightly builds”). The purpose is to perform the build in a neutral environment with a fresh copy of the latest code from the repository, containing the changes checked in by all team members. In addition, some continuous integration systems can coordinate the build on multiple different machines to validate that the code works in different target environments (for example, different platforms, operating systems, library versions, etc.)
CI systems today are mostly concerned with getting a boolean result from the build: whether the build was successful or not. The build log itself is recorded to make it easier for the developers to track down the cause of a failure. In many cases, the product of the build is also made available, for example to allow further testing.
But increasingly, many projects include build scripts that do a lot more than just compile, link and package the end product: unit tests are run, code coverage by the tests is recorded, conformance to a defined coding style is checked, XML files are validated, and so on. Some of these extra steps can be made to “break” the build (such as test failures or validation errors), while others exist solely to produce information about the code base (such as code coverage analysis). Either way, a lot of interesting data is generated during the build; information that can be important for understanding and managing a software project.
For example, Maven is a Java build system built on top of Ant that allows the generation of a number of project reports, typically as HTML files with accompanying graphics. Basically, after a developer performs a full build, she'll have a static web site stored in the build directory, which will contain reports automatically generated from the project source code. While these reports might sometimes benefit that developer, and can be published on the web so they can be viewed by her peers, the use of the information provided this way is inherently limited due to its static nature.
Continuous integration systems are the ideal candidate for collecting all the data generated by automated builds in a central location and making it available for reporting.
The goal of this work is to design and implement a distributed system for automated builds and continuous integration that allows the central collection and storage of software metrics generated during the build. The information collected this way needs to be structured and available in a machine-readable format, so that it can be analyzed, aggregated/correlated and presented after the build itself has completed. The system is required to meet the constraint of neutrality towards programming languages and tool chains: at its core, it must not assume that any particular language or build tool is in use by a project. Rather, it should provide a generic framework for the execution of builds, and collection of data from builds, and for persisting this information in a central location to make it available for various kinds of reports. The system needs be extensible to support various specific languages and tool-chains in a meaningful manner.
This system will be built on top of Trac, a simple web-based application for managing software development projects, written in Python. Trac provides a view of the projects version control repository, a wiki for collaborative documentation and an issue tracker for managing defects and tasks. All of this is held together by a simple wiki syntax that can be used everywhere for linking to any kind of object (for example wiki pages, changesets and tickets), a “timeline” view that shows recent activity in all of those areas, and a generic search facility.
Scope of Work
The design of the system will be based on distributed CI systems such as Tinderbox and BuildBot: a central build orchestrator (or build master) is responsible for the coordination of several build slaves that do the actual work of executing builds. The orchestrator is a daemon that knows what to build and how to build it; it provides this knowledge as a build recipe to the build slaves, which report their status and results back to the orchestrator after the build – or parts of the build – have completed.
A build recipe is declarative; a configuration file determines what commands to execute, and where certain artifacts and reports can be found after a command has completed.
The diagram above shows the three layers that the system will be composed of, along with the main responsibilities of each layer. There are three core aspects that all three layers deal with: the build itself, the generated data and the status of the build. The main emphasis of this work will be on the second of these aspects: the conversion, collection and presentation of the data generated by builds.
- Data conversion
- There is a large variety of different tools that generate data in different formats, including the build system itself, as well as any additional tools integrated with the build, such as unit testing frameworks, code coverage analyzers or style checkers. The data produced by these tools needs to parsed and converted so that it can be handled appropriately. This conversion is done by both the build slave (mainly to prepare the data for transmission to the master) and by the build master (to convert the data into a format suitable for storage and analysis.)
- Data collection
- The build master collects all the data reported back by the individual build slaves and writes that data to some kind of persistent store, for example a relational database. The way it is stored needs to be oriented towards the requirements of providing flexible reporting capabilities. All collected data is always tagged with the revision against which the build was made so that it's possible to correlate the information with other data such as repository activitity.
- Data presentation
- Presentation of the collected data is handled by a Trac plug-in. This plug-in has access to the database maintained by the build master and provides means to visualize the data or make it otherwise accessible through the web interface. Trac itself will be extended to expose additional extension points where necessary, for example to integrate software metrics and stastistics in various places, such as the timeline and the repository browser.
The focus of tool support will be on Java and Python projects and the predominant build systems used with those languages. For Java projects, the integration of tools such as JUnit, Clover/ jcoverage and JMetric will be examined. For Python projects, the standard modules unittest and trace.py can be used for unit tests and code coverage, and third-party scripts such as pychecker and PyLint may be used for style checking and other metrics.