As of version 0.8.0, Buildbot has used a database as part of its storage
backend. This section describes the database connector classes, which allow
other parts of Buildbot to access the database. It also describes how to
modify the database schema and the connector classes themselves.

Note

Buildbot is only half-migrated to a database backend. Build and builder
status information is still stored on disk in pickle files. This is
difficult to fix, although work is underway.

All access to the Buildbot database is mediated by database connector classes.
These classes provide a functional, asynchronous interface to other parts of
Buildbot, and encapsulate the database-specific details in a single location in
the codebase.

The connector API, defined below, is a stable API in Buildbot, and can be
called from any other component. Given a master master, the root of the
database connectors is available at master.db, so, for example, the state
connector's getState method is master.db.state.getState.

The connectors all use SQLAlchemy Core to achieve (almost)
database-independent operation. Note that the SQLAlchemy ORM is not used in
Buildbot. Database queries are carried out in threads, and report their
results back to the main thread via Twisted Deferreds.

The database schema is maintained with SQLAlchemy-Migrate. This package handles the
details of upgrading users between different schema versions.

The schema itself is considered an implementation detail, and may change
significantly from version to version. Users should rely on the API (below),
rather than performing queries against the database itself.

This class handles the complex process of claiming and unclaiming build
requests, based on a polling model: callers poll for unclaimed requests with
getBuildRequests, then attempt to claim the requests with
claimBuildRequests. The claim can fail if another master has claimed
the request in the interim.

An instance of this class is available at master.db.buildrequests.

Build requests are indexed by an ID referred to as a brid. The contents
of a request are represented as build request dictionaries (brdicts) with
keys

Get a single BuildRequest, in the format described above. This method
returns None if there is no such buildrequest. Note that build
requests are not cached, as the values in the database are not fixed.

buildername (string) -- limit results to buildrequests for this builder

complete -- if true, limit to completed buildrequests; if false,
limit to incomplete buildrequests; if None, do not limit based on
completion.

claimed -- see below

bsid -- see below

repository -- the repository associated with the sourcestamps originating the requests

branch -- the branch associated with the sourcestamps originating the requests

Returns:

list of brdicts, via Deferred

Get a list of build requests matching the given characteristics.

Pass all parameters as keyword parameters to allow future expansion.

The claimed parameter can be None (the default) to ignore the
claimed status of requests; True to return only claimed builds,
False to return only unclaimed builds, or "mine" to return only
builds claimed by this master instance. A request is considered
unclaimed if its claimed_at column is either NULL or 0, and it is
not complete. If bsid is specified, then only build requests for
that buildset will be returned.

A build is considered completed if its complete column is 1; the
complete_at column is not consulted.

Try to "claim" the indicated build requests for this buildmaster
instance. The resulting deferred will fire normally on success, or
fail with AlreadyClaimedError if any of the build
requests are already claimed by another master instance. In this case,
none of the claims will take effect.

If claimed_at is not given, then the current time will be used.

As of 0.8.5, this method can no longer be used to re-claim build
requests. All given ID's must be unclaimed. Use
reclaimBuildRequests to reclaim.

Note

On database backends that do not enforce referential integrity
(e.g., SQLite), this method will not prevent claims for nonexistent
build requests. On database backends that do not support
transactions (MySQL), this method will not properly roll back any
partial claims made before an AlreadyClaimedError is
generated.

Re-claim the given build requests, updating the timestamp, but checking
that the requests are owned by this master. The resulting deferred will
fire normally on success, or fail with AlreadyClaimedError if
any of the build requests are already claimed by another master
instance, or don't exist. In this case, none of the reclaims will take
effect.

Release this master's claim on all of the given build requests. This
will not unclaim requests that are claimed by another master, but will
not fail in this case. The method does not check whether a request is
completed.

Complete a set of build requests, all of which are owned by this master
instance. This will fail with NotClaimedError if the build
request is already completed or does not exist. If complete_at is
not given, the current time will be used.

Get a list of the count most recent changes, represented as
dictionaries; returns fewer if that many do not exist.

Note

For this function, "recent" is determined by the order of the
changeids, not by when_timestamp. This is most apparent in
DVCS's, where the timestamp of a change may be significantly
earlier than the time at which it is merged into a repository
monitored by Buildbot.

classifications (dictionary) -- mapping of changeid to boolean, where the boolean
is true if the change is important, and false if it is unimportant

Returns:

Deferred

Record the given classifications. This method allows a scheduler to
record which changes were important and which were not immediately,
even if the build based on those changes will not occur for some time
(e.g., a tree stable timer). Schedulers should be careful to flush
classifications once they are no longer needed, using
flushChangeClassifications.

Return the classifications made by this scheduler, in the form of a
dictionary mapping changeid to a boolean, just as supplied to
classifyChanges.

If branch is specified, then only changes on that branch will be
given. Note that specifying branch=None requests changes for the
default branch, and is not the same as omitting the branch argument
altogether.

This class manages source stamps, as stored in the database. Source stamps
are linked to changes. Source stamps with the same sourcestampsetid belong
to the same sourcestampset. Buildsets link to one or more source stamps via
a sourcestampset id.

An instance of this class is available at master.db.sourcestamps.

Source stamps are identified by a ssid, and represented internally as a ssdict, with keys

ssid

sourcestampsetid (set to which the sourcestamp belongs)

branch (branch, or None for default branch)

revision (revision, or None to indicate the latest revision, in
which case this is a relative source stamp)

patch_body (body of the patch, or None)

patch_level (directory stripping level of the patch, or None)

patch_subdir (subdirectory in which to apply the patch, or None)

patch_author (author of the patch, or None)

patch_comment (comment for the patch, or None)

repository (repository containing the source; never None)

project (project this source is for; never None)

changeids (list of changes, by id, that generated this sourcestamp)

Note

Presently, no attempt is made to ensure uniqueness of source stamps, so
multiple ssids may correspond to the same source stamp. This may be fixed
in a future version.

sourcestampsetid (integer) -- identification of the set, all returned sourcestamps belong to this set

Returns:

sslist of ssdict

Get a set of sourcestamps identified by a set id. The set is returned as
a sslist that contains one or more sourcestamps (represented as ssdicts).
The list is empty if the set does not exist or no sourcestamps belong to the set.

Add a new (empty) sourcestampset to the database. The unique identification
of the set is returned as integer. The new id can be used to add
new sourcestamps to the database and as reference in a buildset.

This class handles maintaining arbitrary key/value state for Buildbot
objects. Each object can store arbitrary key/value pairs, where the values
are any JSON-encodable value. Each pair can be set and retrieved
atomically.

Objects are identified by their (user-visible) name and their
class. This allows, for example, a nightly_smoketest object of class
NightlyScheduler to maintain its state even if it moves between
masters, but avoids cross-contaminating state between different classes
of objects with the same name.

Note that "class" is not interpreted literally, and can be any string that
will uniquely identify the class for the object; if classes are renamed,
they can continue to use the old names.

The more complicated task is to recognize each user across multiple
interfaces with Buildbot. For example, a user may be identified as
'djmitche' in Subversion, 'dustin@v.igoro.us' in Git, and 'dustin' on IRC.
To support this functionality, each user as a set of attributes, keyed by
type. The findUserByAttr method uses these attributes to match users,
adding a new user if no matching user is found.

Users are identified canonically by uid, and are represented by usdicts (user
dictionaries) with keys

uid

identifier (display name for the user)

bb_username (buildbot login username)

bb_password (hashed login password)

All attributes are also included in the dictionary, keyed by type. Types
colliding with the keys above are ignored.

This method is intended for use by other components of Buildbot to
search for a user with the given attributes.

Note that identifier is not used in the search for an existing
user. It is only used when creating a new user. The identifier should
be based deterministically on the attributes supplied, in some fashion
that will seem natural to users.

For future compatibility, always use keyword parameters to call this
method.

The information above is intended for developers working on the rest of
Buildbot, and treating the database layer as an abstraction. The remainder of
this section describes the internals of the database implementation, and is
intended for developers modifying the schema or adding new methods to the
database layer.

Warning

It's difficult to change the database schema significantly after it has
been released, and very disruptive to users to change the database API.
Consider very carefully the future-proofing of any changes here!

A reference to the DBConnector, so that
connector components can use e.g., self.db.pool or
self.db.model. In the unusual case that a connector component
needs access to the master, the easiest path is self.db.master.

The connectors all use SQLAlchemy Core as a wrapper around database
client drivers. Unfortunately, SQLAlchemy is a synchronous library, so some
extra work is required to use it in an asynchronous context like Buildbot.
This is accomplished by deferring all database operations to threads, and
returning a Deferred. The Pool class takes care of
the details.

A connector method should look like this:

defmyMethod(self,arg1,arg2):defthd(conn):q=...# construct a queryforrowinconn.execute(q):...# do something with the resultsreturn...# return an interesting valuereturnself.db.pool.do(thd)

Picking that apart, the body of the method defines a function named thd
taking one argument, a Connection object. It then calls
self.db.pool.do, passing the thd function. This function is called in
a thread, and can make blocking calls to SQLAlchemy as desired. The do
method will return a Deferred that will fire with the return value of thd,
or with a failure representing any exceptions raised by thd.

The return value of thd must not be an SQLAlchemy object - in particular,
any ResultProxy
objects must be parsed into lists or other data structures before they are
returned.

Warning

As the name thd indicates, the function runs in a thread. It should
not interact with any other part of Buildbot, nor with any of the Twisted
components that expect to be accessed from the main thread -- the reactor,
Deferreds, etc.

Queries can be constructed using any of the SQLAlchemy core methods, using
tables from Model, and executed with the connection
object, conn.

Call callable in a thread, with a Connection object as first
argument. Returns a deferred that will fire with the results of the
callable, or with a failure representing any exception raised during
its execution.

Any additional positional or keyword arguments are passed to
callable.

This class contains the canonical description of the buildbot schema, It is
presented in the form of SQLAlchemy Table instances, as class variables. At
runtime, the model is available at master.db.model, so for example the
buildrequests table can be referred to as
master.db.model.buildrequests, and columns are available in its c
attribute.

The source file, master/buildbot/db/model.py, contains comments
describing each table; that information is not replicated in this
documentation.

Note that the model is not used for new installations or upgrades of the
Buildbot database. See Modifying the Database Schema for more
information.

A decorator for "getter" functions that fetch an object from the database
based on a single key. The wrapped method will only be called if the named
cache does not contain the key.

The wrapped function must take one argument (the key); the wrapper will
take a key plus an optional no_cache argument which, if true, will
cause it to invoke the underlying method even if the key is in the cache.

The resulting method will have a cache attribute which can be used to
access the underlying cache.

In most cases, getter methods return a well-defined dictionary. Unfortunately,
Python does not handle weak references to bare dictionaries, so components must
instantiate a subclass of dict. The whole assembly looks something like
this:

It goes without saying that any new connector methods must be fully tested!

You will also want to add an in-memory implementation of the methods to the
fake classes in master/buildbot/test/fake/fakedb.py. Non-DB Buildbot code
is tested using these fake implementations in order to isolate that code from
the database code.

Changes to the schema are accomplished through migration scripts, supported by
SQLAlchemy-Migrate. In fact,
even new databases are created with the migration scripts -- a new database is
a migrated version of an empty database.

The schema is tracked by a version number, stored in the migrate_version
table. This number is incremented for each change to the schema, and used to
determine whether the database must be upgraded. The master will refuse to run
with an out-of-date database.

To make a change to the schema, first consider how to handle any existing data.
When adding new columns, this may not be necessary, but table refactorings can
be complex and require caution so as not to lose information.

Create a new script in master/buildbot/db/migrate/versions, following
the numbering scheme already present. The script should have an update
method, which takes an engine as a parameter, and upgrades the database, both
changing the schema and performing any required data migrations. The engine
passed to this parameter is "enhanced" by SQLAlchemy-Migrate, with methods to
handle adding, altering, and dropping columns. See the SQLAlchemy-Migrate
documentation for details.

Next, modify master/buildbot/db/model.py to represent the updated
schema. Buildbot's automated tests perform a rudimentary comparison of an
upgraded database with the model, but it is important to check the details -
key length, nullability, and so on can sometimes be missed by the checks. If
the schema and the upgrade scripts get out of sync, bizarre behavior can
result.

Your upgrade script should have unit tests. The classes in
master/buildbot/test/util/migration.py make this straightforward.
Unit test scripts should be named e.g.,
test_db_migrate_versions_015_remove_bad_master_objectid.py.

The master/buildbot/test/integration/test_upgrade.py also tests
upgrades, and will confirm that the resulting database matches the model. If
you encounter implicit indexes on MySQL, that do not appear on SQLite or
Postgres, add them to implied_indexes in
master/buidlbot/db/model.py.

Because Buildbot works over a wide range of databases, it is generally limited
to database features present in all supported backends. This section
highlights a few things to watch out for.

In general, Buildbot should be functional on all supported database backends.
If use of a backend adds minor usage restrictions, or cannot implement some
kinds of error checking, that is acceptable if the restrictions are
well-documented in the manual.

The metabuildbot tests Buildbot against all supported databases, so most
compatibility errors will be caught before a release.

MySQL only supports about 330-character indexes. The actual index length is
1000 bytes, but MySQL uses 3-byte encoding for UTF8 strings. This is a
longstanding bug in MySQL - see "Specified key was too long; max key
length is 1000 bytes" with utf8.
While this makes sense for indexes used for record lookup, it limits the
ability to use unique indexes to prevent duplicate rows.

InnoDB has even more severe restrictions on key lengths, which is why the MySQL
implementation requires a MyISAM storage engine.

Unfortunately, use of the MyISAM storage engine precludes real transactions in
MySQL. transaction.commit() and transaction.rollback() are essentially
no-ops: modifications to data in the database are visible to other users
immediately, and are not reverted in a rollback.

Neither MySQL nor SQLite enforce referential integrity based on foreign keys.
Postgres does enforce, however. If possible, test your changes on Postgres
before committing, to check that tables are added and removed in the proper
order.

MySQL's query planner is easily confused by subqueries. For example, a DELETE
query specifying id's that are IN a subquery will not work. The workaround is
to run the subquery directly, and then execute a DELETE query for each returned
id.

If this weakness has a significant performance impact, it would be acceptable to
conditionalize use of the subquery on the database dialect.