Repeatable builds with Docker

Dockerized build environments to ensure uniform and repeatable builds

Posted by Francesco Degrassi on 05 Feb 2015

Consistency and repeatability in builds

A significant challenge in software development is to ensure consistency and repeatability;
from building binaries, to running tests, to deploying in production, we strive
to make sure that different people at different times will build the same
binary, get the same test results, obtain the same application behaviour we do
now.

This is very important because without repeatability bug reports become
extremely difficult to address, we can’t trust a rollback to restore
functionality on a broken application, we waste time debugging phantom problems
(“works for me”).

Lately new concerns are starting to surface; for security-sensitive software
(such as Tor, or Bitcoin Core) there is interest in so called
“deterministic” builds (100% reproducible builds, down to the the single
bits in the binary) so that anyone can independently verify the authenticity of
binary distributions.

Easier said than done

Ensuring repeatability throughout the development process requires to tackle
different issues in each phase.

Coding

source code tracking must be established with a version control system such
as Git, Mercurial or Subversion

library dependencies need to be tracked with a tool such as Ivy, Maven,
rubygems or similar

Buid

build scripts and configuration need to be versioned together with code

machines configuration must be managed with a tool such as Chef or Puppet

binary dependencies, such as system-wide libraries and binaries, must be
tracked

Moreover, several crosscutting concerns must also be addressed, among those:

a clean environment must be guaranteed in each stage: no untracked source
files, no old, stray libraries on the build machine, stale data in the test
or production environment influencing the execution.

adequate performance in each step must be maintained: build and test times
must be kept reasonably low

Issues in the build and test phases

If we focus on the build and test phases only, we find out that there is a
pattern of common challenges and issues in most projects we’ve met. Some of
these are:

Complex environment setup: a lot of time is wasted in manual setup of both
developer workstations and the CI system. Mistakes are also common, such as
wrong versions of compilers, interpreters or build tools, incompatible
libraries, missing packages.

Version conflict: keeping multiple versions of the same tools on one machine
can be cumbersome, more so when we factor in the need for different runtimes
and libraries for different projects (common examples are ruby and python are
common, but also java lately).

Introducing Docker

Docker is an open platform for developers and sysadmins to build, ship, and
run distributed applications. […] The Docker Engine container comprises
just the application and its dependencies. It runs as an isolated process in
userspace on the host operating system, sharing the kernel with other
containers.
Thus, it enjoys the resource isolation and allocation benefits of VMs.

Docker simplifies application distribution and deployment significantly by
guaranteeing portability and standardization of environments; “Dockerized”
apps, by bundling together the executable and any other dependency such as
libraries, other executables and packages, are completely portable and can run
anywhere (on physical or virtualized hardware, both local and remote).

Docker for building

Docker has been used (and praised) extensively for simplifying deployments, but
it has a far wider reach than that, and has been used lately to simplify the
build and test phases.

The idea is pretty simple:

create a Docker image for each different build environment and for any
required test dependency, providing all the system libraries, build tools and
compilers required. Or, better yet, use the ones already available on Docker
hub and personalize when needed.

create a Dockerfile, based on the build image, and configure it to run your
build script

Notes

5: the "dockerbuild" target gets executed by the docker host
8: --rm=true discards the container after the build
9-11: add data volumes for the source tree, Makefile and output directory
12: specify the working directory inside the container
13: golang build image, version 1.4
14: the actual build command to run in the container, calling make
16: the "build" target gets executed inside the docker container, and performs the actual build

This approach has lots of advantages:

drastically simpler setup for newcomers: the complete build and test
environment is specified formally, to set up a new workstation (or a new CI
build system) all you need is to have Docker and Git/Mercurial installed, clone
a repo and start the build.

build environment standardization: your local build environment is identical to
anyone else’s and to the CI system’s.

no version conflict: you don’t need to install several different compiler or
interpreter versions on your workstation or the CI machine, you’ll have a
different image for each one and will run it inside its own Docker container.

test dependencies are tracked and brought up, clean, at every build
being recreated from scratch every time, the build and test environments are
always clean

Too much effort? Try Drone

Drone is an opensource CI system built with this
approach in mind; complete with a web UI and available also
as a service, it provides a collection of pre-configured
containers for different build environments (Scala, Go, Java, Ruby, you name it)
and another for test dependencies (databases, brokers, etc).

It provides most of the plumbing, a complete CI system and only requires
minimal configuration; a typical build config file looks like this:

Many areas are still work in progress, but there is already enough to get you
started and lots of activity around it.

Assumptions, open issues and next steps

Some assorted notes and thoughts:

to be able to reproduce a past build, you need to have a copy of all the parts
of the build process: sources, build-time libraries (jar packages, gems) and
build environment. For dockerized builds, this means storing locally (in some
sort of local docker registry) the docker containers you’ll use for your
build to ensure they are available when you need them.

we assume that any artifact used in the build process, including dockerized
build environments, are versioned and immutable, that is, nobody will push a
different artifact with the same version of a previous one.

rebuilding from scratch every time can be slow; you can work around that by,
e.g. using a local artifact caching repo for gems, jars and such (again,
assuming artifacts being immutable).

Drone.io has support for folder caching inside build environments (e.g. to
cache ~/.m2/repository), but it breaks when deploying remotely; it might also
be problematic if artifacts from the build go into the folder you are caching,
since previous broken builds might taint your environment.

security sensitive builds for projects like Tor or Bitcoin Core also require
some kind of digital signature verification on all the artifacts and
configuration involved; this is considerably more complex than what we’re
trying to introduce here.

finally, other app container technologies are being developed, one of them
being Rocket. It would be interesting to
compare these new alternatives from the point of view of build automation.