Flavio Poletti

So… here’s dibs, which stands for Docker Image Build System. Put it
very, very bluntly… it is what I use instead of docker build lately.
It already comes with documentation and many frills, but we will start
very slow and explore features in a series of posts. Buckle up!

Quick “why dibs”

Why did I start working on dibs? Part is due to my ignorance: I
honestly didn’t know about multistage Dockerfiles, and I quickly got
bored of managing it myself.

This is not the whole story though. Another aspect of the Dockerfile
system that is a bit too… raw in my opinion is how to execute things.
As far as I know, you either use the RUN directive providing a usually
growing list of commands all stitched together with &&s, or you put
those commands in a script, COPY it into the container, then RUN it.
I’d say this is basically what’s needed, but not the best from the
usability point of view.

Last, I also grew a bit tired of the caching mechanism provided by Docker
when building images via Dockerfile. Don’t get me wrong, it’s really neat;
my only concern is that sometimes it led to situations that required some
debugging before figuring out what was going wrong, especially when
building images from a remote repository dynamically cloned via git. I
felt that having direct and explicit control over the caching mechanism
would help in a lot of my use cases.

Example?

In this series of posts I’ll be showing examples of growing complexity. In
this starting one there will be actually nothing (I guess) that cannot be
done more or less directly with a Dockerfile, but it’s helpful to set the
stage.

The main goal is to generate a Docker image for sample-mojo. This
project is somehow purely focused on providing a simple web endpoint,
without wasting too much time on how it will be deployed, apart providing
an Heroku-compatible Procfile:

compile and run under an unprivileged user ada, to avoid root as
much as possible`

Hence, the sequence of operations will be like this:

build the modules in a container were we install all needed building
tools

copy the final complete application in a temporary cache

create another container with only the tools needed at runtime, namely
Perl

copy the application artifacts from the temporary cache into their final
position in this container

put a wrapper script to interpret and execute the Procfile in the
container

set the proper ENTRYPOINT, CMD and USER on the container

save the container as image.

Installing dibs

dibs needs Docker to do anything interesting, so chances are that you
already have Docker. In this case, installing dibs is as simple as
creating the following dibs script somewhere in the PATH, like this:

Actions

dibs executes actions, which can be of differen types. You can specify
the type explicitly, but it’s normally not needed as it’s clear what an
action is about.

At the top level, it always executes a sketch, that is a sequence of
other actions (including other sketches). In our case, we didn’t specify
any explicit action to be run on the command line, so the default sketch
has been selected:

actions:default:[build,bundle]

i.e. the actions build and bundle (which are other sketches
themselves, being lists) have to be executed. This accounts for the
“external” structure of the output:

Actions that contain a from should sound familiar to anyone used to
Dockerfiles: its goal is exactly the same, i.e. define the starting point
of a sequence of container layers. This action has type preparation.

After it, both have an action of type stroke, i.e. something that is
executed inside the container. This action is characterized by having
a pack field inside, which in our case specifies an “immediate program”
that will be executed in the container. There are many other ways of
providing what has to be executed in the container, as we will see in the
future.

Last, the bundle sketch also contains a closing action setting the tag
name for the image. This action is of type frame. Notice that build
does not have a corresponding one: we’re simply not interested in the
byproduct of that chain of action, or better we are not interested in
saving the resulting container as an image.

These action types should ring a bell about the metaphor that dibs
adopts: as our goal is to generate an image, we assemble one or more
sketches, in each of which we first prepare, then draw some strokes,
then frame the result if we are happy with it.

What Happened?

TL;DR each of the sketches build and bundle started from the same
image alpine:3.6 and executed a sequence of commands inside a container.
The program for build eventually saved the application with the compiled
modules in a cache staging area; the bundle sketch eventually resulted
in saving a container image that contains only the strict necessary for
running the program, without building tools.

If you look carefully at all operations, you will see that we indeed stick
to all requirements: the final image is not bloated with unnecessary
tools, compilation is done under user ada, as well as packing and
execution, the ENTRYPOINT and CMD are set right, …

As it is now, though, there’s little advantage over using a Dockerfile to
give to docker build:

it’s easier to provide the sequence of commands to be executed, because
you pass the text of a proper shell script instead of a single, long
escaped line to be fed to /bin/sh -c

the process is much heavier though, each run of the whole sequence
starts from scratch and does not reuse anything already done.

Make build More Efficient

dibs allows you to have direct control over caching, so there’s some
more work to do but it allows us to always keep control of things.

The build process can be divided into a few phases:

creation of user ada

installation of build tools

installation of pre-requisites specific to the program (which amounts to
nothing in our case)

compilation of modules

Our strategy will be to divide the build sketch into phases, and save
the places that we find interesting for reuse.

It’s easy to see what we’re doing here: the old build has been split
into two parts:

build-base, which does most of the heavylifting preparing everything
for the later compilation phase, but without doing it.

build now only strictly executes the compilation of modules and saves
stuff in the cache. Note that we don’t have to use the su - ada trick
to execute the build part as user ada, because dibs can execute that
part as user: ada.

Make build even more efficient

As it turns out, our program is not installing modules like crazy, so we
can use a bit of caching for it too. We already leveraged /tmp/cache as
a mechanism to let different actions communicate with each other (e.g.
to pass the compiled application from build to bundle), but nothing
prevents us from using it also across different builds:

In this way, we’re going also to reuse compilations by cpanm every time
the /tmp/as-ada.sh script is executed during the build phase, saving
more time.

Enhancing bundle

The bundle sketch can use some enhancements too, because it re-creates
the whole thing from scratch over and over, whereas use creation and
runtime installation might be factored out and cached, much like what
happened with build.

At this point, it’s also meaningful to think that the user creation
process might be factored out between build and bundle, as they have
the same goal. To do this, there are a few strategies:

build a base image for the runtime, then use it as the base for the
build base image

build a pre-base image with user creation only, then use that as base
image for the bundle base image and the build base image

factor the user creation process out and reuse it in different sketches.

We will look into the three alternatives in the following sub-sections,
but in all cases we end up with a new
sample-mojo-alien01-bundlebase:latest image that we will use as starting
point for bundling, like this:

The advantage of this approach is that is quite simple; the drawback is
that the bundle image might evolve in a direction that includes tools that
are not needed for building. This might be a problem or not depending on
circumstances; anyway, the build image is usually more bloated anyway, so
it should not be an issue in the average case.

Common Pre-Base Image

A more “normalized” way of doing things would be to factor common
operations like user creation in a single, simpler base image, then use it
as a pre-base for generating two separated build image and a bundle
image, which can then evolve independently:

The advantage of this approach is that it’s more visually clear what’s
going on in the preparation of each base image, because there’s less
referencing around to other base images.

Where Are We Now?

At this point, I guess, we’re somewhere very near to what a multistage
Dockerfile can do: we have a chain for building, another one
for packaging a lean final image with our application, and there’s some
caching to help us speed up things.

Now for the second reason why I wanted a tool like dibs: ease of
reuse. I already find dibs’s way of expressing actions inside
a container better than the lower-level RUN provided by a Dockerfile,
but this is just the tip of the iceberg… as we will discover in our next
post!

The complete configuration file for this stage can be found
here, considering the third alternative in the previous
section.