memoize.py: a build tool framework

I’ve, recently started using memoize.py,
as the core of my build system for a new project I’m working on. This
simplicity involved is pretty neat. Rather than manually needing to
work out the dependencies, (or having specialised tools for
determining the dependencies), with memoize.py, you
simply write the commands you need to build your project, and
memoize.py works out all the dependencies for you.

So, what’s the catch? Well, the way memoize.py works
is by using strace to
record all the system calls that a program makes during its
execution. By analyzing this list memoize.py can work out
all the files that are touched when a command is run, and then stores
this as a list of dependencies for that command. Then, the next time
you run the same command memoize.py first checks to see
if any of the dependencies have change (using either md5sum, or
timestamp), and only runs the command if any of the dependencies have
changed. So the catch of course is that this only runs on Linux (as
far as I know, you can’t get strace anywhere else, although that
doesn’t mean the same techniques couldn’t be used with a different
underlying system call tracing tool).

This technique is quite a radical difference to other tools which
determine a large dependency graph of the entire build, and then,
recursively work through this graph to fulfil unmet dependencies. As
a result this form is a lot more imperative, rather than declarative
style. Traditional tools (SCons, make, etc), provide a language which
allows you to essentially describe a dependency graph, and then the
order in which things are executed is really hidden inside the tool.
Using memoize.py is a lot different. You go through
defining the commands you want to run (in order!), and that is
basically it.

Some of the advantages of this approach are:

Easy to debug builds. You very easily see the order in which things run.

More obvious what is happening.

Single pass, no need to parse files, then later run the commands.

Gets the dependencies right! You don’t end up missing a dependencies because
your scanner failed to pick up a header, or you forgot to declare it. This
makes build much more reliable.

There are however some disadvantages:

Running commands through strace is slow, and parsing the output
of strace is even slower. This is to a certain
extent mitigated by not needing special commands for scanning files for
dependencies, and because it is one pass. This could probably be improved
by directly using ptrace to perform the system call tracing.

No parallel builds. Because there is no job executor running through a
dependency graph, it is harder to take advantage of parallel builds. This could
definitely be a show-stopper for large projects. Of course it might be possible
to explicitly define some jobs as parallel, which might mitigate this problem.

Now simple way to build just one target. If you have a very large build, and
you just want one target, which only needs a small subset of the target build,
then you are in trouble. Of course, it is possible to set up your build system
so that there are explicit targets, and you specify a subset of command to run,
but this must now be explicit, whereas the traditional approach gives you this
for free.

Linux only. It would be possible to handle this on other OSes if you have
a system call tracing mechanism, but for my current project, the compilers are
Linux only anyway, so I’m not too fussed. I did extend memoize.py
a little so that you could simply choose not to run strace. Obviously
you can’t determine dependencies in this case, but you can at least build the
thing.

As with may good tools in your programming kit, memoize.py is
available under a very liberal BSD style license, which is nice, because I’ve
been able to fix up some problems and add some extra functionality. In particular
I’ve added options to:

Select verbosity of output.

Provide a different string to print when running a command.

Option to skip using strace.

Tracked directory creation, as well as file creation.

Provide an option to force building (e.g: ignore the dependencies)

The patch and full file are
available. These have of course been provided upstream, so with any
luck, some or most of them will be merged upstream.

So, if you have a primarily Linux project, and want to try something
different to SCons, or make, I’d recommend considering memoize.py.