Tuesday, February 28, 2012

While developing Shake I investigated many existing build tools. Build tools can be divided into two categories -- those which target single-language projects with fixed rules (e.g. ocamlbuild, ghc --make, Visual Studio projects), and those which allow user specified rules (e.g. make and Shake). Focusing on the second category, the defacto standard is make, but there are many make competitors (notably Ant, CMake, Jam, Scons and Waf). Most of these tools read a list of rules, generate a dependency graph, then execute commands while traversing that graph.

Since the number of build tools is vast, I will restrict my discussion to four build tools which take different approaches (Redo, Ninja, Tup and Fabricate). Interestingly, one thing all four systems have in common is that they require a database of build data, in addition to the rules and the file system. Unlike Shake, all these build systems are limited to files.

The Redo build system has a similar dependency theory to Shake. Rules are run starting at the target. A rule may call redo-ifchange (similar to need) to ensure that this rule is repeated if any of the file arguments change. A rule can build either a specific named file, or a set of files ending with a particular extension.

While Redo has similarities to Shake, the practical implementation is significantly different. Instead of a single rule store, Redo stores each rule in a separate file, and the script language is simply shell script. The advantage of separate files is that Redo is able to depend on the actual rule used to build a result, meaning that build system changes are properly tracked. However, separating build rules makes it harder to reason about the build system, and eliminates many potential uses of abstraction. Redo does not work on Windows and has no support for multiple outputs.

The Ninja build system is designed as a two-stage build system -- users specify their build rules in a high-level manner, which is then translated to a set of Ninja build rules. As a result, the Ninja build system is not designed to be general purpose and configuration choices are expected to be resolved by the first level. The Ninja target language supports three dependency features beyond make. Firstly, a rule can depend on the list of files contained in another file, allowing additional dependencies at build time. Secondly, the command line for each rule is tracked, resulting in a rebuild if the rule itself changes. Thirdly, a rule can generate multiple outputs, which are properly tracked.

The Tup build system is designed as an incremental build system. Tup has a similar dependency structure to make, but a significantly different implementation. Instead of scanning all dependencies, it expects the operating system to supply a list of changed files, avoiding the overhead of checking which files have changed. For large build systems the result can be a significant speed improvement if there are only a few files to rebuild. We believe a similar implementation strategy could be applied to Shake.

Another difference from make is the treatment of dead build results. If a rule to build foo is deleted from the rule list, then Tup automatically deletes the file foo. The problem of dead build results is serious, resulting in builds succeeding that should have failed, and that will fail as soon as a clean build is performed (to reduce this risk, we suggest an overnight build which starts from scratch). However, it is often useful to have build modes which generate skeleton files which are then modified by the user -- deleting these files would be most unwelcome. It would be easy to add support for deleting dead build results to Shake, but we choose not to.

The key innovation in the Fabricate build system is that dependencies do not need to be stated explicitly. A build system is a Python program, which primarily executes system commands in order. While executing the commands, Fabricate uses system tracing (strace on Linux) to record which files are accessed. In future runs, if the same system command is reached but none of the files it used have changed, the command is skipped. The resulting build systems are simple, and avoid the difficulties of correctly specifying dependencies.

There are two inherent difficulties for build systems without explicit dependencies. Firstly, the system tracing mechanisms on different platforms are varied, and on Windows are somewhat fragile. Secondly, parallelism cannot be inferred automatically -- Fabricate requires explicit grouping annotations to use parallelism.

Saturday, February 11, 2012

Summary: I have just released Shake, a library that allows you to write build systems in Haskell (think Make, but much better).

At the Haskell Implementors Workshop 2010 I described a Haskell build system named Shake (video here), hoping an implementation would be available "soon". Better late than never, I am delighted to announce that Shake is now available on Hackage. This version is a from scratch reimplementation, based on an improved underlying theory, all completed in my spare time. Several users have already experimented with this version, so it is reasonably robust.

A Simple Example

As a simple example of a Shake build system, let us build the file result.tar from the files listed by result.txt:

We start by importing the modules defining both Shake and routines for manipulating FilePath values. We define main to call shake with the default shakeOptions. As the second argument to shake, we provide a set of rules. There are two common forms of rules, want to specify target files, and *> to define a rule which builds a file pattern. We use want to require that after the build completes the file result.tar should be ready.

The *.tar rule describes how to build files with the extension .tar, including result.tar. We readFileLines on result.txt, after changing the .tar extension to .txt. We read each line into the variable contents -- being a list of the files that should go into result.tar. Next, we depend (need) all the files in contents. If any of these files change, the rule will be repeated. Finally we call the tar program. If either result.txt changes, or any of the files listed by result.txt change, then result.tar will be rebuilt.

If you are interested in writing a build system using Shake, I suggest you read the documentation. One thing noted in the documentation is that if ghc --make or cabal is capable of building your project, use that instead. Custom build systems are necessary for many complex projects, but many projects are not complex.

As an example of how to do this task in Make, see this StackOverflow question. I'll cover the differences and similarities with other build tools in a forthcoming blog post.