As rustpkg is
still in its infancy, most Rust code tends to be built with make, other tools,
or by hand. I've been working on updating Servo's build system to something a
bit more reliable and fast, and so I've been giving a lot of thought to build
tooling with regards to Rust.

In this post, I want to cover what the current issues are with building Rust
code, especially with regards to external tooling. I'll also describe some
recent work I did to address these issues. In the future, I want to cover
specific ways to integrate Rust with a few different build tools.

Current Issues

Building Rust with existing build tools is a little difficult at the
moment. The main issues are related to Rust's attempt to be a better systems
language than the existing options.

For example, Rust uses a larger compilation unit than C and C++ compilers, and
existing build tools are designed around single file compilation. Rust
libraries are output with unpredictable names. And dependency information must
be done manually.

Compilation Unit

Many programming languages compile one source file to one output file and then
collect the results into some final product. In C, you compile .c files to
.o files, then archive or link them into .lib, .a, .dylib, and so on
depending on the platform and whether you are building an executable, static
library, or shared library. Even Java compiles .java inputs to one or more
.class outputs, which are then normally packaged into a .jar.

In Rust, the unit of compilation is the crate, which is a collection of
modules and items. A crate may consist of a single source file or an arbitrary
number of them in some directory hierarchy, but its output is a single
executable or library.

Using crates as the compilation unit makes sense from a compiler point of
view, as it has more knowledge during compilation to work from. It also makes
sense from a versioning point of view as all of the crate's contents goes
together. Using crates as the compilation unit allows for cyclic dependencies
between modules in the same crates, which is useful to express some things. It
also means that separate declaration and implementation pieces are not needed,
such as the header files in C and C++.

Most build tools assume a model similar to that of a typical C compiler. For
example, make has pattern rules that can take and input to and output based on
on filename transformations. These work great if one input produces one
output, but they don't work well in other cases.

Rust still has a main input file, the one you pass to the compiler, so this
difference doesn't have a lot of ramifications when using existing build
tools.

Output Names

Compilers generally have an option for what to name their output files, or
else they derive the output name with some simple formula. C compilers use the
-o option to name the output; Java just names the files after the classes
they contain. Rust also has a -o option, which works like you expect, except
in the case of libraries where it is ignored.

Libraries in Rust are special in order to avoid naming collisions. Since
libraries often end up stored centrally, only one library can have a given
name. If I create a library called libgeom it will conflict with someone
else's libgeom. Operating systems and distributions end up resolving these
conflicts by changing the names slightly, but it's a huge annoyance. To avoid
collisions, Rust includes a unique identifier called the crate hash in the
name. Now my Rust library libgeom-f32ab99 doesn't conflict with
libgeom-00a9edc.

Unfortunately, the current Rust compiler computes the crate hash by hashing
the link metadata, such as name and version, along with the link metadata of
its dependencies. This results in a crate hash that only the Rust compiler is
realistically able to compute, making it seem pseudo-random. This causes a
huge problem for build tooling as the output filename for libraries in
unknown.

To work around this problem when using make, the Rust and Servo build systems
use a dummy target called libfoo.dummy for a library called foo, and after
running rustc to build the library, it creates the libfoo.dummy file so
that make has some well known output to reason about. This workaround is a bit
messy and pollutes the build files.

Here's an
example
of what a Makefile looks like with this .dummy workaround:

While this works, it also has some drawbacks. For example, if you edit a file
during a long compile, the libfoo.dummy will get updated after the compile
is finished, and rerunning the build won't detect any changes. The timestamp
of the input file will be older than the final output file that the build tool
is checking. If the build system knew the real output file name, it could
compare the correct timestamps, but that information has been locked inside
the Rust compiler.

Dependency Information

Build systems need to be reliable. When you edit a file, it should trigger the
correct things to get rebuilt. If nothing changes, nothing should get
rebuilt. It's extremely frustrating if you edit a file, rebuild the library,
and find that your code changes aren't reflected in the new output for some
reason or that the library is not rebuilt at all. Reliable builds need
accurate dependency information in order to accomplish this.

There's currently no way for external build tools to get dependency
information about Rust crates. This means that developers tend to list
dependencies by hand which is pretty fragile.

One quick way to approximate dependency info is just to recursively find every
*.rs in the crate's source directory. This can be wrong for multiple reasons;
perhaps the include! or include_str! macros are used to pull in files that
aren't named *.rs or conditional compilation may omit several files.

This is similar to dealing with header dependencies by hand when working with
C and C++ code. C compilers have options to generate dependency info to deal
with this, which used by tools like CMake.

The price of inaccurate or missing dependency info is an unreliable build and
a frustrated developer. If you find yourself reaching for make clean, you're
probably suffering from this.

Making It Better

It's possible to solve these problems without sacrificing the things we want
and falling back to doing exactly what C compilers do. By making the output
file knowable and handling dependencies automatically we make make build tool
integration easy and the resulting builds reliable. This is exactly what I've
been working on the last few weeks.

Stable and Computable Hashes

The first thing we need is to make the crate hash stable and easily computable
by external tools. Internally, the Rust compiler uses
SipHash to compute the crate hash, and takes
into account arbitrary link metadata as well as the link metadata of its
dependencies. SipHash is not something easily computed from a Makefile and
the link metadata is not so easy to slurp and normalize from some dependency
graph.

I've just landed a pull request
that replaces the link metadata with a package identifier, which is a crate
level attribute called pkgid. You declare it like
#[pkgid="github.com/mozilla-servo/rust-geom#0.1"]; at the top of your
lib.rs. The first part, github.com/mozilla-servo, is a path, which serves
as both a namespace for your crate and a location hint as to where it can be
obtained (for use by rustpkg for example). Then comes the crate's name,
rust-geom. Following that is the version identifier 0.1. If no pkgid
attribute is provided, one is inferred with an empty path, a 0.0 version, and
a name based on the name of the input file.

To generate a crate hash, we take the SHA256 digest of the pkgid
attribute. SHA256 is readily available in most languages or on the command
line, and the pkgid attribute is very easy to find by running a regular
expression over the main input file. The first eight digits of this hash are
used for the filename, but the full hash is stored in the crate metadata and
used as part of the symbol hashes.

Since the crate hash no longer depends on the crate's dependencies, it is
stable so long as the pkgid attribute doesn't change. This should happen
very infrequently, for instance when the library changes versions.

This makes the crate hash computable by pretty much any build tool you can
find, and means rustc generates predictable output filenames for libraries.

Dependency Management

I've also got a pull request,
which should land soon, to enable rustc to output make-compatible dependency
information similar to the -MMD flag of gcc. To use it, you give rustc the
--dep-info option and for an input file of lib.rs it will create a lib.d
which can be used by make or other tools to learn the true dependencies.

Now it will notice when you change any of the .rs files without needed to
explicitly list them, and this will get updated as your code changes
automatically. A little Makefile abstraction on top of this can make it
quite nice and portable.

Next Up

In the next few posts, I'll show examples of integrating the improved Rust
compiler with some existing build systems like make,
CMake, and tup.

Buy my book

About the author

I'm a hacker and entrepreneur based in Albuquerque, New
Mexico. I have founded several startups built on XMPP
technology including Collecta, a real-time search engine for
the Web, and Chesspark, a real-time, multi-user gaming
platform. You can learn more about me on the about page.