Wednesday, December 10, 2014

There are two stories I particularly like about the problem of
bootstrapping self-referential
tools. This one:

Another way to show that LISP was neater than Turing machines was
to write a universal LISP function and show that it is briefer and
more comprehensible than the description of a universal Turing
machine. This was the LISP function eval[e,a], which computes the
value of a LISP expression e - the second argument a being a list
of assignments of values to variables. (a is needed to make the
recursion work). Writing eval required inventing a notation
representing LISP functions as LISP data, and such a notation was
devised for the purposes of the paper with no thought that it
would be used to express LISP programs in practice. Logical
completeness required that the notation used to express functions
used as functional arguments be extended to provide for recursive
functions, and the LABEL notation was invented by Nathaniel
Rochester for that purpose. D.M.R. Park pointed out that LABEL was
logically unnecessary since the result could be achieved using
only LAMBDA - by a construction analogous to Church's Y-operator,
albeit in a more complicated way.

S.R. Russell noticed that eval could serve as an interpreter for
LISP, promptly hand coded it, and we now had a programming
language with an interpreter.

Then, one day, a student who had been left to sweep up after a
particularly unsuccessful party found himself reasoning in this way:
If, he thought to himself, such a machine is a virtual
impossibility, it must have finite improbability. So all I have to
do in order to make one is to work out how exactly improbable it is,
feed that figure into the finite improbability generator, give it a
fresh cup of really hot tea... and turn it on!

Rust is, I think, the
programming language I've been waiting for: it takes a large
proportion of the ideas I like from functional languages, applies
them in a systems programming language context with 0-overhead
abstractions, can run without a supporting runtime, and makes
"ownership" and "lifetime" into first-class concepts. In my work
trying to build highly complex and robust embedded systems, these
are exactly the features I've been waiting for, and
I'm thrilled to see a modern programming language take this
application space seriously.

Building the language on a second-class platform, though, was
intensely frustrating. The frustration came from several places,
including that my Unix systems-administration skills have gotten
(*ahem*) rusty, but I'd say the bulk of my difficulty came
from a single factor: circular dependencies in the tooling.

Building the Rust compiler requires a working Rust compiler.

Building the Cargo package-manager requires a working Cargo
package-manager.

These problems turned out to be ultimately surmountable:
The rustc compiler had been previously built for a
different version of FreeBSD, but once I re-compiled the required
libraries on my version, I managed to get the pre-built compiler
running. For the Cargo package manager, I ended up
writing a hacky version of Cargo
in Ruby, and using that to
bootstrap the "true" build. I'm glossing over it here, but this
turned out to be a lot of effort to create: Rust is evolving
rapidly, and it was difficult to track new versions
of Cargo (and its dependencies) that depended on new
versions of the Rust compiler, while
the Rust compiler
itself sometimes
did not work on my platform.

This was, necessarily, my first exposure to the platform in general,
and it was, unfortunately, not a positive experience for me.
Circular dependencies in developer tools are completely
understandable, but there should be a way to break the cycle. Making
bootstrap a first-order concern does not seem that difficult to me,
in these cases, and would greatly enhance the portability of the
tools (maybe even getting Rust to work on NetBSD, on GNU/Hurd, or on
other neglected platforms):

The dependency of the compiler on itself can be broken by
distributing the bootstrap compiler as LLVM IR code. Then use
LLVM's IR assembler, and other tools, to re-link the stage0 Rust
compiler on the target platform. The stage0 compiler could then be
used to build the stage1 compiler, and the rest of the build could
proceed as it does today. (Rust issue #19706)

The dependency of the package manager on itself can be broken by
adding a package manager target to, say, tar up the source code of
the package manager and its dependencies, along with a shell
script of build commands necessary to link all that source code
together to create the package manager executable. (Cargo issue #1026)

I am not suggesting that we should avoid circular dependencies in
developer tools: Eating our own dogfood is an extremely important
way in which our tools improve over time, and tools like
this should be capable of self-hosting. But the more
difficult it is to disentangle the circular dependency that results,
the more difficult it becomes to bootstrap our tools, which will
ultimately mean less people adopting our tools in new environments.
Most of us aren't
building Infinite
Improbability Drives. Bootstrapping should be a first-order
concern.