Better namespaces through module aliases

(OCaml 4.02 is entering a feature freeze, which makes it a good time unfto stop
and take a look at what to expect for this release. This is part of a series of
posts where I’ll describe the features that strike me as notable. This is part
2.)

OCaml has a bit of a namespace problem.

In particular OCaml has no good way of organizing modules into packages. One
sign of the problem is that you can’t build an executable that has two modules
with the same module name. This is a pretty awkward restriction, and it gets
unworkable pretty fast as your codebase gets bigger

Other than just prefixing all of your module names with a package name (e.g.,
Core_kernel_list, Core_kernel_int, Core_kernel_array, etc. It gets old
fast.), the only solution right now is something called packed modules. OCaml
can pack a collection of individual module into a single synthetic “packed”
module. Importantly, different packs included in the same executable are allowed
to contain modules of the same name.

In practice, a packed moule is a lot like what you’d get it you named all of
your modules distinctly, and then used a single module to packs together all
your other modules, giving them shorter and more usable names in the process.
Thus, for Core_kernel, we could name all our modules uniquely, and then
provide a single renaming module to allow people to use those modules
conveniently, like this:

In the above, List refers to Core_kernel’s list, not the List module that
ships with the compiler. The longer names would only show up within the
Core_kernel package.

Packed modules basically automate this process for you, with the one improvement
that you get to use the short names within the package your building as well as
outside of it.

We use packed modules extensively at Jane Street, and they’ve been a real help
in organizing our large and complex codebase. But packs turn out to be highly
problematic. In particular, they lead to three distinct problems.

slow compilation of individual files

large executable sizes

coarse dependency tracking, leading to slow incremental rebuilds.

The slow compilation of individual files comes from the cost of interacting with
a large module like Core_kernel. Core_kernel is large because it effectively
contains a full copy of every module in the Core_kernel package. That’s
because a line like this:

moduleList=Core_kernel_list

doesn’t simply make Core_kernel.List an alias to Core_kernel_list; it makes
a full copy of the module. Indeed, the above line is equivalent to the
following.

moduleList=structincludeCore_kernel_listend

Packed modules also increase your executable size, since OCaml includes code at
the compilation unit granularity. Because packed modules are compilation
units, referring to even a single module of Core_kernel requires you to link
all of Core_kernel into your executable.

The coarse dependency problem has to do with the fact that a packed module
depends on all the modules that are included in it, and so once you depend on
anything in the pack, you depend on everything there. For us, that means that
changing a single line of the most obscure module in Core_kernel will cause us
to have to rebuild essentially our entire tree.

Module aliases, along with a few related improvements to the compiler, let us
work around all of these problems. In particular, in 4.02, the following
statement

moduleList=Core_kernel_list

is in fact an alias rather than a copy. This means that opening Core_kernel
would only introduce a bunch of aliases, which does not require a lot of work
from the compiler.

Executable size will be improved because we’ll be able to move to having a
package be structured as a module containing a set of aliases, rather than as a
pack. That means we no longer have a single large compilation unit for the
entire package, and so, using some improved dependency handling in the compiler,
we can link in only the modules that we actually use.

Finally, the dependency-choke-point problem will be fixed by having a tighter
understanding of dependencies. In particular, the fact that I depend on
Core_kernel, which contains a collection of aliases to many other modules like
Core_kernel_list or Core_kernel_array, doesn’t mean I truly depend on all
those modules. In particular, if I don’t use (and so don’t link in)
Core_kernel_array, then I don’t need to recompile when `Core_kernel_array
changes.

Module aliases have other uses, in particular having to do with changes to the
semantics of functors. But for us, the change to compilation speed and
executable size are the big story.