Wednesday, December 25, 2013

Frustrations with namespaces in Clojure

I have a long laundry list of things about Clojure I find irritating. That shouldn’t be too surprising; I have a similar list for every language that I have used to a serious degree. Most of the things on my list are relatively minor gripes and, overall, Clojure remains one of the most enjoyable programming languages I’ve worked with. Namespaces, however, are one of the aspects of Clojure that cause me the greatest pain.

Namespaces are Clojure’s tool for preventing name collisions between files. The most common way to organize your files is with a one-namespace-per-file policy, where the namespace matches the folder-file hierarchy. For example puzzler.sudoku.grid would be the namespace for the grid.clj file in a puzzler/sudoku directory in the classpath. A namespace can require other namespaces, which ensures that the code in those namespaces is loaded and accessible.

It’s a simple enough scheme, but in practice, I find there are several problems with namespaces, as implemented in Clojure.

No circular dependencies

Most programs start off small. At the beginning, it’s pretty easy to break the program cleanly into a few files with a clear, acyclic dependency hierarchy. But then your program grows. And when that happens, Clojure’s inability to handle namespaces with circular dependencies will cause you pain, as you struggle to refactor your entire codebase.

Here are the most common scenarios where I’ve got bitten by this:

Namespace b depends on namespace a. I’m adding a function foo to a and realize that I would really benefit from using a function bar that already exists in b. What to do? One option is to move the function from b into a. If I do this, I have to make sure to also move over everything in b that bar uses in its implementation. Next, I need to check every one of my namespaces that are downstream from b and use any of the moved functions, and add a dependency on a, and update any references to b/bar (or other moved functions) to a/bar.

Another option is to try to figure out if there’s a way to move portions of code from b and a into a new namespace c, where both b and a will now depend on c. If you do this, you might be able to get away with moving only the underlying private helper functions. If you can avoid moving your public functions, that at least saves you the hassle of changing all the downstream consumers of the API to point to the new locations.

Keep in mind that none of the Clojure IDEs are sophisticated enough to help with intelligently handling these sorts of refactorings. The really sad part is that sometimes the hassle factor is so huge, I feel tempted to just copy and paste function bar over to a so it exists in both locations. It makes me want to curse at Clojure just for making me consider such a horrible thing.

Records and protocols cause tremendous problems with the circular dependency restriction. Naturally, your records and protocols are something you want to use throughout your project. So it makes sense to have a namespace such as myproject.types to put these common records and protocols in one place. For efficiency, it is common to implement some of the protocols inline in the record definitions. However, implementing the protocols can at times be very complex, so in those inline implementations, we want to be able to call helper functions that handle the complex implementation details. Namespaces are all about organization, so it’s reasonable to want to implement those helper functions in another namespace. But therein lies the problem. myproject.types depends on myproject.protocol-implementation-details which in turn depends on myproject.types (for example, if you want to implement push on a Stack you need to be able to return a new Stack, so you need that Stack record in scope).

A number of Clojure programmers have responded to this constraint by simply keeping the bulk of their code in one monolithic namespace. It is possible to split one namespace across multiple files, but then you’ve lost out on one of the desirable properties of namespaces — to know where to look for the definition of a function from the namespace/function reference.

No standard mechanism for re-providing required dependencies

The need for this manifests in several contexts:

Creating a public API relying on functions spread across other namespaces. The simplest technique is to just create new vars that refer to the ones in the other namespaces, for example, (def public-foo private/foo). The main problem is that public-foo metadata won’t match the metadata of private/foo.

The best library for dealing with this is potemkin, which handles the details of migrating the metdata over to the new var. This is an important enough issue, there should be a standard solution built into Clojure.

When dealing with a complex project built out of multiple libraries and namespaces, it’s easy to end up with massively long headers, listing out dozens of dependencies. We need some way to group related dependencies into something that can be conveniently required as a single entity.

In the early days of Clojure, Konrad Hinsen did some interesting work building some ns libraries supporting the notion of “namespace cloning” and other related ideas. I don’t know whether these tools are maintained these days, but again, it would be great to see a solution incorporated into Clojure.

In the discussion above about breaking up circular dependencies, I talked about how much of the pain revolves around needing to update all the references to a function that has been moved from one namespace to another. If there were a convenient way to leave behind a link to the new location, it wouldn’t be quite so painful. Again, potemkin is currently the way to deal with this.

No parameterized namespaces

My first two complaints shouldn’t come as any surprise to anyone who has built a complex project in Clojure. But this last point is more subtle.

To illustrate the point, let’s imagine your boss tasks you with writing a program to generate sudoku puzzles. As you code your sudoku generator, it’s obvious that certain numbers show up again and again. Remembering the rule-of-thumb “no magic constants”, you decide to create some definitions at the top of your namespace:

The problem here is that binding almost always leads towards more pain. Inevitably, you forget that, for example, generate-puzzles returns a lazy sequence, so when the sequence is realized, the binding no longer holds, and the vars revert back to their defaults, so you just get ten 9x9 puzzles.

binding simply isn’t a reliable way to parameterize your namespace, i.e., to customize your namespace’s “global variables” that all the functions rely upon.

How do other languages deal with this?

Classes can be thought of as a very richly-featured namespace. Even if you don’t use inheritance and mutable state, classes make great namespaces. In an OO language, the sudoku namespace could be a class, with section-size and symbols as properties/fields set by the class constructor. The functions in the class all “see” those fields.

This means that you can think of objects as special instances of this parameterized namespace. For example, (ab)using a mixture of java and clojure syntax, imagine:

There’s no good way to do this in Clojure. Really, your only option is to explicitly pass these parameters in and out of every function in your namespace. To do this, you’d need to change every single one of your functions to take an additional input of the form:

{section-size :section-size symbols :symbols:as grid-options}

and passing grid-options to every function call.

This also changes the calling convention for all downstream consumers of this API.

This is a disaster. The problem is that projects usually start off just like this, with a few “global definitions” that later turn into actual parameters as the project evolves. binding is not an adequate solution because it interacts poorly with laziness. Threading the values through the functions can be a major ordeal. We need some way to produce a namespace that is derived from another namespace with certain parameters set in a given way.

I’m not an ML expert, but I believe that ML functors are an example of parameterized namespaces in the functional language world. Perhaps Clojure could draw inspiration from this.

Summary

All of these issues have one thing in common: part of Clojure’s appeal is that it creates an environment where you can dive in and start creating without having every detail of your final design planned out. Clojure lets you start simply with small projects and evolve a more complex structure as necessary. However, the details of how namespaces work means that as your project evolves, I inevitably hit a point where new additions become painful to make due to extensive refactoring. I find that when this happens, I’m psychologically steered towards avoiding those new additions, and that’s no good.

In order to create an environment where projects scale more seamlessly from small to large, Clojure’s namespaces are desperately in need of richer features.