Batteries contributors recently started a new discussion on the Enum module, whose purpose is to provide an good intermediate representation for conversion between data-structures, but whose implementation is perceived as too complicated and not well-understood enough. Going back to fundamentals (forgetting about half the features of Enum), what is the ultimate intermediate traversal interface?

There are two obvious choices in an effectful setting: a generator, that is a "next-element" function of type (unit -> 'a) that generates the next element each time it is called, or an exception to signal end of input; or an iterator, a fold-function of type ('a -> unit) -> unit, that iterates a consumer function through all elements of the list.

Both are suitable as an abstract representation of any sequence (the second is precisely the basis for Simon Cruanes' Sequence library), and their are at opposite ends of the control spectrum: generators give control to the consumer of the sequence (she decides when to call the generator function), while iterators give control to the producer of the sequence (she decides when to call the iterated function). It's easy to transform a structure into a iter (to_iter), or to build a structure from a next (of_next), because you have the control. But as a library writer, you should also write the without-control conversions (of_iter and to_next), despite them being slower and harder to implement, so that your users can live the easy in-control life!

Finally, continuation capture is a well-known technique to invert control, letting you write code in "direct style", as if you were in control, when you're not. In this post, I will demonstrate how you can start from a conversion function that has control and, by systematic source-to-source transformations of conversion to continuation-passing-style (CPS) and defunctionalization, obtain a conversion function that works without the control.

Note that there is no support for predicting the size of the data structure: those will not be good interfaces for fixed-size structures such as array. We will use lists as a warm-ups, and then concentrate on binary trees as a structure with more complicated traversal and construction patterns.

Lists are no surprise. Both consumers, with or without control, are extremely simple to write. For producers, we would argue that to_iter is more convenient to express, and more elegant, than to_gen, but the difference is again small.

I did a bit of shallow performance testing, and without surprise the version with control was always faster than the version without control. I also tested the compositions of_gen (to_gen li) and of_seq (to_seq li) and the gen versions, with the simpler types, seems slightly faster -- but it's hard to get anything meaningful with code so simple.

A generator for binary trees

The real meat of this post is the work on binary trees. Iterating on binary trees is easy, and will serve a good example of continuation-passing-style (CPS) transform. Building trees from sequences is more subtle, and will make for a more sophisticated application of the same principles.

We use binary trees with data on the nodes, and empty leaves.

type 'a tree = Leaf | Nodeof 'a tree * 'a * 'a tree

Do you know the trick to create a tree of O(2^n) nodes in O(n) time? It's useless for this post, but I've been using it for performance evaluation, so I'll leave it here.

We wish to get a generator from a tree (to_gen), starting from the code of the to_iter function. iter has the control, it decides when to call the function f. A single (sub-)call to the iter function may call f one time, several times (recursively), or not at all. On the contrary, the difficulty in writing a generator is to understand how to advance precisely to the next element of the data-structure, one step at the time, when the user calls the generator.

Our first transformation will be to turn iter into continuation-passing-style. We have already covered this transformation in a previous blog series, as a tool to get tail-recursive implementations of any recursive functions, trading stack space for heap-allocated closures. The idea is to capture the "context" (what will be done to its return value) of each recursive call as an additional parameter of the recursive function.

Another way to see this transformation is to consider that we abstract on the "return address" of the function; we may call the continuation parameter return, and the function then reads | Leaf -> return (), etc.

The next transformation step is to remove the higher-order aspect associated to our representation of continuation as functions. We could in fact stop here, and derive a generator from this iterator in CPS, but using a first-order data-centric representation will give us an algorithm that is closer to the algorithm that people tend to write naturally -- and will also be a bit more efficient, but I won't insist on performance considerations.

The idea of defunctionalization is to collect all the different lambda-terms that are used in the function, and give each of them a name (a symbol, a piece of data). Instead of the lambda-term, we will only pass the name as the k parameter, and have an additional run function that, from the name, execute the relevant code (that was in the lambda-term).

In our code examples there are two lambda-terms: (fun () -> f x; iter_cps f right k) and (fun result -> result). We will call the first Left, as it marks the fact that we have already iterated on the left of the current tree, and the second one Stop as it marks the end of the computation, the last continuation ever passed, to get the final result. But the Left lambda-term does not only takes a parameter, it also captures the variable x, right and k that are in its scope, and need to be saved for when the continuation will be executed. So Left will be a constructor with three parameters, one for each of those variables:

The new function run takes a data representation of continuation, and runs the corresponding code. I want to emphasize that this transformation again requires no deep thinking, it is only a systematic source-to-source transformation.

You may have noticed that the type of our reified continuations, 'a kiter, looks very much like our 'a tree datatype. In fact, it is isomorphic to it (if we assume 'a kiter ≃ 'a tree, then Left looks very much like Node), so we can rewrite iter_cps_defun with no specific continuation type, using tree to store both data and continuations:

If you squint your eyes a bit, what you have in front of you is a purely functional realization of a well-known traversal technique called pointer reversal. We call iter with a pointer k to "the rest of the work to be done", and when we descend in the left subtree, we replace k with a version of the parent tree t where the left child points not to left anymore, but to k (the parent of t). Of course, pointer reversal are useful for in-place traversals, so this realization is of little practical interest for an persistent immutable tree type where the nodes are copied anyway. But it's still nice to see how, by systematic application of general applications, we reach a place that is already well-known.

Finally, this implementation of the traversal function is low-level enough to be turned into a generator function. The idea is that instead of passing the k argument recursively in an iterator, we can store it in a reference cell between calls to a generator.

This is the same code as before, except that instead of calling f x (that's what you do when you have the control), we store the current state of the traversal, and return x. The CPS transform is precisely what exposed this notion of "state of the traversal" in a way that is convenient to capture through calls.

So there is a cost to pay when inverting control, but in this case it is quite low. Remember than in general your running time will be dominated by what you want to do on each element of a structure, not its traversal itself, so measuring traversals only distort performance results. A 15% difference in a micro-benchmark means that users would most probably not notice the change.

Finally, I will remark that it is possible to revert control by using an off-the-shelf delimited continuation library. This is more general and modular, but may also be noticeably slower. I have tried doing that with Oleg's delightful Delimcc library, that is really not invasive (not requiring any change to the OCaml compiler or runtime); the code is funny (and possibly wrong, I'm not a Delimcc expert), but not competitive performance-wise because the continuation-capture operations are heavier, and not meant to be used at this granularity level.

This implements to_gen (that doesn't have control) in terms of to_seq (that requires control). The idea is that the iteration function that we feed to to_seq, when called, captures the current continuation, that is the "rest of the traversal", and returns it along with the element that was passed. Each time our generator is called, it reinstates the continuation to keep computing until the next element, where it gets suspended again.

Note that this definition is not at all specific to trees, it could be defined generically. Furthermore, we can use the exact same technique to implement of_seq in terms of of_gen.

This code snippet must be compiled with -rectypes because the type of the captured continuation next is equi-recursive: it represents the rest of a computation that, when called, returns a reified value representing the rest of a computation that... It would be possible to look at this recursive type in the eye and define an explicit recursive algebraic datatype for it, to get rid of -rectypes, but I was too lazy to do that.

Filling a tree

That was the two conversion functions from a tree, to_iter and to_gen. What about building a tree from iterator or generators?

First, there is a non-trivial question: how do you build a tree from a sequence? If you had a balanced search tree, you would have an add function and would most probably build the tree by folding over the sequence elements, add-ing one element at a time. But I want something that looks simpler (than maintaining search invariant), but that I actually found harder: fill the tree from left to right.

More precisely, I expect that giving a number of elements that is a power of two would return me a complete tree, with all leaves at the same heights, and that giving less elements would return me a tree with a complete left sub-tree, and a partially-filled right subtree.

Here is the implementation I got after a bit of tinkering. There may be other ways to implement this, and I'm open to suggestions of simpler approaches.

To get a version without control (of_iter), we should again turn these function into CPS form, and then defunctionalize the result. The CPS version is unsurprising; you don't need to read all the code, only look at the continuations following recursive calls.

Something interesting happens when doing defunctionalization. Currently, fill and fill_right are mutually recursive, and loop is defined separately afterwards. But when doing defunctionalization, I will represent all the lambda-abstractions of the module with a single type, and interpret them through a single function run which will be called whenever a continuation is invoked (you can think of run as an explicit marker for the application of a function, represented as first-order data, to its argument(s)).

This means that loop, where a continuation is applied in the if finished then ... branch, will need to call run, which will also be called from still and still_right. All four functions have to be defined as a single mutually recursive set of definitions. In other words, defunctionalization is not a modular transform, it must be defined at once as a "whole-program" transformation, operating on all functions sharing a common calling protocol.

From there, we will be able to derive our control-less conversion of_iter. There is a last sophistication: in the to_gen case, we only needed to call one function to get the next element, namely iter. Here, we must start by calling loop (to begin the construction of the tree), but the computation will need to be suspended after the next element is accessed (try Some (f ()) ...), and restart from there, where fill is called. So our mutable state will store not only the continuation argument to be passed to the next call, but also which function needs to be called next, loop or fill.

The amount of code may be a bit intimidating, but it is really only the previous version of the code, transformed to be called with a single element at a time (we still call it f to avoid naming conflict with already-used x), or None when the input ends. By careful transformations we have successfully turned a generator-taking function, that has control, into an iterator-taking function that doesn't have control.

There is a rather important amount of book-keeping involved in this turned-around definition, so it is no surprise that of_iter is substantially slower than of_gen. It seems clear to me, however, that both should be included in a library. Only providing the controlling of_gen means that the library user will have to code without control herself, leading to complicated code on her side (which is what your job, as a library writer, is meant to avoid), and less efficient code.

One way to improve performances for the without-control version would be to introduce buffering: instead of suspending computation after each element is obtained, we could run fill to consume the next N elements before the costly suspension happens.

This blog post is getting rather long so I'll keep this as a possible idea for the next, more advanced one. I'm not sure I will implement buffering, because I already have two more advanced remarks to make on the fill code and its CPS versions. One is how the intermediate CPS version suggests a different representation with multiple return points, and the other is how we can use GADTs to avoid the rather inelegant None -> assert false at the end of of_iter. Stay tuned!

All the code I've written so far is available on gitorious. It uses DelimCC, -rectypes and OCaml 4.00 (for the GADTs I just mentioned), but all these can be commented out, and the main body of the code is rather straightforward OCaml that I expect to compile and run anywhere.

Finally, if you are looking for research literature relevant to the techniques mentioned here, Olivier Danvy is the established master of systematic source-to-source transformations. He famously remarked, for example, how CPS-transforming then defunctionalizing a natural interpreter for a functional language gets you a corresponding abstract machine implementation of the language.

Defunctionalization is a program transformation that eliminates functions as first-class values. We show that defunctionalization can be viewed as a type-preserving transformation of an extension of with guarded algebraic data types into itself. We also suggest that defunctionalization is an instance of concretization, a more general technique that allows eliminating constructs other than functions. We illustrate this point by presenting two new type-preserving transformations that can be viewed as instances of concretization. One eliminates Rémy-style polymorphic records; the other eliminates the dictionary records introduced by the standard compilation scheme for Haskell's type classes.