ASM.OCaml

As you may know, there is a subset of Javascript that compiles efficiently to assembly used as backend of various compilers including a C compiler like emscripten. We’d like to present you in the same spirit how never to allocate in OCaml.

Before starting to write anything, we must know how to find if a code is allocating. The best way currently is to look at the Cmm intermediate representation. We can see it by calling `ocamlopt` with the `-dcmm` option:

To improve readability, in this post we will clean a bit the variable names:

(function f (x: val) (alloc 2048 x x))

We see that the function f (named `camlTest__f_4`) is calling the `alloc` primitive, which obviously is an allocation. Here, this creates a size 2 block with tag 0 (2048 = 2 << 10 + 0) and containing two times the value `x_6/1204` which was `x` is the source. So we can detect if some code is allocating by doing `ocamlopt -c -dcmm test.ml 2>&1 | grep alloc` (obviously any function or variable named alloc will also appear).

It is possible to write some code that don’t allocate (in the heap) at all, but what are the limitations ? For instance the omnipresent fibonacci function does not allocate:

You can notice that allocation of the reference disappeared. The modifications were replaced by assignments (the `assign` operator) to the result variable. This transformation can happen when a reference is never used anywhere else than as an argument of the ! and := operator and does not appear in the closure of any local function like:

Unboxing

The float, int32, int64 and nativeint types do not fit in the generic representation of values that can be stored in the OCaml heap, so they are boxed. This means that they are allocated and there is an annotation to tell the garbage collector to skip their content. So using them in general will allocate. But an important optimization is that local uses (some cases that obviously won’t go in the heap) are ‘unboxed’, i.e. not allocated.

If/match couple

Some 4.03 change also improve some cases of branching returning tuples

Control flow

You can do almost any control flow like that, but this is quite
unpractical and is still limited in many ways.

If you don’t want to write everything as for and while loops, you can
write functions for your control flow, but to prevent allocation you
will have to refrain from doing a few things. For instance, you should
not pass record or tupple as argument to functions of course, you
should pass each field separately as a different argument.

But what happens when you want to return multiple values ? There is
some ongoing project to try to optimise the allocations of some of
those cases away, but currently you can’t. Really ? NO !

Returning multiple values

If you bend a bit your mind, you may see that returning from a
function is almost the same thing as calling one… Or you can make it
that way. So let’s transform our code in ‘Continuation Passing Style’

For instance, let’s write a function that finds the minimum and the maximum of a list. That could be written like that:

Continuation Passing Style

Transforming it to continuation passing style (CPS) replace every function return by a tail-call to a function representing ‘what happens after’. This function is usually called a continuation and a convention is to use the variable name ‘k’ for it.

Let’s start simply by turning only the keep_min_max function into continuation passing style.

Here instead of calling f then recursively calling fold_left, we prepare what we will do after calling f (that is calling fold_left) and then we call f with that continuation. find_min_max is unchanged and still has the same type.

But we can continue turning things in CPS, and a full conversion would result in:

Where rectypes matter for performance reasons

That’s nice, we only have tail calls now, but we are not done removing allocation yet of course. We now need to get rid of the allocation of the closure in fold_left_k and of the couples in keep_min_max_k. For that, we need to pass everything that should be allocated as argument:

So we can turn return points into call points and get rid of a lot of potential allocations like that. But of course there is no way to handle functions passing or returning sum types like that ! Well, I’m not so sure.

Combining that with the CPS transformation can get you quite far without allocating !

Manipulating Memory

Now that we can manage almost any control flow without allocating, we need also to manipulate some values. That’s the point where we simply suggest to use the same approach as ASM.js: allocate a single large bigarray (this is some kind of malloc), consider integers as pointers and you can do anything. We won’t go into too much details here as this would require another post for that topic.

For some low level packed bitfield manipulation you can have a look at some more tricks

Conclusion

So if you want to write non allocating code in OCaml, turn everything in CPS, add additional arguments everywhere, turn your sum types in unboxed GADTs, manipulate a single large bigarrays. And enjoy !

Post navigation

3 thoughts on “ASM.OCaml”

Thank you for this attractive and informative post.
Just to be sure, is it not ‘t’ rather than ‘l’ that must be past to the fold_left function?
You said “we only have tail calls now” but I don’t see any none tail calls in the first place, am I wrong?

Interesting article, but i have one question. Can we say, from the proof theory point of view, that turning the code in CPS style not to allocate is just an application of the Gentzen’s cut-elimination theorem ?

I explain in more details this interpretation : if we have a proof P1 of the proposition A and a proof P2 of the proposition A ⇒ B, we can produce a proof P3 of proposition B by applying the cut rule or modus ponens, but the theorem says that we can eliminate the use of cut rule and produce a direct proof P4 of the proposition B. But modus ponens (or cut rule) is just the rule for typing function application : if f has type ‘a -> ‘b and x has type ‘a then f x has type ‘b. And so the cut-elimination theorem says that we can produce an object of type ‘b without allocate an object of type ‘a (this is not necessary to produce the P1 proof, or more exactly this is not necessary to put the P1’s conclusion in the environment in order to use it as a premise of the P2 proof ). Am I right ?

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.