Of Monadic Fixpoints and Heap Offsets

Here at ICFP, sometimes the so-called “hallway track” is sometimes just as important as the ordinary track. Johan Tibell was wanting to avoid an out-of-line call to allocate function in GHC when a small array of statically known size was allocated. But he found the way that GHC's new code generator handles heap allocation a bit confusing, and so we skipped out of one session today to work it out. In this post, I would like to explain how the code generation monad figures out what the heap offsets in the code are, by way of a kind of cute (and also slightly annoying) trick involving a “monadic” fixpoint.

First, some background about the code generator. The big overall pattern of a function that GHC has to generate code for is something like:

This seems reasonable, but how does one go about actually generating this code? The code is generated in order, but the amount of heap that needs to be checked is not known until we've finished laying out the rest of the code. If we put on our mutation hats, we might say, “Well, leave it out for now, and then mutate it in when you know the actual value”, but there is still the knotty question of what the offsets should be when we are writing values to the heap. Notice that in the above code, we only bump the heap pointer once; if we repeatedly bump the heap pointer, then the offsets are easy to calculate, but we are wasting instructions; x86 addressing modes support writing to a register plus some offset directly.

Let’s take a look what GHC does when it allocates a dynamic closure to the heap (simplified):

So virtHp just marches upwards as we allocate things; it is, in effect, the contents of the Hp register in our inefficient, rebumping implementation.

Which leaves us with the pressing question, what is realHp? Well, it starts off as zero (since the offset of the real heap pointer is just zero), but once we bump the heap pointer to do the stack check, it is now precisely the amount of heap we did the heap check for. Calling back our example:

(Actually, internally the offsets are recorded as words, so, this being 64-bit code, divide everything by eight. BTW, virtHp + 8 == realHp, and that's where the off-by-one comes from.) The math is a little fiddly, but getHpRelOffset will calculate the offsets for you; you just have to make sure the virtual offset is right!

OK, but we still haven’t figured out how we get this magic number 40 from in the first place! The key is to look at the code generator responsible for doing the heap check, heapCheck, which is wraps the call to code, which is actually responsible for the code generation:

And here, we see the monadic fixpoint. In order to provide the heap usage to fcode, GHC writes itself a check: hp_hw. The check is borrowed from the result of generating fcode, and the string attached is this: “As long as you don’t cash this check before you finish generating the code, everything will be OK!” (It’s a bit like a big bank in that respect.) Cute—and we only need to do the code generation once!

This technique is not without its dark side. hp_hw is dangerous; if you force it in the wrong place, you will chunder into an infinite loop. There are two uses of this variable, both in compiler/codeGen/StgCmmLayout.hs, which are careful not to force it. What would be nice is if one could explicitly mark hp_hw as blackholed, and attach a custom error message, to be emitted in the event of an infinite loop. How this might be accomplished is left as an exercise for the reader.

BTW, in case you aren't aware, I've been live-tumblr'ing coverage of ICFP at http://ezyang.tumblr.com — the coverage is not 100%, and the editing is rough, but check it out!

3 Responses to “Of Monadic Fixpoints and Heap Offsets”

Nice post – one more piece of a puzzle of how heap checks and stack checks are generated by GHC. I still have a few lose ends from my MSR internship and most of them are about investigating stack checks and heap checks.

After a moments thought, I don’t understand why you call that construction a “monadic fixpoint”. Could you explain that? To me this is a mutual recursion, a bit similar to “backwards state monad” presented by Wadler in “The essence of functional programming”. A true monadic fixpoint (the “mfix” function) can be found for example in cmmLayoutStack function in CmmLayoutStack module.