brain food for hackers

Tail Calls, Optimization, and ES6

May 23rd, 2014

In this penultimate post about the stack, we take a quick look at tail
calls, compiler optimizations, and the proper tail calls landing in the
newest version of JavaScript.

A tail call happens when a function F makes a function call as its final
action. At that point F will do absolutely no more work: it passes the ball to
whatever function is being called and vanishes from the game. This is notable
because it opens up the possibility of tail call optimization: instead of
creating a new stack frame for the function call, we can simply reuseF’s stack frame, thereby saving stack space and avoiding the work involved in
setting up a new frame. Here are some examples in C and their results compiled
with mild optimization:

intadd5(inta){returna+5;}intadd10(inta){intb=add5(a);// not tailreturnadd5(b);// tail}intadd5AndTriple(inta){intb=add5(a);// not tailreturn3*add5(a);// not tail, doing work after the call}intfinicky(inta){if(a>10){returnadd5AndTriple(a);// tail}if(a>5){intb=add5(a);// not tailreturnfinicky(b);// tail}returnadd10(a);// tail}

You can normally spot tail call optimization (hereafter, TCO) in compiler output
by seeing a jump instruction where a call would have been
expected. At runtime TCO leads to a reduced call stack.

A common misconception is that tail calls are necessarily
recursive. That’s not the case: a tail call may be recursive,
such as in finicky() above, but it need not be. As long as caller F is
completely done at the call site, we’ve got ourselves a tail call. Whether it
can be optimized is a different question whose answer depends on your
programming environment.

“Yes, it can, always!” is the best answer we can hope for, which is famously the
case for Scheme, as discussed in SICP (by the way, if when you program
you don’t feel like “a Sorcerer conjuring the spirits of the computer with your
spells,” I urge you to read that book). It’s also the case for Lua. And
most importantly, it is the case for the next version of JavaScript, ES6, whose
spec does a good job defining tail position and clarifying the few
conditions required for optimization, such as strict mode. When
a language guarantees TCO, it supports proper tail calls.

Now some of us can’t kick that C habit, heart bleed and all, and the answer
there is a more complicated “sometimes” that takes us into compiler optimization
territory. We’ve seen the simple examples above; now let’s resurrect
our factorial from last post:

So, is line 11 a tail call? It’s not, because of the multiplication by n
afterwards. But if you’re not used to optimizations, gcc’s
result with O2 optimization might shock you: not only it
transforms factorial into a recursion-free loop, but the
factorial(5) call is eliminated entirely and replaced by a
compile-time constant of 120 (5! == 120). This is why debugging optimized
code can be hard sometimes. On the plus side, if you call this function it will
use a single stack frame regardless of n’s initial value. Compiler algorithms
are pretty fun, and if you’re interested I suggest you check out
Building an Optimizing Compiler and ACDI.

However, what happened here was not tail call optimization, since there was
no tail call to begin with. gcc outsmarted us by analyzing what the function
does and optimizing away the needless recursion. The task was made easier by
the simple, deterministic nature of the operations being done. By adding a dash
of chaos (e.g., getpid()) we can throw gcc off:

Optimize that, unix fairies! So now we have a regular
recursive call and this function allocates O(n) stack
frames to do its work. Heroically, gcc still does TCO for getpid
in the recursion base case. If we now wished to make this function tail recursive,
we’d need a slight change:

The accumulation of the result is now a loop and we’ve
achieved true TCO. But before you go out partying, what can we say about the
general case in C? Sadly, while good C compilers do TCO in a number of cases,
there are many situations where they cannot do it. For example, as we saw in our
function epilogues, the caller is responsible for cleaning up the
stack after a function call using the standard C calling convention. So if
function F takes two arguments, it can only make TCO calls to functions taking
two or fewer arguments. This is one among many restrictions. Mark Probst wrote
an excellent thesis discussing Proper Tail Recursion in C where he
discusses these issues along with C stack behavior. He also does
insanely cool juggling.

“Sometimes” is a rocky foundation for any relationship, so you can’t rely on TCO
in C. It’s a discrete optimization that may or may not take place, rather than
a language feature like proper tail calls, though in practice the compiler
will optimize the vast majority of cases. But if you must have it, say for
transpiling Scheme into C, you will suffer.

Since JavaScript is now the most popular transpilation target, proper tail calls
become even more important there. So kudos to ES6 for delivering it along with
many other significant improvements. It’s like Christmas for JS programmers.

This concludes our brief tour of tail calls and compiler optimization. Thanks
for reading and see you next time.