In this code, the call to last is considered a tail call because there are no further instructions to execute once the call is made.
The reason tail calls are interesting has to do with the way functions are typically called. When you make a function call, the compiler generates code to set up the environment in which the function executes. This usually involves pushing arguments onto the call stack, passing information about where to return to, and setting up a stack frame for local variables. When the function finishes executing, the stack frame is cleaned up and control is returned to the caller. But when the function ends with a tail call, the compiler can skip some of the work involved. For instance, the calling function’s stack frame is no longer needed, so its space can be reused by the called function. And when making the tail call, the place for the subroutine to return to is the caller of the function, not the function itself (in our example, we don’t need to return to first because we can return to whatever called first).

Let’s look at this in more concrete terms. Unoptimized assembly for the first function might look like this:

// Set up the stack frame for first
push ebp
mov ebp, esp
sub esp, 4 // Allocates four bytes of local variable space for j
// Save the callee-saved registers on the stack (we will restore them when done)
push ebx
push esi
push edi
// Get the value of the parameter i into eax
mov eax, [ebp+4]
// Call the function second
push eax
call second
// The return value lives in eax, so assign that to our local variable j
mov [ebp-4], eax
// Get the value of our local variable j into eax
mov eax, [ebp-4]
// Call the function last
push eax
call last
// Clean up the function and return
pop edi
pop esi
pop edx
mov esp, ebp
pop ebp
ret 0

If we were to optimize just the tail call part of the code, our assembly would look like this:

// Set up the stack frame for first
push ebp
mov ebp, esp
sub esp, 4 // Allocates four bytes of local variable space for j
// Save the callee-saved registers on the stack (we will restore them when done)
push ebx
push esi
push edi
// Get the value of the parameter i into eax
mov eax, [ebp+4]
// Call the function second
push eax
call second
// The return value lives in eax, so assign that to our local variable j
mov [ebp-4], eax
// Get the value of our local variable j into eax
mov eax, [ebp-4]
// Call the function last with tail call optimization
mov [ebp+4], eax // Put EAX in the parameter space of the stack frame
jmp last // Unconditional jump to last

There are a few things to notice about the optimized code. For starters, when calling last, no new data is pushed onto the stack — we’re simply reusing the stack space for first since it is no longer needed. (This isn’t always possible to do and is an optimization that’s unlikely to appear in anything other than hand-tuned code.) Also, we are using an unconditional jump to call last instead of using the call instruction. The call instruction does two things; it saves the current location to the stack so when the ret instruction is used, it goes back to the caller, and then it does an unconditional jump to the address given. With a tail call, we don’t need to save the location to jump back to — the function will use the ret instruction like normal and return to the caller of first instead of to first itself. This is an optimization because it removes the push to the stack (saving space and time) and we don’t need to jump back to first just to jump back to who called first, also saving time.

At this point, you might be thinking “whoa, tail calls are amazing, I should be using them everywhere!” Alas, but they don’t provide a considerable benefit to most code because there are so few cases where they crop up organically in reality. However, tail calls are amazing enough that you may want to structure your code to take advantage of them in certain circumstances. When you use tail calls as part of recursion, they provide considerable benefit. In fact, they can change the space requirements from linear (O(n)) to constant (O(1))! When coupled with recursion, tail calls can be truly powerful.

Both of these functions do exactly the same thing, but in slightly different ways — they take the initial value, increment it and return the new value. However, only one of these crashes with optimizations turned on! recur2 will crash because a new stack frame must be allocated each time the function recurses due to modifying the returned value. So the call will eventually exhaust the stack space and crash. However, recur will not crash because it is tail call recursive, meaning that no new stack frames will be allocated to call it. So it is effectively just an infinite loop like while(1) or for(;;)!

Before we go into application of this optimization, let’s recap what the requirements are for it and see some more examples. In order for something to be eligible for tail call optimization the last instruction executed before returning to the caller must be a method call. That can sometimes take surprising shapes:

// Eligible for TCO because there is no code following
// the calls to bar or baz (it's an implied return)
void foo( int i ) {
if (i % 2)
bar();
else
baz();
}
// Not eligible for TCO because code must execute
// after the call to baz
int foo( int i ) {
if (i % 2)
bar();
int j = baz();
return j + i;
}
// Eligible for TCO because there is no code
// following the call to printf, though it is
// possible that TCO will be unavailable due to
// the varargs nature of printf
void foo( int i ) {
::printf( "%d: %s\n", i, bar( i ) );
}

Now that you have a better grasp on what tail calls look like, let’s talk about one particular problem that gets considerable benefit from it. When writing a compiler, there are two major ways to write the parser for the language. One is using a table-driven approach, usually by using a parser generator application such as bison or yacc. The other is by writing a recursive descent parser by hand. Both approaches have their pros and cons, but one specific con of a recursive descent parser is that they can be very expensive in terms of space requirements. It’s not uncommon to see call stack depths several dozen calls deep, depending on the language being parsed. This is a prime case where tail call recursion can be a considerable optimization. To this end, if you are writing a recursive descent parser you would do well to have the exit for your functions either return a function call, or return a call to create an AST node.

All of these functions are theoretically eligible for tail call optimization because there is no code that needs to execute past the end of the function calls. But does theory and practice can be two different things, so how do they stack up with the major compilers?

I tried this sample code in both Visual Studio 2010 and gcc (4.6.2) with full optimizations turned on, and they both did reasonably well.

So both MSVC, gcc and clang all do a reasonable job of performing tail call optimizations which would yield great benefits for a recursive descent parser. Oddly enough, it turns out that ICC does not perform tail call optimizations on this code, and I’m not the only one to have problems. However, Intel does claim they support tail call optimizations as of ICC 7.07 (March 2008).

<aside>
I noticed something very strange with the call to operator new generated by all of the compilers. You’ll notice that there’s a test for returning null, and if it’s non-null, it dereferences the pointer and assigns zero. This is baffling to me as there’s nothing in the C++ specification that I could find which suggests this should happen. If operator new is of the throwing variety (which it is by default for all four compilers), then the call to operator new will throw instead of returning null, so there’s no need to check for null (which is why clang elides the check). But once the call to operator new has returned, why set the structure member to zero with optimizations turned all the way up?</aside>

So now you know a bit more about what tail calls are, how they can be optimized, and why you may want to use them. Hopefully this was informative! It was certainly an interesting topic for me to write about.

Your email address will not be published. Required fields are marked *

Comment

Name *

Email *

Website

Who

Aaron Ballman is a software engineer for GrammaTech. He has almost two decades of experience writing cross-platform frameworks in C/C++, compiler & language design, and software engineering best practices and is currently a voting member of the C (WG14) and C++ (WG21) standards committees.

In case you can't figure it out easily enough, the views expressed here are my personal views and not the views of my employer, my past employers, my future employers, or some random person on the street. Please yell only at me if you disagree with what you read.