A few days ago I talked a little about tail recursion, and I mentioned a pattern I called iterator/builder to transform some simple recursive functions into tail-recursive functions. The transformation looks like this (in Erlang).

This is not perfect however. Consider the triangle function above. triangle_a(3) calculates (3+(2+(1+0))) while triangle_b(3) calculates (1+(2+(3+0))). If you change N+Sum to Sum+N in triangle_b you end up calculating (((0+3)+2)+1) instead. This is the left-fold vs right-fold problem that you sometimes run into when you flatten recursion, and also shows how the initial value for the fold operation (zero in this case) can move around. For an operation like + (plus) it doesn’t matter since + is associative and commutative, but it matters in other cases. Consider the stutter/1 function.

stutter_a([a,b,c]) calculates glom(a,glom(b,glom(c,[]))) while stutter_b([a,b,c]) calculates glom(c,glom(b,glom(a,[]))) and then reverses the result. stutter_c([a,b,c]) starts by reversing the iterator and so calculates glom(a,glom(b,glom(c,[]))) just like stutter_a.

So the simple transformative pattern described above sometimes has to be modified if the combine expression is not associative and commutative. Sometimes you can fix the result, sometimes you can reverse the initial iterator, sometimes you need an extra end-of-iterator parameter when you reverse the iterator, and sometimes you have to change the combine expression.

And of course sometimes you just want to leave the function alone and forget about tail recursing. This kind of code transformation or re-factoring requires some understanding of the problem before being applied.

C++ templates are only expanded as necessary. The following is error-free even though does_not_exist() does indeed not exist. The method never_called() is never called and thus never expanded, and so does_not_exist() is never needed.

Invoking the method call_hidden_fn() fails because although hidden_fn(double) is defined, it is hidden in the hiding_ns namespace.

You can solve this problem with a simple using hiding_ns::hidden_fn; declaration. And hidden_fn(double) doesn’t have to be exposed before it is needed, only before the end of the compilation unit. The following compiles without error or warning.

I was surprised. I thought that we’d still need the using hiding_ns::hidden_fn; declaration at the bottom to make this work, but the (MS) compiler doesn’t complain. Apparently calling hidden_fn(..) with an argument whose type is in the hiding_ns namespace is enough of a hint for the compiler to dig hiding_ns::hidden_fn(hidden_type&) out of that namespace.

If you declare another hidden_fn(hiding_ns::hidden_type&) at top namespace scope the compiler complains that there are two hidden_fn(..)s to choose from. So it does not prefer the exposed declaration over the hidden one.

Although this sort of behavior is interesting, I would recommend strongly against relying on it. Even if it is standard (is it?), it feels like something on the edge. It is clearer to declare hiding_ns::hidden_fn(..) near the top of the file, followed by a using statement. That way hidden_fn(..) is exposed before uses_hidden_fn<..> is expanded.

Calling A(..) sets up a stack frame with x on it. Soon y and z are pushed onto the stack frame, and finally a new stack frame for B(..) is created. When B(..) returns, A(..)‘s stack frame is popped and the value is returned.

But once B(..) is called, A(..)‘s stack frame isn’t needed for anything. The compiler could arrange to have A(..)‘s stack frame destroyed before B(..) is called, so that B(..)‘s stack frame could be build in exactly the same place. The return value from B(..) would be returned directly to A(..)‘s caller. In other words, B(..) steps on A(..)‘s tail, which is known as tail-call optimization.

The benefit is that the runtime stack is smaller since it doesn’t have to hold both frames at the same time. In this case it’s a small optimization, perhaps useful in tight embedded situations. But consider the following:

In this example the last thing A(x) does is call A(x+1), so A(x+1) can step on A(x)‘s tail. But it’s no longer just a small optimization since this repeats about a billion times. You’ll need a very big runtime stack if the compiler doesn’t arrange for A(..) to step on it’s own tail.

This is what they call tail recursion. I first read about it in Guy Steele’s famous paper Lambda: The Ultimate GOTO. It’s essential for languages like Scheme and Erlang to optimize tail recursion because they don’t provide a loop, since loops are disguised gotos. In these languages you recurse instead of loop.

But the programmer has to be aware of when he is and isn’t tail recursing. If the recursion is not tail recursion the compiler cannot tail-optimize. Consider this Erlang function.

It looks like the last thing stutter/1 does is call stutter(Rest). But really the last thing it does is make a list incorporating the result of stutter(Rest). So despite appearances, this is NOT tail recursive.

In this, stutter/1 calls stutter_tail/2 which is tail recursive. stutter_tail/2 takes two arguments, the iterator and the builder. The iterator is the list that we take apart, peeling the head off at each iteration. And while we use up the iterator, we add to the builder and construct the stuttering list.

Or consider factorial/1. First the intuitive version, which at first glance looks tail recursive even though it’s not.

When you’re programming in Scheme or Erlang you get used to reaching for a recursive solution whenever you get into a looping situation. And you’re always conscious of whether your implementation is tail recursive or not. And you soon find yourself thinking in recursive patterns like iterator/builder instead of iterative patterns like while(test_this()){do_that();}.

Any recursive algorithm can be expressed as an iterative loop with a stack. If it’s tail recursive, you don’t need the stack to make it into a loop. Some languages, like Scheme and Erlang, will automatically translate tail recursion into in-place looping whenever possible. This allows you to express many algorithms more naturally than you would with a loop without having to worry about stack overflow.

It would be nice if C compilers optimized tail recursion as a loop. It would be even better if C compilers could arrange for a trailing function to step on its caller’s tail whenever possible, even in non-recursive situations. This would allow a more functional coding style in C, and would make it easier for Scheme/Erlang “compilers” to use C as a target language. (I think one of the design goals for C should be to make it a universal target language.)

Tail recursion is more problematic in C++. Usually the last thing a C++ function does is run destructors for local variables. Sometimes this is absolutely essential, such as when you are using a wrapper class to lock/unlock (see Resource Acquisition is Initialization, or RAII). If the C++ compiler optimized tail recursion or tail stepping, the compiler would have to run the destructors before overwriting the caller’s stack frame. In the end the programmer would have to be given a way to control this, thus making C++ even more complex than it already is.