Tail recursion in C#

Regardless of the programming language you’re using, there are tasks for which the most natural implementation uses a recursive algorithm (even if it’s not always the optimal solution). The trouble with the recursive approach is that it can use a lot of space on the stack: when you reach a certain recursion depth, the memory allocated for the thread stack runs out, and you get a stack overflow error that usually terminates the process (StackOverflowException in .NET).

Tail recursion? What’s that?

Some languages, more particularly functional languages, have native support for an optimization technique called tail recursion. The idea is that if the recursive call is the last instruction in a recursive function, there is no need to keep the current call context on the stack, since we won’t have to go back there: we only need to replace the parameters with their new values, and jump back to the beginning of the function. So the recursion is transformed into an iteration, so it can’t cause a stack overflow. This notion being quite new to me, I won’t try to give a full course about tail recursion… much smarter people already took care of it! I suggest you follow the Wikipedia link above, which is a good starting point to understand tail recursion.

Unfortunately, the C# compiler doesn’t support tail recursion, which is a pity, since the CLR supports it. However, all is not lost! Some people had a very clever idea to work around this issue: a technique called “trampoline” (because it makes the function “bounce”) that allows to easily transform a recursive algorithm into an iterative algorithm. Samuel Jack has a good explanation of this concept on his blog. In the rest of this article, we will see how to apply this technique to a simple algorithm, using the class from Samuel Jack’s article; then I’ll present another implementation of the trampoline, which I find more flexible.

A simple use case in C#

Let’s see how we can transform a simple recursive algorithm, like the computation of the factorial of a number, into an algorithm that uses tail recursion (incidentally, the factorial can be computed much more efficiently with a non-recursive algorithm, but let’s assume we don’t know that…). Here’s a basic implementation that results directly from the definition:

(Note the use of BigInteger: if we are to make the recursion deep enough to observe the effects of tail recursion, the result will be far beyond the capacity of an int or even a long…)

If we call this method with a large value (around 20000 on my machine), we get an error which was quite predictable: StackOverflowException. We made so many nested call to the Factorial method that we exhausted the capacity of the stack. So we’re going to modify this code so that it can benefit from tail recursion…

As mentioned above, the key requirement for tail recursion is that the method calls itself as the last instruction. It seems to be the case here… but it’s not: the last operation is actually the multiplication, which can’t be executed until we know the result of Factorial(n-1). So we need to redesign this method so that it ends with a call to itself, with different arguments. To do that, we can add a new parameter named product, which will act as an accumulator:

For the first call, we’ll just have to pass 1 for the initial value of the accumulator.

We now have a method that meets the requirements for tail recursion: the recursive call to Factorial really is the last instruction. Now that we have put the algorithm in this form, the final transformation to enable tail recursion using Samuel Jack’s trampoline is trivial:

Instead of returning the final result directly, we call Trampoline.ReturnResult to tell the trampoline that we now have a result

The recursive call to Factorial is replaced with a call to Trampoline.Recurse, which tells the trampoline that the method needs to be called again with different parameters

This method can’t be used directly: it returns a Bounce object, and we don’t really know what to do with this… To execute it, we use the Trampoline.MakeTrampoline method, which returns a new function on which tail recursion is applied. We can then use this new function directly:

We can now compute the factorial of large numbers, with no risk of causing a stack overflow… Admittedly, it’s not very efficient: as mentioned before, there are better ways of computing a factorial, and furthermore, computations involving BigIntegers are much slower than with ints or longs.

Can we make it better?

Well, you can guess that I wouldn’t be asking the question unless the answer was yes… The trampoline implementation demonstrated above does its job well enough, but I think it could be made more flexible and easier to use:

It only works if you have 2 parameters (of course we can adapt it for a different number of parameters, but then we need to create new methods with adequate signatures for each different arity)

The syntax is quite unwieldy: there are 3 type arguments, and we need to specify them every time because the compiler doesn’t have enough information to infer them automatically

Having to use MakeTrampoline just to create a new function that we can then call isn’t very convenient; it would be more intuitive to have an Execute method that returns the result directly

And finally, I think the terminology isn’t very explicit… Names like Trampoline and Bounce sound like fun, but they don’t really reveal the intent.

So I tried to improve the system to make it more convenient. My solution is based on lambda expressions. There is only one type argument (the return type), and the parameters are passed trough a closure, so there is no need for multiple methods to handle different numbers of parameters. Here’s what the Factorial method looks like with my implementation:

It’s more flexible, more concise, and more readable…in my opinion at least. The downside is that performance is slightly worse than before (it takes about 20% longer to compute the factorial of 50000), probably because of the delegate creation at each level of recursion.

Is there a better way to accomplish tail recursion in C#?

Sure! But it gets a little tricky, and it’s not pure C#. As I mentioned before, the CLR supports tail recursion, through the tail instruction. Ideally, the C# compiler would automatically generate this instruction for methods that are eligible to tail recursion, but unfortunately it’s not the case, and I don’t think this will ever be supported given the low demand for this feature.

Anyway, we can cheat a little by helping the compiler to do its job: the .NET Framework SDK provides tools named ildasm (IL disassembler) and ilasm (IL assembler), which can help to fill the gap between C# and the CLR… Let’s go back to the classical recursive implementation of Factorial, which doesn’t yet use tail recursion:

It’s a bit hard on the eye if you’re not used to read IL code, but we can see roughly what’s going on… The recursive call is at offset IL_001f; this is where we’re going to fiddle with the generated code to introduce tail recursion. If we look at the documentation for the tail instruction, we see that it must immediately precede a call instruction, and that the instruction following the call must be ret (return). Right now, we have several instructions following the recursive call, because the compiler introduced a local variable to store the return value. We just need to modify the code so that it doesn’t use this variable, and add the tail instruction in the right place:

If we reassemble this code with ilasm, we get a new executable, which runs without issues even for large values which made the old code crash. Performance is also pretty good: about 3 times as fast than the version using the Trampoline class. If we compare the performance for smaller values (so that the old code doesn’t crash), we can see that it’s also 3 times as fast as the recursive version with no tail recursion.

Of course, this is just a proof of concept… it doesn’t seem very realistic to perform this transformation manually in a “real” project. However, it might be possible to create a tool that rewrites assemblies automatically after the compilation to introduce tail recursion.

I’m not an expert on the subject, but from what I found by googling about it, it’s not possible to make a fully tail-recursive implementation of quicksort. That’s because for tail call optimization to work, the recursive call must be the last instruction in the method, and quicksort makes two recursive calls, so they can’t both be the last… But some Stackoverflow answers, and Wikipedia, suggest that it’s possible to perform tail call optimization on only one of the recursive calls.

If you have your running calculation variables at the top, and a label where you want to ‘go to’, then simply going to the label based on a logic statement would recurse without a method call. This approach would actually be quite performant. Normally when calling a method variables and values get added to the stack with each call. But with a goto, execution will continue from the label, therefore, nothing added to the stack, and no mega popping of values off the stack as you unwind back to the original caller.

Of course, for Factorials, you don’t really need Recursion. So don’t crusify the code! I’m following the pattern of the recursion, and what happens. (n-1 per call). In iterate1, it uses the same logic to determine the next call, as if happening in the example code provided in the blog.
In iterate2, I simulate the unwinding process of the calls the original code would make.
The difference being, getting your head around values that change, and states that you need to manage (n, iterate, etc.) Eventually you could cook this one, until you find a pattern
with delegates injected for code to execute, and logic delegates for recursion. That would be pretty confusing come the end – but it just might save your stack!

Yeah, writing code in WordPress comments isn’t very convenient… I suggest you either write it to Gist and post the link, or use the [sourcecode] tag.

What you’re doing would work, but what’s the point? It’s basically the same as doing a loop, so it’s no longer recursive. The point of tail recursion is that it effectively transforms a recursive method into an iteration, but the code still looks recursive.

The problem with deep recursion is the possibility of stack overflow. Your article points this out.
This is caused by the stack filling up with addresses to return to, as well as values stored in registers (volatile). Probably a few other points to add in there too.

To quote the Wiki article: “Most of the frame of the current procedure is no longer needed, and can be replaced by the frame of the tail call, modified as appropriate ”

Which is partly I’ve simulated. The goto based on logic means you get the call behavior of recursion. The re-use of variables matches the notion of current frame being replaced by the frame of the tale-call, is akin to saying the same frame if possible. ie, same method, as I’ve achieved.

I’m not saying for one minute that this is a good solution – if one of my team members came to me with this, I’d be asking what the hell they were thinking!

You’re right, that this would appear little more than an iteration, – but when you consider the ‘goto’ to be the ‘lite’ version of calling a method (recursively), then you’ve got a something that could be used.

The problem is, to achieve it, you need to re-think variables, their states, and what value they may have.

Architecturally, to get away from the recursion, this is NOT a good idea! I just wanted to share it, as it demonstrates that thinking outside the box a little, you can get a solution. I must say though, going to IL language to get the true behavior desired is really out there! Not many developers would go to that degree. All too happy in our little C# world behind the CLR.

Performance is always a concern for high volume traffic and data processing. I’ve been fortunate enough not to hit a particularly nasty level of recursion in my whole career. But never the less, I thank you for sharing your code, it makes for an interesting read.