Hi, I'm Kenneth. I’m a Swiss transplant in San Francisco. I’m
an angel investor, and an engineer at AngelList & CSC. Previously, I was co-founder of
FreshPay and the architect behind Chartboost. When I'm not working, you
can find me either photographing some remote corner of the world, or hacking
away on an ever-changing side project.

Fibonacci & the Y-Combinator in C

Be warned, this post describes something you should never actually use in production code. However, we will get to play with some very cool concepts and techniques: functional programming in C, closures, implementing autorelease pools from scratch, data structures (linked lists and b-trees), bitwise operations, recursivity, memoization, and the Y-Combinator. If this sounds crazy, don’t be scared. It’s all fairly simple when broken down.

Let’s back up for a second, however. What we’re going to create here is a program that returns a number in the Fibonacci sequence. The Fibonacci sequence is a sequence of numbers in which each integer is equal to the previous two integers added: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, etc.

There are many ways of calculating Fibonacci numbers, but the naïve implementation which follows the mathematical definition is this:

As you can see, the function is called multiple times with the same argument, and for every time n is decremented (until you reach 1 or 2), the number of function calls increases by a power of two. Imagine how much worse it would be for fib(200). The number is so great that even given unlimited memory and energy, your computer would require billions of years to finish the computation.

Closures

A closure is an anonymous function which may use and capture variables from its parent scope. Imagine this code, in JavaScript:

print_element_plus returns an anonymous function (aka. a closure) which takes one argument and adds it to x. The variable x is captured by the closure, and even though it goes out of scope when print_element_plus returns, it is still retained by the closure until the closure itself goes out of scope and is freed.

C does not natively support closures. Although similar functionality can be implemented in pure C fairly easily using a struct containing the environment and a function pointer, we’re going to instead take advantage of clang’s built-in support for the blocks extension to the language:

$ clang -fblocks -o fib -O3 fib.c -lBlocksRuntime

In C, a block is simply another name for a closure, and its syntax is very similar to defining a function pointer. So with that in mind, let’s rewrite our fib function as a block.

Note: this should go in main() and the __block is necessary to enable explicit recursion, so that the block may reference itself.

The Y-Combinator

Recursion implies that a function has a name by which to refer to itself, so it may call itself. While the example above works, it relies on the block capturing the variable to which it is assigned from the parent scope. This is not super clean, and relies on a implementation detail of the C programming language. Identical code would not work in, for example, lisp.

Since we are giving ourself the restriction of not explicitly using the fib variable in order to recur, we will wrap the function in a function whose first and only argument is a function with which to recur. We’ll call this function the generator, because when called with fib as its argument, it will return the correct fib function.

This is where the Y-Combinator comes in handy. The Y-Combinator is a function which enables recursion, using only anonymous functions, and without using recursion in its implementation. It looks somewhat like this (in Scheme):

This article by Mike Vanier does a far better job of explaining the Y-Combinator than I ever could. Suffice to say that when called with a generator function, as defined above, the Y-Combinator will return the recursive fib function. With the Y-Combinator, we could say:

fib_f fib = Y(fib_generator);

Due to C’s explicit typing, declaring higher-order functions can quickly become cumbersome, even with typedefs. So in order to get around that, we’re going to declare a single type to hold every function in our program. We’re going to call it _l, short for lambda.

Wrapping the type in an union allows it to refer to itself. We’ll be adding a couple more block types to this union shortly, but for now our only two options are: the signature of the fib function, and the generator which both takes and returns a lambda (containing a fib function, though that is unspecified).

Since this type is declared as a pointer, it should live on the heap, and therefore we should write initializer functions for it:

Auto-release pools

As you have probably noticed, the code above is unfortunately leaking tons of memory. Many of the functions we’ve written allocate _l unions, and these are never released. While we could attempt to always correctly release these as soon as they are no longer needed, a more interesting solution is to use an auto-release pool.

Auto-release pools are a concept familiar to Objective-C programmers, since they are used extensively in all of Apple’s APIs. They work like this: You can create and flush pools, which nest like a stack. You can auto-release a pointer, which means that it is added to the topmost pool, and is freed when the pool is flushed.

The simplest way of implementing auto-release pools is with two linked lists: the first to contain the objects that have been auto-released (this is the pool itself), and the second to contain the stack of pools. Lastly we’re going to need a static variable to refer to the top of the pool stack.

Then we can write a function to add pointers to the pool. Since different types may have different free functions, we will also take a reference to the free function to use on the pointer as the second argument to the autorelease function:

Memoization: Wrapping

While our code is now fully functional (ha, ha, coder pun…), it still is horribly inefficient. That’s because we still have not solved the inherent problem of the function having a time complexity of O(2^n). This can be solved using memoization.

Memoization is an optimization technique which consists of storing computations in memory when completed, so that they can be retrieved later on, instead of having to be re-computed. For fib, using memoization reduces the time complexity down to O(n).

In order to use memoization, we need a way to inject a function that will be executed before executing the fib function. We’ll call this wrapping, and as a first example, let’s just use wrapping to demonstrate how bad O(2^n) really is.

Understanding this wrapping function still makes my brain hurt, but suffice to say that it takes a generator and a wrapper as arguments, and returns a generator. The resulting generator can be used in the Y-combinator, and every recursion of the original function will be replaced with a recursion of the wrapper function, which itself calls the original function.

A simple wrapper function that will log every iteration looks like this:

Your output should look somewhat like this. As you can see, calling fib(20) evaluates the function 13,529 times.

Memoization: Implementation

In order to write a memoization wrapper function, we need a data structure in which to store the results. Most examples of memoization use a hash table, but since the key in our case is an integer, I decided to go for something a little more exotic: a binary tree, based on the bits of the integer key.

In order to store a value in the tree, we iterate through every bit in the int, traversing either through the one or zero node of the tree, and finally setting the leaf to the value we’re trying to cache:

And finally, now that we have a tree in which to store cached results, we can write our memoization function. Here, we’re actually adding creating a function called new_memoizer which returns a new instance of the wrapper function with its own (auto-released) cache.

Generate up to fib # 20...
About to generate # 20
About to generate # 19
About to generate # 18
About to generate # 17
About to generate # 16
About to generate # 15
About to generate # 14
About to generate # 13
About to generate # 12
About to generate # 11
About to generate # 10
About to generate # 9
About to generate # 8
About to generate # 7
About to generate # 6
About to generate # 5
About to generate # 4
About to generate # 3
About to generate # 2
About to generate # 1
Fibonacci number 20 is 6765.