How I Figured out the Y Combinator

The Y Combinator is among the most famous functions in theoretical computer science, which is quite ironic given that it's anonymous (and only really useful because it's anonymous). There's even a massive venture capital firm that took its name. And - somewhat unusually for theoretical CS constructs - it's quite useful.

There are no shortage of tutorials describing how it works, but I couldn't find many which worked through an example. So here's my go at it.

Here's a definition of the combinator that you've probably seen before:

Let's roll it out to see what happens when we actually use it:

So basically, what the Y combinator does is apply a function with itself an infinite number of times. That's already remarkable, because it essentially allows for an infinite amount of recursion with only anonymous functions.

In order to really make it useful though, we need to create an analogy to convergent sequences like $1/2, 1/4, 1/8, \dots$.

A functional equivalent in haskell is something like:

f g x = 1/2 * (g x)

This just uses recursion to generate the next term in the sequence. We can try it out:

The sequence f, f(f), f(f(f)), ... is analogous to the sequence $1/2, 1/4,\dots$ and so it converges to $f^\infty(x) = 0$. To generate an infinite application of f with itself, we can use a fixed point operator:

If we try to use this on our specific $f$ though, Haskell will hang, since, well, it takes an infinite amount of time to converge.

The only kind of convergent sequences that we can use in the "real world" are those which converge in a finite amount of time. No article about functional programming would be complete without a factorial implementation, so here's an example which is readily convergent:

Since Haskell is lazy, the else branch is never evaluated if x = 1, so we don't need to worry about whether the else is well-defined. It's only when we try to call $fact(2)$ that we're going to get an error.

We can consider the sequence $f(\perp), f\left(f(\perp)\right), f\left(f\left(f(\perp)\right)\right), \dots$. ($\perp$ being the formal way of saying "undefined".) In code, this looks something like:

The $n$th term in this sequence has $n$ of these else clauses. As $n$ goes to infinity, the function approaches the real factorial function, as it has fewer and fewer undefined segments. So we can say that this sequence of functions "converges" to the factorial function.

And the key point is that we only need as many function applications as we need recursive calls, i.e. $fact(n)$ needs only $n$ applications of $fact$ as an input - the $n+1$th can all be undefined. So even though this sequence takes an infinite amount of time to converge, the approximation is "good enough" for finite $n$.

To come full circle: in order to make our approximation arbitrarily correct, we need an arbitrary number of function applications as an input. And this is of course what $fix$ does.

Here is another usage which might make more sense:
The idea is that we have an infinite list of functions to apply, and when we need a new one we just pop one off. Which just gives us the arbitrary number of function applications that we need.

To summarize:

The Y combinator is just a higher-order function which applies functions to themselves

By applying functions to themselves, we get a sequence which converges to the "normal" recursive function.

So the Y combinator (as well as any fixed-point operator) can create an "arbitrarily good" approximation of a recursive function.