In this post I'll derive the Y-combinator and explain all the steps taken. The key to understanding the Y-combinator is knowing precisely what it does. I am sure many people never understand Y-combinator because of this very reason, or because the introduction is too lengthy to comprehend. Therefore I'll give the shortest possible explanation of Y-combinator before I derive it so that you know what the end result should be.

The Y-combinator allows an anonymous function to call itself, that is, it allows anonymous function recursion. Since anonymous functions don't have names, they can't refer to themselves easily. The Y-combinator is a clever construct helps them to refer to themselves. That's it. That's all the Y-combinator does. Remember that.

I'll use the Scheme programming language to derive the Y-combinator. My derivation is based on the one in "The Little Schemer" book. I took the examples from the book, filled in the missing steps and explained the steps in more details.

Here it is.

Suppose we want to write the length function, which, given a list, returns the number of elements in it. It's really easy if we can give the function a name:

But now suppose that we can't give names - we can only use anonymous functions:

(lambda (list)(cond((null? list)0)(else(add1(???(cdr list)))))))

Suddenly there is no way this anonymous function can refer to itself. What do we put in ???? We can't refer to the name length anymore because it's an anonymous function, which doesn't have any name. One thing we can try is to put the function itself in the place of ???:

This is not much better, we are still left with ???. But there is a bright side to it, this function can determine lengths of lists with one element and no elements. Let's try inserting the same anonymous function in place of ??? again:

Now this function can determine lengths of lists with 0, 1 and 2 elements. If we continue this way, we can construct the length function for lists with a certain number of elements. But this is not what we want, we the real length function that works for lists with any number of elements.

As we all know, repeating code is not a good thing. Let's try to factor out the repetitions and rewrite the original anonymous function slightly. Instead of leaving ??? in the code, let's pass it to an anonymous function via an argument called length.

This is pretty tricky. Observe how (lambda (mk-length) ...) gets the (lambda (length) ...) function passed as the mk-length argument, which in turn accepts ??? as an argument and returns our original anonymous function:

(lambda (list)(cond((null? list)0)(else(add1(???(cdr list))))))

Now let's try constructing length functions for lists of length one and two. For the list of length one, we just need to apply mk-length on itself:

Let's go through this code. First (lambda (length) ...) gets passed to the lambda (mk-length) function as the mk-length argument. Then it applies the result of (mk-length ???) (which is the original anonymous function) to the mk-length. This produces our well known function that works on lists with one or none elements:

Notice also that argument names mk-length and length in lambda (mk-length) and lambda (length) are independent. Therefore we can rename length to mk-length to remind that the first argument to mk-length is mk-length:

The function works because it keeps adding recursive uses by passing mk-length to itself, just as it is about to expire. This is not yet the Y-combinator, but we have successfully managed to recursively call an anonymous function. Now we need to massage the code a bit to separate the anonymous function from the self-applicative code.

The first step is to move the self-applicative code out as much as possible:

s/But this is now what we want, we a real length function that works for lists with any number of elements./But this is not what we want, we want a real length function that works for lists with any number of elements./

tommy > there is a quirk in the Erlang syntax, wich makes that you can't have a local declaration of a recursive function (inside an expression) : actually, there is no syntax for named function declaration inside expressions, so you have to use anonymous functions. Hence, you can't declare a local recursive function without such a fixpoint operator. This is actually a bad idea, as you end up reimplementing a part of the runtime system in a probably slow and certainly cryptic way, but that was just for the example.

Peter > this combinator is just one among the different fixpoint combinators available. There is for example the Turing Fixpoint combinator, wich is also derivable. Do you know where in your derivation you made a decision that lead you to this combinator instead of another ? I have a gut feeling that the derivation is not "canonical" and that one could probably use a slightly different succession of software-engineering-and-common-sense steps that would naturally lead to another combinator.

Also, I'm not sure I agree with your presentation of the Y-combinator. You describe it as something about recursion and anonymous functions. I believe it is much more general than that : fixpoint combinators allow you to control recursion. It is all about parametrizing the function you call over at your recursion sites, instead of hardcoding recursion using the language syntax (or lack of). From a software engineering point of view, you could say that it decouples two aspects of your function : the implementation (with domain-specific knowledge and all) and the "looping" / "tying the knot" process, wich is deferred to the fixpoint combinator.

You can do quite interesting (but somewhat hacky) things once you've decoupled those two separate concerns. Matt mentioned memoization (automagic memoization of inner recursive calls, wich have been exposed by the decoupling process), there are quite a few folk examples, see for example the following article That About Wraps it Up.

It's a funny hack, for example, to write in your favorite language a function that take any derecursified/decoupled function and, assuming the original function was tail-recursive, produce a tail-recursive implementation of it, using an explicit call stack, wich doesn't depend on the implementation of tail recursion in the compiler/runtime.

Tommy, I can only think of lambda calculus that, if I understand correctly, uses Y-combinator to recurse.

Bluestorm, I am also familiar with Y! (Y-bang) combinator (from The Seasoned Schemer book). But I am not exactly sure how it's different from Y-combinator. I am still thinking about it. The derivation is similar to Y-combinator's but it's a bit different. The book says that it works the same as Y-combinator but then gives an example which Y-combinator processes but Y! combinator goes into an infinite loop. And thanks for the in-depth comment!

The problem with the Y-Combinator is, that it cannot be typed. We have Y = \s.(\x.s(x x))(\x.s(x x)), such that Ys reduces to (\x.s(x x))(\x.s (x x)) which reduces to s((\x.s(x x))(\x.s(x x))) = s Ys, thus, having Ys reducing to all sssss...sYs, which means, Ys is not strongly normalizing - and since all typed lambda terms are strongly normalizing, it cannot be typed (actually, in the same way, no fixpoint combinator can be typed).

Anyway, even strongly typed programming languages like Haskell have recursion, even though mostly through calling the function by its name. In the theory of typed lambda terms, you therefore mostly axiomatically define recursion operators, like R:(a->a)->(a->bool)->a, which applies the first argument to a until (a->bool) gets true (there are a lot of other possibilities, though). Anyway, these cannot be expressed as terms, they must be defined axiomatically.

very good explanation, thanks. There is a recent talk worth watching about this esoteric exercises: http://www.infoq.com/presentations/Y-Combinator
I'm among readers of 'The Little Schemer' too :-)
I just completed the functional programming course in Scala offered by Martin Odersky on Coursera and find this concepts really delicious

Good read! You see all this clearly, and I dare ask for more :-)
One intriguing thing about chapter 9 is to discover that the first way that comes to head for simplifying the value function (by factoring out (make-length make-length)) results in a function which enters an infinite loop BEFORE even starting the actual computation.
Have you got an explanation for this (apart from just saying "hey, it happens!" as the book does)? Is this something general about factoring out f(f)? or does it happen just here?
I trust it is a general problem with f(f), but I've thought long about it without coming to a conclusion. Your view will be greatly appreciated.
Thank you and have a good day.

Peter - 5 years on and your article is still really useful. I had coped with The Little Schemer up until the Y combinator - you got me through it.
For entertainment value only you might want to see my attempt in Javascript at http://jsbin.com/loromo/edit.
Thanks!

There's also an account of deriving the Y combinatory in something approximating the untyped lambda calculus in "An Introduction to Functional Programming Through Lambda Calculus" (Addison-Wesley, 1989) in Chapter 4.