Note: If you came here from Simon Cozens' article
looking for an application of closures, you may be disappointed,
because this isn't really an application at all, although it is very
interesting. If you want a practical application, you should
visit my article about
memoization.

I also teach a class called Stolen Secrets of the Wizards of the Ivory
Tower that is about practical applications for closures in
Perl. I've given this class at the Open Source Conference in 2000 and
2001 and at some other places also. You might like to look at the
samples.

When computer scientists want to study what is computable, they need a
model of computation that is simpler than real computers are. The usual
model they use involves a Turing Machine, which has the following parts:

One state register which can hold a single number, called the
state; the state register has a maximum size specified in advance.

An infinite tape of memory cells, each of which can hold a single
character, and a read-write head that examines a single square at any given
time.

A finite program, which is just a big table. For any possible number
N in the register, and any character in the currently-scanned memory cell,
the table says to do three things: It has a number to put into the
register, replacing what was there before,; it has a new character to write
into the current memory cell, replacing what was there before, and it has
an instruction to the read-write head to move to the next cell, the
previous cell, or to stay still,

This may not seem like a very reasonable model of computation, but computer
scientists have exhibited Turing machines that can do all the things you
usually want computers to be able to do, such as performing arithmetic
computations and running interpreter programs to simulate the behavior of
other computers.

They've also showed that a lot of obvious `improvements' to the Turing
machine model, such as adding more memory tapes, random-access memory, more
read-write heads, more registers, or whatever, don't actually add any power
at all; anything that could be computed by such an extended machine could
also have been computed by the original machine, although perhaps slowly.

Finally, a lot of other totally different models for computation turn out
to be equivalent in power to the Turing machine model. Each of these models
has some feature about it that suggests that it really does correspond well
to our intuitive idea of what is computable. One such model is the Lambda Calculus, invented by Alonzo Church. Lambda Calculus is intended to capture, in a
very simple way, the idea of creating a function and then calling it on an
argument.

Miraculously, the Lambda Calculus model is both simpler than the Turing
machine model and more practical for real computation. The programming
language Lisp, for example, is little more than Lambda Calculus in
disguise---and not a very disguising disguise, either.

The Lambda Calculus model of computation is very, very simple. There are
only two operations. You can create a function of one argument with a
specified body, and you can apply one of these functions to an argument.

The first operation is denoted like this:

%x.B

where B is the body and x is the formal parameter of the function. The % is
usually written with a lowercase letter lambda, but my typewriter has no
lambda, so I'll use % instead. B here is some arbitrary expression. When
the function is invoked on some argument A, you locate the occurrences of
the formal parameter x inside the body B, and you replace each X with a
copy of the argument A. This is just what any language does when it calls a
function: It looks in the body of the function, and replaces copies of the
formal parameter with the actual argument.

For example, here's a function:

%x.(x x y)

If we apply this function to the argument (p q), we get

((p q) (p q) y)

To denote function application, we just juxtapose the function and its
argument. So the function application we just discussed is denoted this
way:

(%x.(x x y) (p q))

We say that this expression reduces to the simpler expression

((p q) (p q) y)

Function application associates to the left, so that

(a b c)

is an abbreviation for

((a b) c)

here we apply the function a to the argument b, and then apply the result
to the argument c.

If an expression has no applications in it, it can't be reduced any
further, and is said to be in normal form. There might be several ways to reach a normal form. For example, consider
the expression:

(%x.(p x) ((%y.b) q))

we can reduce the %y part first, yielding

(%x.(p x) b)

and then

(p b)

or we can reduce the %x part first, yielding

(p (%y.b q))
(p b)

In this case, both paths led to the same place. In 1934, Church and Rosser
proved that no expression has more than one normal form, and so any two
sequences of reductions that end with a normal form give you the same
normal form. This means that we can consider the normal form to be the
`value' of an expression: It's what's left when we finish evaluating it,
and it doesn't matter how we carry out the steps in the evaluation.

If two expressions have the same normal form, we say they're
equivalent. We'll denote equivalence by -=-.

Some expressions have no normal form:

(%x.(x x) %x.(x x))

This expression reduces to itself; you never get to a normal form.

Now what does all this have to do with real computation? For computation we
need several things: We need boolean constants and an if-then test. We need
numbers, and arithmetic, and it would be nice to have some data structures
too.

At first it's not clear that we're going to be able to accomplish this. We
only have functions of one argument; how are we going to express addition,
which is a function of two arguments? It turns out that the restriction to
one-argument functions is no restriction at all. That's because we can play
a trick called currying.

First, imagine the function f3 that takes one argument and adds 3 to it.
Now imagine the similar function f4 that takes one argument and adds 4 to
it. Now imagine all the other functions in this family. Certainly they're
all functions of one argument.

Our `add' function is going to take one argument. If that argument is 3,
it'll return f3. If the argument is 4, it'll return f4. if it's some other
number, add will do the analogous thing.

Now let's see what happens when we try to evaluate

(add 3 4)

This is just short for

((add 3) 4)

We already said that (add 3) will produce f3:

(f3 4)

f3 is the function that adds 3 to its argument, so the result of this is

add gets one argument $x, and constructs a function, $f.
$f is the function that gets a number $y, adds $x to it, and returns the result. add itself returns $f. When you apply
$f to a number $y, it adds $x
to it; when you apply add to a number $x,
you get $f. In perl the notation is a little cumbersome;
add actually returns a code reference, and to call it you
have to use the &{...}(...) syntax, so it looks like
this:

$sum = &{add(3)}(4); # Sum is 7

In perl 5.004 and later, there's a tidier notation, which we'll use from
now on:

$sum = add(3)->(4); # Sum is 7

If $add is a reference to the add function itself, it looks
like this:

$sum = $add->(3)->(4); # Sum is 7

As promised, we have an addition function that is made of functions of only
one argument.

Now back to general programming. Our goals are to develop Lambda Calculus
versions of if-then tests, boolean and integer constants, arithmetic
operators, and some simple data structures.

The if-then test is most fundamental. We can imagine that `if' is a
function of three arguments. The first is a boolean value; the second is
the `then' clause, and the third is the `else' clause. The value of this
expression

IF b THEN x ELSE y

is x if b is true, and y if b is false. So what we'd really like is to find
three functions, IF, TRUE, and FALSE, such that

IF TRUE x y -=- x
IF FALSE x y -=- y

for all x and y. Then we can declare that an expression has a boolean value
if it is equivalent to TRUE or FALSE, and the Church-Rosser theorem tells
us that

if E x y

will evaluate to x whenever the expression E has a true value and to y
whenever E has a false value. This is just what we want.

Now let's try for a small data structure. The simplest data structure is
the ordered pair. We'll define a pair function that takes two values and combines them into one, from which the
original values can still be extracted. In Lisp this operation is called cons, although the object produced by cons is in fact called a `pair'.

We want to have PAIR, FIRST, and SECOND functions such that

(FIRST (PAIR A B)) -=- A
(SECOND (PAIR A B)) -=- B

The insight here is that we can use a partially-evaluated IF
construction. To make the pair <A,B>, we'll construct

%f.(IF f A B)

Then FIRST will just supply a true value for f to select A, and SECOND
will supply a false value to select B:

As Lisp programmers know, with pairs you can contruct almost any
interesting data structure. For example, a linked list is a sequence of
pairs, each of whose SECOND elements is the pair that represents the `rest' of the list.

Now we're in good shape to do numbers. Integers have a few fundamental
properties: There's a ZERO element, and an IS_ZERO
function to recognize it. Every number has a successor (the next number),
which is generated by the SUCC function. And finally, every number except zero has a predecessor, computed
by the PRED function. These four values must satisfy the following properties:

IS_ZERO ZERO -=- TRUE
IS_ZERO (SUCC x) -=- FALSE
PRED (SUCC x) -=- x

There are a few other properties they should satisfy: if x and y are different numbers, then (SUCC x) and (SUCC y) should be different also, and so on. But these three are enough for now.

Here's our technique: We'll represent each number n as a list of length n. This means that a number n will be a pair; the SECOND part of the pair will be the number n-1. The FIRST part of the pair can be arbitrary, so we might as well choose it usefully:
The FIRST part of
ZERO will be TRUE, and the FIRST part of every other number will be
FALSE. With that definition, it's easy to recognize ZERO.

All the tools we need here are available: We can construct functions that
bind arguments and return values; we have an if-then-else test; we have IS_ZERO and SUCC and PRED. Similarly, here's the function that computes whether or not two numbers
are equal:

Let's stick with add because it's simpler. Rendered into the compact Lambda Calculus notation,
it looks like this:

ADD = %m.(%n.IF (IS_ZERO m) n (ADD (PRED m) (SUCC n)))

There's a problem here, and it's big one: We used ADD in its own definition. We know what the definition is, but if we try to
insert it, the definition gets longer and longer and we never get rid of
the reference to ADD. We need some way to express recursion so that we don't have to write
infinitely long function definitions, which are impractical.

It turns out that we can express recursion in Lambda Calculus, and the method used to do it is
really brilliant.

First we need to discuss the idea of a fixed point. Consider the function x2. If you put in 3, you get 9. But if you put in 1, you get 1 back out
again. We say that 1 is a `fixed point' of the function x2. You can think of x2 as shuffling the numbers around, sending 2 to 4, 3 to 9, and soforth, but
it leaves 1 fixed in the same place.

Now let's consider a function where you put in one function and get another
one out. Specifically, let's consider this function, which we'll call R:

R = %g.(%m.(%n.IF (IS_ZERO m) n (g (PRED m) (SUCC n))))

You put in a function g, and you get out a new function. For example, if we put in the identity
function %q.q, we get out

just expand the left-hand side once and you get the right-hand side. So if ADD is the addition function we are looking for, then

ADD = R ADD

This means that if we were able to find the ADD we wanted, it would be a fixed point of R, and conversely, if we could find a fixed point of R, it would have to be
the addition function we were looking for.

How does this help us? We have no idea how to find fixed points, especially
fixed points of big hairy functions like R.

Here's the magic: There is a way to find a fixed point for any function in
Lambda Calculus. In fact, there's a function, Y, that computes one. Given any function f, Yf is a fixed point of
f. That is, for any f,

Y f = f (Y f)

because (Y f) is a fixed point of f.

It's not at all obvious that such a function Y could possibly exist, and yet one does exist. All that remains is to
exhibit this magical fixed point function Y that computes the fixed point for any function whatsoever. And here it is:

Y = %f.((%x.f (x x))(%x.f (x x)))

let's try applying Y to some function f and see what we get. We hope that (Y f) will be a fixed point for f, so that (Y f) =
f (Y f). Let's see:

Before we go on to the punch line of the article, there's one subtle matter
that we have to see in detail. The problem is that at any particular step,
there might be any number of ways to reduce a given Lambda Calculus
expression. and we have to pick the right one. The Church-Rosser theorem
says that no matter how we do it, the normal form that we get at the end
will be the same one---if we actually reach a normal form. But if we follow the wrong path, we might
never reach any normal form at all. For example, consider this:

(%x.Q) ((%y.y y)(%y.y y))

Here we have two choices: We can apply the leftmost function %x.Q to its argument, yielding the normal form, Q, immediately. But if instead we try to evaluate the argument first, we
don't get anywhere. A Perl analogue of the same thing would be something
like this:

does indeed loop forever, even though return_Q never uses its argument, because Perl has call by value semantics: This means that a function's arguments are always fully
evaluated before the function is called. The other option is call by name, in which the argument is passed to the function without being evaluated, and is only evaluated later if it is required.

You might think that since call-by-value is workable for Perl, it will work
in practice in Lambda Calculus also. But you'd be wrong: call by value doesn't always work in Perl; there are some essential places where Perl (and every
other language) uses call by name instead. To see why, suppose you were to
try to write your own function to replace Perl's if construction:

This function loops forever on any input. When n=0, it's important that we not make another recursive call---but this function does try to evaluate 0*factorial(-1) and pass this result to my_if; it doesn't matter that my_if would eventually have chosen the other branch, which was 1; with call by
value we have to compute both branches before we can ask which one we want. And that defeats the whole point of if which is to compute one or the other, but not both.

What we'd like to do is somehow express IF in a way that delays the evaluation of x and y until after the IF has selected one or the other of them. As we saw in my lazy streams article
back in TPJ #7, the simplest way to delay evaluation is to wrap up $x and $y
into functions:

%q.x

does not evaluate x, but rather creates a function that will evaluate it sometime in the
future, when we apply the function to an argument; the argument is ignored,
and out pops x. Instead of writing
(IF b x y) this way:

(%b.(%x.(%y.b x y))) b x y

as we have been doing, we'll write it this way:

(%b.(%x.(%y.b x y))) b (%q.x) (%q.y) q

The if part will yield either %q.x or %q.y, and then when we apply that to the dummy argument q, we'll have our result. To see how this works, let's suppose b is FALSE:

OK, now we're finally ready for the punch line of the entire article. By
now I bet you're expecting me to write a Perl program that evaluates Lambda
Calculus expressions. Nope, wrong. Here's the punch line: Perl already has
that feature built in.

If you wanted to investigate Turing machines, you'd have to start by
writing a simulator; then you'd have to write Turing machine programs to
perform functions like addition and multiplication, and that soon becomes
impossibly difficult. But to investigate Lambda Calculus, you don't need a
simulator. Lambda calculus isn't a machine; it's more like a programming
language. In fact, it is a programming language. It's programming language with a syntax for
defining functions and invoking them, and nothing else. The language is so
simple that you can view it as a subset of Perl.

%x.B means to create a function, which, when applied to an argument
A, substitutes it into B in place of x, and yields the result. And that's exactly what sub { my $x = shift; B } does do.

In Lambda Calculus, (p q) means to apply a function p to an argument q; Perl has that too. We'll just say:

$p->($q)

That's all we need; we can translate our Lambda Calculus expressions
directly into Perl and have the Perl interpreter evaluate them immediately.
Only the syntax is different, and not even very different at that.

Oops! This anonymous function that we got might be the TRUE
function %x(%y.x), or it might be something else, and there's no simple way to peek inside
to see. Let's define a utility function that analyzes a Lambda Calculus
boolean and tells us which it is:

print_number($SUCC->($TWO));
# prints: "If this is a number, its value is 2."

The peculiar ($IF->($IS_ZERO->($n))->(1)->(undef))
expression tests a Lambda Calculus number with $IS_ZERO, then converts the Lambda Calculus-style boolean result into a Perl-style
boolean value, either 1 or undef, for use in the until loop.

Now we're ready to try addition. Addition is recursive, and we need a
version of the Y operator that will work under a call-by-value semantics. The existing one
won't, because it wants to make the recursive call right away; we need to
delay this. We can solve this problem the same way we did for IF: Instead of

Y = %f.((%x.f (x x))(%x.f (x x)))

we'll use

YM = %f.(%x.(%y.f (x x)))(%x.(%y.f (x x)))

This YM operator is a little different from the Y operator. Instead of generating the fixed point immediately, as Y does:

Y f = f (Y f)

It delays the computation, and performs it on demand:

YM f = %y.f (YM f)

Then when you actually want to make the recursive call to F, you apply the result to some dummy argument y, which is thrown away.