Why Lambda Calculus?

But more importantly, working through the theory from its original viewpoint
exposes us to different ways of thinking. Aside from a healthy mental workout,
we find lambda calculus is sometimes superior:

As the importance of software grows in our world, so does the importance of
the advantages of lambda calculus, and in particular, its connections with the
foundations of mathematics. Computer science without lambda calculus is like
engineering without physics.

Beta reduction

Unlike Turing machines, everyone already knows the basics of lambda calculus.
In school, we’re accustomed to evaluating functions. In fact, one might argue
they focus too much on making students memorize and apply formulas such as
\(\sqrt{a^2 + b^2}\) for \(a = 3\) and \(b = 4\).

In lambda calculus, this is called beta reduction, and we’d write this
example as:

\[ (\lambda a b . \sqrt{a^2 + b^2}) 3 \enspace 4 \]

This is almost all there is to lambda calculus! Only, instead of numbers,
we plug in other formulas. The details will become clear as we build our
interpreter.

I was surprised this substitution process learned in childhood is all we need
for computing anything. A Turing machine has states, a tape of cells, and a
movable head that reads and writes; how can putting formulas into formulas be
equivalent?

We use code to help answer the question, which requires a bit of boilerplate:

As for input, since typing Greek letters can be nontrivial, we follow Haskell
and interpret the backslash as lambda. We may as well follow Haskell a little
further and accept -> in lieu of periods, and support line comments.

Any alphanumeric string is a valid variable name.

Typing a long term is tedious, so we support a sort of let statement. The line

true = \x y -> x

means that for all following terms, the variable true is no longer a
variable, but shorthand for the term on the right side, namely \x y -> x.
There is one exception: if the variable true is the left child of a lambda
abstraction, then it shadows the original definition. It is a good practice
to pick a different name to avoid confusion.

Evaluation

If the root node is a free variable or a lambda, then there is nothing to do.
Otherwise, the root node is an App node, and we recursively evaluate the left
child.

If the left child evaluates to anything but a lambda, then we stop, as a free
variable got in the way somewhere.

Otherwise, we perform beta reduction as follows. Let the left child be \(\lambda
v . M\). We traverse the right subtree of the root node, and replace every
occurrence of \(v\) with the term \(M\).

While doing so, we must handle a potential complication. A reduction such as
(\y -> \x -> y)x to \x -> x is incorrect. To prevent this, we rename the
first x and find (\y -> \x1 -> y)x reduces to \x1 -> x.

More precisely, a variable v is bound if it appears in the right subtree of
a lambda abstraction node whose left child is v. Otherwise v is free. If a
substitution would cause a free variable to become bound, then we rename all
free occurrences of that variable before proceeding. The new name must differ
from all other free variables.

We store the let definitions in an associative list named env, and perform
lookups on demand to see if a given string is a variable or shorthand for
another term.

Our eval function terminates once no more top-level function applications
(beta reductions) are possible. We recursively call eval on child nodes to
reduce other function applications throughout the tree, resulting in the
normal form of the lambda term. The normal form is unique up to variable
renaming (which is called alpha-conversion).

A term with no free variables is called a closed lambda expression or
combinator. When given such a term, our function’s output contains no App
nodes.

That is, if it ever outputs something. There’s no guarantee that our recursion
terminates. For example, it is impossible to reduce all the App nodes of:

omega = (\x -> x x)(\x -> x x)

In such cases, we say the lambda term has no normal form. We could limit the
number of reductions to prevent our code looping forever; we leave this as an
exercise for the reader.

In an application App m n, the function eval tries to reduce m first.
This is called a normal-order evaluation strategy.
What if we reduced n first, a strategy known as applicative order?
More generally, instead of starting at the top
level, what if we picked some sub-expression to reduce first? Does it matter?

Yes and no. On the one hand, the
Church-Rosser
theorem states that the order of evaluation is unimportant in that if terms
\(b\) and \(c\) are both derived from term \(a\), then there exists a term \(d\) to
which both \(b\) and \(c\) can be reduced. In particular, if we reach a term where
no further reductions are possible, then it must be the normal form we defined
above.

On the other hand, some strategies may loop forever instead of normalizing a
term that does in fact possess a normal form. It turns out this never happens
with normal-order evaluation: it always reduces a term to its normal form if it
exists, hence its name. This is intuitively evident, as at each step we’re
doing the bare minimum. Reducing m before n means we ignore arguments to a
function until they are needed, which explains another name for this strategy:
lazy evaluation.

A Lesson Learned

Until I wrote an interpreter, my understanding of renaming was flawed. I knew
that we compute with closed lambda expressions, that is, terms with no free
variables, so I had thought this meant I could ignore renaming. No free
variables can become bound because they’re all bound to begin with, right?

In an early version of this interpreter, I tried to normalize:

(\f x -> f x)(\f x -> f x)

My old program mistakenly returned:

\x x -> x x

It’s probably obvious to others, but it was only at this point I realized that
the recursive nature of beta reductions implies that in the right subtree of a
lambda abstraction, a variable may be free, even though it is bound when the
entire tree is considered. With renaming, my program gave the correct answer:

\x x1 -> x x1

Booleans, Numbers, Pairs

When starting out with lambda calculus, we soon miss the symbols of Turing
machines. We endlessly substitute functions in other functions. They never
“bottom out”. Apart from punctuation, we only see a soup of variable names
and lambdas. No numbers nor arithmetic operations. Even computing 1 + 1 seems
impossible!

The predecessor function is far slower than the successor function, as it
constructs the answer by starting from 0 and repeatedly computing the successor.
There is no quick way to “strip off” one layer of a function application.

We can pair up any two terms as follows:

pair = \x y z -> z x y
fst = \p -> p true
snd = \p -> p false

From such tuples, we can construct lists, trees, and so on.

Admittedly, the predecessor function is complicated, probably more so than the
a typical Turing machine implementation. However, this is an artifact of the
Church encoding. With
the Scott
encoding, we have a fast and simple predecesor function:

Instead of unary, we could encode numbers in binary by using lists of booleans.
Though more efficient, we lose the elegant spartan equations for arithmetic
that remind us of the Peano axioms.

Recursion

Because our interpreter cheats and only looks up a let definition at the last
minute, we can recursively compute factorials with:

factrec = \n -> if (is0 n) 1 (mul n (factrec (pred n)))

But we stress this is not a lambda calculus term. If we tried to expand the let
definitions, we’d be forever replacing factrec with an expression containing
a factrec. We’d never eliminate all the function names and reach a valid
lambda calculus term.

Instead, we need something like the
Y combinator. The inner
workings are described in many other places, so we’ll content ourselves
with listing their definitions, and observing they are indeed lambda calculus
terms.

Thus we can simulate any Turing machine with a lambda calculus term: we could
concoct a data structure to represent a tape, which we’d feed into a recursive
function that carries out the state transitions.

Unlike the self-interpreter, the self-reducer requires the input to be the
encoding of a closed term. See Mogensen’s paper for details.

Scary quotes

Why did we step outside lambda calculus and hard-code the implementation
of quote? Maybe we can define it within lambda calculus.
Let’s suppose so. Then consider the expression:

(\f.f ((\y.y) x)) quote

This reduces to quote ((\y.y) x), which is:

λa b c.b(λa b c.c(λy a b c.a y))(λa b c.a x)

On the other hand, if we first evaluate the sub-expression ((\y.y) x), then
it reduces to (\f.f x) quote, which reduces to quote x, which is:

λa b c.a x

This violates the Church-Rosser theorem. In short, id x = x but quote(id
x) /= quote x. Thus quote is not a function, and should be seen as a sort of
macro; a labour-saving abbreviation lying outside of lambda calculus.

We named quote after a similar primitive in the Lisp
language, which suffers from the same affliction.
The Right Way to reify is to sacrifice brevity:

Var=\m.\a b c.a m
App=\m n.\a b c.b m n
Lam=\f.\a b c.c f

Then quote ((\y.y) x) can be expressed in pure lambda calculus as:

App (Lam (\y.Var y)) (Var x)

This is less convenient, but it’s more comprehensible than the raw encoding.
Most importantly, we’re back on firm theoretical ground.