The beauty of math; the humor of stupidity.

Category Archives: lambda calculus

Most programmers are familiar with λ-calculus. It’s one of the most widely used tools in theoretical computer science and logic. It’s the basis of many programming languages, including Lisp, Haskell, ML, and Scala. It’s had a strong influence on many other programming languages that aren’t strictly based on it, including Java, C++, and Javascript.

Motivation: the Problem

Modern programmers love the λ-calculus. A lot of the mathematicians who worked in what became computer science also loved it. But if you go back to its roots, there was a problem.

The λ-calculus in its non-typed form, as it was originally defined, was inconsistent.

Haskell Curry eventually reduced the paradox to something quite simple:

That is a function which returns if is true.

To see the paradox, we need to change how we think of λ-calculus. Most people today who know λ-calculus are programmers, and we think of it primarily as something like a programming language. But the roots of λ-calculus were laid before computers as we know them existed. At the time, λ-calculus was a tool used by logicians. To them it wasn’t a computing machine, it was a logical model of computation.

When we look at the expression , what we see is executable logic that reads as “If applying to returns true, then return y”. And by using computational reasoning, we’d conclude that is a non-terminating computation, because to get the value of , we need to evaluate , and then apply to that result. But to logicians like Haskell Curry, it read is “The statement implies y”. Rendered into simple english, it’s a statement like: “If this statement is true, then my hair is purple”. It’s purely a logical implication, and so even though we can’t actually evaluate , in logical terms, we can still say “This is a well-formed logical statement, so is it true?”.

Is that true? Suppose it is. If it’s true, then that means that it says that whatever is must be true. Without knowing what is, we can’t be sure if this is a true statement.

Suppose that it’s false. If is false, that means that the implication must be true, because FOPL says that if the antecedent of an implication is false, the entire implication is true.

It’s clearer when you look at the english: If this sentence is true, then my hair is purple”. My hair isn’t purple, which means that the statement can’t be true. But if the statement isn’t true, then the implication is true, so the statement is true. We’re caught in a self-reference loop, just like what we saw in Russell’s paradox.

This was considered a very bad thing. It’s a bit subtler than the problem with naive set theory. In set theory, we had an unambiguous inconsistency: we had a set that was well-defined under the axioms of set theory, and without any question, that set was inconsistent. Here, we’ve got an expression which might be consistent, and then again, it might not. It depends on what the consequent – the “my hair is purple” part – of the implication is. If it’s something true, then we’re fine. If it’s not, then we’re stuck.

The problem is, no matter what you put into that “my hair is purple” part, you can use this to produce a proof that it’s true, even if it isn’t. And that’s a fundamental inconsistency, which can’t be tolerated.

Curry’s paradox meant that the logicians working with λ-calculus had a problem that they needed to solve. And following the fashion of the time, they solved it using types. Much like ST type theory attempted to preserve as much of set theory as possible while fixing the inconsistency, typed λ-calculus tried to fix the inconsistency of λ-calculus. The basic approach was also similar: the typed λ-calculus introduced a stratification which made it impossible to build the structures that led to inconsistency.

In ST, the solution was to build a logical system in which it was impossible to express self-reference. In λ-calculus, the basic goal was the same: to eliminate self-reference. But in λ-calculus, that restriction is stated differently. What the type system in λ-calculus does is make it impossible to construct a statement that is a fixed-point combinator.

If we look carefully at the paradoxical statement up above, it’s not really pure λ-calculus. It relies on the fact that we’re defining a function named r, and then applying r to itself using the name . But that’s just syntactic sugar: in fact, can’t reference . You need some tool to let you apply an expression to itself: that tool is called a fixed-point combinator. The most common fixed-point in λ-calculus is the Y combinator, which underlies how recursive computations usually work in λ-calculus.

The way that the simply typed λ-calculuss gets around the Curry paradox is by making it impossible to build a well-typed fixed-point combinator. Without that, you can’t build the self-referential constructs that cause the inconsistency. The downside is that the simply typed λ-calculus, without a fixed point combinator, is not Turing complete. The evaluation of every simple typed λ-calculus expression will eventually terminate.

(As an aside: this isn’t really a problem in practice. The self-referential expressions that cause the Curry paradox turn into non-terminating computations. So they don’t produce a paradox; they just don’t produce anything. Logical inconsistencies don’t produce results: they’re still an error, instead of terminating with an inconsistent result, they just never terminate. Again, to the logicians at the time, the idea of non-termination was, itself, a deep problem that needed a solution.)

The Solution: Stratification by Types

The way that the simply typed λ-calculus fixed things was by creating a stratification using types. The type system created a restriction on the set of statements that were valid, well-formed logical statements. That restriction made it impossible to express a fixed point combinator or a general recursive computation of any kind.

It’s helpful to once again refer back to ST. In ST, we started the type of atoms at level 0. To get to level 1, we defined predicates over level-0 objects, and the set of objects that matched the predicate was a level-1 type. Then we could define predicates over level-1 objects, and the set of level-1 types that satisfied the predicate was a level-2 type, and so on. In the simply typed λ-calculus, we do the same thing, but with functions: we can build functions that operate on primitive atoms (also called base values), or on other functions. When we define a function, it must be assigned a type, and that type must be something at a lower level than the function being defined. You can’t ever pass a function to itself, because the function is an object at a higher level than the type of its parameter.

We start with base types. Every simply-typed lambda calculus starts with a collection of primitive atomic values. The set of atomic values is partitioned into a collection of types, which are called the base types. Base types are usually named by single lower-case greek letters: So, for example, we could have a type which consists of the set of natural numbers; a type which corresponds to boolean true/false values; and a type which corresponds to strings.

Once we have basic types, then we can talk about the type of a function. A function maps from a value of one type (the type of parameter) to a value of a second type (the type of the return value). For a function that takes a parameter of type , and returns a value of type , we write its type as ““. The “” is called the function type constructor; it associates to the right, so is equivalent to .

In every function declaration, we need to specify the type of the parameters and the type of the result. So:

is a function which takes natural number as a parameter, and returns a natural number as a result.

is a function which takes a function as a parameter, and produces a natural number as a result.

As usual in λ-calculus, we don’t have multi-parameter functions – all functions are curried, so a function like addNatural would be a function that takes a natural number as a paramater, and returns a function that takes a natural number and returns a natural number. So the type of addNatural is .

How does this get around the self-reference problem? A function like the one in the Curry paradox takes an object of its own type as a parameter. There’s no way to write that in a type system. It’s a significant restriction which makes it impossible to write general recursive expressions – it limits us to something close to primitive recursion, but it avoids the inconsistency. All valid expressions written with this system of types in place is guaranteed to terminate with a consistent result.

Extending λ-calculus with types

Now, it’s time to get at least a little bit formal, to see how we integrate a stratified type system into the lambda calculus. There’s two facets to that: the syntactic, and the analytic. The syntactic part shows how we extend λ-calculus to include type declarations, and the analytic part shows how to determine whether or not an expression with type declarations is valid.

The syntax part is easy. We add a “:” to the notation; the colon has an expression or variable binding on its left, and a type specification on its right. It asserts that whatever is on the left side of the colon has the type specified on the right side. A few examples:

This asserts that the parameter, has type , which we’ll use as the type name for the natural numbers. (In case it’s hard to tell, that’s a greek letter “nu” for natural.) There is no assertion of the type of the result of the function; but since we know that “+” is a function with type , we can infer that the result type of this function will be .

This is the same as the previous, but with the type declaration moved out, so that it asserts the type for the lambda expression as a whole. This time we can infer that because the function type is , which means that the function parameter has type .

This is a two parameter function; the first parameter has type ν, and the second has type δ. We can infer the return type, which is ν. So the type of the full function is ν → δ → ν. This may seem surprising at first; but remember that λ-calculus really works in terms of single parameter functions; multi-parameter functions are a shorthand for currying. So really, this function is: λ x : ν . (λ y : δ . if y then x * x else x); the inner lambda is type δ → ν; the outer lambda is type ν → (δ → ν).

To talk about whether a program is valid with respect to types (aka well-typed), we need to introduce a set of rules for checking the validity of the type declarations. Using those rules, we can verify that the program is type-consistent.

In type analysis, we’ll talk about judgements. When we can infer the type of an expression using an inference rule, we call that inference a type judgement. Type analysis allows us to use inference and judgements to reason about types in a lambda expression. If any part of an expression winds up with an inconsistent type judgement, then the expression is invalid. If we can show that all of the components of an expression have consistent type judgements, then we can conclude that the expression is well-typed, meaning that it’s valid with respect to the type system.

Type judgements are usually written in a sequent-based notation, which looks like a fraction where the numerator consists of statements that we know to be true; and the denominator is what we can infer from the numerator. In the numerator, we normally have statements using a context, which is a set of type judgements that we already know;it’s usually written as an uppercase greek letter. If a type context includes the judgement that , I’ll write that as .

Rule 1: Type Identity

This is the simplest rule: if we have no other information except a declaration of the type of a variable, then we know that that variable has the type it was declared with.

Rule 2: Type Invariance

This rule is a statement of non-interference. If we know that , then inferring a type judgement about any other term cannot change our type judgement for .

Rule 3: Function Type Inference

This statement allows us to infer function types given parameter types. Ff we know the type of the parameter to a function is ; and if, with our knowledge of the parameter type, we know that the type of term that makes up the body of the function is , then we know that the type of the function is .

Rule 4: Function Application Inference

This one is easy: if we know that we have a function that takes a parameter of type and returns a value of type , then if we apply that function to a value of type , we’ll get a value of type .

These four rules are it. If we can take a lambda expression, and come up with a consistent set of type judgements for every term in the expression, then the expression is well-typed. If not, then the expression is invalid.

So let’s try taking a look at a simple λ-calculus expression, and seeing how inference works on it.

Without any type declarations or parameters, we don’t know the exact type. But we do know that “x” has some type; we’ll call that “α”; and we know that “y” is a function that will be applied with “x” as a parameter, so it’s got parameter type α, but its result type is unknown. So using type variables, we can say “x:α,y:α→β”. We can figure out what “α” and “β” are by looking at a complete expression. So, let’s work out the typing of it with x=”3″, and y=”λ a:ν.a*a”. We’ll assume that our type context already includes “*” as a function of type “ν→ν→ν”, where ν is the type of natural numbers.

By rule 4, we can infer that the type of the expression “a*a” where “a:ν” is “ν”, and *:ν→ν→ν so therefore, by rule 3 the lambda expression has type “ν→ν”. So with type labelling, our expression is now: “(λ x y . y x) (3:ν) (λ a:ν.(a*a):ν) : ν→ν”.

So – now, we know that the parameter “x” of the first lambda must be “ν”; and “y” must be “ν→ν”; so by rule 4, we know that the type of the application expression “y x” must be “ν”; and then by rule 3, the lambda has type: “ν→(ν→ν)→ν”.

So, for this one, both α and β end up being “ν”, the type of natural numbers.

So, now we have a simply typed λ-calculus. The reason that it’s simply typed is because the type treatment here is minimal: the only way of building new types is through the unavoidable constructor. Other typed lambda calculi include the ability to define parametric types, which are types expressed as functions ranging over types.

Programs are Proofs

Now we can get to the fun part. The mantra of type theory is the program is the proof. Here’s where we get our first glimpse of just what that really means!

Think about the types in the simple typed language calculus. Anything which can be formed from the following grammar is a λ-calculus type:

type ::= primitive | function | ( type )

primitive ::= α | β | δ | …

function ::= type→type

The catch with that grammar is that you can create type expressions which, while they are valid type definitions, you can’t write a single, complete, closed expression which will actually have that type. (A closed expression is one with no free variables.) When there is an expression that has a type, we say that the expression inhabits the type; and that the type is an inhabited type. If there is no expression that can inhabit a type, we say it’s uninhabitable. Any expression which either can’t be typed using the inference rules, or which is typed to an uninhabitable type is a type error.

So what’s the difference between inhabitable type, and an uninhabitable type?

The answer comes from something called the Curry-Howard isomorphism. For a typed λ-calculus, there is a corresponding intuitionistic logic. A type expression is inhabitable if and only if the type is a provable theorem in the corresponding logic.

The type inference rules in λ-calculus are, in fact, the same as logical inference rules in intuitionistic logic. A type can be seen both as a statement that this is a function that maps from a value of type to a value of type , and as a logical statement that if we’re given a fact , we could use that to infer the truth of a fact .

If there’s a logical inference chain from an axiom (a given type assignment) to an inferred statement, then the inferred statement is an inhabitable type. If we have a type , then given a inhabited type , we know that is inhabitable, because if is a fact, then is also a fact.

On the other hand, think of a different case . That’s not a theorem, unless there’s some other context that proves it. As a function type, that’s the type of a function which, without including any context of any kind, can take a parameter of type α, and return a value of a different type β. You can’t do that – there’s got to be some context which provides a value of type β – and to access the context, there’s got to be something to allow the function to access its context: a free variable. Same thing in the logic and the λ-calculus: you need some kind of context to establish “α→β” as a theorem (in the logic) or as an inhabitable type (in the λ-calculus).

What kind of context would make a type inhabitable? A definition of a valid function that takes an α, and returns a β. If such a function exists, then that function is a proof of the inhabitility of the type. Literally, the program is the proof.

Like this:

Over the weekend, a couple of my Foursquare coworkers and I were chatting on twitter, and one of my smartest coworkers, a great guy named Jorge Ortiz, pointed out that type inference in Scala (the language we use at Foursquare, and also pretty much my favorite language) is Turing complete.

Somehow, I hadn’t seen this before, and it absolutely blew my mind. So I asked Jorge for a link to the proof. The link he sent me is a really beautiful blog post. It doesn’t just prove that Scala type inference is Turing complete, but it does it in a remarkably beautiful way.

Before I get to the proof, what does this mean?

A system is Turing complete when it can perform any possible computation that could be performed on any other computing device. The Turing machine is, obviously, Turing complete. So is lambda calculus, the Minsky machine, the Brainfuck computing model, and the Scala programming language itself.

If type inference is Turing complete, then that means that you can write a Scala program where, in order to type-check the program, the compiler has to run an arbitrary program to completion. It means that there are, at least theoretically, Scala programs where the compiler will take forever – literally forever – to determine whether or not a given program contains a type error. Needless to say, I consider this to be a bad thing. Personally, I’d really prefer to see the type system be less flexible. In fact, I’d go so far as to say that this is a fundamental error in the design of Scala, and a big strike against it as a language. Having a type-checking system which isn’t guaranteed to terminate is bad.

But let’s put that aside: Scala is pretty entrenched in the community that uses it, and they’ve accepted this as a tradeoff. How did the blog author, Michael Dürig, prove that Scala type checking is Turing complete? By showing how to implement a variant of lambda calculus called SKI combinator calculus entirely with types.

SKI calculus is seriously cool. We know that lambda calculus is Turing complete. It turns out that for any lambda calculus expression, there’s a way rewriting it without any variables, and without any lambdas at all, using three canonical master functions. If you’ve got those three, then you can write anything, anything at all. The three are called S, K, and I.

The S combinator is: .

The K combinator is: .

The I combinator is: .

They come from intuitionistic logic, where they’re fundamental axioms that describe how intuitionistic implication works. K is the rule ; S is the rule ; and I is .

Given any lambda calculus expression, you can rewrite it as a chain of SKIs. (If you’re interested in seeing how, please just ask in the comments; if enough people are interested, I’ll write it up.) What the author of the post id is show how to implement the S, K, and I combinators in Scala types.

trait Term {
type ap[x <: Term] <: Term
type eval <: Term
}

He’s created a type Term, which is the supertype of any computable fragment written in this type-SKI. Since everything is a function, all terms have to have two methods: one of them is a one-parameter “function” which applies the term to a parameter, and the second is a “function” which simplifies the term into canonical form.

He implements the S, K, and I combinators as traits that extend Term. We’ll start with the simplest one, the I combinator.

I needs to take a parameter, so its apply type-function takes a parameter x, and returns a new type I1[x] which has the parameter encoded into it. Evaluating I1[x] does exactly what you’d want from the I combinator with its parameter – it returns it.

The apply “method” of I1 looks strange. What you have to remember is that in lambda calculus (and in the SKI combinator calculus), everything is a function – so even after evaluating I.ap[x] to some other type, it’s still a type function. So it still needs to be applicable. Applying it is exactly the same thing as applying its parameter.

So if have any type A, if you write something like var a : I.ap[A].eval, the type of a will evaluate to A. If you apply I.ap[A].ap[Z], it’s equivalent to taking the result of evaluating I.ap[A], giving you A, and then applying that to Z.

Michid then goes on to show examples of how to use these beasts. He implements equality testing, and then shows how to test if different type-expressions evaluate to the same thing. And all of this happens at compile time. If the equality test fails, then it's a type error at compile time!

It's a brilliant little proof. Even if you can't read Scala syntax, and you don't really understand Scala type inference, as long as you know SKI, you can look at the equality comparisons, and see how it works in SKI. It's really beautiful.

Like this:

Today I’m going to show you the basic idea behind the equivalency of closed cartesian categories and typed lambda calculus. I’ll do that by showing you how the λ-theory of any simply typed lambda calculus can be mapped onto a CCC.

First, let’s define the term “lambda theory”. In the simply typed lambda calculus, we always have a set of base types – the types of simple atomic values that can appear in lambda expressions. A lambda theory is a simply typed lambda calculus, plus a set of additional rules that define equivalences over the base types.

So, for example, if one of the base types of a lambda calculus was the natural numbers, the lambda theory would need to include rules to define equality over the natural numbers:

x = y if x=0 and y=0; and

x = y if x=s(x’) and y=s(y’) and x’ = y’

So. Suppose we have a lambda-theory . We can construct a corresponding category . The objects in are the types in . The arrows in correspond to families of expressions in ; an arrow corresponds to the set of expressions of type that contain a single free variable of type .

The semantics of the lambda-theory can be defined by a functor; in particular, a cartesian closed functor that maps from to the closed cartesian category of Sets. (It’s worth noting that this is completely equivalent to the normal Kripke semantics for lambda calculus; but when you get into more complex lambda calculi, like Hindley-Milner variants, this categorical formulation is much simpler.)

We describe how we build the category for the lambda theory in terms of a CCC using something called an interpretation function. It’s really just a notation that allows us to describe the translation recursively. The interpretation function is written using brackets: is the categorical interpretation of the type from lambda calculus.

So, first, we define an object for each type in . We need to include a special
type, which we call unit. The idea behind unit is that we need to be able to talk about “functions” that either don’t take any real paramaters, or functions that don’t return anything. Unit is a type which contains exactly one atomic value. Since there’s only one possible value for unit, and unit doesn’t have any extractable sub-values, conceptually, it doesn’t ever need to be passed around. So it’s a “value” that never needs to get passed – perfect for a content-free placeholder.

Anyway, here we go with the base rules:

Next, we need to define the typing rules for complex types:

Now for the really interesting part. We need to look at type derivations – that is, the type inference rules of the lambda calculus – to show how to do the correspondences between more complicated expressions. Just like we did in lambda calculus, the type derivations are done with a context, containing a set of type judgements. Each type judgement assigns a type to a lambda term. There are two translation rules for contexts:

We also need to describe what to do with the values of the primitive types:

For each value , there is an arrow .

And now the rest of the rules. Each of these is of the form , where we’re saying that entails the type judgement . What it means is the object corresponding to the type information covering a type inference for an expression corresponds to the arrow in .

Unit evaluation: . (A unit expression is a special arrow “!” to the unit object.)

Simple Typed Expressions: . (A simple value expression is an arrow composing with ! to form an arrow from Γ to the type object of Cs type.)

Free Variables: (A term which is a free variable of type A is an arrow from the product of Γ and the type object A to A; That is, an unknown value of type A is some arrow whose start point will be inferred by the continued interpretation of gamma, and which ends at A. So this is going to be an arrow from either unit or a parameter type to A – which is a statement that this expression evaluates to a value of type A.)

Inferred typed expressions: , where (If the type rules of Γ plus the judgement gives us , then the term is an arrow starting from the product of the interpretation of the full type context with ), and ending at . This is almost the same as the previous rule: it says that this will evaluate to an arrow for an expression that results in type .)

Function Abstraction: . (A function maps to an arrow from the type context to an exponential , which is a function from to .)

Function application: , , . (function evaluation takes the eval arrow from the categorical exponent, and uses it to evaluate out the function.)

There are also two projection rules for the decomposing categorical products, but they’re basically more of the same, and this is already dense enough.

The intuition behind this is:

arrows between types are families of values. A particular value is a particular arrow from unit to a type object.

the categorical exponent in a CC is exactly the same thing as a function type in λ-calculus; and an arrow to an exponent is the same thing as a function value. Evaluating the function is using the categorical exponent’s eval arrow to “decompose” the exponent, and produce an arrow to the function’s result type; that arrow is the value that the function evaluates to.

And the semantics – called functorial semantics – maps from the objects in this category, to the category of Sets; function values to function arrows; type objects to sets; values to value objects and arrows. (For example, the natural number type would be an object in , and the set of natural numbers in the sets category would be the target of the functor.)

Aside from the fact that this is actually a very clean way of defining the semantics of a not-so-simply typed lambda calculus, it’s also very useful in practice. There is a way of executing lambda calculus expressions as programs that is based on this, called the Categorical Abstract Machine. The best performing lambda-calculus based programming language (and my personal all-time-favorite programming language), Objective-CAML had its first implementation based on the CAM. (CAML stands for categorical abstract machine language.).

From this, you can see how the CCCs and λ-calculus are related. It turns out that that relation is not just cool, but downright useful. Concepts from category theory – like monads, pullbacks, and functors are really useful things in programming languages! In some later posts, I’ll talk a bit about that. My current favorite programming language, Scala, is one of the languages where there’s a very active stream of work in applying categorical ideas to real-world programming problems.

Like this:

Lambda calculus started off with the simple, untyped lambda calculus that we’ve been talking about so far. But one of the great open questions about lambda calculus was: was it sound? Did it have a valid model?

Church found that it was easy to produce some strange and non-sensical expressions using the simple lambda calculus. In order to try to work around those problems, and end up with a consistent system, Church introduced the concept of types, producing the simply typed lambda calculus. Once types hit the scene, things really went wild; the type systems for lambda calculi have never stopped developing: people are still finding new things to do by extending the LC type system today! Most lambda calculus based programming languages are based on the Hindley-Milner lambda calculus, which is a simplification of one of the standard sophisticated typed lambda calculi called SystemF. There’s even a Lambda Cube which can categorize the different type abstractions for lambda calculus (but alas, despite its name, it’s not related to the time cube.) Once people really started to understand types, they realized that the untyped lambda calculus was really just a pathologically simple instance of the simply typed lambda calculus: a typed LC with only one base type.

The semantics of lambda calculus are easiest to talk about in a typed version. For now, I’ll talk about the simplest typed LC, known as the simply typed lambda calculus. One of the really amazing things about this, which I’ll show, is that a simply typed lambda calculus is completely semantically equivalent to an intuitionistic propositional logic: each type in the program is a proposition in the logic; each β reduction corresponds to an inference step; and each complete function corresponds to a proof! Look below for how.

Types

The main thing that typed lambda calculus adds to the mix is a concept called base types. In a typed lambda calculus, you have some universe of atomic values which you can manipulate; those values are partitioned into the *base types*. Base types are usually named by single lower-case greek letters: So, for example, we could have a type “σ”, which consists of the set of natural numbers; a type “τ” which corresponds to boolean true/false values; and a type “γ” which corresponds to strings.

Once we have basic types, then we can talk about the type of a function. A function maps from a value of one type (the type of parameter) to a value of a second type (the type of the return value). For a function that takes a parameter of type “γ”, and returns a value of type “δ”, we write its type as “γ → δ”. “→” is called the _function type constructor_; it associates to the right, so “γ → δ → ε” parses as “γ → (δ → ε)”

To apply types to the lambda calculus, we do a couple of things. First, we need a syntax update so that we can include type information in lambda terms. And second, we need to add a set of rules to show what it means for a typed program to be valid.

The syntax part is easy. We add a “:” to the notation; the colon has an expression or variable binding on its left, and a type specification on its right. It asserts that whatever is on the left side of the colon has the type specified on the right side. A few examples:

This asserts that the parameter, has type , which we’ll use as the type name for the natural numbers. (In case it’s hard to tell, that’s a greek letter “nu” for natural.) There is no assertion of the type of the result of the function; but since we know that “+” is a function with type , we can infer that the result type of this function will be .

This is the same as the previous, but with the type declaration moved out, so that it asserts the type for the lambda expression as a whole. This time we can infer that because the function type is , which means that the function parameter has type .

This is a two parameter function; the first parameter has type ν, and the second has type δ. We can infer the return type, which is ν. So the type of the full function is ν → δ → ν. This may seem surprising at first; but remember that lambda calculus really works in terms of single parameter functions; multi-parameter functions are a shorthand for currying. So really, this function is: λ x : ν . (λ y : δ . if y then x * x else x); the inner lambda is type δ → ν; the outer lambda is type ν → (δ → ν).

To talk about whether a program is valid with respect to types (aka well-typed), we need to introduce a set of rules for type inference. Then we can verify that the program is type-consistent.

In type inference, we talked about judgements. When we can infer the type of an expression using an inference rule, we call that inference a type judgement. Type inference and judgements allow us to reason about types in a lambda expression; and if any part of an expression winds up with an inconsistent type judgement, then the expression is invalid. (When Church started doing typed LC, one of the motivations was to distinguish between values representing “atoms”, and values representing “predicates”; he was trying to avoid the Godel-esque paradoxes, by using types to ensure that predicates couldn’t operate on predicates.)

Type judgements are usually written in a sequent-based notation, which looks like a fraction where the numerator consists of statements that we know to be true; and the denominator is what we can infer from the numerator. In the numerator, we normally have statements using a context, which is a set of type judgements that we already know;it’s usually written as an uppercase greek letter. If a type context includes the judgement that , I’ll write that as .

Rule 1: Type Identity

This is the simplest rule: if we have no other information except a declaration of the type of a variable, then we know that that variable has the type it was declared with.

Rule 2: Type Invariance

This rule is a statement of non-interference. If we know that , then inferring a type judgement about any other term cannot change our type judgement for .

Rule 3: Function Type Inference

This statement allows us to infer function types given parameter types. Ff we know the type of the parameter to a function is ; and if, with our knowledge of the parameter type, we know that the type of term that makes up the body of the function is , then we know that the type of the function is .

Rule 4: Function Application Inference

This one is easy: if we know that we have a function that takes a parameter of type and returns a value of type , then if we apply that function to a value of type , we’ll get a value of type .

These four rules are it. If we can take a lambda expression, and come up with a consistent set of type judgements for every term in the expression, then the expression is well-typed. If not, then the expression is invalid.

So let’s try taking a look at a simple lambda calculus expression, and seeing how inference works on it.

Without any type declarations or parameters, we don’t know the exact type. But we do know that “x” has some type; we’ll call that “α”; and we know that “y” is a function that will be applied with “x” as a parameter, so it’s got parameter type α, but its result type is unknown. So using type variables, we can say “x:α,y:α→β”. We can figure out what “α” and “β” are by looking at a complete expression. So, let’s work out the typing of it with x=”3″, and y=”λ a:ν.a*a”. We’ll assume that our type context already includes “*” as a function of type “ν→ν→ν”, where ν is the type of natural numbers.

By rule 4, we can infer that the type of the expression “a*a” where “a:ν” is “ν”, and *:ν→ν→ν so therefore, by rule 3 the lambda expression has type “ν→ν”. So with type labelling, our expression is now: “(λ x y . y x) (3:ν) (λ a:ν.(a*a):ν) : ν→ν”.

So – now, we know that the parameter “x” of the first lambda must be “ν”; and “y” must be “ν→ν”; so by rule 4, we know that the type of the application expression “y x” must be “ν”; and then by rule 3, the lambda has type: “ν→(ν→ν)→ν”.

So, for this one, both α and β end up being “ν”, the type of natural numbers.

So, now we have a simply typed lambda calculus. The reason that it’s simply typed is because the type treatment here is minimal: the only way of building new types is through the unavoidable constructor. Other typed lambda calculi include the ability to define parametric types, which are types expressed as functions ranging over types.

Programs are Proofs

Here’s where it gets really fun. Think about the types in the simple typed language calculus. Anything which can be formed from the following grammar is a lambda calculus type:

type ::= primitive | function | ( type )

primitive ::= α | β | δ | …

function ::= type→type

The catch with that grammar is that you can create type expressions which, while they are valid type definitions, you can’t write a single, complete, closed expression which will actually have that type. (A closed expression is one with no free variables.) When there is an expression that has a type, we say that the expression inhabits the type; and that the type is an inhabited type. If there is no expression that can inhabit a type, we say it’s uninhabitable. Any expression which either can’t be typed using the inference rules, or which is typed to an uninhabitable type is a type error.

So what’s the difference between inhabitable type, and an uninhabitable type?

The answer comes from something called the Curry-Howard isomorphism. For a typed lambda calculus, there is a corresponding intuitionistic logic. A type expression is inhabitable if and only if the type is a provable theorem in the corresponding logic.

The type inference rules in lambda calculus are, in fact, the same as logical inference rules in intuitionistic logic. A type can be seen as either a statement that this is a function that maps from a value of type to a value of type , or as a logical statement that if we’re given a fact alpha , we could use that to infer the truth of a fact .

If there’s a logical inference chain from an axiom (a given type assignment) to an inferred statement, then the inferred statement is an inhabitable type. If we have a type , then given a inhabited type , we know that is inhabitable, because if is a fact, then is also a fact.

On the other hand, think of a different case . That’s not a theorem, unless there’s some other context that proves it. As a function type, that’s the type of a function which, without including any context of any kind, can take a parameter of type α, and return a value of a different type β. You can’t do that – there’s got to be some context which provides a value of type β – and to access the context, there’s got to be something to allow the function to access its context: a free variable. Same thing in the logic and the lambda calculus: you need some kind of context to establish “α→β” as a theorem (in the logic) or as an inhabitable type (in the lambda calculus).

It gets better. If there is a function whose type is a theorem in the corresponding intuitionistic logic, then the program that has that type is a proof of the theorem. Each beta reduction is equivalent to an inference step in the logic. This is what programming languages geeks like me mean when we say “the program is the proof”: a well-typed program is, literally, a proof its well-typed-ness.

To connect back to the discussion about models: the intuitionistic logic corresponding to the lambda calculus and intuitionistic logic are, in a deep sense, just different reflections of the same thing. We know that intuitionistic logic has a valid model. And that, in turn, means that lambda calculus is valid as well. When we show that something is true using the lambda calculus, we can trust that it’s not an artifact of an inconsistent system.

Like this:

As I said in the last post, Church came up with λ-calculus, which looks like it’s a great formal model of computation. But – there was a problem. Church struggled to find a model. What’s a model, and why would that matter? That’s the point of this post. To get a quick sense of what a model is, and why it matters?

A model is basically a mapping from the symbols of a logical system to some set off objects, such that all statements that you can prove in the logical system will be true about the corresponding objects. Note that when I say object here, I don’t necessarily mean real-world physical objects – they’re just something that we can work with, which is well-defined and consistent.

Why does it matter? Because the whole point of a system like λ-calculus is because we want to use it for reasoning. When you have a logical system like λ-calculus, you’ve built this system with its rules for a reason – because you want to use it as a tool for understanding something. The model provides you with a way of saying that the conclusions you derive using the system are meaningful. If the model isn’t correct, if it contains any kind of inconsistency, then your system is completely meaningless: it can be used to derive anything.

So the search for a model for λ-calculus is really important. If there’s a valid model for it, then it’s wonderful. If there isn’t, then we’re just wasting our time looking for one.

So, now, let’s take a quick look at a simple model, to see how a problem can creep in. I’m going to build a logic for talking about the natural numbers – that is, integers greater than or equal to zero. Then I’ll show you how invalid results can be inferred using it; and finally show you how it fails by using the model.

One quick thing, to make the notation easier to read: I’m going to use a simple notion of types. A type is a set of atoms for which some particular one-parameter predicate is true. For example, if is true, I’ll say that x is a member of type P. In a quantifier, I’ll say things like to mean . Used this way, we can say that P is a type predicate.

How do we define natural numbers using logic?

First, we need an infinite set of atoms, each of which represents one number. We pick one of them, and call it zero. To represent the fact that they’re natural numbers, we define a predicate , which is true if and only if x is one of the atoms that represents a natural number.

Now, we need to start using predicates to define the fundamental properties of numbers. The most important property of natural numbers is that they are a sequence. We define that idea using a predicate, , where is true if and only if x = y + 1. To use that to define the ordering of the naturals, we can say: .

Or in english: every natural number has a successor – you can always add one to a natural number and get another natural number.

We can also define predecessor similarly, with two statements:

.

So every number has a predecessor, and every number has a successor, and x is the predecessor of y if y is the successor of x.

To be able to define things like addition and subtraction, we can use successor. Let’s define addition using a predicate Sum(x,y,z) which means “z = x + y”.

Again, in english: for any two natural numbers, there is a natural number that it their sum; x + 0 always = x; and for any natural number, x + y = z is true if (x + 1) + (y – 1) = z.

Once we have addition, subtraction is easy:

That’s: x-y=z if and only if x=y+z.

We can also define greater than using addition:

. But we’ve violated that – we have both , and

Like this:

So, last post, I finally defined closed cartesian categories. And I alluded to the fact that the CCCs are, essentially, equivalent to the simply typed λ calculus. But I didn’t really talk about what that meant.

Before I can get to that, you need to know what λ calculus is. Many readers are probably familiar, but others aren’t. And as it happens, I absolutely love λ calculus.

In computer science, especially in the field of programming languages, we tend to use λ calculus a whole lot. It’s also extensively used by logicians studying the nature of computation and the structure of discrete mathematics. λ calculus is great for a lot of reasons, among them:

It’s very simple.

It’s Turing complete: if a function can be computed by any possible computing device, then it can be written in λ-calculus.

It’s easy to read and write.

Its semantics are strong enough that we can do reasoning from it.

It’s got a good solid model.

It’s easy to create variants to explore the properties of various alternative ways of structuring computations or semantics.

The ease of reading and writing λ calculus is a big deal. It’s led to the development of a lot of extremely good programming languages based, to one degree or another, on the λ calculus: Lisp, ML, Haskell, and my current favorite, Scala, are very strongly λ calculus based.

The λ calculus is based on the concept of functions. In the pure λ calculus, everything is a function; there are no values except for functions. In fact, we can pretty much build up all of mathematics using λ-calculus.

With the lead-in out of the way, let’s dive in a look at λ-calculus. To define a calculus, you need to define two things: the syntax, which describes how valid expressions can be written in the calculus; and a set of rules that allow you to symbolically manipulate the expressions.

Lambda Calculus Syntax

The λ calculus has exactly three kinds of expressions:

Function definition: a function in λ calculus is an expression, written: λ param . body, which defines a function with one parameter.

Identifier reference: an identifier reference is a name which matches the name of a parameter defined in a function expression enclosing the reference.

Function application: applying a function is written by putting the function value in front of its parameter, as in x y to apply the function x to the value y.

There’s a trick that we play in λ calculus: if you look at the definition above, you’ll notice that a function (lambda expression) only takes one parameter. That seems like a very big constraint – how can you even implement addition with only one parameter?

It turns out to be no problem, because of the fact that functions are, themselves, values. Instead of writing a two parameter function, you can write a one parameter function that returns a one parameter function, which can then operate on the second parameter. In the end, it’s effectively the same thing as a two parameter function. Taking a two-parameter function, and representing it by two one-parameter functions is called currying, after the great logician Haskell Curry.

For example, suppose we wanted to write a function to add x and y. We’d like to write something like: λ x y . x + y. The way we do that with one-parameter functions is: we first write a function with one parameter, which returns another function with one parameter.

Now that we know that adding multiple parameter functions doesn’t really add anything but a bit of simplified syntax, we’ll go ahead and use them when it’s convenient.

One important syntactic issue that I haven’t mentioned yet is closure or complete binding. For a λ calculus expression to be evaluated, it cannot reference any identifiers that are not bound. An identifier is bound if it a parameter in an enclosing λ expression; if an identifier is not bound in any enclosing context, then it is called a free variable. Let’s look quickly at a few examples:

λ x . p x y: in this expression, y and p are free, because they’re not the parameter of any enclosing λ expression; x is bound because it’s a parameter of the function definition enclosing the expression p x y where it’s referenced.

λ x y.y x: in this expression both x and y are bound, because they are parameters of the function definition, and there are no free variables.

λ y . (λ x . p x y). This one is a tad more complicated, because we’ve got the inner λ. So let’s start there. In the inner λ, λ x . p x y, y and p are free and x is bound. In the full expression, both x and y are bound: x is bound by the inner λ, and y is bound by the other λ. “p” is still free.

We’ll often use “free(x)” to mean the set of identifiers that are free in the expression “x”.

A λ calculus expression is valid (and thus evaluatable) only when all of its variables are bound. But when we look at smaller subexpressions of a complex expression, taken out of context, they can have free variables – and making sure that the variables that are free in subexpressions are treated right is very important.

Lambda Calculus Evaluation Rules

There are only two real rules for evaluating expressions in λ calculus; they’re called α and β. α is also called “conversion”, and β is also called “reduction”.

α is a renaming operation; basically it says that the names of variables are unimportant: given any expression in λ calculus, we can change the name of the parameter to a function as long as we change all free references to it inside the body.

So – for instance, if we had an expression like:

λ x . if (= x 0) then 1 else x^2

We can do an α to replace X with Y (written “α[x/y]” and get):

λ y . if (= y 0) then 1 else y^2

Doing α does not change the meaning of the expression in any way. But as we’ll see later, it’s important because without it, we’d often wind up with situations where a single variable symbol is bound by two different enclosing λs. This will be particularly important when we get to recursion.

β reduction is where things get interesting: this single rule is all that’s needed to make the λ calculus capable of performing any computation that can be done by a machine.

β basically says that if you have a function application, you can replace it with a copy of the body of the function with references to the parameter identifiers replaced by references to the parameter value in the application. That sounds confusing, but it’s actually pretty easy when you see it in action.

Suppose we have the application expression: (λ x . x + 1) 3. By performing a beta reduction, we can replace the application by taking the body x + 1 of the function, and substituting (or αing) the value of the parameter (3) for the parameter variable symbol (x). So we replace all references to x with 3. So the result of doing a beta reduction xs 3 + 1.

A slightly more complicated example is the expression:

λ y . (λ x . x + y)) q

It’s an interesting expression, because it’s a λ expression that when applied, results in another λ expression: that is, it’s a function that creates functions. When we do beta reduction in this, we’re replacing all references to the parameter y with the identifier q; so, the result is λ x . x + q.

One more example, just for the sake of being annoying. Suppose we have: (λ x y. x y) (λ z . z * z) 3

That’s a function that takes two parameters, and applies the first one to the second one. When we evaluate that, we replace the parameter x in the body of the first function with λ z . z * z; and we replace the parameter y with 3, getting: (λ z . z * z) 3. And we can perform beta on that, getting 3 * 3.

That condition on the end, “if free(e) ⊂ free(B[x := e]” is why we need α: we can only do beta reduction if doing it doesn’t create any collisions between bound identifiers and free identifiers: if the identifier “z” is free in “e”, then we need to be sure that the beta-reduction doesn’t make “z” become bound. If there is a name collision between a variable that is bound in “B” and a variable that is free in “e”, then we need to use α to change the identifier names so that they’re different.

As usual, an example will make that clearer: Suppose we have a expression defining a function, λ z . (λ x . x+z). Now, suppose we want to apply it: (λ z . (λ x . x + z)) (x + 2). In the parameter (x + 2), x is free. Now, suppose we break the rule and go ahead and do beta. We’d get “λ x . x + x + 2“. The variable that was free in x + 2 is now bound! We’ve changed the meaning of the function, which we shouldn’t be able to do. If we were to apply that function after the incorrect β, we’d get (λ x . x + x + 2) 3. By beta, we’d get 3 + 3 + 2, or 8.

What if we did α the way we were supposed to?

First, we’d do an α to prevent the name overlap. By α[x/y], we would get λ z . (λ y . y + z) (x+2).

Then by β, we’d get “λ y . y + x + 2“. If we apply this function the way we did above, then by β, we’d get 3+x+2.3+x+2 and 3+3+2 are very different results!

And that’s pretty much it. There’s another optional rule you can add called η-conversion. η is a rule that adds extensionality, which provides a way of expressing equality between functions.

η says that in any λ expression, I can replace the value f with the value g if/f for all possible parameter values x, f x = g x.

What I’ve described here is Turing complete – a full effective computation system. To make it useful, and see how this can be used to do real stuff, we need to define a bunch of basic functions that allow us to do math, condition tests, recursion, etc. I’ll talk about those in my next post.

It’l also important to point out that while I’ve gone through a basic definition of λ calculus, and described its mechanics, I haven’t yet defined a model for λ-calculus. That’s quite an important omission! λ-calculus was played with by logicians for several years before they were able to come up with a complete model for it, and it was a matter of great concern that although it looked correct, the early attempts to define a model for it were failures! And without a valid model, the results of the system are meaningless. An invalid model in a logical system like calculus is like a contradiction in axioms: it means that nothing that it produces is valid.

One of my favorite categorical structures is a thing called a closed cartesian category, or CCC for short. Since I’m a computer scientist/software engineer, it’s a natural: CCCs are, basically, the categorical structure of lambda calculus – and thus, effectively, a categorical model of computation. However, before we can talk about the CCCs, we need – what else? – more definitions.

Cartesian Categories

A cartesian category (note not cartesian closed category) is a category:

With a terminal object , and

, the objects and arrows of the categorical product .

So, a cartesian category is a category closed with respect to product. Many of the common categories are cartesian: the category of sets, and the category of enumerable sets, And of course, the meaning of the categorical product in set? Cartesian product of sets.

Categorical Exponentials

To get from cartesian categories to cartesian closed categories, we also need to define categorical exponentials. Like categorical product, the value of a categorical exponential is not required to included in a category. The exponential is a complicated definition, and it’s a bit hard to really get your head around, but it’s well worth the effort. If categorical products are the categorical generalization of set products, then the categorical exponential is the categorical version of a function space. It gives us the ability to talk about structures that are the generalized version of “all functions from A to B”.

Given two objects x and y from a category C, their categorical exponential xy, if it exists in the category, is defined by a set of values:

An object ,

An arrow , called an evaluation map.

, an operation . (That is, an operation mapping from arrows to arrows.)

These values must have the following properties:

:

To make that a bit easier to understand, let’s turn it into a diagram.

As I alluded to earlier, you can also think of it as a generalization of a function space. is the set of all functions from y to x. The evaluation map is simple description in categorical terms of an operation that applies a function from a to b (an arrow) to a value from a, resulting in an a value from b.

So what does the categorical exponential mean? I think it’s easiest to explain in terms of sets and functions first, and then just step it back to the more general case of objects and arrows.

If X and Y are sets, then is the set of functions from Y to X.

Now, look at the diagram:

The top part says, basically, that is a function from to to : so takes a member of , and uses it to select a function from to .

The vertical arrow says:

given the pair , maps to a value in .

given a pair , we’re going through a function. It’s almost like currying:

The vertical arrow going down is basically taking , and currying it to .

Per the top part of the diagram, selects a function from to . (That is, a member of .)

So, at the end of the vertical arrow, we have a pair .

The “eval” arrow maps from the pair of a function and a value to the result of applying the function to the value.

Cartesian Closed Categories

Now – the abstraction step is actually kind of easy: all we’re doing is saying that there is a structure of mappings from object to object here. This particular structure has the essential properties of what it means to apply a function to a value. The internal values and precise meanings of the arrows connecting the values can end up being different things, but no matter what, it will come down to something very much like function application.

With exponentials and products, we can finally say what the cartesian closed categories (CCCs). A Cartesian closed category is a category that is closed with respect to both products and exponentials.

Why do we care? Well, the CCCs are in a pretty deep sense equivalent to the simply typed lambda calculus. That means that the CCCs are deeply tied to the fundamental nature of computation. The structure of the CCCs – with its closure WRT product and exponential – is an expression of the basic capability of an effective computing system. So next, we’ll take a look at a couple of examples of what we can do with the CCCs as a categorical model of computation.

Lambda calculus started off with the simple, untyped lambda calculus that we’ve been talking about so far. But one of the great open questions about lambda calculus was: was it sound? Did it have a valid model?
Church found that it was easy to produce some strange and non-sensical expressions using the simple lambda calculus. In order to try to work around those problems, and end up with a consistent system, Church introduced the concept of *types*, producing the *simply typed lambda calculus*. Once types hit the scene, things really went wild; the type systems for lambda calculi have never stopped developing: people are still finding new things to do by extending the LC type system today! Most lambda calculus based programming languages are based on the Hindley-Milner lambda calculus, which is a simplification of one of the standard sophisticated typed lambda calculi called *SystemF*. There’s even a [Lambda Cube][cube], though it’s not related to the Time Cube. Once people really started to understand types, they realized that the *untyped* lambda calculus was really just a pathologically simple instance of the simply typed lambda calculus: a typed LC with only one base type.
The semantics of lambda calculus are easiest to talk about in a typed version. For now, I’ll talk about the simplest typed LC, known as the *simply typed lambda calculus*. One of the really amazing things about this, which I’ll show, is that a simply typed lambda calculus is completely semantically equivalent to an intuitionistic propositional logic: each type in the program is a proposition in the logic; each β reduction corresponds to an inference step; and each complete function corresponds to a proof! Look below the fold for how.

Like this:

So in the last few posts, I’ve been building up the bits and pieces that turn lambda calculus into a useful system. We’ve got numbers, booleans, and choice operators. The only thing we’re lacking is some kind of repetition or iteration.
In lambda calculus, all iteration is done by recursion. In fact, recursion is a pretty natural way of expressing iteration. It takes a bit of getting used to, but anyone who’s programmed in a language like Scheme, ML, or Haskell for a while gets very used to idea, and feels frustrated coming back to a language like Java, where you need to write loops.
It can be a bit difficult if you’re not used to thinking recursively. I wrote [an explanation of recursion][recursion] which you can go read if you’re not used to what recursion is or how it works.
be found here). But since functions in lambda calculus don’t have names, that means that we resort to something tricky. It’s called the Y combinator, aka the lambda fixed point operator.
Let’s start by looking at a simple recursive function outside of the lambda calculus. The factorial function, n!, is the standard example:
factorial(n) = 1 if n = 0,
or factorial(n) = n*factorial(n-1) if n > 0

If we want to start trying to write that in lambda calculus, we’d need a couple of tools… We need a test for equality to zero, and we need a way of multiplying numbers; and we need a way of subtracting one.
For testing equality to zero, we’ll use a function named *IsZero*, which takes three parameters: a number, and two values. If the number is 0, it returns the first value; if it’s not 0, then it returns the second value.
For multiplication – multiplication is an iterative algorithm, so we can’t write multiplication until we work out recursion. But we’ll just handwave that for now, and have a function *Mult x y*.
And finally, for subtracting one, we’ll use *Pred x* for the predecessor of x – that is, x – 1.
So – a first stab at factorial, written with the recursive call left with a blank in it, would be:

*λ n . IsZero n 1 (Mult n (**something** (Pred n)))*

Now, the question is, what kind of “something” can we plug in to there? What we’re really like to do is plug in a copy of the function itself:

*Fact ≡ λ n . IsZero n 1 (Mult n (Fact (Pred n)))*

How can we do that? Well, the usual way of plugging something in to a lambda calculus function is through a parameter:

*Fact ≡ (λ f n . IsZero n 1 (Mult n (f (Pred n)))) Fact*

Of course, we can’t plug in a copy of the function as its own parameter that way: the name *Fact* doesn’t exist in the expression in which we’re trying to use it. You can’t use an undefined name – and in lambda calculus, the *only* way to bind a name is by passing it as a parameter to a λ expression. So what can we do?
The answer is to use something called a *combinator*. A combinator is a special kind of *higher order function* which can be defined without reference to anything but function applications. (A higher order function is a function which takes functions as parameters and returns functions as results). The Y combinator is the special, almost magical function that makes recursion possible. Here’s what it looks like:

Y ≡ λ y . (λ x . y (x x)) (λ x . y (x x))

If you look at it, the reason for calling it Y is because it is *shaped* like a Y. To show you that more clearly, sometimes we write lambda calculus using trees. Here’s the tree for the Y combinator:
Why is the Y combinator an answer to our problem in defining the factorial function? The Y combinator is something called a *fixed point* combinator. What makes it special is the fact that for any function *f*, *Y f* evaluates to *f Y f*; which evaluates to *f (f Y f)*; which evaluates to *f (f (f Y f))*. See why it’s called Y?
Let’s try walking through “*Y f*”:
1. Expand Y: “*(λ y . (λ x . y (x x)) (λ x . y (x x))) f*”
2. β: “*(λ x . f (x x)) (λ x . f (x x))*
3. β again: “*f (λ x . f (x x)) (λ x . f (x x))*”
4. Since “*Y f = (λ x . f (x x)) (λ x . f (x x))*”, what we just got in step three is “f Y f”.
See, there’s the magic of “Y”. No matter what you do, you can’t make it consume itself. Evaluating “*Y f*” will produce another copy of *f*, and leave the “Y f” part untouched.
So how do we use this crazy thing?
Remember our last attempt at defining the factorial function? Let’s look at it again:

Now – the trick is, “Fact” is not an identifier defined inside of “Fact”. How do we let “Fact” reference “Fact”? Well, we did a lambda abstraction to let us pass the “Fact” function as a parameter; so what we needed to do is to find a way to write “Fact” that lets us pass it to itself as a parameter.
What does “Y f” do? It expands into a call to “f” with “Y f” as its first parameter. And “Y f” will expand into “f Y f” – that is, a call to “f” with “Y f” as its parameter. In other words, Y f turns “f” into a recursive function with *itself* as its first parameter.
So the factorial function is:

*Fact ≡ Y Metafact*

*(Y metafact)* is the parameter value of “f” in the metafact lambda; when we do β on the function, if n is zero, then it just returns 1. If it’s not zero, then we get the call to *f (Pred n)*. *f* betas to *Y metafact*. Which does that funky magic copying thing, giving us *metafact (Y metafact) (Pred n)*.
Voila, recursion. s
I learned about the Y combinator back in my undergrad days, which would place it around 1989 – and I still find it rather mystifying. I do understand it now, but I can’t imagine how on earth anyone ever figured it out!
If you’re interested in this, then I highly recommend getting a copy of the book [The Little Schemer][schemer]. It’s a wonderful little book – set up like a childrens’ book, where the front of each page is a question; and the back of each page is the answer. It’s written in a delightfully playful style, it’s very fun and engaging, and it will not only teach you to program in Scheme.
As an important side-note there are actually a couple of different versions of the Y combinator. There are different ways of evaluating lambda calculus: given an expression like *(λ x y . x * y) 3 ((λ z. z * z) 4)*”
we can do it in two different orders: we can first do the beta on “*(λ x y . x * y)*”,which would give us: “*3 * ((λ z . z * z) 4)*”.
Or, we could beta “*((λ z . z * z) 4)*” first: “*(λ x y . x * y) 3 (4 * 4)*”. Nn this case, the two orders end up with the same result; but that’s not always the case. Sometimes the order of evaluation matters – and the way that the Y combinator works is one of those times. One order will result in expanding the Y infinitely; the other will result in a clean recursive function.
The first order is what we call *lazy evaluation*: don’t evaluate the parameters to a function until they’re needed. (This is also pretty much the same thing as what we sometime call *by name* parameter passing.) The second is called *eager evaluation* : always evaluate parameters *before* the functions that they’re passed to. (In real programming languages, Lisp, Scheme, and ML are lambda-calculus based languages that use eager evaluation; Haskell and Miranda are lambda calculus based languages that use lazy evaluation.) The Y combinator I described above is the Y for *lazy* evaluation. If we used eager evaluation, then Y combinator above wouldn’t work – in fact, it would copy Ys forever. There is another version of Y which works for eager evaluation – in fact, it’s the one described in “The Little Schemer”, and they explain it so much better than I do that I’ll just recommend again that you head for whatever bookstore you prefer, and buy yourself a copy.
[recursion]: http://goodmath.blogspot.com/2006/03/clarifying-recursion.html
[schemer]: http://www.amazon.com/gp/redirect.html?link_code=ur2&tag=goodmathbadma-20&camp=1789&creative=9325&location=/gp/search%3F%26index=books%26keywords=little%20schemer%26_encoding=UTF8

Like this:

I’m on vacation this week, so I’m posting reruns of some of the better articles from when Goodmath/Badmath was on Blogger. Todays is a combination of two short posts on numbers and control booleans in λ calculus.
So, now, time to move on to doing interesting stuff with lambda calculus. To
start with, for convenience, I’ll introduce a bit of syntactic sugar to let us
name functions. This will make things easier to read as we get to complicated
stuff.
To introduce a *global* function (that is a function that we’ll use throughout our lambda calculus introduction without including its declaration in every expression), we’ll use a definition like the following:
*square ≡ λ x . x × x*
This declares a function named “square”, whose definition is “*λ x . x×x*”. If we had an expression “square 4”, the definition above means that it would effectively be treated as if the expression were: “*(λ square . square 4)(λ x . x×x)*”.
Numbers in Lambda Calculus
——————————
In some of the examples, I used numbers and arithmetic operations. But numbers don’t really exist in lambda calculus; all we really have are functions! So we need to invent some way of creating numbers using functions. Fortunately, Alonzo Church, the genius who invented the lambda calculus worked out how to do that. His version of numbers-as-functions are called Church Numerals.
In Church numerals, all numbers are functions with two parameters:
* Zero ≡ *λ s z . z*
* One ≡ *λ s z . s z*
* Two ≡ *λ s z . s (s z)*
* Three ≡ *λ s z . s (s (s z))*
* Any natural number “n”, is represented by a Church numeral which is a function which applies its first parameter to its second parameter “n” times.
A good way of understanding this is to think of “z” as being a a name for a zero-value, and “s” as a name for a successor function. So zero is a function which just returns the “0” value; one is a function which applies the successor function once to zero; two is a function which applies successor to the successor of zero, etc. It’s just the Peano arithmetic definition of numbers transformed into lambda calculus.
But the really cool thing is what comes next. If we want to do addition, x + y, we need to write a function with four parameters; the two numbers to add; and the “s” and “z” values we want in the resulting number:
add ≡ *λ s z x y . x s (y s z)*
Let’s curry that, to separate the two things that are going on. First, it’s taking two parameters which are the two values we need to add; second, it needs to normalize things so that the two values being added end up sharing the same binding of the zero and successor values.
add_curry ≡ λ x y. (λ s z . (x s (y s z)))
Look at that for a moment; what that says is, to add x and y: create the church numeral “y” using the parameters “s” and “z”. Then **apply x** to that new church numeral y. That is: a number is a function which adds itself to another number.
Let’s look a tad closer, and run through the evaluation of 2 + 3:
add_curry (λ s z . s (s z)) (λ s z . s (s (s z)))
To make things easier, let’s alpha 2 and 3, so that “2” uses “s2” and “z2”, and 3 uses “s3” and “z3”;
add_curry (λ s2 z2 . s2 (s2 z2)) (λ s3 z3 . s3 (s3 (s3 z3)))
Now, let’s do replace “add_curry” with its definition:

λ s z . (λ s2 z2 . s2 (s2 z2)) s (λ s3 z3 . s3 (s3 (s3 z3)) s z)
And now let’s beta the church numeral for three. This basically just “normalizes” three: it replaces the successor and zero function in the definition of three with the successor and zero functions from the parameters to add.
λ s z . (λ s2 z2 . s2 (s2 z2)) s (s (s (s z)))
Now.. Here comes the really neat part. Beta again, this time on the lambda for two. Look at what we’re going to be doing here: two is a function which takes two parameters: a successor function, and zero function. To add two and three, we’re using the successor function from add function; and we’re using the result of evaluating three *as the value of the zero!* for two:
λ s z . s (s (s (s (s z))))
And we have our result: the church numeral for five!
Choice in Lambda Calculus
—————————
Now that we have numbers in our Lambda calculus, there are only two things missing before we can express arbitrary computations: a way of expressing choice, and a way of expressing repetition. So now I’ll talk about booleans and choice; and then next post I’ll explain repetition and recursion.
We’d like to be able to write choices as if/then/else expressions, like we have in most programming languages. Following the basic pattern of the church numerals, where a number is expressed as a function that adds itself to another number, we’ll express true and false values as functions that perform an if-then-else operation on their parameters. These are sometimes called *Church booleans*. (Of course, also invented by Alonzo Church.)
* TRUE ≡ λ t f . t
* FALSE ≡ λ t f . f
So, now we can write an “if” function, whose first parameter is a condition expression, second parameter is the expression to evaluate if the condition was true, and third parameter is the expression to evaluate if the condition is false.
* IfThenElse ≡ *λ cond t f . cond t f*
For the boolean values to be useful, we also need to be able to do the usual logical operations:
* BoolAnd ≡ *λ x y .x y FALSE*
* BoolOr ≡ *λ x y. x TRUE y*
* BoolNot ≡ *λ x . x FALSE TRUE*