The first is called success, the second is failure. We can compose operations on this somewhat conveniently, like we would on a monad (like Option).

Simple parser primitives

All of the above boilerplate allows us to define a parser, which succeeds if the first token in the input satisfies some given predicate pred. When it succeeds, it reads the token string, and splits the input there.

All of the above are methods on a Parser[T] class. Thanks to infix space notation in Scala, we can denote x.y(z) as x y z, which allows us to simplify our DSL notation; for instance A ~ B corresponds to A.~(B).

abstractclassParser[T]{// An abstract method that defines the parser function
defapply(in:Input):ParseResultdef~[U](rhs:Parser[U])=newParser[T~U]{defapply(in:Input)=Parser.this(in)match{caseSuccess(x,tail)=>rhs(tail)match{caseSuccess(y,rest)=>Success(new~(x,y),rest)casefailure=>failure}casefailure=>failure}}def|(rhs:=>Parser[T])=newParser[T]{defapply(in:Input)=Parser.this(in)match{cases1@Success(_,_)=>s1casefailure=>rhs(in)}}def^^[U](f:T=>U)=newParser[U]{defapply(in:Input)=Parser.this(in)match{caseSuccess(x,tail)=>Success(f(x),tail)casex=>x}}def^^^[U](r:U):Parser[U]=^^(x=>r)}

👉 In Scala, T ~ U is syntactic sugar for ~[T, U], which is the type of the case class we’ll define below

For the ~ combinator, when everything works, we’re using ~, a case class that is equivalent to Pair, but prints the way we want to and allows for the concise type-level notation above.

1
2
3

caseclass~[T, U](_1:T,_2:U){overridedeftoString="("+_1+" ~ "+_2+")"}

At this point, we thus have two different meanings for ~: a function~ that produces a Parser, and the ~(a, b)case class pair that this parser returns (all of this is encoded in the function signature of the ~ function).

Note that the | combinator takes the right-hand side parser as a call-by-name argument. This is because we don’t want to evaluate it unless it is strictly needed—that is, if the left-hand side fails.

^^ is like a map operation on Option; P ^^ f succeeds iff P succeeds, in which case it applies the transformation f on the result of P. Otherwise, it fails.

Shorthands

We can now define shorthands for common combinations of parser combinators:

The trouble with left-recursion

Parser combinators work top-down and therefore do not allow for left-recursion. For example, the following would go into an infinite loop, where the parser keeps recursively matching the same token unto expr:

👉 It used to be that the standard library contained parser combinators, but those are now a separate module. This module contains a chainl (chain-left) method that reduces after a rep for you.

Arithmetic expressions — abstract syntax and proof principles

This section follows Chapter 3 in TAPL.

Basics of induction

Ordinary induction is simply:

Suppose P is a predicate on natural numbers.
Then:
If P(0)
and, for all i, P(i) implies P(i + 1)
then P(n) holds for all n

We can also do complete induction:

Suppose P is a predicate on natural numbers.
Then:
If for each natural number n,
given P(i) for all i < n we can show P(n)
then P(n) holds for all n

It proves exactly the same thing as ordinary induction, it is simply a restated version. They’re interderivable; assuming one, we can prove the other. Which one to use is simply a matter of style or convenience. We’ll see some more equivalent styles as we go along.

Mathematical representation of syntax

Let’s assume the following grammar:

1
2
3
4
5
6
7
8

t ::=
true
false
if t then t else t
0
succ t
pred t
iszero t

What does this really define? A few suggestions:

A set of character strings

A set of token lists

A set of abstract syntax trees

It depends on how you read it; a grammar like the one above contains information about all three.

However, we are mostly interested in the ASTs. The above grammar is therefore called an abstract grammar. Its main purpose is to suggest a mapping from character strings to trees.

For our use of these, we won’t be too strict with these. For instance, we’ll freely use parentheses to disambiguate what tree we mean to describe, even though they’re not strictly supported by the grammar. What matters to us here aren’t strict implementation semantics, but rather that we have a framework to talk about ASTs. For our purposes, we’ll consider that two terms producing the same AST are basically the same; still, we’ll distinguish terms that only have the same evaluation result, as they don’t necessarily have the same AST.

How can we express our grammar as mathematical expressions? A grammar describes the legal set of terms in a program by offering a recursive definition. While recursive definitions may seem obvious and simple to a programmer, we have to go through a few hoops to make sense of them mathematically.

Mathematical representation 1

We can use a set T of terms. The grammar is then the smallest set such that:

Each function takes a set of terms U as input and produces “terms justified by U” as output; that is, all terms that have the items of U as subterms.

The set U is said to be closed under F or F-closed if F(U)⊆U.

The set of terms T as defined above is the smallest F-closed set. If O is another F-closed set, then T⊆O.

Comparison of the representations

We’ve seen essentially two ways of defining the set (as representation 1 and 2 are equivalent, but with different notation):

The smallest set that is closed under certain rules. This is compact and easy to read.

The limit of a series of sets. This gives us an induction principle on which we can prove things on terms by induction.

The first one defines the set “from above”, by intersecting F-closed sets.

The second one defines it “from below”, by starting with ∅ and getting closer and closer to being F-closed.

These are equivalent (we won’t prove it, but Proposition 3.2.6 in TAPL does so), but can serve different uses in practice.

Induction on terms

First, let’s define depth: the depth of a term t is the smallest i such that t∈Si.

The way we defined Si, it gets larger and larger for increasing i; the depth of a term t gives us the step at which t is introduced into the set.

We see that if a term t is in Si, then all of its immediate subterms must be in Si−1, meaning that they must have smaller depth.

This justifies the principle of induction on terms, or structural induction. Let P be a predicate on a term:

If, for each term s,
given P(r) for all immediate subterms r of s we can show P(s)
then P(t) holds for all t

All this says is that if we can prove the induction step from subterms to terms (under the induction hypothesis), then we have proven the induction.

We can also express this structural induction using generating functions, which we introduced previously.

Suppose T is the smallest F-closed set.
If, for each set U,
from the assumption "P(u) holds for every u ∈ U",
we can show that "P(v) holds for every v ∈ F(U)"
then
P(t) holds for all t ∈ T

Why can we use this?

We assumed that T was the smallest F-closed set, which means that T⊆O for any other F-closed set O.

Showing the pre-condition (“for each set U, from the assumption…”) amounts to showing that the set of all terms satisfying P (call it O) is itself an F-closed set.

Since T⊆O, every element of T satisfies P.

Inductive function definitions

An inductive definition is used to define the elements in a set recursively, as we have done above. The recursion theorem states that a well-formed inductive definition defines a function. To understand what being well-formed means, let’s take a look at some examples.

Let’s define our grammar function a little more formally. Constants are the basic values that can’t be expanded further; in our example, they are true, false, 0. As such, the set of constants appearing in a term t, written Consts(t), is defined recursively as follows:

This seems simple, but these semantics aren’t perfect. First off, a mathematical definition simply assigns a convenient name to some previously known thing. But here, we’re defining the thing in terms of itself, recursively. And the semantics above also allow us to define ill-formed inductive definitions:

The last rule produces infinitely large rules (if we implemented it, we’d expect some kind of stack overflow). We’re missing the rules for if-statements, and we have a useless rule for 0, producing empty sets.

How do we tell the difference between a well-formed inductive definition, and an ill-formed one as above? What is well-formedness anyway?

What is a function?

A relation over T,U is a subset of T×U, where the Cartesian product is defined as:

T×U={(t,u):t∈T,u∈U}

A function f from A (domain) to B (co-domain) can be viewed as a two-place relation, albeit with two additional properties:

It is total: ∀a∈A,∃b∈B:(a,b)∈f

It is deterministic: (a,b1)∈f,(a,b2)∈f⟹b1=b2

Totality ensures that the A domain is covered, while being deterministic just means that the function always produces the same result for a given input.

Induction example 1

As previously stated, Consts is a relation. It maps terms (A) into the set of constants that they contain (B). The induction theorem states that it is also a function. The proof is as follows.

Consts is total and deterministic: for each term t there is exactly one set of terms C such that (t,C)∈Consts1 . The proof is done by induction on t.

To be able to apply the induction principle for terms, we must first show that for an arbitrary term t, under the following induction hypothesis:

For each immediate subterm s of t, there is exactly one set of terms Cs such that (s,Cs)∈Consts

Then the following needs to be proven as an induction step:

There is exactly one set of terms C such that (t,C)∈Consts

We proceed by cases on t:

If t is 0, true or false

We can immediately see from the definition that of Consts that there is exactly one set of terms C={t}) such that (t,C)∈Consts.

This constitutes our base case.

If t is succ t1, pred t1 or iszero t1

The immediate subterm of t is t1, and the induction hypothesis tells us that there is exactly one set of terms C1 such that (t1,C1)∈Consts. But then it is clear from the definition that there is exactly one set of terms C=C1 such that (t,C)∈Consts.

If t is if t1 then t2 else t3

The induction hypothesis tells us:

There is exactly one set of terms C1 such that (t1,C1)∈Consts

There is exactly one set of terms C2 such that (t2,C2)∈Consts

There is exactly one set of terms C3 such that (t3,C3)∈Consts

It is clear from the definition of Consts that there is exactly one set C=C1∪C2∪C3 such that (t,C)∈Consts.

This proves that Consts is indeed a function.

But what about BadConsts? It is also a relation, but it isn’t a function. For instance, we have BadConsts(0)={0} and BadConsts(0)={}, which violates determinism. To reformulate this in terms of the above, there are two sets C such that (0,C)∈BadConsts, namely C={0} and C={}.

Note that there are many other problems with BadConsts, but this is sufficient to prove that it isn’t a function.

The following is a congruence rule, defining where the computation rule is applied next:

t1⟶t′1if t1 then t2 else t3⟶if t′1 then t2 else t3(E-If)

We want to evaluate the condition before the conditional clauses in order to save on evaluation; we’re not sure which one should be evaluated, so we need to know the condition first.

Derivations

We can describe the evaluation logically from the above rules using derivation trees. Suppose we want to evaluate the following (with parentheses added for clarity): if (if true then true else false) then false else true.

In an attempt to make all this fit onto the screen, true and false have been abbreviated T and F in the derivation below, and the then keyword has been replaced with a parenthesis notation for the condition.

The final statement is a conclusion. We say that the derivation is a witness for its conclusion (or a proof for its conclusion). The derivation records all reasoning steps that lead us to the conclusion.

Inversion lemma

We can introduce the inversion lemma, which tells us how we got to a term.

Suppose we are given a derivation D witnessing the pair (t,t′) in the evaluation relation. Then either:

If the final rule applied in D was (E-IfTrue), then we have if true then t2 else t3 and t′=t2 for some t2 and t3

If the final rule applied in D was (E-IfFalse), then we have if false then t2 else t3 and t′=t2 for some t2 and t3

If the final rule applied in D was (E-If), then we have t=if t1 then t2 else t3 and t′=t=if t′1 then t2 else t3, for some t1,t′1,t2,t3. Moreover, the immediate subderivation of D witnesses (t1,t′1)∈⟶.

This is super boring, but we do need to acknowledge the inversion lemma before we can do induction proofs on derivations. Thanks to the inversion lemma, given an arbitrary derivation D with conclusion t⟶t′, we can proceed with a case-by-case analysis on the final rule used in the derivation tree.

If the final rule applied in D was (E-IfTrue), then we have t=if true then t2 else t3 and t′=t2, and the result is immediate from the definition of size

If the final rule applied in D was (E-IfFalse), then we have t=if false then t2 else t3 and t′=t2, and the result is immediate from the definition of size

If the final rule applied in D was (E-If), then we have t=if t1 then t2 else t3 and t′=if t′1 then t2 else t3. In this case, t1⟶t′1 is witnessed by a derivation D1. By the induction hypothesis, size(t1)>size(t′1), and the result is then immediate from the definition of size

Abstract machines

An abstract machine consists of:

A set of states

A transition relation of states, written ⟶

t⟶t′ means that t evaluates to t′ in one step. Note that ⟶ is a relation, and that t⟶t′ is shorthand for (t,t′)∈⟶. Often, this relation is a partial function (not necessarily covering the domain A; there is at most one possible next state). But without loss of generality, there may be many possible next states, determinism isn’t a criterion here.

Normal forms

A normal form is a term that cannot be evaluated any further. More formally, a term t is a normal form if there is no t′ such that t⟶t′. A normal form is a state where the abstract machine is halted; we can regard it as the result of a computation.

Values that are normal form

Previously, we intended for our values (true and false) to be exactly that, the result of a computation. Did we get that right?

Let’s prove that a term t is a value ⟺ it is in normal form.

The ⟹ direction is immediate from the definition of the evaluation relation ⟶.

The ⟸ direction is more conveniently proven as its contrapositive: if t is not a value, then it is not a normal form, which we can prove by induction on the term t.

Since t is not a value, it must be of the form if t1 then t2 else t3. If t1 is directly true or false, then E-IfTrue or E-IfFalse apply, and we are done.

Otherwise, if t=if t1 then t2 else t3 where t1 isn’t a value, by the induction hypothesis, there is a t′1 such that t1⟶t′1. Then rule E-If yields if t′1 then t2 else t3, which proves that t is not in normal form.

All values are still normal forms. But are all normal forms values? Not in this case. For instance, succ true, iszero true, etc, are normal forms. These are stuck terms: they are in normal form, but are not values. In general, these correspond to some kind of type error, and one of the main purposes of a type system is to rule these kinds of situations out.

Multi-step evaluation

Let’s introduce the multi-step evaluation relation, ⟶∗. It is the reflexive, transitive closure of single-step evaluation, i.e. the smallest relation closed under these rules:

t⟶t′t⟶∗t′t⟶∗tt⟶∗t′t′⟶∗t′′t⟶∗t′′

In other words, it corresponds to any number of single consecutive evaluations.

Termination of evaluation

We’ll prove that evaluation terminates, i.e. that for every term t there is some normal form t′ such that t⟶∗t′.

First, let’s recall our proof that t⟶t′⟹size(t)>size(t′). Now, for our proof by contradiction, assume that we have an infinite-length sequence t0,t1,t2,… such that:

t0⟶t1⟶t2⟶…⟹size(t0)>size(t1)>size(t2)>…

But this sequence cannot exist: since size(t0) is a finite, natural number, we cannot construct this infinite descending chain from it. This is a contradiction.

Most termination proofs have the same basic form. We want to prove that the relation R⊆X×X is terminating — that is, there are no infinite sequences x0,x1,x2,… such that (xi,xi+1)∈R for each i. We proceed as follows:

Choose a well-suited set W with partial order < such that there are no infinite descending chains w0>w1>w2>… in W. Also choose a function f:X→W.

Show f(x)>f(y)∀(x,y)∈R

Conclude that are no infinite sequences (x0,x1,x2,…) such that (xi,xi+1)∈R for each i. If there were, we could construct an infinite descending chain in W.

As a side-note, partial order is defined as the following properties:

Anti-symmetry: ¬(x<y∧y<x)

Transitivity: x<y∧y<z⟹x<z

We can add a third property to achieve total order, namely x≠y⟹x<y∨y<x.

Lambda calculus

Lambda calculus is Turing complete, and is higher-order (functions are data). In lambda calculus, all computation happens by means of function abstraction and application.

Lambda calculus is isomorphic to Turing machines.

Suppose we wanted to write a function plus3 in our previous language:

plus3 x = succ succ succ x

The way we write this in lambda calculus is:

plus3 =λx. succ(succ(succ(x)))

λx.t is written x => t in Scala, or fun x -> t in OCaml. Application of our function, say plus3(succ 0), can be written as:

(λx.succ succ succ x)(succ 0)

Abstraction over functions is possible using higher-order functions, which we call λ-abstractions. An example of such an abstraction is the function g below, which takes an argument f and uses it in the function position.

g=λf.f(f(succ 0))

If we apply g to an argument like plus3, we can just use the substitution rule to see how that defines a new function.

Another example: the double function below takes two arguments, as a curried function would. First, it takes the function to apply twice, then the argument on which to apply it, and then returns f(f(y)).

double=λf.λy.f(f(y))

Pure lambda calculus

Once we have λ-abstractions, we can actually throw out all other language primitives like booleans and other values; all of these can be expressed as functions, as we’ll see below. In pure lambda-calculus, everything is a function.

Variables will always denote a function, functions always take other functions as parameters, and the result of an evaluation is always a function.

Bodies of lambda abstractions extend as far to the right as possible, so λx.λy.xy means λx.(λy.xy), not λx.(λy.x)y

Scope

The lambda expression λx.tbinds the variable x, with a scope limited to t. Occurrences of x inside of t are said to be bound, while occurrences outside are said to be free.

Let fv(t) be the set of free variables in a term t. It’s defined as follows:

fv(x)={x}fv(λx.t1)=fv(t1)∖{x}fv(t1t2)=fv(t1)∪fv(t2)

Operational semantics

As we saw with our previous language, the rules could be distinguished into computation and congruence rules. For lambda calculus, the only computation rule is:

(λx.t12)v2⟶[x↦v2]t12(E-AppAbs)

The notation [x↦v2]t12 means “the term that results from substituting free occurrences of x in t12 with v2”.

The congruence rules are:

t1⟶t′1t1t2⟶t′1t2t2⟶t′2t1t2⟶t1t′2(E-App1)(E-App2)

A lambda-expression applied to a value, (λx.t)v, is called a reducible expression, or redex.

Evaluation strategies

There are alternative evaluation strategies. In the above, we have chosen call by value (which is the standard in most mainstream languages), but we could also choose:

Full beta-reduction: any redex may be reduced at any time. This offers no restrictions, but in practice, we go with a set of restrictions like the ones below (because coding a fixed way is easier than coding probabilistic behavior).

Call-by-name: allows no reductions inside lambda abstractions. Arguments are not reduced before being substituted in the body of lambda terms when applied. Haskell uses an optimized version of this, call-by-need (aka lazy evaluation).

Classical lambda calculus

Classical lambda calculus allows for full beta reduction.

Confluence in full beta reduction

The congruence rules allow us to apply in different ways; we can choose between E-App1 and E-App2 every time we reduce an application, and this offers many possible reduction paths.

While the path is non-deterministic, is the result also non-deterministic? This question took a very long time to answer, but after 25 years or so, it was proven that the result is always the same. This is known the Church-Rosser confluence theorem:

Let t,t1,t2 be terms such that t⟶∗t1 and t⟶∗t2. Then there exists a term t3 such that t1⟶∗t3 and t2⟶∗t3

Alpha conversion

Substitution is actually trickier than it looks! For instance, in the expression λx.(λy.x)y, the first occurrence of y is bound (it refers to a parameter), while the second is free (it does not refer to a parameter). This is comparable to scope in most programming languages, where we should understand that these are two different variables in different scopes, y1 and y2.

The above example had a variable that is both bound and free, which is something that we’ll try to avoid. This is called a hygiene condition.

We can transform a unhygienic expression to a hygienic one by renaming bound variables before performing the substitution. This is known as alpha conversion. Alpha conversion is given by the following conversion rule:

y∉fv(t)(λx.t)=α(λy.[x↦y]t)(α)

And these equivalence rules (in mathematics, equivalence is defined as symmetry and transitivity):

t1=αt2t2=αt1t1=αt2t2=αt3t1=αt3(α-Symm)(α-Trans)

The congruence rules are as usual.

Programming in lambda-calculus

Multiple arguments

The way to handle multiple arguments is by currying: λx.λy.t

Booleans

The fundamental, universal operator on booleans is if-then-else, which is what we’ll replicate to model booleans. We’ll denote our booleans as tru and fls to be able to distinguish these pure lambda-calculus abstractions from the true and false values of our previous toy language.

We want true to be equivalent to if (true), and false to if (false). The terms tru and flsrepresent boolean values, in that we can use them to test the truth of a boolean value:

tru =λt.λf.tfls =λt.λf.f

We can consider these as booleans. Equivalently tru can be considered as a function performing (t1, t2) => if (true) t1 else t2. To understand this, let’s try to apply tru to two arguments:

tru vw=(λt.(λf.t))vw⟶(λf.v)w⟶v

This works equivalently for fls.

We can also do inversion, conjunction and disjunction with lambda calculus, which can be read as particular if-else statements:

not =λb.bflstrueand =λb.λc.bcflsor =λb.λc.btruc

not is a function that is equivalent to not(b) = if (b) false else true.

and is equivalent to and(b, c) = if (b) c else false

or is equivalent to or(b, c) = if (b) true else c

Pairs

The fundamental operations are construction pair(a, b), and selection pair._1 and pair._2.

pair =λf.λs.λb.bfsfst =λp.ptrusnd =λp.pfls

pair is equivalent to pair(f, s) = (b => b f s)

When tru is applied to pair, it selects the first element, by definition of the boolean, and that is therefore the definition of fst

Equivalently for fls applied to pair, it selects the second element

Numbers

We’ve actually been representing numbers as lambda-calculus numbers all along! Our succ function represents what’s more formally called Church numerals.

c0=λs.λz.zc1=λs.λz.szc2=λs.λz.sszc3=λs.λz.sssz

Note that c0’s implementation is the same as that of fls (just with renamed variables).

Every number n is represented by a term cn taking two arguments, which are s and z (for “successor” and “zero”), and applies s to z, n times. Fundamentally, a number is equivalent to the following:

Successorscc: we apply the successor function to n (which has been correctly instantiated with s and z)

Additionadd: we pass the instantiated n as the zero of m

Subtractionsub: we apply predn times to m

Multiplicationmul: instead of the successor function, we pass the addition by n function.

Zero testiszero: zero has the same implementation as false, so we can lean on that to build an iszero function. An alternative understanding is that we’re building a number, in which we use true for the zero value z. If we have to apply the successor function s once or more, we want to get false, so for the successor function we use a function ignoring its input and returning false if applied.

What about predecessor? This is a little harder, and it’ll take a few steps to get there. The main idea is that we find the predecessor by rebuilding the whole succession up until our number. At every step, we must generate the number and its predecessor: zero is (c0,c0), and all other numbers are (cn−1,cn). Once we’ve reconstructed this pair, we can get the predecessor by taking the first element of the pair.

zz=pair c0c0ss=λp.pair (snd p)(scc (snd p))prd=λm.fst (mss zz)

Sidenote

The story goes that Church was stumped by predecessors for a long time. This solution finally came to him while he was at the barber, and he jumped out half shaven to write it down.

Lists

Recursion in lambda-calculus

Let’s start by taking a step back. We talked about normal forms and terms for which we terminate; does lambda calculus always terminate? It’s Turing complete, so it must be able to loop infinitely (otherwise, we’d have solved the halting problem!).

The trick to recursion is self-application:

λx.xx

From a type-level perspective, we would cringe at this. This should not be possible in the typed world, but in the untyped world we can do it. We can construct a simple infinite loop in lambda calculus as follows:

omega =(λx.xx)(λx.xx)⟶(λx.xx)(λx.xx)

The expression evaluates to itself in one step; it never reaches a normal form, it loops infinitely, diverges. This is not a stuck term though; evaluation is always possible.

In fact, there are no stuck terms in pure lambda calculus. Every term is either a value or reduces further.

So it turns out that omega isn’t so terribly useful. Let’s try to construct something more practical:

Yf=(λx.f(xx))(λx.f(xx))

Now, the divergence is a little more interesting:

Yf=(λx.f(xx))(λx.f(xx))⟶f((λx.f(xx))(λx.f(xx)))=f(Yf)⟶…=f(f(Yf))

This Yf function is known as a Y-combinator. It still loops infinitely (though note that while it works in classical lambda calculus, it blows up in call-by-name), so let’s try to build something more useful.

To delay the infinite recursion, we could build something like a poison pill:

poisonpill=λy.omega

It can be passed around (after all, it’s just a value), but evaluating it will cause our program to loop infinitely. This is the core idea we’ll use for defining the fixed-point combinatorfix, which allows us to do recursion. It’s defined as follows:

fix=λf.(λx.f(λy.xxy))(λx.f(λy.xxy))

This looks a little intricate, and we won’t need to fully understand the definition. What’s important is mostly how it is used to define a recursive function. For instance, if we wanted to define a modulo function in our toy language, we’d do it as follows:

1
2
3

defmod(x,y)=if(y>x)xelsemod(x-y,y)

In lambda calculus, we’d define this as:

mod=fix (λf.λx.λy.(gt yx)x(f(sub ab)b))

We’ve assumed that a greater-than gt function was available here.

More generally, we can define a recursive function as:

fix (λf.(recursion on f))

Equivalence of lambda terms

We’ve seen how to define Church numerals and successor. How can we prove that succ cn is equal to cn+1?

The naive approach unfortunately doesn’t work; they do not evaluate to the same value.

This still seems very close. If we could simplify a little further, we do see how they would be the same.

The intuition behind the Church numeral representation was that a number n is represented as a term that “does something n times to something else”. scc takes a term that “does something n times to something else”, and returns a term that “does something n+1 times to something else”.

What we really care about is that scc c2behaves the same as c3 when applied to two arguments. We want behavioral equivalence. But what does that mean? Roughly, two terms s and t are behaviorally equivalent if there is no “test” that distinguishes s and t.

Let’s define this notion of “test” this a little more precisely, and specify how we’re going to observe the results of a test. We can use the notion of normalizability to define a simple notion of a test:

Two terms s and t are said to be observationally equivalent if they are either both normalizable (i.e. they reach a normal form after a finite number of evaluation steps), or both diverge.

In other words, we observe a term’s behavior by running it and seeing if it halts. Note that this is not decidable (by the halting problem).

For instance, omega and tru are not observationally equivalent (one diverges, one halts), while tru and fls are (they both halt).

Observational equivalence isn’t strong enough of a test for what we need; we need behavioral equivalence.

Two terms s and t are said to be behaviorally equivalent if, for every finite sequence of values v1,v2,…,vn the applications sv1v2…vn and tv1v2…vn are observationally equivalent.

This allows us to assert that true and false are indeed different:

truxΩ⟶xflsxΩ⟶Ω

The former returns a normal form, while the latter diverges.

Types

As previously, to define a language, we start with a set of terms and values, as well as an evaluation relation. But now, we’ll also define a set of types (denoted with a first capital letter) classifying values according to their “shape”. We can define a typing relationt:T. We must check that the typing relation is sound in the sense that:

t:Tt⟶∗vv:Tandt:T∃t′ such that t⟶t′

These rules represent some kind of safety and liveness, but are more commonly referred to as progress and preservation, which we’ll talk about later. The first one states that types are preserved throughout evaluation, while the second says that if we can type-check, then evaluation of t will not get stuck.

In our previous toy language, we can introduce two types, booleans and numbers:

With these typing rules in place, we can construct typing derivations to justify every pair t:T (which we can also denote as a (t,T) pair) in the typing relation, as we have done previously with evaluation. Proofs of properties about the typing relation often proceed by induction on these typing derivations.

Like other static program analyses, type systems are generally imprecise. They do not always predict exactly what kind of value will be returned, but simply a conservative approximation. For instance, if true then 0 else false cannot be typed with the above rules, even though it will certainly evaluate to a number. We could of course add a typing rule for if true statements, but there is still a question of how useful this is, and how much complexity it adds to the type system, and especially for proofs. Indeed, the inversion lemma below becomes much more tedious when we have more rules.

Properties of the Typing Relation

The safety (or soundness) of this type system can be expressed by the following two properties:

Progress: A well-typed term is not stuck.

If t:T then either t is a value, or else t⟶t′ for some t′.

Preservation: Types are preserved by one-step evaluation.

If t:T and t⟶t′, then t′:T.

We will prove these later, but first we must state a few lemmas.

Inversion lemma

Again, for types we need to state the same (boring) inversion lemma:

If true:R, then R=Bool.

If false:R, then R=Bool.

If if t1 then t2 else t3:R, then t1: Bool, t2:R and t3:R

If 0:R then R=Nat

If succ t1:R then R=Nat and t1:Nat

If pred t1:R then R=Nat and t1:Nat

If iszero t1:R then R=Bool and t1:Nat

From the inversion lemma, we can directly derive a typechecking algorithm:

Changing type-checking rule

This doesn’t break our type system. It’s still sound, but it rejects if-else expressions that return other things than numbers (e.g. booleans). But that is an expressiveness problem, not a soundness problem; our type system disallows things that would otherwise be fine by the evaluation rules.

Adding bit

We could add a boolean to natural function bit(t). We’d have to add it to the grammar, add some evaluation and typing rules, and prove progress and preservation.

bit true⟶0bit false⟶1t1⟶t′1bit t1⟶bit t′1t:Boolbit t:Nat

We’ll do something similar this below, so the full proof is omitted.

Simply typed lambda calculus

Simply Typed Lambda Calculus (STLC) is also denoted λ→. The “pure” form of STLC is not very interesting on the type-level (unlike for the term-level of pure lambda calculus), so we’ll allow base values that are not functions, like booleans and integers. To talk about STLC, we always begin with some set of “base types”:

1
2
3

T ::= // types
Bool // type of booleans
T -> T // type of functions

In the following examples, we’ll work with a mix of our previously defined toy language, and lambda calculus. This will give us a little syntactic sugar.

Type annotations

We will annotate lambda-abstractions with the expected type of the argument, as follows:

λx:T1.t1

We could also omit it, and let type inference do the job (as in OCaml), but for now, we’ll do the above. This will make it simpler, as we won’t have to discuss inference just yet.

Typing rules

In STLC, we’ve introduced abstraction. To add a typing rule for that, we need to encode the concept of an environment Γ, which is a set of variable assignments. We also introduce the “turnstile” symbol ⊢, meaning that the environment can verify the right hand-side typing, or that Γ must imply the right-hand side.

This additional concept must be taken into account in our definition of progress and preservation:

Progress: If Γ⊢t:T, then either t is a value or else t⟶t′ for some t′

Preservation: If Γ⊢t:T and t⟶t′, then Γ⊢t′:T

To prove these, we must take the same steps as above. We’ll introduce the inversion lemma for typing relations, and restate the canonical forms lemma in order to prove the progress theorem.

Inversion lemma

Let’s start with the inversion lemma.

If Γ⊢true:R then R=Bool

If Γ⊢false:R then R=Bool

If Γ⊢if t1 then t2 else t3:R then Γ⊢t1:Bool and Γ⊢t2,t3:R.

If Γ⊢x:R then x:R∈Γ

If Γ⊢λx:T1.t2:R then R=T1→T2 for some R2 with Γ∪(x:T1)⊢t2:R2

If Γ⊢t1t2:R then there is some type T11 such that Γ⊢t1:T11→R and Γ⊢t2:T11.

Canonical form

The canonical forms are given as follows:

If v is a value of type Bool, then it is either true or false

If v is a value of type T1→T2 then v has the form λx:T1.t2

Progress

Finally, we get to prove the progress by induction on typing derivations.

Theorem: Suppose that t is a closed, well typed term (that is, Γ⊢t:T for some type T). Then either t is a value, or there is some t′ such that t⟶t′.

For boolean constants, the proof is immediate as t is a value

For variables, the proof is immediate as t is closed, and the precondition therefore doesn’t hold

For abstraction, the proof is immediate as t is a value

Application is the only case we must treat.

Consider t=t1t2, with Γ⊢t1:T11→T12 and Γ⊢t2:T11.

By the induction hypothesis, t1 is either a value, or it can make a step of evaluation. The same goes for t2.

If t1 can reduce, then rule E-App1 applies to t. Otherwise, if it is a value, and t2 can take a step, then E-App2 applies. Otherwise, if they are both values (and we cannot apply β-reduction), then the canonical forms lemma above tells us that t1 has the form λx:T11.t12, and so rule E-AppAbs applies to t.

Preservation

Theorem: If Γ⊢t:T and t⟶t′ then Γ⊢t′:T.

Proof: by induction on typing derivations. We proceed on a case-by-case basis, as we have done so many times before. But one case is hard: application.

For t=t1t2, such that Γ⊢t1:T11→T12 and Γ⊢t2:T11, and where T=T12, we want to show Γ⊢t′:T12.

To do this, we must use the inversion lemma for evaluation (note that we haven’t written it down for STLC, but the idea is the same). There are three subcases for it, starting with the following:

The left-hand side is t1=λx:T11.t12, and the right-hand side of application t2 is a value v2. In this case, we know that the result of the evaluation is given by t′=[x↦v2]t12.

And here, we already run into trouble, because we do not know about how types act under substitution. We will therefore need to introduce some lemmas.

Weakening lemma

Weakening tells us that we can add assumptions to the context without losing any true typing statements:

If Γ⊢t:T, and the environment Γ has no information about x—that is, x∉dom(Γ)—then the initial assumption still holds if we add information about x to the environment:

(Γ∪(x:S))⊢t:T

Moreover, the latter ⊢ derivation has the same depth as the former.

Permutation lemma

Permutation tells us that the order of assumptions in Γ does not matter.

If Γ⊢t:T and Δ is a permutation of Γ, then Δ⊢t:T.

Moreover, the latter ⊢ derivation has the same depth as the former.

Substitution lemma

Substitution tells us that types are preserved under substitution.

That is, if Γ∪(x:S)⊢t:T and Γ⊢s:S, then Γ⊢[x↦s]t:T.

The proof goes by induction on the derivation of Γ∪(x:S)⊢t:T, that is, by cases on the final typing rule used in the derivation.