Let there be let

We infer id has type X -> X where X is a type variable. Should we be
allowed to substitute Nat -> Nat for X in one occurence of id and Nat
for the other?

If we can only pick one type constant for X, then it seems we must duplicate
code:

id = \x.x
id2 = \x.x
one = id succ(id2 0)

On the other hand, before type inference, we could expand one to:

one = ((\x.x) succ)((\x.x) 0)

and only then introduce type variables. Afterwards, we can substitute different
types for different uses of a definition.

The same discussion applies to local let expressions. That is, suppose
we allow let _ = _ in _ anywhere we expect a term. Then it might be
reasonable to permit the following:

one = let f = \x.x in f succ (f 0)

Otherwise we might be forced to duplicate code:

one = let f = \x.x in let g = \x.x in (\f g.f succ(g 0))

Allowing different uses of a definition to have different types is called
let-polymorphism. We demonstrate it with an interpreter based on
PCF (Programming
Computable Functions), a simply typed lambda calculus with the base type Nat
with the constant 0 and extended with:

pred, succ: these functions have type Nat -> Nat and return the
predecessor and successor of their input; evaluating pred 0 anywhere in
a term returns the Err term which represents this exception.

ifz-then-else: when given 0, an ifz expression evaluates to its then
branch, otherwise it evaluates to its else branch.

For convenience, we parse all natural numbers as constants of type Nat.
We also provide an undefined keyword that throws an error.

Avoiding the fixpoint operator guarantees normalization, that is, we get
programs that must terminate. Even with this restriction, the language is
surprisingly expressive: we can sort lists without fix!

Some presentations of PCF also add the base type Bool along with constants
True, False and replace ifz with if and iszero, which is similar to
our last interpreter.

Memoized type inference

We should mention how to evaluate local let definitions.
Suppose we write:

\x y.let z = \a b.a in z x y

Evaluating this is trivial:

eval env (Let x y z) = eval env $ beta (x, y) z

That is, we add a new binding to the environment before evaluating the let
body. An easy exercise is to add this to our previous interpreter.

As for type inference, we could treat let as a macro: we could fully expand
all let definitions before type checking if we accept that work may be
repeated. For example:

let f = \x.x in f succ (f 0)

could be expanded to:

(\x.x) succ ((\x.x) 0)

We would determine the first (\x.x) has type _0 -> _0 where _0 is a
generated type variable, before deducing further that _0 must be Nat ->
Nat. Afterwards, we would repeat computations to determine that the second
(\x.x) has type _1 -> _1, before deducing _1 must be Nat.

However, this approach has drawbacks. Functions can be more complicated than
\x.x and let expansions can be deeply nested, leading to prohibitively many
repeated computations. Also, we may one day wish to support a recursive
variant of let, where full expansion is impossible.

In our example above, we first use type inference to determine id has type X
-> X where X is a type variable. Next, we mark X as a generalized type
variable. Then each time id is used in an expression, we replace X with a
newly generated ordinary type variable before proceeding with type inference.

Universally quantified types

Memoization is also useful for understanding the theory. Rather than
vaguely say id is a sort of macro, we say that id = \x.x has type ∀X.X ->
X. The symbol ∀ indicates a given type variable is generalized. Lambda
calculus with generalized type variables from let-polymorphism is known as the
Hindley-Milner type system, or HM for short. Like simply typed lambda
calculus, HM is strongly normalizing.

We might then wonder if this ∀ notation is redundant. Since let definitions
are like macros, shouldn’t we generalize all type variables returned by the
type inference algorithm? Why would we ever need to distinguish between
generalized type variables and plain type variables if they’re always going
to be generalized?

The reason becomes clear when we consider lower-level let expressions.
Our code must mix generalized and ordinary type variables, and carefully keep
track of them in order to correctly infer types. Consider the following example
from Benjamin C. Pierce, “Types and Programming Languages”,
where the language has base types Nat and Bool:

This program is invalid. But if we blithely assume all type variables in
let expressions should be generalized, then we would mistakenly conclude
otherwise. We would infer g has type ∀X.X->X. In g 0, this would
generate a new type variable (that we then infer should be Nat).

Instead, we must infer g has type X->X, that is, X is an plain type
variable and not generalized. This enables type inference to find
two contradictory constraints (X = Nat and X = Bool) and reject the term.

On the other hand, we should generalize type variables in let expressions
absent from higher levels. For example, in the following expression:

\f:X->X x:X. let g=\y.f in g

type inference should determine the function g has type
∀Y.Y->(X->X)->(X->X), that is, Y is generalized while X is not.

These details only matter when implementing languages. Users can blissfully
ignore the distinction, because in top-level let definitions, all type
variables are generalized, and in evaluated terms, all generalized type
variables are replaced by plain type variables. When else does a user ask
for a term’s type?

Indeed, our demo will follow Haskell and omit the (∀) symbol. We’ll say, for
example, the const function has type a -> b -> a even though a and b
are generalized type variables; its type is really ∀a b.a -> b ->a.

Halfway to Haskell

Syntax aside, we’re surprisingly close to Haskell 98, which is based on HM
extended with the fixpoint operator. We lack many base types and primitive
functions, but these have little theoretical significance.

The juicy missing pieces are algebraic data types and type classes.

Later versions of Haskell go beyond Hindley-Milner to a variant of
System F, and there are plans to go even further.
As a result, type inference is no longer guaranteed to succeed, and often the
programmer must supply annotations to help the type checker.

We would be close to ML if we had chosen eager evaluation instead of lazy
evaluation.

Definitions

Despite the advanced capabilities of HM, we can almost reuse the data
structures of simply typed lambda calculus.
In a way, we could do with less. HM is rich enough that we can get by with
no base types whatsoever. However, we’re implementing PCF so we provide Nat.

To keep the code simple, we show generalized type variables in a nonstandard
manner: we simply prepend an at sign to the variable name. It’s understood
that (@x -> y) -> @z really means ∀@x @z.(@x -> y) -> @z.

Since we follow Haskell’s convention by showing non-generalized type variables
for top-level let expressions, under normal operation we’ll never show a
generalized type variable. Roughly speaking, we lazily generalize the type
variables of let statements, that is, we store them as ordinary type variables,
and generalize them on demand during evaluation. A generalized type variable
would only be printed if we, say, added a logging statement for debugging.

Parsing

The biggest change is the parsing of types in lambda abstractions. If omitted,
we supply the type variable _ which indicates we should automatically
generate a unique variable name for it later. Any name but Nat is a
user-supplied type variable name.

Type Inference

We add generalized variables to our implementation of Algorithm W.
The instantiate function generates fresh type variables for any generalized
type variables. This time, we write it without the state monad so we can
compare styles. On balance, I prefer the state monad version.

If a variable name is absent from gamma, then the term is invalid.
We abuse the GV constructor to represent this error.

We’re careful with let expressions: we only generalize those type
variables that are absent from gamma before recursively calling gather.

We always generate a fresh variable for undefined so it can fit anywhere.

The function typeOf is little more than a wrapper around gather and unify.
It applies all the substitutions found during unify to the type expression
returned by gather to compute the principal type of a given closed term
in a given context.

Evaluation

Once we’re certain a closed term is well-typed, we can ignore the types and
evaluate as we would in untyped lambda calculus.

If we only wanted the weak head normal form, then we could take shortcuts: we
could assume the first argument to any ifz, pred, or succ is a natural
number. However, we want the normal form, necessitating extra checks.

If we encounter an Err term, we propagate it up the tree to halt computation.

In simply typed lambda calculus, we must fix the type of the fold result.
For example, a list of integers might be represented as right fold that
returns an integer, and we can compute the sum of a list of integers as
follows:

This is about as far as we can go. Without let-polymorphism, we’re stuck with
a single fold return type, limiting what we can achieve.

Hindley-Milner frees us. Thanks to generalized type variables,
a single fold can return any type we want. We can port our Haskell code to
lambda calculus to obtain a sorting function free of fix. We do use fix in
our less-than function, but in a practical language this would be a built-in
primitive. Alternatively we can use Church numerals, which has a well-known
fix-free less-than-or-equal-to function.

It almost seems we’re cheating to avoid explicit loops by piggybacking off the
representation of the list, but this is merely a consequence of our strategy.
When functions represent data, we can perform complex tasks with miraculously
concise code.

One step back, ten steps forward

Hindley-Milner is considered a sweet spot in the language design space because
type inference is simple and decidable, yet the type system is powerful.
There is a blemish: type inference takes exponential time for certain
pathological cases. Luckily, they never show up in real life.

Still, experience suggests we should weaken Hindley-Milner in practical
programming languages.
Let
should not be generalised automatically for local bindings.

Removing implicit local let-polymorphism makes it easier to extend the type
system. For example, we can add type classes and sidestep the monomorphism
restriction controversy. At the same time, local let-polymorphism is rarely
used, and in any case, we can trivially declare a type when needed. In other
words, we can still support local let-polymorphism; it’s just no longer
automatic.