About Me

I'm a software engineer in New York City. I created the the Gypsum programming language, and I have worked on optimizing the V8 JavaScript engine for mobile phones. Most of my writing focuses on compilers and programming languages. I'm also interested in 3D graphics, high performance computing, cryptography, and security.

Type inference

This entry was taken from my old Fenris development blog.
It was originally posted on May 8, 2008.
Although the Fenris project is now defunct, I think these old entries
have some general appeal, so I've added them to this blog.

Type inference is a common feature in functional languages that I
would like to bring into Fenris. Although Fenris resembles Java
syntactically, I don't think it should inherit Java's verbosity. In
most situations, it's easy for the compiler to figure out which types
you intend to use; you shouldn't really have to explicitly type every
variable or function parameter. There are also cases where type
inference would significantly clean up syntax, such as immediate
functions, arrays, and records (which now require an explicit type).

Type inference for Fenris is a difficult problem because
definitions can be made in an arbitrary order. In the normal type
inference algorithm, you can figure out types one definition at a
time, but something more imaginative will have to be done for Fenris.
The purpose of this entry is to describe the initial design (which
will probably change a lot over the next few weeks).

Type inference will definitely take place in the IR phase.
Wherever a type is needed but not given, the compiler will generate an
erased type with a unique identifier. Erased types just act as
placeholders and will be replaced after type inference using a
substitution system already in use for generics. The unique identifier
acts as a key for a constraint table in the IR environment.
Each erased type will be mapped to a list of type constraints that are
built as the program is comipiled to IR. A constraint describes a
certain property of a type. Here are some I've already thought of:

is a class

is a subtype of _

has member _

has member _ with arguments _

is a closure with arguments _, return type _

is an array

supports binary operator _ with other type _

supports unary operator _

I'll probably think of more later. Frequently, we'll need to unify
two types. This is done by taking the union of the constraint list of
each type and mapping both identifiers to the same list.

Once we have a full list of constraints, we will need to actually
assign types for the erased types. This is done by making a big list
of possible types and using the constraints to eliminate them. In some
cases, we will have several related candidates; in those cases, we
choose the most general type. If multiple candidate types are not
related, an ambiguity error is reported. If there are no candidate
types, the erased type is overconstrained, which is also an error.

Most of the above constraints rely on other erased types being
decided first. We can think of this as directed acyclic (hopefully)
graph where the nodes are erased types and the edges are dependencies.
The fixpoint algorithm is useful for solving systems like these. It
will loop until no more types can be decided, which (again hopefully)
means that all types have been decided.

After type inference is complete, we'll need to rewrite the code.
The obvious part of this involves replacing erased types with real
types, but this also includes distinguishing between record and object
operations and adding generic parameters where needed. This may
actually be fairly difficult to get right.