Gallium

Type systems

Type systems are a very effective way to improve
programming language reliability. By grouping the data manipulated by
the program into classes called types, and ensuring that operations
are never applied to types over which they are not defined
(e.g. accessing an integer as if it were an array, or calling a string
as if it were a function), a tremendous number of programming errors
can be detected and avoided, ranging from the trivial (mis-spelled
identifier) to the fairly subtle (violation of data structure
invariants). These restrictions are also very effective at thwarting
basic attacks on security vulnerabilities such as buffer overflows.

The enforcement of such typing restrictions is called type checking,
and can be performed either dynamically (through run-time type tests)
or statically (at compile-time, through static program analysis). We
favour static type checking, as it catches bugs earlier and even in
rarely-executed parts of the program, but note that not all type
constraints can be checked statically if static type checking is to
remain decidable (i.e. not degenerate into full program proof).
Therefore, all typed languages combine static and dynamic
type-checking in various proportions.

Static type checking amounts to an automatic proof of
partial correctness of the programs that pass the compiler. The two
key words here are partial, since only type safety guarantees are
established, not full correctness; and automatic, since the
proof is performed entirely by machine, without manual assistance from
the programmer (beyond a few, easy type declarations in the source).
Static type checking can therefore be viewed as the poor man's formal
methods: the guarantees it gives are much weaker than full formal
verification, but it is much more acceptable to the general population
of programmers.

Type systems and language design.

Unlike most other uses of static program analysis, static
type-checking rejects programs that it cannot analyze safe.
Consequently, the type system is an integral part of the language
design, as it determines which programs are acceptable and which are
not. Modern typed languages go one step further: most of the language
design is determined by the type structure (type algebra and
typing rules) of the language and intended application area. This is
apparent, for instance, in the XDuce and CDuce domain-specific
languages for XML transformations
whose design is driven by the idea of regular expression types that
enforce DTDs at compile-time. For this reason, research on type
systems -- their design, their proof of semantic correctness (type
safety), the development and proof of associated type checking and
inference algorithms -- plays a large and central role in the field of
programming language research, as evidenced by the huge number of type
systems papers in conferences such as
Principles of Programming
Languages.

Polymorphism in type systems.

There exists a fundamental tension in the field of type systems that
drives much of the research in this area. On the one hand, the desire
to catch as many programming errors as possible leads to type systems
that reject more programs, by enforcing fine distinctions between
related data structures (say, sorted arrays and general arrays). The
downside is that code reuse becomes harder: conceptually identical
operations must be implemented several times (say, copying a general array
and a sorted array). On the other hand, the desire to support code
reuse and to increase expressiveness leads to type
systems that accept more programs, by assigning a common type to
broadly similar objects (for instance, the Object type of
all class instances in Java). The downside is a loss of precision in
static typing, requiring more dynamic type checks (downcasts in Java)
and catching fewer bugs at compile-time.

Polymorphic type systems offer a way out of this dilemma by
combining precise, descriptive types (to catch more errors statically)
with the ability to abstract over their differences in pieces of
reusable, generic code that is concerned only with their commonalities.
The paradigmatic example is parametric polymorphism, which is
at the heart of all typed functional programming
languages. Many forms of polymorphic typing have been studied since
then. Taking examples from our group, the work of Rémy, Vouillon and
Garrigue on row polymorphism, integrated
in Objective Caml, extended the benefits of this approach (reusable
code with no loss of typing precision) to object-oriented programming,
extensible records and extensible variants. Another example is the
work by Pottier on subtype polymorphism, using a constraint-based
formulation of the type system.

Type inference.

Another crucial issue in type systems research is the issue of type
inference: how many type annotations must be provided by the
programmer, and how many can be inferred (reconstructed) automatically
by the typechecker? Too many annotations make the language more
verbose and bother the programmer with unnecessary details. Too little
annotations make type checking undecidable, possibly requiring
heuristics, which is unsatisfactory.
Objective Caml requires explicit type information at data type
declarations and at component interfaces, but infers all
other types.

In order to be predictable, a type inference algorithm must be
complete. That is, it must not find one, but all
ways of filling in the missing type annotations to form an explicitly
typed program. This task is made easier when all possible solutions to
a type inference problem are instances of a single,
principal solution.

Maybe surprisingly, the strong requirements -- such as the existence of
principal types -- that are imposed on type systems by the desire to perform
type inference sometimes lead to better designs. An illustration of this is
row variables. The development of row variables was prompted by type inference
for operations on records. Indeed, previous approaches were based on subtyping
and did not easily support type inference. Row variables have proved simpler
than structural subtyping and more adequate for typechecking record update,
record extension, and objects.

Type inference encourages abstraction and code reuse. A programmer's
understanding of his own program is often initially limited to a particular
context, where types are more specific than strictly required. Type inference
can reveal the additional generality, which allows making the code more
abstract and thus more reuseable.