2010-03-22

In a nutshell, generalized algebraic data types, or GADTs for short, are a first-class phantom type of sorts. The basic idea is to enrich the constructors of an ADT with a more expressive "return" type by allowing the instantiation of one or more type parameters. A simple motivating example is that of an abstract syntax in Peyton Jones's original paper:

data Term a where
Lit :: Int -> Term Int
Inc :: Term Int -> Term Int
IsZ :: Term Int -> Term Bool
If :: Term Bool -> Term a -> Term a -> Term a
Pair :: Term a -> Term b -> Term (a,b)
Fst :: Term (a,b) -> Term a
Snd :: Term (a,b) -> Term b

For instance, the constructor for Inc specifies the type of its argument as Term Int by constraining its application to bind the parameter a to Int, in effect specifying the "return type" of the constructor. Xavier Leroy's proposal for a syntax extension of OCaml would look like this:

Each syntax emphasizes complementary views of what "is" the essence of this generalization: Haskell's makes clear the parameter's functional dependency on the constructor; OCaml's stresses the constraints placed on a match on the constructor. In any case, this type shows the three cases a GADT might present:

No constraint, as in the constructor If

A concrete constraint, as in the constructors Lit, Inc or IsZ

An existential constraint, as in the constructors Pair, Fst or Snd

The constraints a GADT places on the parameters must be met by the functions that do a pattern-matching on the type. In general, the result type is a function of the types of the parameter. This requires polymorphic recursion which forces the use of an explicit rank-2 type for it. For instance, a function evaluating a Term a will have type a for each possible a defined in the GADT. In the following code the match for Lit forces a to be instantiated to Int, whereas the matching for IsZ forces it to Bool:

Unfortunately, the current version of OCaml (3.11) does not yet include either GADTs or rank-2 polymorphism. On July 11th, 2009, Oleg announced that OCaml's type system is expressive enough to encodeGADTs. On his site he shows three applications of the technique. However, Oleg's examples do not offer much guidance in how to apply the encoding, especially since the more complete example, that of the typed printf/scanf, is too advanced an application to use as a template. I have shown in another post how to encode polymorphic recursion with rank-2 types. Here I'll explain how to treat each of the three cases in a GADT to systematically embed the type in OCaml.

The basic technique is to use a witness for the constraint placed on the parameter. Oleg uses a module with the required semantics:

A value eq of type ('a, 'b) eq witnesses the equality of types a and b. Given such a witness and a value x of type a, apply_eq eq x is a safe type cast of x to type b. This is sufficient to force the constraints on the type parameter, as explained in the paper Oleg references. For each one of the above identified cases, Oleg's code encodes the constructor, defines a smart constructor to ease the use of the type as an embedded DSL and pattern-matches on the constructor in order to access its parameters. This is how:

For unconstrained constructors, nothing needs to be done.

For instance, the type constructor for If does not require fixing the constructor's parameter a as it is by definition equal to the type parameter:

| If of bool term * 'a term * 'a term

A smart constructor for this constructor is routine:

let if_ c t f = If (c, t, f)

(the underscore is to avoid a clash with the reserved keyword). Equally routine is pattern-matching on it:

| If (c,t,f) -> …

If desired for reasons of regularity, however, the constraint 'a = 'b for an existentially-qualified parameter b can be introduced explicitely using the last encoding.

For concretely-constrained constructors, it is necessary to witness the equality of the parameter and this concrete type.

It is not as simple, however, to augment the constructor with a value of type ('a, t) EQ.eq: since matches on those values would force the type of a to be the same for every match, it would require that t be the same concrete type everywhere. What is needed instead is an existential type, say b such that it is constrained to be t and which in turn constrains a. Oleg encodes the existential with the double negation on a rank-2 universal, by way of a witness type. In order to maximize the compactness of the code, he uses a class type to introduce the required rank-2 type:

| Lit of < m_lit : 'w . ((int, 'a) eq -> int -> 'w) -> 'w >

Here w witnesses the existence of an a such that, if successfully constrained to int, it would bear the required type int term. A smart constructor for this constructor must build the object instance which will provide the required witness:

Note first that, under the Curry-Howard isomorphism, continuations encode double negation. Note also that the typing must be explicit since it relies on structual and not nominal class types. Note finally that the encoding style is curried much as in Haskell definitions; it is entirely equivalent to encode Leroy-style tupled constructors.

Now a match on a constructor of this kind must invoke the method on the object parameter with a suitable continuation that will accept the witness as valid and coerce the argument to the constrained type:

| Lit o -> o#m_lit (fun eq n -> apply_eq eq n)

As will be apparent from the complete encoding of the Haskell example, the other cases are the same.

For existentially-constrained constructors, it is necessary to encode the existential with a rank-2 type, and witness the equality of the parameter and this type.

This case is an extension of the previous one, with the twist that the existential must be encoded separately. In this case Oleg uses a record type instead of an object type, but the encoding could use the latter equally well. For instance, in the case of Pairs, there must exist types b and c such that a is constrained to be b×c. The constructor factors out this existential:

The equality witness expresses the required intent on the types of the constructor's arguments. The smart constructor is analogous to the previous one, except that in this case the continuation has type k_pair:

Now, as I mentioned above, an evaluator has type 'a term -> 'afor all a; this requires rank-2 polymorphism. In order to keep the namespace tidy, I use a local module definition hiding a record type encoding the higher-kinded function type:

In every case the coercion must be applied at the end, to the result of recursively evaluating the subexpressions in a constructor.

This is all there is to Oleg's encoding. For me, this exegesis has two lessons: first, much as rank-2 polymorphism, GADTs are a conservative extension to Haskell's and OCaml's type system. In fact, version 3.12 will allow higher-ranked functions via explicit annotations. One can only hope that the latter would also find its way into the next release. Second, that Oleg's proof-of-concept examples bear detailed examination to find the general mechanism behind his insights.

About Me

Long-time software developer and architect by day, specializing in boring-but-crucial bits of interconnect between disparate architectures. OCaml is my language of choice, but I feel extremely comfortable with C, Java and SQL whenever the need arises.

By night, I'm a calligrapher and a seeker of the hidden light of understanding in the noise of daily living.