GADTs for dummies

From HaskellWiki

For a long time, I didn't understand what GADTs were or how they could be used. It was sort of a conspiracy of silence — people who understood GADTs thought
that everything was obvious and didn't need further explanation, but I still
couldn't understand them.

Now that I have an idea of how it works, I think that it was really obvious. :) So, I want to share my understanding of GADTs. Maybe the way I realized how GADTs work could help
someone else. See also Generalised algebraic datatype

declares a TYPE FUNCTION named "X". Its parameter "a" must be some type
and it returns some type as its result. We can't use "X" on data values,
but we can use it on type values. Type constructors declared with
"data" statements and type functions declared with "type" statements
can be used together to build arbitrarily complex types. In such
"computations" type constructors serves as basic "values" and type
functions as a way to process them.

3 One more hypothetical extension - multi-value type functions

Let's add more fun! We will introduce one more hypothetical Haskell
extension - type functions that may have MULTIPLE VALUES. Say,

type Collection a =[a]
Collection a = Set a
Collection a = Map b a

So, "Collection Int" has "[Int]", "Set Int" and "Map String Int" as
its values, i.e. different collection types with elements of type
"Int".

Pay attention to the last statement of the "Collection" definition, where
we used the type variable "b" that was not mentioned on the left side,
nor defined in any other way. Since it's perfectly possible for the
"Collection" function to have multiple values, using some free variable on
the right side that can be replaced with any type is not a problem
at all. "Map Bool Int", "Map [Int] Int" and "Map Int Int" all are
possible values of "Collection Int" along with "[Int]" and "Set Int".

At first glance, it seems that multiple-value functions are meaningless - they
can't be used to define datatypes, because we need concrete types here. But
if we take another look, they can be useful to define type constraints and
type families.

We can also represent a multiple-value function as a predicate:

type Collection a [a]
Collection a (Set a)
Collection a (Map b a)

If you're familiar with Prolog, you should know that a predicate, in contrast to
a function, is a multi-directional thing - it can be used to deduce any
parameter from the other ones. For example, in this hypothetical definition:

head| Collection Int a :: a ->Int

we define a 'head' function for any Collection containing Ints.

And in this, again, hypothetical definition:

data Safe c | Collection c a = Safe c a

we deduced element type 'a' from collection type 'c' passed as the
parameter to the type constructor.

4 Back to real Haskell - type classes

After reading about all of these glorious examples, you may be wondering
"Why doesn't Haskell support full-featured type functions?" Hold your breath...
Haskell already contains them, and GHC has implemented all of the
capabilities mentioned above for more than 10 years! They were just named...
TYPE CLASSES! Let's translate all of our examples to their language:

The Haskell'98 standard supports type classes with only one parameter.
That limits us to only defining type predicates like this one. But GHC and
Hugs support multi-parameter type classes that allow us to define
arbitrarily-complex type functions

You can compare it to the hypothetical definition we gave earlier.
It's important to note that type class instances, as opposed to
function statements, are not checked in order. Instead, the most
_specific_ instance is automatically selected. So, in the Replace case, the
last instance, which is the most general instance, will be selected only if all the others
fail to match, which is what we want.

In many other cases this automatic selection is not powerful enough
and we are forced to use some artificial tricks or complain to the
language developers. The two most well-known language extensions
proposed to solve such problems are instance priorities, which allow
us to explicitly specify instance selection order, and '/=' constraints,
which can be used to explicitly prohibit unwanted matches:

In practice, type-level arithmetic by itself is not very useful. It becomes a
fantastic tool when combined with another feature that type classes provide -
member functions. For example:

class Collection a c wherefoldr1::(a -> a -> a)-> c -> a
classNum a where(+):: a -> a -> a
sum::(Num a, Collection a c)=> c -> a
sum=foldr1(+)

I'll also be glad to see the possibility of using type classes in data
declarations, like this:

data Safe c =(Collection c a)=> Safe c a

but as far as I know, this is not yet implemented.

UNIFICATION
...

5 Back to GADTs

If you are wondering how all of these interesting type manipulations relate to
GADTs, here is the answer. As you know, Haskell contains highly
developed ways to express data-to-data functions. We also know that
Haskell contains rich facilities to write type-to-type functions in the form of
"type" statements and type classes. But how do "data" statements fit into this
infrastructure?

My answer: they just define a type-to-data constructor translation. Moreover,
this translation may give multiple results. Say, the following definition:

dataMaybe a = Just a | Nothing

defines type-to-data constructors function "Maybe" that has a parameter
"a" and for each "a" has two possible results - "Just a" and
"Nothing". We can rewrite it in the same hypothetical syntax that was
used above for multi-value type functions:

dataMaybe a = Just a
Maybe a = Nothing

Or how about this:

data List a = Cons a (List a)
List a = Nil

and this:

dataEither a b = Left a
Either a b = Right b

But how flexible are "data" definitions? As you should remember, "type"
definitions were very limited in their features, while type classes,
on the other hand, were more developed than ordinary Haskell functions
facilities. What about features of "data" definitions examined as sort of functions?

On the one hand, they supports multiple statements and multiple results and
can be recursive, like the "List" definition above. On the other, that's all -
no pattern matching or even type constants on the left side and no guards.

Lack of pattern matching means that the left side can contain only free type
variables. That in turn means that the left sides of all "data" statements for a
type will be essentially the same. Therefore, repeated left sides in
multi-statement "data" definitions are omitted and instead of

dataEither a b = Left a
Either a b = Right b

we write just

dataEither a b = Left a
| Right b

And here we finally come to GADTs! It's just a way to define data types using
pattern matching and constants on the left side of "data" statements!
Let's say we want to do this:

data T String= D1 Int
T Bool= D2
T [a]= D3 (a,a)

We cannot do this using a standard data definition. So, now we must use a GADT definition:

data T a where
D1 ::Int-> T String
D2 :: T Bool
D3 ::(a,a)-> T [a]

Amazed? After all, GADTs seem to be a really simple and obvious extension to
data type definition facilities.

The idea here is to allow a data constructor's return type to be specified
directly:

data Term a where
Lit ::Int-> Term Int
Pair :: Term a -> Term b -> Term (a,b)...

In a function that performs pattern matching on Term, the pattern match gives
type as well as value information. For example, consider this function:

If the argument matches Lit, it must have been built with a Lit constructor,
so type 'a' must be Int, and hence we can return 'i' (an Int) in the right
hand side. The same thing applies to the Pair constructor.

6 Further reading

The best paper on type level arithmetic using type classes I've seen
is "Faking it: simulating dependent types in Haskell"
( http://www.cs.nott.ac.uk/~ctm/faking.ps.gz ). Most of
this article comes from his work.

A great demonstration of type-level arithmetic is in the TypeNats package,
which "defines type-level natural numbers and arithmetic operations on
them including addition, subtraction, multiplication, division and GCD"
( darcs get --partial --tag '0.1' http://www.eecs.tufts.edu/~rdocki01/typenats/ )

There are plenty of GADT-related papers, but the best one for beginners
is "Fun with phantom types"
(http://www.informatik.uni-bonn.de/~ralf/publications/With.pdf).
Phantom types is another name of GADT. You should also know that this
paper uses old GADT syntax. This paper is a must-read because it
contains numerous examples of practical GADT usage - a theme completely
omitted from my article.