GADTs for dummies

From HaskellWiki

For a long time, I didn't understand what GADTs are and how they can be used. It was a sort of conspiracy of silence - people who understand GADTs think
it is all obvious, and don't need any explanation, but I still
couldn't understand.

Now I have an idea how it works, and think that it was really obvious :) and I want to share my understanding - may be my way to realize GADTs can help
someone else. See also [[Generalised_algebraic_datatype]

declares a TYPE FUNCTION named "X". Its parameter "a" must be some type
and it returns some type as its result. We can't use "X" on data values,
but we can use it on type values. Type constructors declared with
"data" statements and type functions declared with "type" statements
used together to build arbitrarily complex types. In such
"computations" type constructors serves as basic "values" and type
functions as a way to process them.

Here, we just defined list of simple types, the implied result of all
written statements for "IsSimple" is True value, and False value for
anything else. Essentially, "IsSimple" is no less than TYPE PREDICATE!

I really love it! :) How about constructing a predicate that traverses a
complex type trying to decide whether it contains "Int" anywhere?

3 One more hypothetical extension - multi-value type functions

Let's add more fun! We will introduce one more hypothetical Haskell
extension - type functions that may have MULTIPLE VALUES. Say,

type Collection a =[a]
Collection a = Set a
Collection a = Map b a

So, "Collection Int" has "[Int]", "Set Int" and "Map String Int" as
its values, i.e. different collection types with elements of type
"Int".

Pay attention to the last statement of the "Collection" definition, where
we've used type variable "b" that was not mentioned on the left side
nor defined in any other way. It's perfectly possible - anyway
"Collection" function has multiple values, so using on the right side
some free variable that can be replaced with any type is not a problem
at all - the "Map Bool Int", "Map [Int] Int" and "Map Int Int" all are
possible values of "Collection Int" along with "[Int]" and "Set Int".

On the first look, it seems that multiple-value functions are
meaningless - they can't be used to define datatypes, because we need
concrete types here. But on the second look :) we can find them
useful to define type constraints and type families.

We can also represent multiple-value function as predicate:

type Collection a [a]
Collection a (Set a)
Collection a (Map b a)

If you remember Prolog, you should guess that predicate, in contrast to
function, is multi-purpose thing - it can be used to deduce any
parameter from other ones. For example, in this hypothetical definition:

head| Collection Int a :: a ->Int

we define 'head' function for any Collection containing Ints.

And in this, again, hypothetical definition:

data Safe c | Collection c a = Safe c a

we deduced element type 'a' from collection type 'c' passed as the
parameter to the type constructor.

4 Back to real Haskell - type classes

Reading all those glorious examples you may be wondering - why Haskell
don't yet supports full-featured type functions? Hold your breath...
Haskell already contains them and at least GHC implements all the
mentioned abilities more than 10 years ago! They just was named...
TYPE CLASSES! Let's translate all our examples to their language:

Haskell'98 standard supports type classes with only one parameter that
limits us to defining only type predicates like this one. But GHC and
Hugs supports multi-parameter type classes that allows us to define
arbitrarily-complex type functions

You can compare it to the hypothetical definition we gave earlier.
It's important to note that type class instances, as opposite to
function statements, are not checked in order. Instead, most
_specific_ instance automatically selected. So, in Replace case, the
last instance that is most general will be selected only if all other
are failed to match and that is that we want.

In many other cases this automatic selection is not powerful enough
and we are forced to use some artificial tricks or complain to the
language developers. The two most well-known language extensions
proposed to solve such problems are instance priorities, which allow
to explicitly specify instance selection order, and '/=' constraints,
which can be used to explicitly prohibit unwanted matches:

At practice, type-level arithmetics by itself is not very useful. It
becomes really strong weapon when combined with another feature that
type classes provide - member functions. For example:

class Collection a c wherefoldr1::(a -> a -> a)-> c -> a
classNum a where(+):: a -> a -> a
sum::(Num a, Collection a c)=> c -> a
sum=foldr1(+)

I'll be also glad to see possibility to use type classes in data
declarations like this:

data Safe c =(Collection c a)=> Safe c a

but afaik this is also not yet implemented

UNIFICATION
...

5 Back to GADTs

If you are wonder how relates all these interesting type manipulations
to GADTs, now is the time to give you answer. As you know, Haskell
contains highly developed ways to express data-to-data functions. Now
we also know that Haskell contains rich facilities to write
type-to-type functions in form of "type" statements and type classes.
But how "data" statements fits in this infrastructure?

My answer: they just defines type-to-data constructors translation.
Moreover, this translation may give multiple results. Say, the
following definition:

dataMaybe a = Just a | Nothing

defines type-to-data constructors function "Maybe" that has parameter
"a" and for each "a" has two possible results - "Just a" and
"Nothing". We can rewrite it in the same hypothetical syntax that was
used above for multi-value type functions:

dataMaybe a = Just a
Maybe a = Nothing

Or how about this:

data List a = Cons a (List a)
List a = Nil

and this:

dataEither a b = Left a
Either a b = Right b

But how are flexible "data" definitions? As you should remember,
"type" definitions was very limited in their features, while type
classes, vice versa, much more developed than ordinary Haskell
functions facilities. What about features of "data" definitions
examined as sort of functions?

On the one side, they supports multiple statements and multiple
results and can be recursive, like the "List" definition above. On the
other side, that's all - no pattern matching or even type constants on
the left side and no guards.

Lack of pattern matching means that left side can contain only free
type variables, that in turn means that left sides of all "data"
statements for one type will be essentially the same. Therefore,
repeated left sides in multi-statement "data" definitions are omitted
and instead of

dataEither a b = Left a
Either a b = Right b

we write just

dataEither a b = Left a
| Right b

And here finally comes the GADTs! It's just a way to define data types
using pattern matching and constants on the left side of "data"
statements! How about this:

If the argument matches Lit, it must have been built with a
Lit constructor, so type 'a' must be Int, and hence we can return 'i'
(an Int) in the right hand side. The same objections applies to the Pair
constructor.

6 Further reading

The best paper on type level arithmetic using type classes i've seen
is "Faking it: simulating dependent types in Haskell"
( http://www.cs.nott.ac.uk/~ctm/faking.ps.gz ). Most part of my
article is just duplicates his work.

The great demonstration of type-level arithmetic is TypeNats package
which "defines type-level natural numbers and arithmetic operations on
them including addition, subtraction, multiplication, division and GCD"
( darcs get --partial --tag '0.1' http://www.eecs.tufts.edu/~rdocki01/typenats/ )

There are plenty of GADT-related papers, but best for beginners
remains the "Fun with phantom types"
(http://www.informatik.uni-bonn.de/~ralf/publications/With.pdf).
Phantom types is another name of GADT. You should also know that this
paper uses old GADT syntax. This paper is must-read because it
contains numerous examples of practical GADT usage - theme completely
omitted from my article.

The first line i added tell the compiler that Collection predicate has
two parameters and the second parameter determines the first. Based on
this restriction, compiler can detect and prohibit attempts to define
different element types for the same collection: