A Working Programmer's Guide to Type-Indexed Values

Parametric polymorphism is a basic mechanism in ML for writing code that is
generic, i.e., that can be used on multiple different types. To get the basic
idea of how what parametric polymorphism is, think about the following simple
example.

moduleM:sig(* Takes a list and returns a stuttered version, e.g., [1;2;3] is mapped to [1;1;2;2;3;3] *)valdouble:'alist->'alistend=structletrecdouble=function|[]->[]|hd::tl->hd::hd::doubletlend

In the type signature for double, the expression 'a is a type variable,
meaning that this function can be used with an arbitrary type plugged in for
'a. The reason that the type variable shows up is that the code of double
doesn’t depend in any way on the properties of the elements of the list.
At first glance, parametric polymorphism doesn’t seem all that
powerful. After all, how useful is it to write functions that can only be
generalized over all possible types? More often than not you want to write
functions that can be used over some narrow universe of types with particular
properties. Object oriented languages provide this functionality with subtyping,
and Haskell lets you get at this with type-classes. What do you do in ML?

It turns out that ML does allow you to write functions like this in a quite
straightforward and ordinary way. A simple example can be found be examining the
signature of a standard sort function:

valsort:cmp:('a->'a->int)->'alist->'alist

The signature above can be used on a value of any type, provided that you can
also provide a comparison function for that type. In other words, a polymorphic
function can take advantage of the idiosyncratic capabilities of the types it
deals with, but the ability to take advantage of those capabilities must be
passed in along with the values in question.

This is used in small ways throughout any non-trivial ML codebase. But we can
use this in a more structured way by creating what are sometimes called
type-indexed values. A type-indexed value is a value used to represent a set
of capabilities associated with a type. Here’s an example of a simple
type-indexed value for capturing number-ness. In what follows, the type-indexed
value is Num.Type.t, and the rest of the Num module is just utility
functions to make the interface pretty.

You’ll note that the definition above of Type.int and Type.float are
basically boilerplate. Because the modules in question themselves have a fairly
standardized interface, we could instead use a functor to create these
type-indexed values without the boilerplate:

This is yet another advantage one gets from having standardized interfaces.

If type indexed-values look similar to Haskell’s type-classes, it’s because they
are. In my limited understanding of Haskell, the implementation is similar as
well, in that under the cover, Haskell passes around dictionaries of functions
which play the same role that the Type.ts play here.

The number typeclass described above is just an example, and not something I’ve
felt the need for in practice. But here are some places where we’ve used
type-indexed values to good effect:

Serialization

The latest (unreleased) version of the bin_prot macros that we use for binary
serialization and deserialization now come with a type-indexed value that ties
together all the little bits that you need to use the library. Before we did
that, one could only instantiate useful bin-prot functionality using the module
language. Now, we can do it using ordinary polymorphic functions.

Little languages

Sometimes we design domain-specific languages embedded in the type system. It is
often useful to have values representing the different types that can be
generated in the language. For example, we use this as part of a set of SQL
bindings to represent types that we know how to convert to and from SQL.

Containers

We’ve started experimenting with type-indexed values representing the
container-hood of a given object. This is a little trickier than the previous
examples, since the type-indexed value has two type parameters, one for the type
of the container, and one for the type of the elements of the container. In the
end, this let’s you write functions with signatures like

max:('a,'b)Container.t->cmp:('a->'a->int)->'b->'a

and use it to find the maximum element of a list (where the type-indexed value
has type ('a, 'a list) Container.t) or an array (('a, 'a array) Container.t)
or a string ((char,string) Container.t).

Type-indexed values obviously have their downsides: they can be somewhat
inconvenient syntactically, since you need to explicitly pass them along; and
they sacrifice some performance because it leads you to call closures where you
could otherwise call static functions that could be inlined instead. But
overall, they are a flexible and elegant way of writing generic code in ML.