A Typed Markup Language Based On Haskell

by Phil Freeman on 2013/05/18

In this post, I'm going to implement a simple markup language including a parser and type checker. The language's syntax will be based on Haskell's syntax, and the type system will look a little bit like System F.

Why am I interested in implementing such a markup language? Well, markup languages like JSON are good at describing the values in a dataset, but not their types, and XML extends this by allowing you to specify a schema, which corresponds to a set of typing rules. Languages like XAML allow you to extend this idea even futher by using an actual type system as a sort of global schema, and then extending the expressiveness of the language with things like subclass polymorphism.

In my opinion, XAML is great for quickly bootstrapping a DSL without having to worry about parsing, but the lack of generics makes it impossible to encode some simple constraints in the schema. I often find myself wanting a markup language which looks like Haskell and which uses similar typing rules.

The simplicity of the language below allows for a very simple implementation: typing is a simple bottom-up pass over a term.

Language Definition and Examples

I would like separate languages for type declarations and values. Here are some examples of the sorts of things I would like to be able to write (note, not all of these are expected to pass the typechecker):

Notice that the type definition language supports things like type constructors and phantom types to allow me to encode simple constraints in the type of a declaration. For example, I can enforce the fact that the Zip constructor takes two lists of the same length. Also, there are no higher order functions, since there are no user-defined functions other than constructors, and no data declarations: all types are open for extension.

The value language copies several ideas from Haskell: infix constructors using backticks, and application using both juxtaposition and $. Other than that, the syntax is very simple.

Here are the data types which represent terms and types, along with their pretty printing functions:

There is quite a bit going on here, but for the most part, the parser is assembled from reusable pieces in Text.Parsec.Token and some utility functions such as sepBy. The parser serves as a fairly good definition of the language syntax: the tm and ty parsers are just constructed using buildExpressionParser from the precedence tables for the term and type constructors respectively.

Implementing a Typechecker

The next step is to write the typechecker. This is actually quite simple since there are no function types other than constructors, and all quantification over free type variables implicitly happens at the top level.

The first step is to implement a unification function. This is very easy: types are either equal modulo variable substitution, or they do not unify at all:

One important function which is used in the typechecker is specialize, which is used to apply a polymorphic function to an argument. The idea is that specialize f x should return the most general type of f x. The helper function isFunction is used to ensure that each type variable was unified with at most one type.

This is good - the ill-typed expressions returned a typing error, and the well-typed expressions returned the correct types, including the correctly inferred placeholders for phantom types.

There are a few other small additions to the language that I would like to make such as infix operators, kind checking, better error messages and support for other primitive types.

A Mini Language For Relational Queries

As a more useful example, I've put together a minimal DSL for partially typed relational queries. The type system is used to make sure pairs of columns equated in joins have the same type, and to track the tables used in a query. Try replacing some of the constructors to see the ways in which the type system is used!

Implementing a Kind Checker

Ensuring terms are well-typed is pretty useless if the type language is itself typeless, so let's implement a kind checker for type definitions in a schema.

The kind checker below is a modified version of a type checker from an earlier post. It works by generating kind constraints by performing a bottom-up traversal of a type, and then solving tose constraints by substitution.