Thursday, August 23, 2012

These days, most people are familiar with using monads to structure functional programs, and are also familiar with Haskell's do-notation for writing monadic code. Slightly less well-known is the fact that the do-notation actually originates in Moggi's original 1991 paper on monads, in which he gave the following type-theoretic introduction and elimination rules for monadic types:

Wednesday, August 8, 2012

In this post, I'll give an OCaml implementation of the pattern match compiler from my previous post. Just for fun,
it will also pattern compilation into a series of tests.

We begin by giving some some basic types. The type var
of variables is just strings, and we have a type tp of
types, which are just units, sums, products, void types, and a default
Other to handle all non-matchable types (e.g.,
functions). The type pat is the type of patterns. We have
patterns for variables, wildcards, units, pairs, and left and right
injections.

Finally, we have the type exp of expressions. We've thrown in
the minimum number of constructs necessary to elaborate pattern matching. Namely,
we have variables, let-binding, unit and pair elimination (both given in the "open-scope"
style), an abort (for the empty type), and a case expression for sums.

First, we give some utility operations. gensym does fresh name generation. This is an effect, but a simple one, and should be easy enough to transcribe into (say) monadic Haskell.

letgensym=letr=ref 0 infun()-> incr r;Printf.sprintf "u%d"!r

The filtermap operation maps an optional function over a list, returning the non-None values. I really should work out the general story for this someday -- there's obviously a nice story involving distributive laws at work here.

Unlike the simplify_unit and simplify_pair routines, the simplification routines for sums return
an option, which we filtermap over. This is because when we want the inl patterns, the inr
patterns are well-typed, but not needed. So we need to filter them out, but it's not actually an error to see them in the
branch list.

Monday, August 6, 2012

One of the most practically useful features of functional programming
languages is pattern matching, and one of the most important
quality-of-implementation features for an implementation is the
exhaustiveness checking for pattern matches. In this post, I will give
the simplest algorithm for coverage checking that I know.

I previously gave one in my paper Focusing
on Pattern Matching. I was never completely satisfied with
that algorithm, since it was harder to understand than I liked. It
worked by giving a judgment to prove that no match would fail, rather
than directly proving that all matches covered. The negation made
thinking about extensions (such as pattern matching for dependent
types) harder than it should be, and it also made it harder to get a
match compilation algorithm.

So in this blog post, I'll give a better algorithm. It's
super-simple, and is about as naive as it should be -- there are only
six rules in total. It's not totally industrial strength, but (a) it's
certainly good enough to use in a prototype compiler, and (b) it's easy
enough to beef up.

There's also actually some cool proof theory yet to be worked out lurking
here. I'll point it out as we go along.
$$
\newcommand{\cover}[2]{{#1} \;\mathsf{covers}\;{#2}}
\newcommand{\elaborate}[3]{{#1} \;\mathsf{covers}\;{#2} \leadsto {#3}}
\newcommand{\To}{\Rightarrow}
$$

The types are sums and products, plus a type $o$ to indicate functions and
other non-matchable types. The patterns are just what you'd expect -- $()$ for
the unit pattenrn, $(p, p')$ for pairs, and $\inl(p)$ and $\inr(p)$ for the
left and right branches of sums. (Of course, there is no pattern for the
empty type!) We also have wildcard $\_$ and variable $x$ patterns.

Since our algorithm works inductively, we will destructure patterns into their
components, and we will need to keep track of lists of patterns. A pattern
list (with metavariable $ps$) is just the obvious thing --- a comma-separated
list of patterns.

We also need branch lists -- a pattern match clause consists of a list of
of patterns and expressions. So we use branch lists to say what body is associated
with each pattern list (a pattern list, since we are going to destructure the
pattern given for each arm).

We will also write $\overrightarrow{ps \To e}$ to indicate branch lists. If I write
a list like $\overrightarrow{(p_1, p_2), ps \To e}$, then I mean that every element
of the branch list starts with a pair pattern, and similarly for the other
constructors.

This algorithm keeps track of the type of the value being destructured, so we
introduce a typed context $\Gamma$ to keep track of the types of the expressions
being destructured.

The first two rules handle wildcards and variables. The first rule
says that if the lead pattern of every pattern list in the branch set
is a wildcard, we can just drop it and keep going. The second rule
says that you can turn any lead variable pattern into a wildcard
pattern.

The third rule handles the unit pattern. It is nice and easy --- it
says that if every lead pattern is either a unit or wildcard pattern,
we can drop it and keep going.

The fourth rule handles tuples, and is almost as easy. It says that
if we have a lead pattern $(p_1, p_2), ps \To e$, we can rewrite it to
$p_1, p_2, ps \To e$. We also have to remember to double any lead
wildcard pattern $\_, ps' \To e'$ into $\_, \_, ps' \To e'$, since we
are destructuring a product into two subpatterns.

The fifth rule handles the void type. It just succeeds, since there
are no patterns of empty type!

The sixth rule handles sums. It says that if we are scrutinizing a
value of type $A + B$, then we can separate the branch list into those
branches handling the left case $\inl(p), ps \To e$ and those branches
handling the right case $\inr(p'), ps' \To e'$, and then check then at
$A$ and $B$ respectively. Here, we have to remember to send wildcards
$\_, ps'' \To e''$ to both sides, since a wildcard can match
both the left- and right-cases.

And that's it!

The open problem in proof theory is a bit of slight of hand in my
notation. You understand exactly what I mean when I take a
meta-pattern like $\overrightarrow{(p_1, p_2), ps \To e}$, and turn it into $\overrightarrow{p_1, p_2,
ps \To e}$. However, formalizing this is trickier than it looks, and
it's not known how to give a proof-theoretic characterization of this
kind of pattern (according to Rob Simmons, who I asked early this year
about it).

Thursday, August 2, 2012

IMO, this is one of the best introductions to category theory around, and it's a shame that it's been out of print for so long. (I don't actually own a copy myself -- I just repeatedly borrow my spouse's copy.) Many thanks to them for their kindness in making it freely available.