I’m a software engineer at a place. I like the work and the people, and I learn a lot from my teammates. Many of them work very hard, so much that they don’t enjoy programming for fun anymore. I still love recreational programming, but in a peculiar sense.

When I come home from work, I try to prove theorems in a proof assistant. Usually the theorems are related to functional programming or type systems.

It’s a masochistic hobby. Convincing a computer that a theorem is true can be quite difficult compared to convincing a human. But if I can get a computer to accept my proof, then a) I must have a pretty good understanding of it, and b) it really must be right!

I use the Coq theorem prover for formalizing proofs. Coq is a beautiful and simple language [1]—far simpler than the languages I use at work! Its beauty is hidden beneath a rather unsightly IDE [2]:

The proof is on the left. The green highlighted part has been verified by Coq. As I am working on a proof, I can ask Coq to step forward through it with Ctrl+Down, and I can use Ctrl+Up to step backward. At any point if my proof is wrong, Coq will highlight the error in red. It’s similar to how a type checker works—actually, it’s exactly the same in a very deep sense. I’d love to explain what I mean by that, but I’ll have to save that for separate article.

In the screenshot, I’m currently in the middle of a proof. The upper right pane shows what I need to prove (in this case, f x = x) and also what hypotheses I can use to prove it (shown above the horizontal line). The lower right pane shows helpful information if I request it, such as the type of something or a list of definitions/lemmas/theorems related to it.

Recently I set up a small GitHub repository with a continuous integration system that checks my proofs when I push to a branch, preventing me from merging it into master if there is a mistake. It’s impossible to commit incorrect math to this repository [3]!

Proving theorems in a machine-checked language is invigorating once you get the hang of it. It feels like doing a puzzle. And when you finish a long proof of a nontrivial result, it’s a remarkable sensation unlike anything else.

Case study: domain theory

When you write down a recursive function in any language, it’s actually not obvious that such a definition is mathematically well-defined. This week I was reading about the theoretical underpinning of recursive definitions: fixed points. Whenever you define something recursively, you are technically taking a fixed point of a continuous function (yes, you read that correctly!). There is some really cool math behind this called domain theory.

Before this week, I had a vague understanding of all this. I do a fair amount of functional programming, but I hardly knew anything about domain theory. Today, I finished proving the main result: the Kleene fixed-point theorem. This was a difficult proof for me, but now I consider myself pretty well-versed in the basic ideas of domain theory.

The proof is not very remarkable on paper, but convincing a computer to accept it is another matter entirely. I started by writing down the definitions below. The first step to understanding a theory is being able to formally define the main ideas! I don’t expect the casual reader to dissect all these definitions, but I include them here to give you a sense of what it looks like to write them in Coq:

To give you an idea of what an actual proof looks like in Coq, below is a proof of one of the easier lemmas above. The proof is virtually impossible to read without stepping through it interactively, so don’t worry if it doesn’t make any sense to you here.

There’s nothing better than typing Qed at the end of a long proof and watching it turn green as Coq verifies it. What’s really amazing to me is that Stephen Kleene probably proved this without the help of a computer, but I was able to verify it with the highest possible scrutiny. It’s as if he wrote an impressive program without ever having run it, and it turned out not to have any bugs! This is more than just an analogy. When you work with Coq, you become intimately familiar with the interpretation of propositions as types and proofs as programs.

Case study: soundness of a type system

A successful formula for writing papers about programming languages theory is the following:

Define the syntax of a small but novel language (usually based on the lambda calculus).

Define a type system for the language (a set of rules that specify which programs are legal).

Define a semantics for the language (i.e., describe what happens when you run a program).

Prove that the type system prevents programs from “getting stuck” according to the semantics.

The theorem that is proven in that last step is called soundness. A few months ago, I wanted to try doing a formally-verified soundness proof.

The simply-typed lambda calculus is essentially the smallest programming language with higher-order functions and a meaningful type system [4]. I started by defining the abstract syntax of the language in Coq:

To keep this article short, I won’t describe how the language works. There are many good introductions. My favorite is Chapter 9 of Benjamin C. Pierce’s wonderful book called Types and Programming Languages.

Below are the typing rules. You can add whatever base types you like (e.g., Booleans, integers, etc.) to the simply-typed lambda calculus, but to keep things simple I only have a rather useless “unit” type which has only a single value (akin to an empty tuple).

To specify the semantics, I defined a small-step operational semantics. The main idea is that you define a relation which says what steps a program takes when it is run. There are many ways to define the semantics of a programming language, and this is just one of them.

The proof, including definitions and lemmas, is about 500 lines long. For such a simple language, this was quite a difficult undertaking! I recall working on this for an entire weekend, probably 8 hours both days. Now I really appreciate when research papers include a formally verified soundness proof.

Conclusion

I think of Coq as an extension of my own ability to reason logically. When I’m in doubt about something, I open the Coq IDE and try to prove it. I think the reason this is so valuable to me is that I often mull over functional programming, types, logic, algorithms, etc. These kinds of things are well-suited to formalization.

Unfortunately, machine-checked theorem proving is not very easy to learn, which is probably why it’s so esoteric. It’s difficult to prove nontrivial results without a graduate-level understanding of type theory (even then it’s still difficult). The most accessible introduction is Benjamin C. Pierce’s Software Foundations, which is freely available online. After that, read Adam Chlipala’s Certified Programming with Dependent Types, which is also freely available online.

I am not an expert in Coq by any means, and my proofs are probably longer and clunkier than they need to be. I only know basic techniques, but a nice thing about logic is that the underlying rules are very simple. I have reached a point where I can prove anything in Coq that I can rigorously prove on paper (given enough time). The two proofs I described in this article took me a few days each, and I’m steadily getting better.

Footnotes

Strictly speaking, the language itself is called Gallina, and “Coq” refers to the system as a whole.

Subtyping is a tricky topic in programming language theory. The trickiness comes from a pair of frequently misunderstood phenomena called covariance and contravariance. This article will explain what these terms mean.

The following notation will be used:

A ≼ B means A is a subtype of B.

A → B is the type of functions for which the argument type is A and the return type is B.

x : A means x has type A.

A motivating question

Suppose I have these three types:

Greyhound ≼ Dog ≼ Animal

So Greyhound is a subtype of Dog, and Dog is a subtype of Animal. Subtyping is usually transitive, so we’ll say Greyhound is also a subtype of Animal.

Question: Which of the following types could be subtypes of Dog → Dog?

Greyhound → Greyhound

Greyhound → Animal

Animal → Animal

Animal → Greyhound

How do we answer this question? Let f be a function which takes a Dog → Dog function as its argument. We don’t care about the return type. For concreteness, we can say f : (Dog → Dog) → String.

Now I want to call f with some function g. Let’s see what happens when g has each of the four types above.

1. Suppose g : Greyhound → Greyhound. Is f(g) type safe?

No, because f might try to call its argument (g) with a different subtype of Dog, like a GermanShepherd.

2. Suppose g : Greyhound → Animal. Is f(g) type safe?

No, for the same reason as (1).

3. Suppose g : Animal → Animal. Is f(g) type safe?

No, because f might call its argument (g) and then try to make the return value bark. Not every Animal can bark.

4. Suppose g : Animal → Greyhound. Is f(g) type safe?

Yes—this one is safe. f might call its argument (g) with any kind of Dog, and all Dogs are Animals. Likewise, it may assume the result is a Dog, and all Greyhounds are Dogs.

What’s going on?

So this is safe:

(Animal → Greyhound) ≼ (Dog → Dog)

The return types are straightforward: Greyhound is a subtype of Dog. But the argument types are flipped around: Animal is a supertype of Dog!

To state this strange behavior in the proper jargon, we allow function types to be covariant in their return type and contravariant in their argument type. Covariance in the return type means A ≼ B implies (T → A) ≼ (T → B) (A stays on the left of the ≼, and B stays on the right). Contravariance in the argument type means A ≼ B implies (B → T) ≼ (A → T) (A and B flipped sides).

Fun fact: In TypeScript, argument types are bivariant (both covariant and contravariant), which is unsound (although now in TypeScript 2.6 you can fix this with --strictFunctionTypes or --strict). Eiffel also got this wrong, making argument types covariant instead of contravariant.

What about other types?

Question: Could List<Dog> be a subtype of List<Animal>?

The answer is a little nuanced. If lists are immutable, then it’s safe to say yes. But if lists are mutable, then definitely not!

Why? Suppose I need a List<Animal> and you pass me a List<Dog>. Since I think I have a List<Animal>, I might try to insert a Cat into it. Now your List<Dog> has a Cat in it! The type system should not allow this.

Formally: we can allow the type of immutable lists to be covariant in its type parameter, but the type of mutable lists must be invariant (neither covariant nor contravariant) in its type parameter.

Years ago my colleague Gustavo asked how I would represent physical units like m/s or kg*m/s^2 as types so the compiler can check that they match up and cancel correctly. F# supports this natively, but it felt weird to have it baked into the type system. It seemed too ad hoc, though I didn’t know of anything better.

Today I was thinking about this again, and I found a way to do it in Haskell. The main idea is to represent units of measure as function spaces. For example, the unit m/s can be encoded as the type Seconds -> Meters. The numerator becomes the return type, and the denominator the argument type. Function types can be composed to form more interesting units, such as Seconds -> Seconds -> Meters for acceleration. Products can be represented by higher-order functions. A special type class will enable us to easily convert a Double into a quantity with any units of measure and vice versa.

There are a handful of packages on Hackage for doing dimensional analysis. This article will demonstrate a simple, portable way to do it without relying on any language extensions. As we will see in Example 3, this implementation sometimes requires the programmer to provide proofs of unit equivalence (which can be tedious at times). This is less convenient than other libraries, but it’s an interesting exhibition of the power of vanilla Haskell.

Units as function spaces

First, we define some types for base units. We will never instantiate these types, so we don’t specify any constructors. They are only used for type checking.

data Meter
data Kilogram
data Second

Next, we need types which actually represent quantities with units. For the base units above, we define BaseQuantity as follows:

newtype BaseQuantity a = BaseQuantity Double

It’s just a wrapper for Double, but written as a phantom type. The type parameter a keeps track of the base unit. For example, BaseQuantity Meter is a type which represents a length.

A quotient a / b of two units a and b will be represented by the function space b -> a. For example, m / s becomes BaseQuantity Second -> BaseQuantity Meter. To make intentions clear, we formalize this idea as a type synonym:

type Quotient a b = b -> a

We also need a BaseQuantity for dimensionless quantities like π. We could define a new base type for this, but () does the job nicely:

type Dimensionless = BaseQuantity ()

We can also define multiplicative inverse a^-1 as the quotient 1 / a.

type Inverse a = Quotient Dimensionless a

A product a * b can be represented as a / b^-1:

type Product a b = Quotient a (Inverse b)

A helpful synonym to make square units like m^2 easier to read:

type Square a = Product a a

All quantities have some numeric value. We formalize this in Haskell using a type class:

Quotients of quantities are quantities as well. To construct a Quotient from a Double, we define a function which destructs its argument, multiplies the result by the given Double, and constructs a quantity of the return type (the numerator unit). To destruct a Quotient, we first construct 1 in the denominator unit (the argument type), use the quotient to convert it into the numerator unit, and destruct the result.

Here we demonstrate the correspondence between quantities and functions. Velocity is a synonym for Length / Time, but it’s also a function from Time to Length. Given a Time, we can simply “apply” a Velocity to it to get a Length:

tripDistance :: Length
tripDistance = trainVelocity tripDuration

So multiplication of a / b by b is just function application.

Example 3: Manual proofs of unit equality

Let’s define a function that takes a Length and a Velocity and returns a Time. First try:

Haskell doesn’t know that Length / Velocity = Time. If this is indeed true, there will be a way to manipulate the program (without destructing the quantity) to make it type check by swapping arguments, defining new functions, using the quotientAxiom, etc. This is similar to what one would do in a proof assistant like Agda or Coq.

So we have:

distance ./. velocity :: Length / Velocity

Velocity is a type synonym for Length / Time, so we actually have:

distance ./. velocity :: Length / (Length / Time)

We can apply the quotientAxiom to get:

quotientAxiom (distance ./. velocity) :: Time / (Length / Length)

Under the interpretation of units as function spaces, we have:

quotientAxiom (distance ./. velocity) :: (Length -> Length) -> Time

We can apply id to cancel the Lengths and get a Time. Putting it all together:

Conclusion

We can do type safe dimensional analysis by encoding units as function spaces. The basic pattern is:

Construct whatever quantities you want using the construct function. A type annotation can be used to specify the units, or you can let type inference figure out the units automatically.

Do type safe operations on these quantities using the operations .+., .*., etc. You may need to manually rearrange units or cancel them to satisfy the type checker, but this will always be possible if your units are actually correct (exercise for the reader: prove it!).

When you are done with calculation and want to use the resulting quantities, the destruct function will convert them into Doubles.

It’s inconvenient that we sometimes have to provide manual proofs of unit equivalence. It would be nice if rearranging and canceling units was completely automatic. But at least we have type safety!

Recently I was working on a parser, and I thought the grammar might be ambiguous. I looked around for a tool to help me detect ambiguities in context-free grammars, but I couldn’t find anything that worked. So I built CFG Checker!

It is only semi-decidable to determine whether an arbitrary context-free grammar is ambiguous. The best we can do is generate all derivations in a breadth-first fashion and look for two which produce the same sentential form. So that’s exactly what CFG Checker does. If the input grammar is ambiguous, CFG Checker will eventually find a minimal ambiguous sentential form. If the grammar is unambiguous, CFG Checker will either reach this conclusion or loop forever.

Here’s how it works. You specify the grammar as a series of production rules:

My colleague Esther proposed the following challenge: given a string, decompose it into elemental symbols from the periodic table (if possible). For example, Hi Esther becomes H I Es Th Er. In general there might be no solutions, one solution, or several.

I implemented it in Haskell with dynamic programming. The elementize function does all the work, using the list monad to compute all possible solutions.