Wednesday, December 25, 2013

You may have heard that Haskell is "great for equational reasoning", but perhaps you didn't know precisely what that meant. This post will walk through an intermediate example of equational reasoning to show how you can interpret Haskell code by hand by applying substitutions as if your code were a mathematical expression. This process of interpreting code by substituting equals-for-equals is known as equational reasoning.

Equality

In Haskell, when you see a single equals sign it means that the left-hand and right-hand side are substitutable for each other both ways. For example, if you see:

x = 5

That means that wherever you see an x in your program, you can substitute it with a 5, and, vice versa, anywhere you see a 5 in your program you can substitute it with an x. Substitutions like these preserve the behavior of your program.

In fact, you can "run" your program just by repeatedly applying these substitutions until your program is just one giant main. To illustrate this, we will begin with the following program, which should print the number 1 three times:

import Control.Monad
main = replicateM_ 3 (print 1)

replicateM_ is a function that repeats an action a specified number of times. Its type when specialized to IO is:

replicateM_ :: Int -> IO () -> IO ()

The first argument is an Int specifying how many times to repeat the action and the second argument is the action we wish to repeat. In the above example, we specify that we wish to repeat the print 1 action three times.

But what if you don't believe me? What if you wanted to prove to yourself that it repeated the action three times? How would you prove that?

Use the source!

You can locate the source to Haskell functions using one of three tricks:

Use Hayoo!, which is like hoogle, but searches a larger package database and is more strict about matches

Use Google and search for "hackage <function>". This also works well for searching for packages.

Using either of those three tricks we would locate replicateM_here and then we can click the Source link to the right of it to view its definition, which I reproduce here:

replicateM_ n x = sequence_ (replicate n x)

The equals sign means that any time we see something of the form replicateM_ n x, we can substitute it with sequence_ (replicate n x), for any choice of n or x. For example, if we choose the following values for n and x:

n = 3
x = print 1

... then we obtain the following more specific equation:

replicateM_ 3 (print 1) = sequence_ (replicate 3 (print 1))

We will use this equation to replace our program's replicateM_ command with an equal program built from sequence and replicate:

main = sequence_ (replicate 3 (print 1))

The equation tells us that this substitution is safe and preserves the original behavior of our program.

Now, in order to simplify this further we must consult the definition of both replicate and sequence_. When in doubt which one to pick, always pick the outermost function because Haskell is lazy and evaluates everything from outside to in.

Here we see two equations. Both equations work both ways, but we don't really know which equation to apply unless we know whether or not the third argument to foldr is an empty list or not. We must evaluate the call to replicate to see whether it will pass us an empty list or a non-empty list, so we consult the definition of replicate:

Here we see three equations. The first one has a predicate, saying that the equality is only valid if n is less than or equal to 0. In our case n is 3, so we skip that equation. However, we cannot distinguish which of the latter two equations to use unless we know whether repeat (print 1) produces an empty list or not, so we must consult the definition of repeat:

The buck stops here! Although repeat is infinitely recursive, we don't have to fully evaluate it. We can just evaluate it once and lazily defer the recursive call, since all we need to know for now is that the list has at least one value. This now provides us with enough information to select the third equation for take that requires a non-empty list as its argument:

Now our Haskell runtime knows enough information to realize that it needs to print a single 1. The language is smart and will execute the first print statement before further evaluating the call to foldr.

We can repeat this process two more times, cycling through evaluating repeat, take, and foldr, which emit two additional print commands:

This time the first equation matches, because n is now equal to 0. However, we also know the third equation will match because repeat will emit a new element. Whenever more than one equation matches Haskell takes the first one by default, so we use the first equation to substitute the call to take with an empty list:

There it is! We've fully expanded out our use of replicateM_ to prove that it prints the number 1 three times. For reasons I won't go into, we can also remove the final return () and finish with:

main = do
print 1
print 1
print 1

Equational reasoning - Part 1

Notice how Haskell has a very straightforward way to interpret all code: substitution. If you can substitute equals-for-equals then you can interpret a Haskell program on pen and paper. Substitution is the engine of application state.

Equally cool: we never needed to understand the language runtime to know what our code did. The Haskell language has a very limited role: ensure that substitution is valid and otherwise get out of our way.

Unlike imperative languages, there are no extra language statements such as for or break that we need to understand in order to interpret our code. Our printing "loop" that repeated three times was just a bunch of ordinary Haskell code. This is a common theme in Haskell: "language features" are usually ordinary libraries.

Also, had we tried the same pen-and-paper approach to interpreting an imperative language we would have had to keep track of temporary values somewhere in the margins while evaluating our program. In Haskell, all the "state" of our program resides within the expression we are evaluating, and in our case the "state" was the integer argument to take that we threaded through our subsitutions. Haskell code requires less context to understand than imperative code because Haskell expressions are self-contained.

We didn't need to be told to keep track of state. We just kept mindlessly applying substitution and perhaps realized after the fact that what we were doing was equivalent to a state machine. Indeed, state is implemented within the language just like everything else.

Proof reduction

Proving the behavior of our code was really tedious, and we're really interested in proving more generally reusable properties rather than deducing the behavior of specific, disposable programs. However, we spent a lot of effort to prove our last equation, so we want to pick our battles wisely and only spend time proving equations that we can reuse heavily.

So I will propose that we should only really bother to prove the following four equations in order to cover most common uses of replicateM_:

The last two equations are written in a "point-free style" to emphasize that replicateM_ distributes in a different way over multiplication. If you expand those two equations out to a "point-ful style" you get:

These four master equations are still very tedious to prove all in one go, but we can break this complex task into smaller bite-sized tasks. As a bonus, this divide-and-conquer approach will also produce several other useful and highly reusable equations along the way.

Let's begin by revisiting the definition of replicateM_:

replicateM_ n x = sequence_ (replicate n x)

Like many Haskell functions, replicateM_ is built from two smaller composable pieces: sequence_ and replicate. So perhaps we can also build our proofs from smaller and composable proofs about the individual behaviors of sequence_ and replicate.

Indeed, replicate possesses a set of properties that are remarkably similar to replicateM_:

These four replicate equations are easier to prove than the corresponding replicateM_ equations. If we can somehow prove these simpler equations, then all we have to do is then prove that sequence_ lifts all of the replicate proofs into the equivalent replicateM_ proofs:

It might not be obvious at first, but the above four equations for sequence_ suffice to transform the replicate proofs into the analogous replicateM_ proofs. For example, this is how we would prove the first replicateM_ equation in terms of the first replicate equation and the first sequence equation:

Equational reasoning - Part 2

We reduced our proofs of the replicateM_ properties to smaller proofs for replicate and sequence properties. The overhead of this proof reduction is tiny and we can gain the benefit of reusing proofs for replicate and sequence.

As programmers we try to reuse code when we program and the way we promote code reuse is to divide programs into smaller composable pieces that are more reusable. Likewise, we try to reuse proofs when we equationally reason about code and the way we encourage proof reuse is to divide larger proofs into smaller proofs using proof reduction. In the above example we reduced the four equations for replicateM_ into four equations for replicate and four equations for sequence_. These smaller equations are equally useful in their own right and they can be reused by other people as sub-proofs for their own proofs.

However, proof reuse also faces the same challenges as code reuse. When we break up code into smaller pieces sometimes we take things too far and create components we like to think are reusable but really aren't. Similarly, when we reduce proofs sometimes we pick sub-proofs that are worthless and only add more overhead to the entire proof process. How can we sift out the gold from the garbage?

I find that the most reusable proofs are category laws or functor laws of some sort. In fact, every single proof from the previous section was a functor law in disguise. To learn more about functor laws and how they arise everywhere you can read another post of mine about the functor design pattern.

Proof techniques

This section will walk through the complete proofs for the replicate equations to provide several worked examples and to also illustrate several useful proof tricks.

I deliberately write these proofs to be reasonably detailed and to skip as few steps as possible. In practice, though, proofs become much easier the more you equationally reason about code because you get better at taking larger steps.

Let's revisit the equations we wish to prove for replicate, in point-ful form:

We can't proceed further unless we know whether or not m + n is greater than 0. For simplicity we'll assume that m and n are non-negative.

We then do something analogous to "case analysis" on m, pretending it is like a Peano number. That means that m is either 0 or positive (i.e. greater than 0). We'll first prove our equation for the case where m equals 0:

Now we can use induction to reuse the original premise since m' is strictly smaller than m. Since we are assuming that m is non-negative this logical recursion is well-founded and guaranteed to eventually bottom out at the base case where m equals 0:

This completes the proof for both cases so the proof is "total", meaning that we covered all possibilities. Actually, that's a lie because really rigorous Haskell proofs must account for the possibility of non-termination (a.k.a. "bottom"). However, I usually consider proofs that don't account for non-termination to be good enough for most practical purposes.

Hopefully these proofs give an idea for the amount of effort involved to prove properties of moderate complexity. I omitted the final part of proving the sequence_ equations in the interest of space, but they make for a great exercise.

Equational Reasoning - Part 3

Reasoning about Haskell differs from reasoning about code in other languages. Traditionally, reasoning about code would require:

building a formal model of a program's algorithm,

reasoning about the behavior of the formal model using its axioms, and

proving that the program matches the model.

In a purely functional language like Haskell you formally reason about your code within the language itself. There is no need for a separate formal model because the code is already written as a bunch of formal equations. This is what people mean when they say that Haskell has strong ties to mathematics, because you can reason about Haskell code the same way you reason about mathematical equations.

This is why Haskell syntax for function definitions deviates from mainstream languages. All function definitions are just equalities, which is why Haskell is great for equational reasoning.

This post illustrated how equational reasoning in Haskell can scale to larger complexity through the use of proof reduction. A future post of mine will walk through a second tool for reducing proof complexity: type classes inspired by mathematics that come with associated type class laws.

Thursday, December 19, 2013

One of the main deficiencies of transformers is that the MonadTrans class does not let you lift functions like catchError. The mtl library provides one solution to this problem, which is to type-class catchError and throwError using the MonadError type class.

That's not to say that transformers has no solution for lifting catchError and throwError; it's just really verbose. Each module provides a liftCatch function that you use to lift a catchError function from the base monad to the transformed monad.

For example, Control.Monad.Trans.State provides the following liftCatch function:

The k in the above function will be a series of composed liftCatch functions that we will apply to catchError. However, in the spirit of the lens library we will rename these liftCatch functions to be less verbose and more hip and sexy:

This approach has a few advantages over the traditional MonadError approach:

You get improved type inference

You get type errors earlier in development. With MonadError the compiler will not detect an error until you try to run your monad transformer stack.

You get better type errors. MonadError errors will arise at a distance where you call runErrorT even though the logical error is probably at the site of the catchError function.

Functional references are first class and type classes are not

However, we don't want to lift just catchError. There are many other functions that transformers can lift such as local, listen, and callCC. An interesting question would be whether this is some elegant abstraction that packages all these lifting operations into a simple type in the same way that lenses package getters, setters, traversals, maps, and zooms into a single abstraction. If there were, then we could reuse the same references for catching, listening, and other operations that are otherwise difficult to lift: