Mainly math and haskell

Monads as used in functional programming are ‘one way’. The monad type class provides a way to put values into the monad (with return) but no way to get values out.

123

classMonadmwherereturn::a->ma(>>=)::ma->(a->mb)->mb

If you want to get values out of a monad you have to write a function yourself. Such a function would have a type h :: m a -> a where m is a particular monad. It’d be nice if there were some guarantees about these functions should behave. We’re stepping outside the jurisdiction of the monad laws, so you don’t get any help there. So what are the relevant laws?

If you put a value into the monad and immediately extract a value, you want to get the same value you put in. In other words you want h (return a) = a, which is the same as

1

h.return=id

If you have two layers of monad m (m a) then there are two ways you could get a value out. You could lift h into the monad with liftM (or fmap) to get values out of the inner monad, and then hit the whole thing with h. Alternatively you could use the monadic join to collapse the two layers of the monad into one, and then apply h once. It would be nice if you get the same result either way, i.e. if h (join x) = h (fmap h x), or

1

h.join=h.fmaph

Taken together, these two laws are exactly the laws of an algebra over the monad. Categorically, an algebra over a monad M is an object A together with a map h : MA → A such that the following diagrams commute:

Note that the categorical definitions says that you have to specify an object as well as a morphism. In the category Hask objects are types. If we want to specify a type class for algebras, it must be a multi-parameter type class where you specify the underlying type as well as the monad - i.e. the algebra doesn’t have to be polymorphic over all types, it is allowed to vary depending on the type of values in the monad.

In terms of do notation, the algebra laws mean that the following program transformation is valid, where xs has type m (m a):

12

h$dox<-xs=h$dox<-xsreturn(hx)x

Examples

Enough theory - what are some examples of algebras? For the list monad, both sum and product are list algebras for any numeric type. You can check that they satisfy the algebra laws, i.e.

12

sum(returnx)==x-- for any x :: asum(joinxs)==sum(fmapsumxs)-- for any xs :: [[a]]

and similarly for product.

Note that the function length isn’t a list algebra, since length (join [[1],[2,3]]) is 3, but length (fmap length [[1],[2,3]]) is 2.

The function head is nearly a list algebra, only failing because it isn’t defined for the empty list. In fact it’s an algebra over the type of non-empty lists.

Similarly maximum and minimum are algebras over the type of non-empty lists.

More generally, if a is an instance of the Monoid type class, then foldr mappend mempty is a list algebra. This is reflected in the definition of the fold function in Data.Foldable, which has type

1

fold::(Foldablef,Monoidm)=>fm->m

showing that fold is an f-algebra for any f that is an instance of Foldable. This gives algebras over Maybe, Either a, Tree and many more.

Given any value a of type r then h m = runReader m a is a Reader r-algebra.

Similarly h m = evalState m a is a State s-algebra, and fst . runWriter is a Writer w-algebra.

For the monad of probability spaces, the expectation is an algebra. If P(PA) is a distribution of probability distributions, then it doesn’t matter if we take inner expectations before taking the outer expectation, or if we collapse the entire distribution first and takes its expectation. This is related to the law of iterated expectation. Note that higher moments aren’t algebras, and nor are central moments.

I had hoped that for the parser monad I could define an algebra by

12345678

str::String-- any string that can be parsedright::Eitherab->bright(Left_)=error""right(Rightb)=bh::Parsera->ahp=right(parsep""str)

However, this fails the algebra laws. In particular, if we have a parser of type Parser (Parser a), the outer parser might need to read some of the input string before the inner parser will match correctly, so we can’t just map over the inner parser. So I wonder if there are any algebras for the parser monad?

For the IO monad, the obvious candidate for an algebra is

1

unsafePerformIO::IOa->a

My intuition is that this shouldn’t be an algebra, i.e. it should fail to respect one of the algebra laws. However, I haven’t been able to construct an example where it fails yet. Anyone have any ideas?

There are many different trees t that all return the same value when you call either runTree t True or runTree t False. If two trees s and t return the same value under runTree for all values of the boolean argument, say that s is equivalent to t and write s ~ t.

This satisfies the axioms for an equivalence relation – reflexivity (every tree is equivalent to itself), symmetry (if s ~ t then t ~ s) and transitivity (if r ~ s and s ~ t then r ~ t). So it makes sense to consider the set of equivalence classes of binary trees under this relation.

What may be surprising is that the type of “equivalence classes of binary trees” is also a monad. To define the monad we have to define return and bind for it. It’s nontrivial to do this in Haskell, but relatively easy mathematically.

Let C (for class) be the operation that maps a tree to its equivalence class, and R (for representative) be the operation that selects a particular tree from an equivalence class. Then it must be true that C(R(x)) = x. It isn’t true in general R(C(t)) = t, but we will always get a tree that’s equivalent to the one we put in.

We define ‘return a’ for the monad to be the equivalence class C(Leaf a).

To define x ≫= f first choose a representative of x and bind it to the function R ○ f using the normal bind operation for trees. Then take the equivalence class of the resulting tree. Symbolically,

So we have a monad - but what monad is it? Interestingly, it is exactly equivalent to Reader Bool.

To see why, note that two trees whose far left branch and far right branch are the same are equivalent (because runTree either traverses all the way down the left of the tree, or all the way down the right). So specifying an equivalence class of trees just requires specifying a pair of values of type a, i.e. the type a2. This is equivalent to the type 2 → a, which is the same as functions Bool -> a, which is the same as Reader Bool a.

Generalizing to all readers

To generalize to arbitrary readers we should consider r-ary trees Tree r a rather than binary trees – that is, trees that have r branches at each node. To deal with fact that r might be infinite we work with functions r -> Tree r a, which are equivalent to (Tree r a)^r, an r-tuple of trees. In Haskell,

and again we can write a function runTree that, given an r, takes the same branch of the tree repeatedly until it reaches a value:

123

runTree::Treera->r->arunTree(Leafa)r=arunTree(Nodeg)r=runTree(gr)r

Consider two trees equivalent if they give the same result when fed to runTree r, for all values of r. Then the equivalence class of trees is a monad in the same way as before, and this monad is exactly the Reader r monad. To make this really obvious we could rename the data type and constructors, and define the runReader and ask functions:

Thoughts

One reason for thinking of readers as trees rather than using the more familiar definition is that the monadic join operator on trees T maps T ○ T structures of size n to T structures of size n (where by ‘size’ I mean the number of leaves in the tree). This isn’t true for monads in general – for example, two layers of Reader Bool a is isomorphic to a4, which gets mapped to a2 by monadic join. So join preserves the combinatorial interpretation of the tree.

This is related to some stuff I’m thinking about to do with combinatorial species and how they relate to functional programming. A comment by Brent Yorgey sparked the idea for this post.

It seems likely that given any monad T and a family of equivalence relations ~a on Ta, the set of equivalence classes under the relation ~ is always a monad, but I haven’t thought much about it.

I think this is all extremely similar to a post written a few years back by Dan Piponi where he worked throught he details for the state monad instead of reader.

Most languages have some form of printf function that lets you print formatted strings. In Python it looks like this

1

print"The %s river is %d miles long"%("Amazon",4000)

In C it looks like this

1

printf("The %s river is %d miles long\n","Amazon",4000)

What they have in common is that they take a format string as their first argument, followed by a variable number of arguments, of varying type. The arguments are formatted, interpolated into the string and then printed to the screen.

The immediate problem you face if you want to write printf in Haskell is the type system. All functions have a type, and the type determines what arguments it can accept. You can write polymorphic functions that accept a range of input types, but writing a function that accepts a variable number of arguments? That seems tough.

I recently read through the source of the Text.Printf module, and was impressed by how concise and self-contained it is, and especially by the neat trick that allows printf to take a variable number of arguments. In this post I’ll walk through creating a simplified, cut-down version of Text.Printf to give a flavor of the module.

I’ll need the following uncontroversial language extensions:

1

{-# LANGUAGE TypeSynonymInstances, FlexibleInstances #-}

First define a data type to hold things that are possible arguments to printf (in our cut-down module, we will only care about interpolating strings and integers into the format string).

12

dataUPrintf=StringString|IntegerInteger

With this in hand, we could write the universal printf function, uprintf. This takes a format string and a list of arguments to be interpolated into the string, and returns the formatted string. Its type signature is therefore:

1

uprintf::String->[UPrintf]->String

All printf functionality will eventually be implemented in terms of this function. We’ll hold off on giving its implementation until later in the module, and focus on how we can write the polyvariadic, polymorphic printf function.

We get polymorphism using type classes. Define a type class for types that can be converted into a UPrintf, along with a couple of useful instances:

There are multiple valid return types for the printf function. For example, we might want to return a String, or an IO action that outputs to a file or to stdout, or more generally any instance of MonadIO. So introduce a type class for valid return types:

12

classPrintfTypetwherespr::String->[UPrintf]->t

The spr function takes a format string and a list of suitable arguments to be interpolated into the string, and returns something. The key is in what the valid return types are. Certainly, it makes sense to return a string by calling uprintf (we reverse the argument list, for reasons that will become clear in a bit)

Here’s the cool part. If a is an instance of PrintfArg and t is an instance of PrintfType, then a function a -> t is also a valid instance of PrintfType – you should picture it as consuming the a, and outputting another valid return type (which might be another function).

This is one way to implement polyvariadic functions in Haskell. Functions are curried by default so it’s easy to incrementally consume arguments, and return a function that consumes the rest of the arguments.

This is why we need to reverse the argument list when we make the final call to uprintf – the arg list acts like a stack, so that the last arguments pushed onto it are the first ones out. We need to reverse the stack at the end so that we format the arguments in the right order.

Now to write printf, just call spr with an empty argument list – the supplied arguments will be consed onto the list by spr until there are none remaining, and uprintf is called:

12

printf::PrintfTypet=>String->tprintfformatString=sprformatString[]

The rest is just details. The uprintf function incrementally consumes the format string, building the output up as it goes. If it exhausts either the format string or the argument string while the other still has elements remaining, then it throws an error. Otherwise it watches for the % signs that signal a value to be interpolated, and calls the correct formatting function:

123456789

uprintf::String->[UPrintf]->Stringuprintf""[]=""uprintf""_=fmterroruprintf('%':_:_)[]=argerroruprintf('%':c:cs)(u:us)=fmtcu++uprintfcsusuprintf(c:cs)us=c:uprintfcsusfmterror=error"Reached end of format string with args remaining."argerror=error"Insufficient args for format string"

The fmt function looks at what type of value is to be interpolated, and makes a call to the appropriate function for converting the argument into a string:

Our simple module only allows for strings and integers to be interpolated into the format string, and doesn’t allow for complicated format strings like "%8d" or "%6.2f". These features can be implemented by adding more cases to fmt – the fundamental architecture that allows for polymorphic, polyvariadic functions is in place.

I wrote this as a comment on Hacker News before polishing it up to put it here – see original.

The question “How do you swap two variables without using a temporary variable?” is a favorite of unimaginative interviewers everywhere. If you haven’t seen it, the typical solution is three bitwise exclusive or (xor) operations in succession:

That looks a bit different - instead of performing the same operation three times, we performed an addition followed by two subtractions.

Can we describe all possible solutions? Sure - you just need the concept of a mathematical group. A group is a set of values G together with a binary operation (traditionally written as a filled dot · ) that satisfy four axioms:

Closure – for all values a and b in G their multiplication a · b is also in G.

Associativity – for all a, b and c in G you have (a · b) · c = a · (b · c).

Identity – there is a value e in G such that for every a in G, the equation e · a = a · e = a holds.

Inverses – for every a in G there is an inverse a-1 such that the equations a-1 · a = e = a · a-1 hold.

If the types of a and b form a group, then we can pull off the ‘xor trick’ like this (using the left arrow for assignment)

Note that it’s important that we write b-1 · a in the last assignment and not a · b-1 since you can’t in general assume that the group you’re working with is commutative.

Groups come up a lot in programming, so there are many ways to pull off the variable swapping trick:

It works when a and b are integers, since integers form a group where the operation is addition, the identity is 0 and the inverse of n is -n. Remarkably we don’t have to worry about integer overflow when adding two machine ints, since integers also form a group under addition modulo N.

It works when a and b have fixed-length binary representations, since bitstrings form a group where the operation is bitwise xor (the ^ operator), the identity is a string of 0s and every string is its own inverse (this last property is why when you do this variable swap trick with xor, you end up repeating the same operation three times).

It almost works when a and b are nonzero floats – it’s nearly (but not quite) true that nonzero floats form a group where the operation is multiplication, the identity is 1 and the inverse of x is 1/x. It’s not quite a group, because of issues like over- and underflow, NaN values and non-associativity of floating point multiplication.

It works when a and b are matrices of fixed dimensions, which form a group where the operation is addition, the identity is the zero matrix and the inverse is given by elementwise negation.

It almost works when a and b are square, nonsingular matrices of the same dimension, which form a group where the operation is matrix multiplication, the identity is the identity matrix and inverses are given by the matrix inverse (again, this only works modulo floating point weirdness).

It works when a and b are permutations, which are a group under composition, with identity given by the null permutation and inverses are ‘undo all the swaps’.

So there you go - a multitude of ways to swap two variables without using a temporary variable. With math!

Edit: It was pointed out to me by a commenter on Reddit that you don’t need the full properties of a group. A quasigroup is a set Q with three binary operations: multiplication *, left division \ and right division / which satisfy the following axioms:

Every group is automatically a quasigroup if we define a * b = a · b, a / b = a · b-1 and a \ b = a-1 · b. So quasigroups are a generalization of groups. If we have a quasigroup then the xor trick can be expressed as

You can follow through the assignments using the quasigroup axioms where appropriate to prove that this really doe swap the two variables.

It’s exciting that every Latin square defines a quasigroup, and complete solutions to Sudoku are examples of Latin squares. Therefore every problem of the type “swap two variables without using a temp variable” can be solved using a sufficiently large Sudoku.

I gave a talk on this subject at London Haskell in November 2012. A video of the talk is on YouTube and slides are on GitHub. This is Part 3 in a series; you can also read Part 1 and Part 2.

Last time I described recursive types, showed how they can be written as algebraic equations, and you you can solve the equations to deduce interesting facts about types. In this post I’ll explain what it means to do calculus with types.

Zippers

The regular Haskell list is a linked list. While they are simple to define and convenient to work with, they have disadvantages. If you want to append an element to the list of length n, it takes O(n) time. Accessing or modifying elements partway through the list is also slow. Linked lists don’t support efficient random access.

You could solve this with an array, but immutable arrays have their own problems. Data sharing is harder, for example, if you want the elements of the array to be contiguous in memory.

Fortunately, not all applications require random access. Sometimes you just want to keep track of your location within a list, with the ability to move around in the list, possibly modifying elements as you go.

This is exactly the functionality offered by a zipper, which is a data structure with a pointer to a distinguished location, called the focus.

List Zippers

You can write a zipper for any data structure, but I’ll focus on lists.

To store the pointer we could wrap the list up with an Int that stores the location currently in focus. That’s fraught with all kind of difficulties, though – and it doesn’t meet our requirements for efficient access to the element in focus.

Instead, we define a new type composed of the element in focus, coupled with a pair of lists - one of which contains the elements before the focus, and the other the elements after the focus. Schematically it looks like this:

The element in focus is in the center, and the two lists on either side. In Haskell we would write:

data Zipper a = Z [a] a [a]

For a zipper to be useful we need to define functions that move the focus left and right. For efficiency reasons we store the first list in reverse order – we need to efficiently add and remove elements from the end of the list, rather than the head:

Because a zipper is a product type, it’s possible to tease it apart, decomposing it into a single piece of data of paired with two lists. The two lists tell us where the focus is, and we complete the zipper by filling in the piece of data that’s in focus. Algebraically we have

$$ Z(a) = a \cdot L(a) ^ 2 $$

which is clearly equivalent to the previous expression. The first part of the product is just a piece of data, and we can think of the second as a data structure with a hole. When we ‘fill in’ the hole by pairing it with some data, we get a zipper. This ‘data structure with a hole’ goes by the name of a one-hole context. For the list zipper, this looks like:

The empty circle is a hole, representing ‘no data’ – it’s a placeholder for data. The type that contains no data is the unit type (), so we can say that the type of a hole is ().

There’s nothing special about list zippers – any zipper can be decomposed into a one-hole context, and a piece of data that fills the hole.

One-Hole Contexts for Tuples

I said that you can write a zipper for any data type, so let’s write some zippers for tuples – that is, types of the form \( a ^ n \) for some whole number n.

One-Tuples

The simplest case is a one-tuple, which is just a single piece of data. A zipper for a one-tuple is trivial - there’s only one location for data, so there’s only one focus – the zipper for a one-tuple a is just a.

But remember that a zipper is a piece of data paired with a one-hole context, so what is the one-hole context? There’s only one place to put the hole, and once we’ve put a hole in a single piece of data, we’re just left with the hole! Schematically, the pairing of the data with the one-hole context looks like this:

$$ ( a, \circ )$$

The type of the one-hole context has to be (), or 1, so that when we paired it with the data a, we get a back again.

Two-Tuples

In a two-tuple there are two possible locations for the hole. Remembering that a sum represents a choice, either the left constructor or the right constructor, the pairing of data and one-hole context looks like:

$$ \Big( a , \; (\circ, a) + (a,\circ) \Big)$$

Remembering that the type of the hole is 1, the type of the one-hole context is a + a, or 2a.

Three-Tuples

In a three-tuple there are three possible locations for the hole. The pairing of data and one-hole context looks like

$$ \Big( a, \;(\circ,a,a) + (a,\circ,a) + (a,a,\circ) \Big) $$

The type of the one-hole context is a² + a² + a², or 3a².

Finding the pattern

Perhaps you’ve already seen the pattern. In the left column are the types of n-tuples for n = 1, 2, 3 and in the right column are their corresponding zippers:

If you’ve studied any calculus, you will immediately spot the pattern – for an n-tuple of type \( a ^ n \), the type of the zipper is \( n a ^ {n-1} \), which is exactly the derivative of \( a ^ n \).

In fact, this pattern always applies! This was noted by Conor McBride in a paper, whose title gives the key insight:

“The Derivative of a Regular Type is its Type of One-Hole Contexts”

If you take a type expression and differentiate it, the result is the type of one-hole contexts for that type. This makes it easy to figure out how to write zippers for arbitrary data types.

Calculus of Types

I’ll use the notation \( \partial_a \) to mean ‘take the derivative with respect to a’. You may already be familiar with the notation \( d / da\) for differentiation – this is no different, but it has the advantage that it can be typeset on one line!

How far can we push the analogy between taking the derivative of a type and finding its type of one-hole contexts? Is it valid to differentiate a type in all contexts, or do we have to be careful? We should certainly check that the operation of differentiation makes sense when applied to the types we already know about, for example.

Constants

Here’s a simple rule from calculus – the derivative of any constant is zero:

$$ \partial_a({\rm const}) = 0 $$

To interpret this for types, remember that the derivative operator d/da makes holes in a data structure that contains data of type a. This equation is saying that if a type doesn’t have any data of type a, then its one-hole context is Void, i.e. you can’t create any instances of it.

Sums

The sum rule for derivatives is that for any f(a) and g(a), differentiating the sum is the same as differentiating individually and then summing:

$$ \partial_a (f(a) + g(a)) = \partial_a f(a) + \partial_a g(a) $$

This tells us that making a hole in the sum type is equivalent to making a hole in each of the summands, and taking their sum. In more explicit language, if F a and G a are types with one-hole contexts DF a and DG a, then the one-hole context of

type Sum a = Add (F a) (G a)

is

type DSum a = Add (DF a) (DG a)

Products

Interpreted as a statement about types, this tells us that making a hole in the product of two types is equivalent to either making a hole in the first type (and leaving the second one as it is) or making a hole in the second type (and leaving the first one as it is). That is, the one-hole context of

type Prod a = (F a, G a)

is

type DProd a = Add (DF a, G a) (F a, DG a)

Composition

We haven’t yet met composition. This occurs when you have a data structure F that holds elements of type G a. In Haskell, we could write this as

data Compose f g a = Compose (f (g a))

where f and g have kind * -> * and a has kind *, although its simpler to keep the syntax light and just write f (g a).

In calculus the chain rule tells us how to differentiate compositions:

As a type equation, this says that to make a hole in a layered data structure, we need a product – one half of the product tells us where the hole is in the outer structure, and the other half keeps track of the hole in the inner structure. That is, the one-hole context of

type Comp a = F (G a)

is

type DComp a = (DG a, DF (G a))

Deriving Zippers

I said that knowing that the derivative of an ADT corresponds to its type of one-hole contexts makes it easy to derive zippers for arbitrary data types. Now I’ll justify that assertion.

List Zippers

The type of lists is

$$ L(a) = \frac{1}{1-a} $$

We can differentiate this using the quotient rule and the chain rule, finding

$$ \partial_a L(a) = \frac{1}{(1-a) ^ 2} = L(a) ^ 2 $$

which says that the one-hole context of a list is a pair of lists, exactly as we saw earlier. To get the list zipper we take the product of an a with a pair of lists L(*a).

Tree Zippers

Remember that the type of binary trees T(a) is defined by the recursive equation

$$ T(a) = 1 + a \cdot T(a) ^ 2 $$

We could solve for T(a) as we did in the previous post, and differentiate the result. However, it’s much easier to use implicit differentiation to differentiate the above expression, to get

To make sense of this, remember that the type 1 / (1 - x) is synonymous with the type L(x), of lists of x. So the tree zipper is

$$ \partial T (a) = T(a) ^ 2 \cdot L\big(2 \cdot a \cdot T(a)\big) $$

That is, the one-hole context of a tree is a product of two trees, and a list holding elements of type (Bool, a, Tree a). It looks like this:

The hole is our current location, the focus. Beneath the hole are two trees, which hold the part of the tree that we haven’t yet traversed. In this picture they are both single-element trees.

The sequence of red nodes leading back to the top of the tree is the past - the elements of the tree that we traversed to end up at the hole. Each node is associated with a Bool, which tells us whether we took the left or the right branch after passing through that node, and a Tree a which contains all the elements that we missed by not going down the other path.

Exercise: Code up the zipper for binary trees, remembering that a zipper is a one-hole context paired with data. You’ll need to write functions left and right to take the left or right path at a particular node, and up to go back up the tree.

Exercise: Rose trees are trees with arbitrarily many branches from each node. In Haskell you would define them by

data Rose a = Rose a [Rose a]

What does a rose tree zipper look like?

Next Time

In this post we learned what it means to take the derivative of an ADT, and how to manipulate the type algebra to automatically derive zippers for arbitrary data types.

Next time I’ll talk about assigning a meaning to subtraction and division of types – and I might get on to mentioning combinatorial species.

See how we’ve replace the data declaration with a type declaration? That means that Maybe a is no longer a new type – it’s just a synonym for a type we already knew about.

But we can take it further. Notice that Nothing only has one value, so it’s equivalent to (). Similarly, Just is a container with a single value of type a, so it’s equivalent to a. We therefore have

type Maybe a = Add () a

But that’s the same as saying that Maybe a is the same as 1 + a. All that Maybe does is add one more possible value to a type.

Recursive Types

Lists

The basic list in Haskell is a linked list. A list of as is either empty, denoted [], or it is a cons of a single a onto another list of as, denoted a : as. If we wanted to define lists ourself, we could write something like

data List a = Nil | Cons a (List a)

Let’s take a second to look at the structure of this declaration. Just like Maybe, the List type is the sum of two simpler types.

The first summand is Nil, a nullary constructor which is equivalent to ().

The second summand is Cons a (List a), which is a product consisting of an a and a list of as.

The algebraic form, if we write lists as L(a), is

$$ L(a) = 1 + a \cdot L(a) $$

It seems as though we should be able to write the list type in Haskell as

type List a = Add () (a, (List a))

but in fact that won’t compile. The reason is that type synonyms are expanded out at compile time, after type checking, but before compilation. This definition would never finish expanding – it would just keep growing as

Add () (a, Add () (a, Add () (a, ...)))

and so on. The reason that this doesn’t work is to do with the way that Haskell handles recursive type definitions (like most languages with sophisticated type systems, it uses isorecursive rather than equirecursive types).

The way around it is to use a newtype declaration instead of a type declaration, and wrap the type inside a new constructor L:

newtype List a = L (Add () (a, List a))

This is just to satisfy the type checker – when the code is compiled the additional constructor will be optimized away, and what we’re left with will be identical to the type declaration above.

If I write the type List a as L(a) then this declaration for lists says that

$$ L(a) = 1 + a \cdot L(a) $$

To see what lists really are, we can start expanding the definition by repeated substitution. In the expansion, whenever I see a type of the form a · a, I’ll replace it with a² to save space.

Exercise: the expression a² can denote a pair (a, a) or a function 2 → a. Show these are the same.

This is telling us that a list of a is either the empty list, or it’s a list containing a single a, or a list of two a, or three a, etc. Maybe you already knew that - but it’s neat that the algebra is telling us!

But now, here is something really cool. Let’s start with our equation for lists, and instead of repeatedly substituting, we’ll forget for a moment that the objects are types and pretend that they’re arbitrary mathematical expressions. That means we’re justified in using whatever techniques we like to “solve for L(a)”.

First subtract a · L(a) from both sides:

$$ L(a) - a \cdot L(a) = 1 $$

Now factor out the L(a) on the left-hand side:

$$ (1 - a) \cdot L(a) = 1 $$

And finally, divide both sides by 1 - a:

$$ L(a) = \frac{1}{1-a} $$

This looks nonsensical, because we don’t know what it means to subtract one type from another, and we certainly don’t know what it means to divide one type by another. But is it telling us something interesting anyway?

If you studied calculus, you may remember that many functions have an expansion as a Taylor Series. We can ask Wolfram Alpha what the Taylor series of 1 / (1 - a) is, and it tells us:

$$ L(a) = 1 + a + a ^ 2 + a ^ 3 + a ^ 4 + \cdots $$

That is, the series expansion of L(a) is exactly what we deduced from our repeated substitutions. Although we abused the algebra horribly, by performing operations that are completely unjustified on types, we still ended up with a sensible answer.

Trees

Consider the type of binary trees with values at the nodes. In Haskell you can write this as

data Tree a = Empty | Node a (Tree a) (Tree a)

In a way that is probably becoming familiar, we can look at this as a sum of two types - a nullary type equivalent to (), and a product type. This time it is a product of three terms, but that’s not a problem - we just use a nested product, of the form (a, (b, c)).

The definition in terms of operators that we already know about (again, using a newtype to allow us to have a recursive definition) is

newtype Tree a = T (Add () (a, (Tree a, Tree a)))

In algebraic language, with T(a) for the type of trees, we would write

$$ T(a) = 1 + a\cdot T(a)^2 $$

To get a better handle on what trees are, we could apply repeated substitution in the same way we did for lists, but it would be much messier. Instead, can we apply the trick of rearranging the equation to solve for T(a)?

First, rearrange to get everything on one side of the equation:

$$ a \cdot T(a)^2 - T(a) + 1 = 0 $$

We can now recognize this as a quadratic equation for T(a), which we can solve using the quadratic formula, to get

$$ T(a) = \frac{1 - \sqrt{1 - 4 a}}{2a} $$

This makes even less sense than the equation for lists. What on earth does it mean to take the square root of a type? But, keeping calm, we again ask Wolfram Alpha for the series expansion, and it tells us

$$ T(a) = 1 + a + 2a ^ 2 + 5a ^ 3 + 14a ^ 4 + \cdots $$

How do we make sense of this? The first two terms tell us that a tree can either be empty (if it is Empty) or it can have one value of type a (if it is Node a Empty Empty).

Then the next term tells us that a tree can contain two values of type a, in two different ways, while the next term tells us that there can be three values of type a in five different ways.

But that’s exactly what we notice if we enumerate all of the binary trees, grouping them by the number of values they contain. There is one tree with no values and one tree with one value. Then there are two trees with two values and five trees with three values – as in this picture (borrowed from the excellent book Analytic Combinatorics by Flajolet and Sedgewick).

The equation counts the number of different binary trees that can exist! This counting property is related to the simple value-counting examples we saw in the first post, and also related to combinatorial species, as written about extensively by Brent Yorgey. Species and types have a lot in common, although they are not the same thing.

Seven Trees In One

If we restrict to trees that contain only the unit type, i.e. Tree (), then the a appearing in T(a) is equal to 1, and we can write the defining equation of trees as

$$ T = 1 + T^2 $$

By playing with the algebra, and repeatedly using the fact that \( T ^ 2 = T - 1 \), we can deduce that

which, when interpreted in the language of types, says that a six-tuple of trees is equivalent to the unit type. In other words, there is only one six-tuple of trees.

This is clearly nonsense, so what’s gone wrong? Even more mysterious is that if we just multiply both sides of the equation by T, we get

$$ T ^ 7 = T $$

which isn’t nonsense - it says that a seven-tuple of trees is equivalent to a single tree.

At first glance this isn’t a profound result. Any two types that have a countably infinite number of possible values are equivalent - that’s what being countable means.

However, it’s more subtle than this. In the paper Seven Trees In One, Andreas Blass not only shows how to find an explicit map from seven trees to one and back again (essentially he tells you how to write the from and to functions from the previous post), but he shows that the functions never have to look more than four levels deep into any of the trees.

Finally, he explains why deducing that \( T ^ 7 = T \) is valid, but \( T ^ 6 = 1 \) is not. It turns out that you can turn a derivation like the one I gave above, which uses subtraction, into an ‘honest’ proof that doesn’t use subtraction, and hence is valid for types – but only when the power of T that you start with is one more than a multiple of 6. Hence you can reduce \( T ^ 7 \) to \( T \) (because 7 = 6 + 1) but you can’t reduce \(T ^ 6\) to 1.

An Explanation?

It would be fair if you come away thinking that this post has raised more questions than it has supplied answers. For example,

What does it mean to subtract, divide or take square roots of types?

What else you can do to a type equation that still has a meaningful computational interpretation?

As a teaser for the last question, a mathematical structure that models types is the semiring. That’s just a way of saying that you can add and multiply objects, and you have objects corresponding to 0 and 1. There is a paper by Marcelo Fiore and Tom Leinster that shows that if you can start with a complex numbert defined by the equation

$$ t = p(t) $$

for some polynomial p, and deduce

$$ q_1(t) = q_2(t) $$

where q1 and q2 aren’t constant (they can’t just be 1) then the same result is true for semirings. In particular, it is true for types, and you can find a proof that doesn’t use subtraction, or any other operators that are illegal for types.

Next time, I’ll explain what zippers are, and describe how to do calculus with types.

I gave a talk on this subject at London Haskell in November 2012. A video of the talk is on YouTube and slides are on GitHub.

In this series of posts I’ll explain why Haskell’s data types are called algebraic - without mentioning category theory or advanced math.

The algebra you learned in high school starts with numbers (e.g. 1, 2, 3 …) and operators (e.g. addition and multiplication). The operators give you a way to combine numbers and make new numbers from them. For example, combining 1 and 2 with the operation of addition gives you another number, 3 - a fact that we normally express as

$$ 1 + 2 = 3 $$

When you get a little older you are introduced to variables (e.g. x, y, z …) which can stand for numbers. Further still, and you learn about the laws that algebra obeys. Laws like

$$ 0 + x = x $$
$$ 1 \cdot x = x$$

which hold for all values of x. There are other laws as well, which define properties of numbers or of operations.

When mathematicans talk about algebra, they mean something more general than this. A mathematical algebra has three parts:

Objects are the ‘things’ of the algebra. The collection of objects defines what we’re talking about.

Operations give us ways to combine old things to make new things.

Laws are relationships between the objects and the operations.

In high school algebra the objects are numbers and the operations are addition, multiplication and friends.

The algebra of Haskell types

In the algebra of Haskell types, the objects are types, for example Bool and Int. The operators take types that already exist, and generate new types from them. An example is the type constructor Maybe. It’s not a type itself, but you use it to create types - for example Maybe Bool and Maybe Int, which are types. Another example is Either, which creates a new type from two old types - for example Either Int Bool.

Counting

A connection to the more familiar algebra of numbers can be seen by counting the possible values that a type has. Take Bool, defined by

data Bool = False | True

There are two values that an object of type Bool can have – it is either False or True (technically it could also be undefined – a fact that I’m going to ignore for the rest of the post). Loosely, the type Bool corresponds to the number ‘2’ in the algebra of numbers.

If Bool is 2, then what is 1? It should be a type with only one value. In the computer science literature such a type is often called Unit and defined as

data Unit = Unit

In Haskell there is already a type with only one value - it’s called () (pronounced “unit”). You can’t define it yourself, but if you could it would look like

data () = ()

Using this counting analogy, Int corresponds to the number \(2 ^ {32}\), as this is the number of values of type Int (at least, it is on my machine).

Addition

In principle we could types corresponding to 3, 4, 5 and so on. Sometimes we might have a genuine need to do this - for example, the type corresponding to 7 is useful for encoding days of the week. But it would be nicer if we could build up new types from old. This is where the operators of the algebra come in.

A type corresponding to addition is

data Add a b = AddL a | AddR b

That is, the type a + b is a tagged union, holding either an a or a b. To see why this corresponds to addition, we can revisit the counting argument. Let’s say that a is Bool and b is (), so that there are 2 values a and 1 value for b. How many values of type Add Bool () are there? We can list them out:

addValues = [AddL False, AddL True, AddR ()]

There are three values, and 3 = 2 + 1. This is often called a sum type. In Haskell the sum type is often called Either, defined as

data Either a b = Left a | Right b

but I’ll stick with Add.

Multiplication

A type corresponding to multiplication is

data Mul a b = Mul a b

That is, the type a · b is a container holding both an a and a b. The counting argument justifies the correspondence with multiplication - if we fix a and b to both be Bool, the possible values of the type Mul Bool Bool are

There are four values, and 4 = 2 x 2. This is often called a product type. In Haskell the product is the pair type:

data (,) a b = (a, b)

but I’ll stick with Mul.

Zero

Using addition and multiplication we can generate types corresponding to all the numbers from 1 upwards - but what about 0? That would be a type with no values. It sounds odd, but you can define such a type:

data Void

Notice that there are no constructors in the data definition, so you can’t ever construct a value of type Void – it has zero values, just as we wanted!

Laws in the algebra of Haskell types

What are the laws for the types we’ve just defined? Just like in the algebra of numbers, a law will assert the equality of two objects - in our case, the objects will be types.

However, when I talk about equality, I don’t mean Haskell equality, in the sense of the (==) function. Instead, I mean that the two types are in one-to-one correspondence – that is, when I say that two types a and b are equal, I mean that you could write two functions

from :: a -> b
to :: b -> a

that pair up values of a with values of b, so that the following equations always hold (here the == is genuine, Haskell-flavored equality):

to (from a) == a
from (to b) == b

For example, I contend that the types Bool and Add () () are equivalent. I can demonstrate the equivalence with the following functions:

I’ll use the triple equality symbol, ===, to denote this kind of equivalence between types.

Laws for sum types

Here are two laws for addition:

Add Void a === a

which says that there are as many values of type Add Void a as there are of type a, and

Add a b === Add b a

which says that it doesn’t matter which order you add things in. These laws are probably more familiar to you in the algebra of numbers as

$$ 0 + x = x $$
$$ x + y = y + x $$

If you fancy an exercise, you can demonstrate the correctness of the laws in the Haskell algebra – either with a counting argument, or by writing the functions from and to.

Laws for product types

There are three useful laws for multiplication:

Mul Void a === Void

which says that if you multiply anything by Void, you get Void back,

Mul () a === a

which says that if you multiply by () you don’t change anything, and

Mul a b === Mul b a

which says that it doesn’t matter which order you multiply in. The more familiar forms of these laws are

$$ 0 \cdot x = 0 $$
$$ 1 \cdot x = x $$
$$ x \cdot y = y \cdot x $$

Two more exercises: (i) prove the validity of these laws in the Haskell algebra, and (ii) explain why we don’t need laws of the form

Mul a Void === Void
Mul a () === a

There’s also a law that relates the addition and multiplication operators:

Mul a (Add b c) === Add (Mul a b) (Mul a c)

This one is a bit trickier to reason about, but writing the corresponding from and to functions isn’t too hard. The arithmetic version of this law is the friendlier-looking

$$ a \cdot (b + c) = a \cdot b + a \cdot c $$

called the distributive law.

Function types

As well as concrete types like Int and Bool, in Haskell you also have function types, like Int -> Bool or Double -> String. How do these fit into the algebra?

To figure this out we can go back to the counting argument. How many functions of type a → b are there?

Let’s be concrete, and fix a and b to both be Bool. The value False can map to either False or True, and similarly for the value True - thus there are \(2 \cdot 2 = 2 ^ 2 = 4\) possible functions Bool -> Bool. To be really explicit, we could enumerate them:

A common question amongst people learning Haskell is whether I/O is pure or not. Haskell advertises itself as a purely functional programming language, but I/O looks like it’s inherently impure - for example, the function getLine, which gets a line from stdin, returns a different result depending on what the user types:

Prelude> x <- getLine
Hello
Prelude> x
"Hello"

How can this possibly be pure?

In this post I want to explain exactly why I/O in Haskell is pure. I’ll do it by building up data structures that represent blocks of code. These data structures can later be executed, and they cause effects to occur - but until that point we’ll always work with pure functions, never with effects.

Let’s look at a simplified form of I/O, where we only care about reading from stdin, writing to stdout and returning a value. We can model this with the following data type:

123

dataIOActiona=Returna|PutString(IOActiona)|Get(String->IOActiona)

That is, an IOAction is one of the following three things:

A container for a value of type a,

A container holding a String to be printed to stdout, followed by another IOAction a, or

A container holding a function from String -> IOAction a, which can be applied to whatever String is read from stdin.

Notice that the only terminal constructor is Return – that means that any IOAction must be a combination of Get and Put constructors, finally ending in a Return.

Some simple actions include the one that prints to stdout before returning ():

1

puts=Puts(Return())

and the action that reads from stdin and returns the string unchanged:

1

get=Get(\s->Returns)

To build up a language for doing I/O we need to be able to combine and sequence actions. We want the ability to perform an IOAction a followed by an IOAction b, and return some result.

In fact, we could have the second IOAction depend on the return value of the first one - that is, we need a sequencing combinator of the following type:

1

seqio::IOActiona->(a->IOActionb)->IOActionb

We want to take the IOAction a supplied in the first argument, get its return value (which is of type a) and feed that to the function in the second argument, getting an IOAction b out, which can be sequenced with the first IOAction a.

That’s a bit of a mouthful, but writing this combinator isn’t too hard. When we reach the final Return, we apply the function f to get a new action. For the other constructors, we keep the form of the action the same, and just thread seqio through the constructor:

It will surprise no one to learn that IOAction is a monad. In fact we’ve already defined the necessary bind operation in seqio, so getting the Monad instance is trivial:

123

instanceMonadIOActionwherereturn=Return(>>=)=seqio

The main benefit of doing this is that we can now sequence actions using Haskell’s do notation, which desugars into calls to (>>=), and hence to seqio. Our earlier hello example can now be written as:

123456

hello2=doput"What is your name?"name<-getput"What is your age?"age<-getput("Hello, "++name++"!")put("You are "++age++" years old!")

Remember though, that this is still just defining a value of type IOAction () - no code is executed, and no effects occur! So far, this post is 100% pure.

To see the effects, we need to define a function that takes an IOAction a and converts it into a value of type IO a, which can then be executed by the interpreter or the runtime system. It’s easy to write such a function just by turning it into the approprate calls to putStrLn and getLine.

You can now load up GHCi and apply run to any action – a value of type IO a will be returned, and then immediately executed by the interpreter:

*Main> run hello
What is your name?
Chris
What is your age?
29
Hello Chris!
You are 29 years old

Is there any practical use to this?

Yes - an IOAction is a mini-language for doing I/O. In this mini language you are restricted to only reading from stdin and writing to stdout - there is no accessing files, spawning threads or network I/O.

In effect we have a “safe” domain-specific language. If a user of your program or library supplies a value of type IOAction a, you know that you are free to convert it to an IO a using run and execute it, and it will never do anything except reading from stdin and writing to stdout (not that those things aren’t potentially dangerous in themselves, but…)