2008-08-15

As Haskell gains popularity, a dozen articles or more a month about them make the front page at the Programming reddit. Most deal with the necessity of using and understanding monads to control side-effects in Haskell. This is only natural and to be expected since, in Haskell, they're a language design decision, and as such as inescapable as laziness and as structural as the predefined operator precedence.

My background is in OCaml (gratuitous biographical aside: I found a notebook with OCaml code I wrote in '96), and so I view the IO and attendant monads with some disdain. Let me explain myself: my view is that using them to control side-effects is a rather pedestrian application of monads.

There's another, much higher-level view of them, and that is monads as categorical constructions. Category theory, by its very nature as a mathematical framework, is a language in which to express relations and operations between things. It's the mathematical equivalent of the GoF's patterns. So, reading an article like this one left me with the slightly uneasy feeling of having simultaneously not understood much of it (because I don't know Haskell, and because the article has no context to speak of) and knowing that an isomorphism between the IO monad and anything else cannot allow you to "weave in and out of using IO": freeness demands that the "other" side of the isomorphism contain, encode or fake enough of IO so as to be undistinguishable from it (the type of IO' as presented in the article lent me evidence of this). This is where category theory comes handy: understanding by pattern-matching.

The practical upshot of this is that it enables a third view of monads, Moggi's, as capsules of meaning that are, if not composable, certainly pluggable. First in Notions of computation and monads, then in Monads and Effects (whence IO and friends), Moggi shows that monads are the right tool for the (hard) job of making your code independent of, or at least parameterizable by, semantics. Monads are plug-ins.

In what way are monads plug-ins? Well, a plug-in must have two things: specific behavior that you can invoke to augment your program in unforeseen ways even after you've written it, and a standard interface by which the program can call it. Monads have a rather minimalist interface:

First comes a type, α m: this is the "protocol" by which your program and the plug-in communicate without one having to "know too much" (or more than the standardized interface allows) about the other. It lets the monad say, in effect: "for me to be arbitrary but compatible, you'll have to accept that I do things my way". The monad gets to have its "private space" in which to do its job, without fear that the program will mess up its stuff.

Then comes the method return, that the program uses to pass values down to the monad. It lets the code say, "here, have this value for your own use". Finally, there's the operation bind that lets the program and the monad maintain a conversation. With it, the program can say to the monad "oh, when you are done with anything, here's another thing I need you to do", to which the monad responds "OK, I'll take that and give you back the result". This "wiring" aspect to the bind is emphasized in it being usually called >>=, an infix operator.

This is all nice, operational and anthropomorphizing, but hardly illuminating. The fact that the monad functions as a protocol is formalized in the monad laws which every concrete, "plug-in" monad must abide to:

bind (return a) k ≡ ka

bind m return ≡ m

bind (bind mk) h ≡ bind m (fun x -> bind (kx) h)

The first law states that return on the left of bind does nothing. The second law states that return on the right of bind does nothing. The third law means that a bind of binds associates freely. But that's exactly what a monoid is: a "thing" with an associative operation that has a neutral, "do-nothing" element on the left and right of it.

Consider by way of example the concatenation of strings, denoted by ^. There is a special string, "" or the empty string, that "doesn't change" a string when catenated to it: for all strings s, "" ^ s is the same as s, and so is s ^ "". Also, it doesn't matter which string you concatenate before and which after; it's the order in which you concatenate that matters. In symbols, s ^ (t ^ u) is the same as (s ^ t) ^ u.

You might think, what have monoids to do with monads? The answer is that monads "behave" algebraically as monoids. You might think next, aren't monads monoids, then? Well, no. The crucial difference between a monoid and a monad is that, packed inside the return and the bind operations of a monad there is meaning tucked in. It could send output to a console. It could update a memory location. It could try simultaneously different possible computations and return any, or all of them. It could record transactional data. The key is that the monad laws are a contract that allow you, no matter how complicated the "stuff" the monad does underneath, operate with it in a structured, predictable way. Arbitrary behavior under a fixed interface. A plug-in, in short.

If you followed me so far you might have a nagging doubt by now. How do you "escape" the monad, once you use it? That is, return serves to put plain old values in a "wrapper" the monad can use but, how can you get back what the monad did with it? Where's the function with type α m → α? The bad news is that there isn't any… in general. That is, there is no guaranteed way to get back a value from an arbitrary monad. However, many useful monads do provide an escape to "retrieve" the useful work the monad did.

At this point I should show a concrete monad. A classical example (one that I'll eventually need to use) is the state monad. It allows to "plug-in" statefulness to an otherwise "pure" or "stateless" function:

This monad is multi-parametric, in the sense that not only the values your function computes are arbitrary (the α in the monad's type), but the states you might want to track are too. Suppose you want to tally the arithmetic operations a function uses to calculate a value. The state can be as simple as a single integer, or as complex as a tuple recording separate counts for additions and subtractions, multiplications, divisions, square roots and so on. Whichever the case, the key to this monad is that an impure function can be split into a pure part and a state transformer part that takes an input state and returns an output state together with the value of the function. With that, it's not difficult to see that if return is to be a neutral element, it must pass its input state unchanged. In other words, the effect of a value is nil. In yet other words, a value is always pure. Also, bind must keep track of intermediate states between effects, and thread them appropriately.

Immutable state is no state at all but just another value, so this monad comes with a pair of operations get and put that allow your program to retrieve the state at some point of your function, and modify it. Also, after your computation terminates, you should be able to retrieve not only its final value with eval but also to access the final state with run. This variant assumes you'll want one or the other; it is possible to get both at once without having to re-compute the function.

Note that get is a monadic value, not a function. This might not make sense unless you look into the monad and see how it is constructed. For a less operational view, think of get as representing, by itself, whatever the state is at that point in the program. Since it's a monadic value, the only useful thing to do with it is to bind it to a function:

This is all quite tedious to repeat each time I need a monad. Fortunately, most of the operations you can build out of monadic bind and monadic return are structural, and so generic on the particular monad. Enter the Monad functor:

seq allows you to invoke the monad on the left not for its value but for its effect. join lets you "squash down" a monadic monad to the underlying monad; the fact that it is generic and independent of the underlying semantics means that the monad itself, viewed as a categorical functor, is idempotent, that is, it's useless to monadize a monad. fmap "maps" a function inside a monad, which effectively makes a functor out of a monad. liftmn is an indexed family of mappers for functions of one, two… arguments, I've stopped at three but you could go on as you need.

A list of monads can be composed together to turn them "inside out" into a monad of lists. That is what mapm and sequence do, commuting the monad and the list, with or without transforming the list's elements first. Also, a list of monads can be evaluated in sequence purely by their side effects, that is where the mapm_ and sequence_ come in handy (the names of these functions are exactly the same as that of their Haskell counterparts).

Lastly, the two very common operations return and seq have very convenient operators associated with them. With all this, I can write the following:

This monad interleaves formatted writing operations, fmt, as a side effect of whatever the code using it does. What is special of this monad is that the "escape" function, write ships out all the accumulated shows to a named output file, safely closing it even in the event of an exception.

With this, I'm ready to show you how to actually consume plug-ins from your program. Suppose you want to have a simple graphics DSL. I have a program with exactly such a need, and as a MacOS X user, the easiest way I found to satisfy it was to generate EPS and to use /usr/bin/open to handle the data for me. But I didn't want to encapsulate (!) my drawing code inside a PostScript interface, but needed it to be as "abstract" as possible (my original back-end was a Mathematica one). I settled on a simple hybrid:

The module is not parameterized by a monad, but it does include a monad that will provide the "glue" to whatever back-end I want to write. As you can see, it is rather primitive but includes operations for drawing in color. It is worth mentioning that this is not a so-called retained interface, but an immediate one, and so save runs the drawing operations on a "sub-context" that is restored when all the commands are shipped out. Also, the type that carries the drawing operations is an alias for the unit monadic value, as drawing doesn't really have a value. My first implementation was the EPS back-end:

In this case, all drawing is to be output to a file, so the monad that implements the plug-in driver is Out. The code uses fmt extensively to format the PostScript commands. For instance, an EPS file must specify a bounding box against which all drawing will be clipped to:

The monad is used purely for its side-effects, namely writing to a file, so I don't bind, I always seq. After the drawing is complete, I actually want to see it on screen; for that, I write a temporary EPS file that I pass to /usr/bin/open, which in turn opens Preview, which converts the file to PDF and shows it:

Polygons are built out of a sequence of points. The first one is the starting point and must be passed to moveto, the rest are to be arguments of linetos. A mapping would do, except for the fact that the code is in monadic style, and must use mapm_:

Success! A decagon with fat vertices. The only problem with an EPS back-end is that it is difficult to get the optimal width and height for an image. In other words, it would be ideal to have some auto-scaling. For that, I have to know the size of a drawing before turning it into PostScript code. Enter the Bounds back-end:

I keep track of the bounding box of the commands seen by the interface, as a rectangle given by two pairs of coordinates. The initial, empty bounds are such that I can accumulate a point simply by a series of minimum and maximum operations:

This allows me to scale up the drawing to fit the boundaries of the EPS file, or to adjust the size of the EPS file so that no space is wasted. The important thing is that the code that contains the drawing commands is left unchanged. Monads are plug-ins.

4 comments:

Anonymous
said...

I read this bit...

Then comes the method return, that the program uses to pass values down to the monad. It lets the code say, "here, have this value for your own use". Finally, there's the operation bind that lets the program and the monad maintain a conversation. With it, the program can say to the monad "oh, when you are done with anything, here's another thing I need you to do", to which the monad responds "OK, I'll take that and give you back the result". This "wiring" aspect to the bind is emphasized in it being usually called >>=, an infix operator.

...and had a bit of an "aha" moment. It seems like a monad (by this description - I'm no expert) is like a fork or a pipe. You open up a pipe to another process and you can communicate with it via it's handle. Is that a good analogy?

About Me

Long-time software developer and architect by day, specializing in boring-but-crucial bits of interconnect between disparate architectures. OCaml is my language of choice, but I feel extremely comfortable with C, Java and SQL whenever the need arises.

By night, I'm a calligrapher and a seeker of the hidden light of understanding in the noise of daily living.