Why Do Monads Matter?

(A Side Note: I’ve been formulating the final thoughts on this post for about a week now. In an entirely unrelated coincidence, a good friend of mine and fellow Haskell programmer, Doug Beardsley, ended up writing two postsabout monads over the weekend as well. Weird! But don’t fret; this isn’t really the same thing at all. I’m not writing to teach Haskell programmers how to use monads. I’m writing about a kind of intuition about why these concepts turn out to matter in the first place. You won’t find much here by way of how to program in Haskell.)

Category Theory for Software Development?

Match made in heaven? Or abstraction distraction?

If you’re a software developer, have you heard about monads and wondered what they were? Have you tried to learn Haskell, and struggled with them? Wondered why people worry so much about them? Have you watched the videos from Microsoft’s “Channel 9″ and heard a bunch of researchy Microsoft folk talk about them, but had trouble relating them to your day-to-day programming experience?

Or if you’re interested in mathematics, have you heard murmurs in the past about how category theory interests computer science people? Looked for some clear statement of why we care, and what problems we might be interested in? Wondered if it’s really true at all? Perhaps you are like a friend of mine (and a first-rate algebraist, too, so it’s entirely reasonable to have these questions) who asked me about this a year or so ago, remembered hearing a lot of excitement in the early 90s about category theory and computer science, but never heard whether it had really panned out or was a dead end?

These are the kinds of questions I begin with. My goal is to demonstrate for you, with details and examples:

Where category-based intuition and ideas, and monads in particular, come from in computer programming.

Why the future of programming does lie in these ideas, and their omission in today’s mainstream languages has cost us dearly.

What the state of the art looks like in applying category-based ideas to problems in computer programming.

If you’re coming into this without a knowledge of category theory, never fear; this may be one of the gentlest introductions to the idea of categories and monads that you will find. But you’ll want to slow down and take a moment to understand the definition of a category and related ideas like function composition; these are absolutely crucial. Then you want to completely skip or just skim through the section called “What’s This Got To Do With Monads?” where I tell you how what we’re talking about here relates to the traditional math meaning of monads. Don’t worry, you don’t need to know that at all.

On the other hand, if you’re a mathematician, you may want to skim the bits where I review basic category theory, and just dig in where I am talking about the computer programming perspective. Just be forewarned, my introduction to monads will be via Kleisli categories, so take a minute when we get to that part and make sure you’re familiar with how the relationship works out.

Ready? Here goes!

Computer Programming and Functions: A Tenuous Relationship

Quick quiz: Do computer programmers use functions?

Ask any computer programmer you know, and you will hear: YES! Functions are some of the most basic tools computer programmers use. Then you’ll get odd looks, for asking such a silly question. Of course computer programmers use functions. That’s like asking if carpenters use nails! Right?

The truth, though, is a bit more complicated. To a mathematician, a function is just an association of input values to output values… and that is all! Any two functions that associate the same input values to the same output values are the same. Yes, you can represent functions by formulas (sometimes, anyway), but you can also represent them with just tables of inputs and outputs, or if they are functions between real numbers, as graphs. If you ask computer programmers for examples of functions, though, you will start hearing about some pretty bizarre things. I call these the “I must have skipped that day of calculus” functions. These are things that computer programmers are quite happy referring to as functions, but that to a mathematician are not really functions at all!

“Functions” that return randomly chosen numbers… and if evaluated several times, will give a different answer each time.

“Functions” that return one answer on Sundays, but a different answer on Mondays, yet another on Tuesdays, and so on.

“Functions” that cause words to appear on some nearby computer screen every time you calculate their values.

What’s going on here? Most computer programmers go about their lives happily calling these things functions, but really they are something different. But wait a second! They do have quite a lot in common with functions. Namely, they have: (a) parameters, representing their domain; and (b) return values, representing their range. (Many computer programmers are happy to talk about functions that have no parameters, or no return values… but there’s no need to be overly picky here. We can just regard their domains and ranges as one-element sets, so that no actual information is conveyed, but we can keep up appearances.)

Even more importantly, these “functions” share one more thing with the functions of mathematicians: they are constantly being composed by taking the result from one function and passing it along as a parameter to another function. When I say composed, I mean it almost exactly in the basic mathematics sense of function composition: (f · g)(x) = f(g(x)). In fact, the whole reason our “functions” exist at all is to be composed with each other! Once upon a time, in the early days of computers, we liked to keep track of information by just sticking it in known places in the computer’s memory; but all this shared knowledge about where to find information made it hard to write parts separately and fit them together, so we mostly switched to this idea of functions and composition instead.

Here’s the executive summary so far:

When computer programmers talk about functions, they do not mean exactly what mathematicians do.

What they do mean is the idea of having inputs (domains), outputs (ranges), and most importantly composition.

Along Came The Category…

So in the previous section, we ended up with our hands full of things that sort of look like functions. They have domains and ranges, and they can be composed. But at the same time, they are not functions in the mathematics sense. Baffling? No, not really. Mathematicians deal with stuff like that a lot. They have a name for systems of function-esque things of exactly that form. That name is… cue the drumroll, please… CATEGORIES!

In math-speak, categories are:

collections of “objects” (you should think of sets),

and “arrows” (you should think of functions between sets),

where each arrow has a domain and a range,

each object has an “identity” arrow (think of the identity function, where f(x) = x)

and arrows can be composed when the domains and ranges match up right.

Before we agree to call something a category, we also throw in a few rules, such as if you compose any function with an identity, it doesn’t actually change, and composing functions obeys the associative property. These should be unsurprising, so if they seem strange to you, please take a moment, grab a pencil, and try working it out using the definition of function composition earlier: (f · g)(x) = f(g(x)), and simplifying.

The nice thing about categories is this: it’s not just some pointless abstraction that a bunch of mathematicians made up. Categories are defined that way because people have looked at literally hundreds of things that all look sort of like functions with domains and ranges and compositions. Things from algebra, like groups and rings and vector spaces; things from analysis, like metric spaces and topological spaces; things from combinatorics, like elements of partially ordered sets and paths in graphs; things from formal logic and foundations, like proofs and propositions. Almost without fail, they can be described using all the ideas we just looked at! In short, categories are the right intuition for talking about composing things with domains and ranges, which is exactly the situation we’re in.

The Four Horsemen of the Catapocalypse

Now you can see why categories come into the picture: they are the right intuition for things that maybe aren’t functions, but can be composed like functions. But just because a category exists doesn’t mean it’s worth talking about. What makes this worth talking about is that the category-related ideas aren’t just there, but actually express common concerns for computer programmers.

It’s now time to get a little more specific, and introduce the four examples that will guide us the rest of the way through this exploration. Each example highlights one way that the “functions” used by computer programmers might be different from the functions that mathematicians talk about. These examples represent actual kinds of problems that computer programmers have run into and solved, and we’ll look more at the practical side of them later. For now, we’ll just be happy getting familiar with the general ideas.

The First Horseman: Failure

The first problem is failure. Computer programmers do lots of things that might fail. Reading from files (they might not exist, or on a computer with more than one user, they might not be set to allow you to read them), talking over the internet (the network might be broken or too slow), even just doing plain old calculations with a large amount of data (you might run out of memory). Because of this, dealing with failure is a constant concern.

In general, in modern computer programming tools, it’s always understood that a function might fail. You may get an answer, but you also may get a reason that the task could not be completed. When that happens, programmers are responsible for dealing with it responsibly: letting someone know, cleaning up the leftover mess in computer memory from a half-complete task, and just otherwise putting the pieces back together. A major factor in programming techniques or tools is how easy they make it for programmers to cope with the constant possibility of failure.

The Second Horseman: Dependence

The second problem is dependence on outside information. While functions of mathematics are nice and self-contained, computer programmers often don’t have that luxury. Computer programs are messes of configuration. Even simple mobile phones have pages and pages of settings. What language does the user speak? How often should you save off a copy of their work? Should you encrypt communication over the network? Rare is the application today that doesn’t have a “Settings” or “Preferences” menu item. In many other contexts, too, computer programs depend on information that is a sort of “common knowledge” throughout the application, or some part of the application.

Ways of dealing with this have progressed through the ages. When everything was stored in well-known memory locations anyway, it was easy enough to just look there for information you need; but that led to problems when different parts of a program needed different information and sections of programs could step on each other’s toes. The massively influential technique known as object-oriented programming can be seen as partly an attempt to solve exactly this problem by grouping functions into a context with information that they depend on. The simplest and most flexible answer would be to just pass the information around to all the functions where it is needed… but when that’s a lot of places, passing around all those parameters can be very, very inconvenient.

The Third Horseman: Uncertainty

The third problem is uncertainty, also known as non-determinism. A normal function associates an input to an output. A non-deterministic function associates an input to some number of possible outputs. Non-determinism is less well-known than the first two problems, but possibly only because it hasn’t yet seen a convincing solution in a general purpose language! Consider:

Theoretical computer science talks about non-determinism all the time, because it’s the right approach for discussing a lot of computational problems, ranging from parsing to search to verification. That language just hasn’t made its way into the programming practice.

Non-determinism comes up when querying, searching, or considering many possible answers. These are precisely the places that programmers end up relying on a variety of domain specific languages, ranging from SQL to Prolog, and more recently language-integrated technologies like LINQ.

Even with specialized languages for heavy-duty querying and search tasks, we still end up writing a lot of our own nested looping and control structures for the purpose of looking through possibilities when it’s not worth crossing that language barrier. This kind of thing is responsible for some of the more complex code structures you find these days.

While the first two problems of failure and dependence are at least partly solved by current mainstream programming languages, non-determinism is as yet solved mostly by special-purpose sub-languages, with LINQ as the notable exception.

The Fourth Horseman: Destruction

Finally, the fourth problem is destruction. Evaluating a math-type function is observable only in that you now know the answer. But in computer programming, functions can have permanent effects on the world: displaying information, waiting on responses from other computers or people, printing documents, even quite literally exploding things, if they are running on military systems! Because of this, things that aren’t specified in mathematics, like the order in which evaluation happens, matter quite a lot here.

The destructive nature (by which we just mean having effects that can’t be undone) of computer programming functions has plenty of consequences. It makes programming more error-prone. It makes it harder to divide up a task and work on different parts simultaneously, such as you might want to do with a modern multi-core computer, because doing the parts in the wrong order might be incorrect. But at the same time, these destructive effects are in a sense the whole point of computer programming; a program that has no observable effects would not be worth running! So in practically all mainstream programming languages, our functions do have to cope with the problem of destruction.

Back To The Function

Now we’ve seen the faces of some problems we find in the computer programming world. We build software that might fail, has to deal with a ton of extra context, models non-deterministic choice, and sometimes has observable effects on the world that constrain when we can perform the computation.

It may now seem that we’ve left the nice and neat world of mathematical functions far behind. We have not! On closer inspection, we’ll see that if we can just squint hard enough, each of these quasi-functions can actually be seen as true, honest-to-goodness functions after all. There is a cost, though. To turn them into real functions, we need to change the range of those functions to something else. Let’s see how it works for each of our function types in turn:

Functioning With Failure

Our first example of pseudo-functions were those that might fail. It’s not hard to see that a function that could fail is really just a function whose results include two things:

successes, which are the intended possible results; and

failures, which are descriptions of why the attempt failed.

So for any set A, we’ll define a new set called Err(A) to be just A together with possible reasons we might have failed. Now a possibly failing function from a set A to a set B is really just an ordinary function from A to Err(B).

Functioning With Dependence

Our second type of pseudo-functions were those that depended on information that they got from the world around them: perhaps preferences or application settings. We play a similar trick here, but for a set A, we will define the set Pref(A) to be the set of functions from application settings to the set A. Now watch closely: a function in context from A to B is just an ordinary function from A to Pref(B). In other words, you give it a value from the set A, and it gives you back another function that maps from application settings to the set B.

As confusing as that might sound, a function whose range is another function is really just a function of two parameters, except that it takes its parameters one at a time! Take a minute to convince yourself of this. The conversion between these two equivalent ideas is sometimes called “currying”. So by changing the range of our function, we actually effectively added a new parameter, and it now receives the application settings as a parameter. Remember that except for being inconvenient (we’ll deal with that later), that’s exactly what we wished for.

Functioning With Uncertainty

This is perhaps the most obvious example of all. Our third type were those that represent non-determinism: instead of one specific answer, they have many possible answers. This is easy enough: for each set A, define P(A) to be the power set of A, whose members are themselves sets of values of A. Then a non-deterministic function from A to B is just an ordinary function from A to P(B).

Functioning With Destruction

Our final trick is to deal with functions that have destructive effects. Here we’ll need to be a bit more elaborate in constructing a new range: for each set A, we define IO(A) (standing for input/output, which captures the notion of effects that interact with the rest of the world). An element of the set IO(A) is a list of instructions for obtaining a member of A. It is not a member of A, merely a way to obtain one, and that procedure might have any number of observable effects.

Now we play the same trick and change the range: a destructive function from A to B is just an ordinary plain old mathematical function from A to IO(B). In other words, if you give me an A, then as a plain old function I can’t actually do the steps to get a B, but I can certainly tell you what they are.

But what about composition? It’s great to be back in the world of plain functions, but remember what got us here in the first place? We liked functions because we liked composition; but it seems we’ve now lost composition! If I have a possibly failing function from A to B, and another from B to C, well now I’ve turned them into functions from A to Err(B) and then B to Err(C). Those function domains and ranges don’t match up, and I can’t compose them!

Oh no…

Hold Your Horses, Heinrich Kleisli to the Rescue!

Well, all is not lost. I just haven’t yet told you how to compose these “special” functions.

Because some math dude found these things before us, we call our “special” functions by a name: Kleisli arrows. There are two things going on here at once, so keep your eyes open: first, Kleisli arrows are just plain old ordinary functions, but with weird-looking ranges. Since they are just functions, you can compose them as functions, and that’s just fine. But at the same time, they are “special”, and we can compose them as Kleisli arrows, too.

Remember what we decided earlier? The right way to think about composition is by talking about a category. Sets are a category, and that’s fine if you want plain function composition. But now we want a new kind of category, too. It’s called the Kleisli category. If you don’t remember what all the parts of a category are, take a second to review them. To define a category, I need objects, arrows, identities, and composition.

To keep things simple, the objects in this new category will be the same: they are just sets of things.

The arrows in this category are, unsurprisingly, the Kleisli arrows.

I haven’t told you yet what the identities and composition look like, so let’s do that next.

First, we look at failure. We’re given a failure Kleisli arrow from A to B, and one from B to C. We want to compose them into a Kleisli arrow from A to C. In other words, we have an ordinary function from A to Err(B), and a function from B to Err(C), and we want one from A to Err(C). Take a minute to think about what to do.

The central idea of error handling is that if the first function gives an error, then we should stop and report the error. Only if the first function succeeds should we continue on to the second function, and give the result from that (regardless of whether it’s an error or a success).

To summarize:

If g(x) is an error, then (f · g)(x) = g(x).

If g(x) is a success, then (f · g)(x) = f(g(x)).

To complete the definition of a category, we also need to decide about the identity Kleisli arrows. These are the ones that don’t do anything, so that if you compose them with any other Kleisli arrow, it doesn’t change the other one. Identities are functions from A to Err(A), and it turns out these are just the functions f(x) = x, just like for sets. Notice that means they never return an error; only a successful result.

I’ll run more briefly through the remaining three examples, but I encourage readers who aren’t yet clear on how this will work to write them out in more detail and use this opportunity to become more comfortable with defining a second category of Kleisli arrows.

Next we have Klesli arrows for dependence, which are functions from A to Pref(B). Recall that adding the Pref to the range is equivalent to adding a new parameter for the application preferences. The key idea here is that if I have two functions that both need to know the application preferences, I should give the same preferences to both. Then composing two of these Kleisli arrows just builds a new function that gets the extra preferences parameter, and passes the same one along to the two component parts. And identities? A Kleisli identity will get that extra preferences parameter, but will ignore it and just return its input anyway.

The Kleisli arrows for uncertainty, or non-determinism, are functions from A to P(B), the power set of B. The key idea for non-determinism is that at each stage, we want to try all possible values that might exist at this point, and collect the results from all of them. So the composition calculates the second function for each possible result of the first, then the resulting possibilities are merged together with a set union. The identities, of course, aren’t really non-deterministic at all, and just return one-element sets containing their input.

Finally, Kleisli arrows for destructive effects are functions from A to IO(B). The key idea here is to combine instructions by following them in a step-by-step manner: first do one, then the next. So the composition writes instructions to perform the first action, look up the second action from the result, and then perform the second action, in that order. A Kleisli identity here is just an instruction to do nothing at all and announce the input as the result. So for each of the four motivating examples, we created a new category, the Kleisli category.

These new categories have their own function-like things, and related ideas of composition and identities, that express the unique nature of each specific problem. By using the appropriate notion of composition in the right Kleisli category, you can solve any of these long-standing computer programming problems in a nice composable way.

And that’s why you should care about monads.

Monads?!? Oh yes, I should mention that we’ve just learned about monads. We simply forgot to use the word.

What’s This Got To Do With Monads?

This section is for those of you who want to know how the stuff we said earlier are related to monads as they are understood in mathematics. If you open Wikipedia, or most category theory textbooks, and look up monads, they won’t look very much like what we just did. You’ll see something about an endofunctor, and two natural transformation, and properties about commuting triangles and squares.

We haven’t talked about functors at all, much less natural transformations… so how could we have possibly learned about monads? It turns out there’s more than one way to describe monads. The one we’ve just gone through is an entirely valid one. The shifts we made to the ranges of our functions earlier — Err, Pref, P, and IO — are actually examples of monads. To make sure they are monads in the conventional math way, we’d have to work pretty hard: first, prove that they are functors. Then build two natural transformations called η and µ, and prove that they are natural. Finally, prove the three monad laws.

But wait, there’s an easier way! Heinrich Kleisli, whom we’ve already met from the categories earlier, pointed out that if you can build a category like the ones we did in the last section, whose arrows are just functions with a modified range, then your category is guaranteed to also give you a monad. That’s quite convenient, because as computer programmers, we tend to care a lot more about our Kleisli arrows than we do about a mathematician’s idea of monads. Remember, those Kleisli arrows are exactly the modified notion of functions that we were already using, long before we ever heard a word about category theory! And Kleisli tells us that as long as composition works the way we expect with our Kleisli arrows (namely, that it’s associative and the identities act like identities), then all that other stuff we’re supposed to prove to show we have a monad just happens for us automatically.

Still, it’s an interesting side question to look at the relationship between the two. I won’t give all the details, but I’ll give the structure, and then leave the interested reader with some familiarity with category theory to fill in the proofs of the relevant properties. We’ll use Err as our monad, just to pick a specific example, but nothing here is specific to Err.

We start with Err, which is already a map from objects to objects. But the traditional definition of a monad also requires that it be a functor. That is, given a function f from A to B, I need a way to construct a function Err(f) from Err(A) to Err(B). I do it as follows: in the underlying category (not the Kleisli category, just the category of sets), I find an identity function from Err(A) to Err(A). Then I find a Kleisli identity from B to Err(B). I compose that Kleisli identity in the underlying category with f, and get a function from A to Err(B). I can now do a Kleisli composition of the identity from Err(A) to Err(A) and the function from A to Err(B), and get a function from Err(A) to Err(B). That’s the one I’ll call Err(f).

Next, I need a natural transformation η, from the identity functor to Err. This is easy: the components of η are the Kleisli identities.

Finally, I need a natural transformation µ from Err² to Err. To get the component of µ at A, I take the identity functions in the underlying category from Err (Err A) to Err (Err A), and then from Err A to Err A, and I combine them with Kleisli composition to get a function from Err (Err A) to Err A. This is the component of µ.

The construction in the opposite direction is easier. Given a monad Errwith ? and µ, the Kliesli category is constructed as follows.

The identities are just the components of η.

Given a function f from A to Err(B) and a function g from B to Err(C), I compose the two as µ · Err(g) · f.

Again, the details and the proofs of the appropriate monad and category laws are left to the reader. I hope this brief aside has been useful. I now return to using the word “monad” but talking about monads via Kleisli categories.

Joining The Monadic Revolution

Once again, let’s pause to sum up.

Computer programmers like to work by composing some things together, which we call functions.

They aren’t functions in the obvious way… but they do make up a category.

Actually, they are functions after all, but only if you squint and change the ranges into something weirder.

The category that they form is called a Kleisli category, and it’s basically another way of looking at monads.

These monads / Kleisli categories nicely describe the techniques we use to solve practical problems.

It’s not just about those four examples, either. Those are typical of many, many more ideas about programming models that can be described in the same framework. I think it’s fair to sum up and say, at this point, that someone interested in studying and analyzing programming languages and models should be familiar with some ideas from category theory, and with monads in particular.

But still, what about the humble computer programmer, who is not designing a new language, is not writing research papers analyzing programming languages, but just wants to solve ordinary everyday problems? That’s a fair question. As long as monads remain just a mathematical formalism for understanding what computer programmers mean by functions, the practicing computer programmer has a good claim to not needing to understand them.

It’s becoming clear, though, that monads are on their way into practical programming concerns, too. In the past, these Kleisli arrows, the modified notions of “function” used by computer programmers, were built into our programming languages. Functions in C used a Kleisli arrow, and C++ functions used a different one. The language specification would tell us what is and what is not possible using a function in this language, and if we wanted something different, too bad. Maybe once a decade, we’d make the swap to a brand new programming language, and bask in the warm rays of some new language features for a while.

The Past: Error Handling

Consider the Err monad, which gave us functions that might fail and report their failure in structured ways. Modulo some details and extensions, this is basically structured exception handling. Looking to history, programmers worked without exception handling in their programming languages for many years. Of course, languages like C are all Turing complete, and can solve any possible computational problem, proper error handling included. But we don’t apply categories to think about possible computations; categories are for thinking about composition. Without exception handling in the notion of a “function” that’s provided by languages like C, programmers were left to do that composition by hand.

As a result, any C function that could fail had to indicate that failure using a return value. In many cases, conventional wisdom built up saying things like “return values are for indicating success or failure, not for giving back answers”. Coding conventions called for most if not all function calls to be followed with if statements checking for failure, and the resulting code was borderline unreadable. This was the heyday of flowcharts and pseudo-code, because no one expected to be able to understand real code at a glance! In reality, though, programmers only checked for errors when they thought they was possible, and a lot of errors went undetected. Programs were often unreliable, and likely untold billions of dollars spent on extra development work and troubleshooting.

What was the reason for this? It’s quite simple: the C programming language and others of its time provided an insufficient kind of Kleisli arrow! If their Kleisli arrow had included the functionality from the Err monad we defined above, this could have been avoided. But the notion of what a function means in C is fixed, so the answer was to deal with it, and eventually migrate to a different programming language, rewriting a lot of software, and likely costing another untold billions of dollars.

The Present: Global Variables and Context

What about the Pref monad, and others like it? As discussed earlier, this is about defining computations in a larger context of available information and state of the world.

In the past, we had global variables, the slightly more modern equivalent of just storing information at a known place in computer memory. Quick and dirty, but even 30 years ago, programmers knew they were the wrong answer, and wouldn’t be manageable for larger programs. Object oriented programming tried to alleviate the problem a little, by having functions run in a specific “object” that serves as their context, and that was implicitly passed around at least within the implementation of the object itself. To get this, everyone effectively had to change programming languages to get a better Kleisli arrow again. But even so, object-oriented languages don’t give a perfect answer to this problem.

This point is about the future, but I’ll start out by pointing out that everything here is already possible, but just requires an appropriate choice of programming language!

One current challenge for the computer programming community is finding effective ways to handle parallelism. Ironically, while past examples have focused on the problem of putting too little power into a language’s Kleisli arrow, the problem this time is too much! Plain (also known as “pure”) functions present lots of opportunities for parallelism. When code is executed in parallel, it may run faster, or if the parallelism is poorly designed it may even run slower, but in any case it will certainly still give the same answer. But when the Kleisli arrow incorporates destructive updates, that is no longer the case. Now parallelism is risky, and might give unexpected or incorrect results due to so-called race conditions.

We can’t just remove destructive updates from a language’s Kleisli arrow, though. A program that has no observable effects at all isn’t useful. What is useful is the ability to separate the portions of code that perform destructive update from those that just compute pure functions. So for the first time, we need a language with more than one kinds of Kleisli arrow, in the same language!

There is already at least one language that offers precisely this. Programmers in the Haskell language can build their own monads, and work in the Kleisli category of a monad of their choosing. The programming language offers a nice syntax for making this approach readable and easy to use. If something might fail, you can throw it in Err. If it needs access to the application settings, throw it in Pref. If it needs to do input or output, throw it in IO. Haskell web application frameworks and similar projects start by defining an appropriate monad with the appropriate features for that kind of application.

Another current trend in the computer programming community is toward building more domain-specific programming models. The language Erlang became popular specifically for providing a new programming model with advantages for parallelism. Microsoft’s .NET framework incorporates LINQ, which offers a programming model that’s better for bulk processing and querying of collections of data. Rails popularized domain-specific languages for web applications. Other languages offer continuations as a way to more easily build specify computations in a more flexible way. All of these are examples of working in new and different Kleisli arrows that capture exactly the model appropriate for a given task.

It comes down to this: If we believe that there is one single notion of “function” that is most appropriate for all of computer programming, then as practical programmers we can find a language that defines functions that way, and then forget about the more general idea of monads or Kleisli arrows as a relic of theoreticians. But it’s not looking that way. The programming community is moving quickly toward different notions of what a function means for different contexts, for different tasks, even for different individual applications. So it’s useful to have the language, the tools, and the intuition for comparing different procedural abstractions. That’s what monads give us.

Abstraction Over Monads

Using a language with a choice of monads offers some other advantages here, too. It gives us back our abstraction. In Haskell, for example, it’s possible to write code that is applicable in multiple different monads. A surprising amount of the programming done with one monad in mind actually has meaning in very different monads! For example, consider the following Haskell type:

sequence :: Monad m => [m a] -> m [a]

What this means is that for any monad, which we’ll call M, sequence converts from a list of values of M(A) into M(List(A)), the monad applied to lists themselves. Let’s take a minute to consider what this means for each of our four examples. For Err, it takes a list of results that might be failures, and if any of them are failures, it fails; but if not, then it gives back a list of all the results. It’s basically a convenient way to check a whole list of computations for a failure. For Pref, it takes a single set of application preferences, and distributes that to everything in the list, giving back a list of the results. For the power-set monad, P, it would take a list of sets, and give back a set of all the ways to choose one item from each set. And for IO, it takes a list of instruction cards, and gives back the single card with instructions for doing all of them in turn. Amazingly, this one function, which had only one implementation, managed to make sense and do something useful for all four of our examples of monads!

Along with a choice of monads comes the ability to abstract over that choice, and write meaningful code that works in any monad that you do end up choosing.

Between all of these forces, I predict that within the next ten years, software developers will be expected to discuss monads in the same way that most developers currently have a working vocabulary of design patterns or agile methodologies.

Beyond Monads: More Categorical Programming

While most of this has been about monads, I don’t want to leave anyone with the impression that monads are the only influence of categories in computer programming. All of the following ideas have found their way into programming practice, mostly (so far) within the Haskell programming language community because of its flexibility and a deep academic culture and history.

Monad transformers are a powerful technique for combining the effects of more than one monad to build rich and powerful programming models.

Functors and applicative functors (a.k.a. strong lax monoidal functors for mathematicians) are weaker than monads, but more widely applicable.

Other kinds of categories that are not Kleisli categories can often be defined and composed to solve specific problems. Freyd categories are also useful.

I’ll stop there, but only as an encouragement to look more into the various abstractions from category theory that programmers have found useful. A good starting point is the (Haskell-specific) Typeclassopedia by Brent Yorgey. That’s just a door into the many possibilities of applying category-based ideas and intuitions in computer programming.

But I hope I was able to convey how these ideas aren’t just made up, but are actually the natural extension of what computer programmers have been doing for decades.

” But if you start asking computer programmers for examples of functions, you will start hearing about some pretty bizarre things. I call these the “I must have skipped that day of calculus” functions.”

You might want to call them If-I-skipped-philosophy-or-logic-class-and-missed-the-session-on-equivocation-I-might-end-up-calling-these-‘I skipped calculus’-functions. While it’s true that most functions in the programming context are not functions according to a strict the mathematical definition, that’s ok. Many words have different definitions based on the context in which they are used.

Thanks for the comment! The point here isn’t to just shake our heads at people using the word function in those ways. I absolutely agree that if you’re programming in C, you ought to say “function” to mean a C function, because that’s the shared vocabulary of the language. Rather, the point there is to ask what those things really are, since they are not true functions. The answer then is (a) they are actually different things for different programming languages; C, for example, defines a different procedural abstraction than C++ or Java; and (b) in nearly all cases, they are the arrows in some kind of Kleisli category.

“Why the future of programming does lie in these ideas, and their omission in today’s mainstream languages has cost us dearly.”

Interesting article, but you really failed to deliver on this. Your article delivers on motivating that category theory provides a useful way of formalising computer software, particularly that which has an explicit notion of Monad. However, the future of programming in an everyday sense does not need either of these ideas to be explicit. I do agree that a strong notion of pure function is important, but understanding that does not require understanding category theory. And, existing imperative languages like e.g. Java or C could easily support a notion of pure function without having any explicit concept of a Monad. We’ll just add a “pure” modifier, with a few simple rules — done.

Sure, but then you’d also need to reconcile the differences in the concept of functions or subroutines in C versus Java versus PL/SQL, for example. It’s not a black and white question of “pure function” versus “subroutine”. The definition of a subroutine has evolved over time and been adapted to meet different needs, both general-purpose and domain-specific, and historically it’s meant a change of programming language each time.

Now, if there is one procedural abstraction to rule them all, some kind of perfect answer to what a subroutine should be, then we should forget about all this monad nonsense and just all go use that one. But if different notions of procedures are appropriate for different programming tasks, then it’s nice to have the tools to talk about, build, and abstract over the one that’s appropriate for the task you’re handling right now. That’s why monads come up not just for analyzing semantics and programming languages, but also for solving programming problems. Right now, there are only a few languages where things like that are possible with a sufficiently convenient syntax to be worth doing: Haskell, and a (upcoming? new? I’m not sure if it’s released yet) version of Microsoft’s F#. But programmers across even more communities are starting to think about monads in more limited contexts, and it’s a matter of time until the language features catch up.

You missed the most important reason why we have “functions” (subroutines).

Functions are ABBREVIATIONS that allow us to make the same calculation in more than one place without duplicating the code for the calculation.

They’re a bit like theorems in that way, actually, since theorems are also abbreviations that allow you to shorten proofs.

Of course you could just inline all “functions” of a program, but most likely you’re then looking at a combinatorial explosion. Just like how proof systems without theorems are less powerful than proof system _with_ theorems (where “less powerful” means that any proof of a certain statement in the less powerful system is longer than any proof of the same statement in the more powerful system). In other words, without “functions”, our programs would compile to huge monsters of assembly code.

I’m not sure I agree that reuse is the most important reason for introducing the notion of functions. Programmers in Commodore 64 BASIC achieved code reuse with GOSUB, despite not having any notion of parameters and return values. No one called those things functions. Of course composability helps make this kind of reuse more convenient, so it’s definitely a part of the reason we like composition.

You can’t actually inline all functions anyway, in any language that treats functions as values. Even function pointers in C make this exercise impossible, because the language’s built-in qsort needs to be passed a reference to your custom_compare_stuff function. Functions are more than abbreviations, they’re also a layer of indirection: letting one function leave “gaps” in its behavior, to be filled in later by referring to another as-yet-undetermined function.

Thanks a lot for this article. I finally understand the link between category theory and monads. The other comments might have valid points for other things to talk about, but I think you achieve the main objective to explain that link.

Also, your point about other languages being influences by this is spot on. I can point to this example

First, let me say, this is a great, well written, and thought-provoking post. Thanks for taking the time to write it.

Now that I’ve said that… I think you are completely and totally wrong. :)

The argument for functional languages is usually (implicitly) that mathematical models are the right way to think about software. You do this when you say that C functions are “not true functions” (meaning in the mathematical sense).

The evidence for this argument is that mathematics has designed very powerful mathematics has very powerful tools for working with the same kinds of things that software deals with, and these tools are easy to reason about.

But this is not the only way to approach programming or language design. This tension between the mathematically-driven and the, for lack of a better word, pragmatically-driven approaches has always existed, and I tend to think that the pragmatically-driven — meaning defining language concepts based on what many programmers find easy to work with — will probably win. Case in point, Google recently release a new programming language with a type system that is unsound by design! Crazy, right? They obviously could have designed a sound type system if they wanted to, but they thought the type system would be more useful the way they did it.

I think monads are interesting and powerful ideas and will certainly be embraced by new (functional) languages, but I don’t think they will necessarily be part of most or even many popular programming languages in the future. Most programmers today do not have a computer science background. In the future this ratio will be tipped even further to the side of the non-CS programmers. Until someone finds a way to make monads appeal to these programmers (which might happen), they wont gain widespread adoption.

One point of the article is that typical programming languages already use a limited set of monads “under the surface”. Exception handling I an obvious example. Making it possible for the developer to add new monads (as you can in Haskell) gives the developer enormous power to do better software faster.

I’ve read a plenty of monad tutorials of all kinds and styles in the past, still, yours was intriguing enough to draw my attention. Thanks for a fascinationg read.

I’d like to play a role of skeptic this time. All these promises of monadic approach are well known and repeated in numerous articles. Still I feel something important is missing here. I guess everybody who tried to write some non-trivial application in Haskell knows how quickly it gets clumsy. All these different monads, stacks of monad transformers, seems they’re never done right and every change in their structure trigger massive rewrites.

Just compare for example the ease of controlling a piece of state in OOP languages to juggling a couple state monads stacked on top of each other? Is it just a syntax differences? I doubt. Yeah, OOP languages have been around for a four decades or so, having enough time to evolve. But the idea of a monadic approach to programming is at least twenty years old too.

If we have a sound theoretical framework, why it so painfully applies to real life? Maybe it’s too rigorous, forcing us to artificially separate the ideas and features which better be fused? Maybe it fails to capture some significant factors? Or maybe we’re just currently applying it the wrong way?

Dmitry, I think you’re talking about two things at once. Certainly using mutable state in a global location is, in the small anyway, a lot more convenient than passing information around between functions. The problem is that the more global approach doesn’t scale to more complex problems, limits your design choices, makes concurrent applications more dangerous, etc. Sometimes you can do it and it’s fine. Other times you need to pass things around. And sometimes a compromise approach (like OOP’s passing around object references with mutable fields) is okay. This has nothing to do with monads at all.

Where monads come in is this: would you rather explicitly write functions of the form (a -> s -> (s, b))? Or would you rather work with (a -> State s b)? If you’re composing a lot of these functions through a significant part of your application, probably the latter. (For one-off functions, maybe the first is easier.) It is *this* choice, the one between explicit state threading through parameters and results versus using the State monad, that is relevant here. If you’d like to use global state, then go for it and the consequences will be the consequences of global state, and have nothing to do with monads.

Not at all. It took more than 30 years for OO to be generally accepted even though it is a small extension to procedural programming. So why should this (which is a huge step up compared with going from procedural to OO) be any faster for people to adapt?

Hi Morten,
let me clarify a bit. I’m not speaking about the lack of the immediate adoption. It’s the lack of expression power and composability. There’re tons and tons of questions around: people asking how to do this and this in Haskell, a long debate follows, but none of proposed answers feels quite right, original poster goes away dissapointed, groaning that it was trivial in a language X.

I’m too young to remember this, but I read most early complains about OOP concerned the poor performance, not lack of convenience. I’m not an OOP advocate, I just feel something is wrong with current Haskell’s approach if it requires so much thinking to implement things which are easy in other languages.

Morten / Apr 27 2012 4:55 am

I understand where you are coming from Dmitry. However I must admit I smiled reading this part: “lack of expression power and composability” :-) Haskell is light years ahead of C++ when it comes to expression power and composability. And I am saying that with 30+ years of development experience behind me; primarily using C++. However I also know that it is very very hard to understand why that is without learning Haskell. Which is a major reason why adaption is slow. It is basically a chicken and egg situation. It is such a big step up from C++ (and C#/Java) that you really have to learn it to understand why it is worth learning it. It isn’t just a minor evolutionary improvement on what you already know (like C# and Java in some ways are compared with C++). It is a revolutionary big-bold-change-the-way-you-think step up :-) It was very very hard to learn for an old dog like me but man was it worth it. Highly recommended.

Dmitry Vyal / Apr 27 2012 5:30 am

Well, I understand your feelings, Morten. They remind me my own sensations when I first started to get all this great stuff :) Just keep in mind that monadic approach is not the sole attempt of restricting side effects. And please, don’t think that one either praise it or doesn’t get it. Take a look at papers on Disciple (http://disciple.ouroborus.net/), for example. And think, why, if monads are so undoubtly cool, people are looking for another ways of doing things.

Morten / Apr 27 2012 6:22 am

Hey absolutely Dmitry. I don’t think we as a software development community ever will (or should) stop looking for better ways to do things. That’s why my message is this: learn new stuff. Don’t believe that what you know is all there is to know; or that what you know is somehow “optimum” or “the best”. And when learning new stuff, learn something that is truly different (C++ if you know Haskell, Haskell if you know C++) instead of minor variations of what you already know (Java/C# if you know C++). Haskell/Monads is an example of that. Hard to learn if you only know C++ family languages. Worth learning. Worth having in your toolbox of solutions. Other recommended things to learn are Closure/Scheme/Lisp until you truly “get” macros (another memorable “light bulb” experience for me :-) and Erlang (understanding how very hard industrial concurrency problems can be solved with an elegant functional programming language).

Matt Ford / Jun 28 2012 3:58 am

Hi, I’ve been trying to understand the ‘trick’ of extending the range that allows us to “function with dependence”. By way of an example I define the application settings to be:

3. Side effects. As you say, mathematical (Cartesian) functions are merely associations, even though they sometimes smell like verbs. But … a system with state could have a function applied to it that changes the state. (Eg, “everything on the screen” is part of the image of the function, along with the return value … the codomain being the cross product of possible screen states with possible return values … crossed with the domains of global variables that could be changed by the function-with-side-effects.)

So here’s how I think of categories: a category is one entire model, with all of the necessary context for interpretation. You have both the things (dots) and how they relate to each other (arrows).

Then a functor between two categories tells you “category A is like category B.”

The difference to set theory is that sets don’t necessarily come along with all of the context for interpretation. Like having most of the parts to a car and some of the ways that some of the parts go together could be a “space” in old maths. But a category requires all of the diagrams that show us how the parts of the car connect and what they do, how to put them together, and also every part has to be included.

I don’t think I agree with the characterization of functors as a kind of similarity. Similarity ought to be an equivalence relation, where functors are very much one-dimensional. Furthermore, there ALWAYS exists a functor from any category to any non-empty category… just map every object to a fixed destination and every morphism to the identity there. So the mere existence of a functor tells you very little. The focus ought to be on what that functor is doing.

for any set A, we’ll define a new set called Err(A) to be just A together with possible reasons we might have failed. Now a possibly failing function from a set A to a set B is really just an ordinary function from A to Err(B).

After reading many articles about Monads, I finally get that intuitive feel to what I can do with them in software engineering
Thanks for this great article. When the time comes, I promise now I will not write my own tutorial like it seems to be proper, but just link here. Hard to resist the temptation.

Hi, I’m studying your article. I have some comment on the “Functioning with Uncertantly” section, where you state that a non-deterministic function f:A->B can be seen as a common function f:A->P(B).

What I would have done is this: define T as the (continuos) set of time points and rewrite the function: f:TxA->B (to be curried…), such that it returns a different output depending on when it is evaluated.

Thank you for this wonderful article. I’ve been reading up (and working with) monads for a while, but there are two key insights I haven’t see before:

– The idea of a category as an “extended” function, and that category theory gives us techniques of bringing composability to these extended functions.
– I was somewhat confused by the definition of Haskell Monad instances like Reader and State because they consisted of functions. Your description of how these can just be seen adding an additional parameter to the original function makes their implementation totally clear.

Thank you for this very interesting article (I’m a little late, I know!). I would like to translate it into French if you give me permission. I’m trying to convince fellow programmers to embrace functional programming and you’re giving excellent arguments in this post. It would be published on my own blog, hosted by developpez.com (http://www.developpez.net/forums/blogs/643915-stendhal666/). I thank you in advance for your answer.

Since this post is somewhat old and the blog hasn’t been updated sine more or less a year, I fear the author might not check it anymore. I haven’t seen any copyright on that content, so I will publish it without waiting any further. But if the author finds out my request and wish to deny me the right of republishing his post, I’ll be glad to remove if from my blog. Just leave a comment in response, I should be warned by email.