One of the tropes of the golden era of point-n-click adventure games is, would you believe it, the pointing and clicking. In particular, pointing where you’d like the avatar to go, and clicking to make it happen. This post will explore how I made that happen in my neptune game engine.

The first thing we need to do is indicate to the game which parts of the background should be walkable. Like we did for marking hotspots, we’ll use an image mask. Since we have way more density in an image than we’ll need for this, we’ll overlay it on the hotspot mask.

Again, if the room looks like this:

room background

Our mask image would look like this:

room mask

Here, the walkable section of the image is colored in blue. You’ll notice there’s a hole in the walk mask corresponding to the table in the room; we wouldn’t want our avatar to find a path that causes him to walk through the table.

However there is something important to pay attention to here; namely that we’re making an adventure game. Which is to say that our navigation system doesn’t need to be all that good; progress in the game is blocked more by storytelling and puzzles than it is by the physical location of the player (unlike, for example, in a platformer game.) If the avatar does some unnatural movement as he navigates, it might be immersion-breaking, but it’s not going to be game-breaking.

Which means we can half ass it, if we need to. But I’m getting ahead of myself.

The first thing we’re going to need is a function which samples our image mask and determines if a given position is walkable.

Currying this function against our image mask gives us a plain ol’ function which we can use to query walk-space.

In a 3D game, you’d use an actual mesh to mark the walkable regions, rather than using this mask thing. For that purpose, from here on out we’ll call this thing a navmesh, even though it isn’t strictly an appropriate name in our case.

Because pathfinding algorithms are defined in terms of graphs, the next step is to convert our navmesh into a graph. There are lots of clever ways to do this, but remember, we’re half-assing it. So instead we’re going to do something stupid and construct a square graph by sampling every \(n\) pixels, and connecting it to its orthogonal neighbors if both the sample point and its neighbor are walkable.

It looks like this:

graph building

Given the navmesh, we sample every \(n\) points, and determine whether or not to put a graph vertex there (white squares are vertices, the black squares are just places we sampled.) Then, we put an edge between every neighboring vertex (the white lines.)

We’re going to want to run A* over this graph eventually, which is implemented in Haskell via Data.Graph.AStar.aStar. This package uses an implicit representation of this graph rather than taking in a graph data structure, so we’ll construct our graph in a manner suitable for aStar.

But first, let’s write some helper functions to ensure we don’t get confused about whether we’re in world space or navigation space.

We use the list monad here to construct all of the possible neighbors – those which are left, right, above and below our current location, respectively. We then guard on each, ensure our current nav point is walkable, that our candidate neighbor is within nav bounds, and finally that the candidate itself is walkable. We need to do this walkable check last, since everything will explode if we try to sample a pixel that is not in the image.

Aside: if you actually have a mesh (or correspondingly a polygon in 2D), you can bypass all of this sampling nonsense by tessellating the mesh into triangles, and using the results as your graph. In my case I didn’t have a polygon, and I didn’t want to write a tessellating algorithm, so I went with this route instead.

Finally we need a distance function, which we will use both for our astar heuristic as well as our actual distance. The actual distance metric we use doesn’t matter, so long as it corresponds monotonically with the actual distance. We’ll use distance squared, because it has this monotonic property we want, and saves us from having to pay the cost of computing square roots.

Technically correct, in that it does in fact get from our source location to our destination. But it’s obviously half-assed. This isn’t the path that a living entity would take; as a general principle we try not to move in rectangles if we can help it.

We can improve on this path by attempting to shorten it. In general this is a hard problem, but we can solve that by giving it the old college try.

Our algorithm to attempt to shorten will be a classic divide and conquer approach – pick the two endpoints of your current path, and see if there is a straight line between the two that is walkable throughout its length. If so, replace the path with the line you just constructed. If not, subdivide your path in two, and attempt to shorten each half of it.

Before we actually get into the nuts and bolts of it, here’s a quick animation of how it works. The yellow circles are the current endpoints of the path being considered, and the yellow lines are the potential shortened routes. Whenever we can construct a yellow line that doesn’t leave the walkable region, we replace the path between the yellow circles with the line.

path shortening

The “divide and conquer” bit of our algorithm is easy to write. We turn our path list into a Vector so we can randomly access it, and then call out to a helper function sweepWalkable to do the nitty gritty stuff. We append the src and dst to the extrema of the constructed vector because aStar won’t return our starting point in its found path, and because we quantized the dst when we did the pathfinding, so the last node on the path is the closest navpoint, rather than being where we asked the character to move to.

The final step, then, is to figure out what this sweepWalkable thing is. Obviously it wants to construct a potential line between its endpoints, but we don’t want to have to sample every damn pixel. Remember, we’re half-assing it. Instead, we can construct a line, but actually only sample the nav points that are closest to it.

In effect this is “rasterizing” our line from its vector representation into its pixel representation.

Using the Pythagorean theorem in navigation space will give us the “length” of our line in navigation space, which corresponds to the number of navpoints we’ll need to sample.

We can then subdivide our line into 6 segments, and find the point on the grid that is closest to the end of each. These points correspond with the nodes that need to be walkable individually in order for our line itself to be walkable. This approach will fail for tiny strands of unwalkable terrain that slices through otherwise walkable regions, but maybe just don’t do that? Remember, all we want is for it to be good enough – half-assing it and all.

The other day, I found myself working on the interaction subsystem of my game engine. I want the game to play like Monkey Island 3, which means you can click on the ground to walk there. You can also click and hold on an interactive piece of scenery in order to have a context-sensitive menu pop-up, from which you can choose how to interact with the object in question. If you’re not familiar with the genre, watching a few minutes of the video linked above should give you some idea of what I’m trying to build.

An adventure game in which you’re unable to interact with anything isn’t much of a game, and that’s where we left the engine. So it seemed like a thing to focus on next.

I knew that click/hold interaction that I wanted formed some sort of DFA, so I unwisely headed down that garden path for a bit. After implementing a bit, I found a state machine with the denotation of type DFA s e a = s -> e -> Either s a, where s is the state of the machine, e is the type of an edge transition, and a is the eventual output of the machine. Upon the final result, however, it became clear that I had fallen into an abstraction hole. I spent a bunch of time figuring out the implementation of this thing, and then afterwards realized it didn’t actually solve my problem. Whoops. Amateur Haskell mistake :)

The problem is that transitioning into some state might need to make a monadic action in order to generate the next edge. For example, when you press down on the mouse button, we need to start a timer which will open the action menu when it expires. This could be alleviated by changing Either to These and letting a ~ (Monad m => m b), but that struck me as a pretty ugly hack, and getting the implementation of the denotation to work again was yucky.

So I decided that instead maybe I should write a dumb version of what I wanted, and find out how to abstract it later if I should need similar machinery again in the future. I burned my DFA implementation in a fire.

This posed a problem, though, because if I wanted to write this for real I was going to need things to actually interact with, and I didn’t yet have those. I decided to put the interaction sprint on hold, in order to focus more on having things with which to interact.

One abstraction I think in terms of when working with adventure games is that of the hotspot. A hotspot is a mask on the background image which indicates a static piece of interesting geometry. For example, a window that never moves would be baked into the background image of the room, and then a hotspot would be masked on top of it to allow the character to interact with it.

For example, if our room looks like this (thanks to MI2 for the temporary art):

and now bake the first three parameters of this function when we construct our level definition.

In order to test these things, I gave added a field _hsName :: Hotspot -> String in order to be able to test if my logic worked. The next step was to bind the click event to be able to call the Pos -> Maybe Hotspot that I curried out of mkHotspot and stuck into my Room datastructure (_hotspots :: Room -> Pos -> Maybe Hotspot).

I clicked around a bunch, and found that print . fmap _hsName $ _hotspots currentRoom mousePos lined up with the door when I clicked on it. It seemed to be working, so I considered my first yak shave successful: I now had something in the world that I could interact with.

The next step was to code up a little bit of the DFA I was originally working on. I decided that I should make the avatar walk to the place you clicked if it wasn’t a hotspot.

So: when the mouse is pressed, see if it was over top of a hotspot. If so, print out the name of it. Otherwise, check the navmesh of the room, and see if that’s a valid place to walk. If so, update any entity who has the isAvatar component and set its pathing component to be the location we want.

The engine at this point already has navigation primitives, which is why this works. We’ll discuss how the navmesh is generated and used in another devlog post.

I ran this code and played around with it for a while. Everything looked good – after I remembered to set isAvatar on my player entity :)

The next step was to implement timers that would have a callback, and could be started and stopped. I’d need support for these in order to wait a little bit before opening up the action menu. Thankfully, timers are super easy: just have an amount of time you decrement every frame until it hits zero, and then do the necessary action. I came up with this model for timers:

A Timer is just an amount of remaining time and something to do afterwards. It’s stored in the GlobalState with a TimerType key. I originally thought about using a bigger type (such as Int) as my timer key, but realized that would make canceling specific timers harder as it would imply they’re given a non-deterministic key when started. The interface for starting and canceling timers turned out to be trivial:

The only thing left is to update timers and run their callbacks when it’s time. I fucked around with this implementation too hard, trying to find a completely lensy way of doing it, but eventually settled on this ugly fromList . toList thing:

ts' is a traversal over the Map of timers, that decrements each of their times, optionally runs their callbacks, then returns a Mayber Timer for each one. The last line is where the interesting bit is – sequence over a (TimerType, Maybe Timer) is a Maybe (TimerType, Timer), which we can then insert back into our Map as we construct it – essentially filtering out any timers which have expired.

Finally we can get back to our DFA. Instead of printing out the name of the hotspot you clicked on, we can now start a timer that will update our game state. I added a field to GlobalState:

The idea is that we start in state IStart, transition into IBeforeCoin when we start the timer, and into ICoinOpen when the timer expires. Additionally, if the user releases the mouse button, we want to cancel the timer. All of this becomes:

If you care, try to trace through these cases and convince yourself that this logic is correct. The reason we have a position stored inside the ICoinOpen is so that we know where the mouse was when the user started holding their mouse down. This corresponds to where we should draw the action menu.

This is done in the drawing routine by checking the current state of _gInputDFA – if it’s ICoinOpen it means the menu is up and we need to draw it.

The only last thing is how can we map where you release your mouse button on the menu to what interaction we should do. Our action menu looks like this:

the action menu

From left to right, these squares represent talking/eating, examining, and manipulating. We need some way of mapping a location on this image to a desired outcome.

Doing rectangle collision is easy enough – we define a bounding box and a test to see if a point is inside of it (as well as some auxiliary functions for constructing and moving BBs, elided here):

The abstraction is my amazingly-named BBSurface, which is a mapping of BBs to values of some type a. We can find a Maybe a on the BBSurface by just checking if the point is in any of the bounding boxes. If it is, we return the first value we find.

All that’s left is to construct one of these BBSurfaces for the coin, and then to move it to the position indicated inside the ICoinOpen. Easy as pie. Pulling everything together, and our interactive menu works as expected. Great success!

Perhaps you could explain a little bit about your choice to write ecstasy rather than to use apecs? I’ve not used apecs, I’m just interested as I had done some limited research into writing games in Haskell and apecs seemed to have traction.

That seems like a really good idea, and combined with the fact that I really haven’t published anything about ecstasy suggested I actually write about it!

What is an ECS?

So before diving in, let’s take a look at the problem an entity-component-system (ECS) solves. Let’s say we’re writing a simple 2D platformer, we’ll have dudes who can run around and jump on platforms.

The way I’d go about writing this before knowing about ECS would be to implement one feature at a time, generally using the player character to test it as I went. I write functions that look something like this:

On the surface this feels good. We’ve reused the code for moveActor for both the player and any other dudes on the level who might want to walk around. It feels like we can build up from here, and compose pieces as we go.

Which is true if you’re really patient, good at refactoring, or have spent a lot of time building things like this and know where you’re going to run afoul. Because you’re always going to run afoul in software.

The problem with our first attempt at this code is that it codifies a lot of implicit assumptions about our game. For example, did you notice that it implies we’ll always have an Actor for the player? It seems like a reasonable assumption, but what if you want to play a cut-scene? Or how about if you don’t want to always have control over the player? Maybe you’ve just been hit by something big that should exert some acceleration on you, and you don’t want to be able to just press the opposite direction on the control stick to negate it.

All of a sudden, as you try to code for these things, your simple moveActor function takes more and more parameters about the context of the circumstances in which it’s running. And what’s worse is that often the rules of how these behaviors should play out will change depending on whether its the player or some enemy in the level. We’re left with a conundrum – should we build ad-hoc infrastructure around the callers of moveActor or should we put all of the logic inside of it?

As you can imagine, it pretty quickly becomes a mess.

In one of the few times I’ll praise object-oriented programming, I have to say that its inheritance-based polymorphism lends itself well to this problem. You can build more complicated and specific behaviors out of your ancestors’ behaviors. Unfortunately, this approach bucks the OOP best-practice of “composition over inheritance.”

ECS takes what I consider to be the functional-programming-equivalent of this OOP strategy. It’s fundamental stake in the ground is that rather than representing your universe of game objects as an array-of-structs, you instead represent it as a struct-of-arrays. Conceptually, this is a cognitive shift that means instead of looking like this:

This has some profound repercussions. First of all, notice that we have no guarantee that our Arrays are the same length, which implies that not every GameObject need have all of its possible components.

All of a sudden, we can pick and choose which components an entity has. Entities, now instead of being explicitly modeled by a GameObject are implicitly defined by an Int corresponding to their index in all of the arrays.

From here, we can now write specific, global behaviors that should manipulate the components of an entity. We can avoid a lot of our previous ad-hoc machinery by essentially running a map that performs pattern matching on only the components we want to care about. For example, we can say that we only want to draw entities who have both a position and a graphics. We want to apply gravity to all entities that have a velocity, but don’t have the notAffectedByGravity flag.

I haven’t dug too much into the internals of apecs, so this representation might not be perfect, but it’s good enough for us to get an understanding of what’s going on here.

We can now use some of apecs’ primitives to, for example, transfer our velocity over to our position:

rmap $ \(Position p, Velocity v) ->Position$ p + v

This rmap function is something I’d describe as “fucking magic.” You pass it a lambda, it inspects the type of the lambda, uses the tuple of its input to determine which components an entity must have, and then will update the components of the corresponding output tuple.

At first, this seems like a fine abstraction, but it breaks down pretty quickly when used in anger. For example, what if you want to run a function over Position that only works if you don’t have a Velocity? Or if you want to remove a component from an entity? apecs can do it, but good luck finding the right function. Do you want cmap, cmapM, cmapM_, cimapM, cimapM_, rmap', rmap, wmap, wmap' or cmap'? After a week of working with the library, I still couldn’t come up with heads or tails for which function I needed in any circumstance. I’m sure there’s a mnemonic here somewhere, but I’m not bright enough to figure it out.

When you do eventually find the right function, doing anything other than a pure map from one component to another becomes an exercise in futility and magic pattern matching. There’s this thing called Safe you sometimes need to pattern match over, or produce, and it roughly corresponds to when you’re not guaranteed to have all of the components you asked for.

There are several other gotchas, too. For example, you can construct an entity by providing a tuple of the components you want to set. Unfortunately, due to apecs’ design, this thing must be type-safe. Which means you can’t construct one based on runtime data if you’re loading the particular components from e.g. a level editor. Well, you can, if you’re willing to play “existentialize the dictionary” and learn enough of the underlying library (and quirks of Haskell’s type inference algorithm) in order to convince the compiler what you’re doing is sound.

One final gotcha I’ll mention is that this magic tuple stuff is provided through typeclasses which are generated for the library by template haskell. Out of the box, you only get support for 5-tuples, which means you can’t easily construct entities with more components than that. Furthermore, changing the TH to generate more results in exponential growth of your compile times.

None of this is to say that apecs is bad software. It’s actually pretty brilliant in terms of its technology; I just feel as though its execution is lacking. It depends on a lot of tricks that I wouldn’t consider to be idiomatic Haskell, and its usability suffers as a consequence.

Ecstasy

So with all of the above frustrations in mind, and a lot of time to kill in a Thai airport, I felt like I could make a better ECS. Better is obviously subjective for things like this, but I wanted to optimize it for being used by humans.

My explicit desiderata were:

Keep boilerplate to a minimum.

The user shouldn’t ever bump into any magic.

I think ecstasy knocks it out of the park on both of these fronts. Before diving into how it all works, let’s take a peek at how it’s used. We can define our components like so:

Ecstasy clearly wins on minimizing the definition-side of boilerplate, but it seems like we’ve gained some when we actually go to use these things. This is true, but what we buy for that price is flexibility. In fact, emap is powerful enough to set, unset and keep components, as well as branch on whether or not a component is actually there. Compare this to the ten functions with different signatures and semantics that you need to keep in mind when working with apecs, and it feels like more of a win than the syntax feels like a loss.

So the question I’m sure you’re wondering is “how does any of this work?” And it’s a good question. Part of the reason I wrote this library was to get a feel for the approach and for working with GHC.Generics.

The idea comes from my colleague Travis Athougies and his mind-meltingly cool library beam. The trick is to get the library user to define one semantic type that makes sense in their domain, and then to use tricky type system extensions in order to corral it into everything you need. beam uses this approach to model database tables; ecstasy uses it to provide both a struct-of-arrays for your components, as well as just a struct corresponding to a single entity.

As you’d expect, the sorcery is inside of the Component type family. We can look at its definition:

This Component thing spits out different types depending on if you want a record for the entity ('FieldOf), an updater to change which components an entity has ('SetterOf), or the actual universe container to hold all of this stuff ('WorldOf). If we’re building an entity record, every component is a Maybe. If we’re describing a change to an entity, we use data Update a = Set a | Unset | Keep. If we want a place to store all of our entities, we generate an IntMap for every 'Field. There’s also support for adding components that are uniquely owned by a single entity, but we won’t get into that today.

The trick here is that we get the user to fill in the c :: ComponentType when they define the components, and ask them to keep the s :: StorageType polymorphic. The library then can instantiate your EntWorld f with different StorageTypes in order to pull out the necessary types for actually plumbing everything together.

We use the Generic derivation on EntWorld in order to allow ourselves to construct the underlying machinery. For example, when you’re defining an entity, you don’t want to be able to Keep the old value of its components, since it didn’t have any to begin with. We can use our Generic constraint in order to generate a function toSetter :: EntWorld 'FieldOf -> EntWorld 'SetterOf which takes an entity record and turns it into an entity update request, so that we don’t actually need special logic to construct things. The Generic constraint also helps generate default values of the EntWorld 'WorldOf and other things, so that you don’t need to write out any boilerplate at the value level in order to use these things.

The actual how-to-do of the GHC.Generics is outside of the scope of today’s post, but you can read through the source code if you’re curious.

I’m ravenously working my way through Austin Kleon’s excellent book Show Your Work. One of the points that most resounded with me was to, as you might anticipate, show your work. But more importantly, to share it every day. I’ve decided to take up that challenge in documenting the development of some of my bigger projects. The goal has a few facets: to show how I work and the struggles that I face while writing Haskell on a day-to-day basis; to lend my voice towards the art of game programming in Haskell; and to bolster my 2018 publishing goals.

I want to make an old school point-and-click adventure game in the style of Monkey Island or Full Throttle. I’ve wanted to make one for as long as I can remember, and I finally have a concept and some amount of script that I think would be fitting for the medium. I spent roughly two days searching for engines to run this baby on, and I didn’t have any luck whatsoever.

adventure - an old adventure game engine I wrote back in ’12 or so. It requires writing a lot of lua, and appears to have bitrotten since then. I couldn’t get it to compile.

Godot/Escoria - Escoria doesn’t appear to run on recent versions of Godot.

Visionaire - I successfully got the editor running on WINE, but it couldn’t draw anything, so I could edit everything but had no visual feedback on anything.

Bladecoder Adventure Engine - I fought to compile this for a while, and eventually succeeded, but got scared of it. It’s written by a single guy in a language I never want to touch, and decided the risk factor was too high.

Unity Adventure Creator - looks promising, but required forking out 70 euros before you could try it. As someone who is unemployed knows nothing about Unity, this is a pretty steep price to determine whether or not the project will work for my purposes.

So it looks like we’re SOL. The existing engines don’t seem like they’re going to cut it. Which means we’re going to need to roll our own.

Fortunately I’ve rolled a few of my own already. This wasn’t my first rodeo. There’s the previously mentioned adventure, an unnamed XNA/C# one I wrote before knowing about source control which is unfortunately lost to the sands of time, and one I most recently put together as a technical demo for a project a friend and I were going to work on. The friend pulled out, unfortunately, so the project died, but that means I have a starting point.

The engine as it existed had basic functionality for pathing around a bitmap, moving between rooms, and basic support for interacting with the environment. Unwisely, it was also a testbed for lots of type-trickery involving existentially pushing around types to manage the internal state of things in the game. It was intended that we’d do all of our game scripting directly in Haskell, and this seemed like the only approach to have that work.

So my first order of business was to tear out all of the existential stuff. I’ve learned since that you should always avoid existentializing things unless you are really really sure you know what you’re doing. It’s a soft and slow rule, but more often than not I regret existentializing things. The new plan was to script the game with a dedicating scripting language, and so Haskell never needs to know about any of the internal state.

Since writing the first draft of this game engine, I’ve published a library called ecstasy. It’s an entity-component system that allows you to describe behaviors over components of things, and then compose all of those behaviors together. The magic here is that you can write a function that only manipulates the components you need, and the library will lift it over all entities such a behavior would be relevant to. This means you can pick-and-choose different behaviors for game objects without needing to do a lot of heavy plumbing to get everything to play nicely with one another.

And so the next step was to hook up ecstasy to my existing engine. I didn’t want to alter any of the game’s behavior yet, so entities managed by ecstasy would have to exist completely parallel to the ones managed by the existing engine.

I defined my ecstasy component type with the most minimal support for drawing things on the screen.

There was some silly plumbing necessary to connect my old, convoluted Game monad with the System monad provided by ecstasy. That’s what this ms@(s, _) and Game' silliness is here; little shims that can run the two monads simultaneously and reconcile the results. It was pretty gnarly, but thankfully only a hack until I could convert enough of the game logic over to being exclusively managed by ecstasy.

I think that’s where we’ll leave the dev blog for today. I want to get us roughly caught up to the present in terms of getting from there-to-here in order to provide a better overall view of what game development in Haskell looks like. But I’m also pretty anxious to actually get some work done, rather than just describing work I have done. I expect the posts to get more technical as we get closer to being caught up, when I don’t need to depend on my memory for what changes were made.

Next time we’ll discuss ripping out most of the silly global variables that used to be in play, and talk about how an ECS better models things like “what should the camera be focused on?” and “how should characters navigate the space?”

I have a (not very controversial) feeling that people don’t feel as though algebra is actually a thing you can use for stuff. I fall into this trap myself often, despite being someone who does math for a living, and so I suspect this is a pretty wide-spread phenomenon. Let me explain.

For example, consider the equation:

\[
(x + y)(x - y) = x^2 - y^2
\]

This is known as the difference of squares. Let’s work through the derivation of it together:

Recall that we can use the FOIL method to get from the first line to the second.

I implore you to read through this proof carefully, and convince yourself of its truthfulness – even if you don’t consider yourself a “math” person. Believe it or not, there’s a point I’m getting to.

Anyway – by all accounts, this difference of squares thing is a pretty humdrum theorem. Who really cares, right? Let’s switch gears for a bit and talk about something more interesting.

Recall that \(20 \times 20 = 400\). As an interesting question, without actually computing it, let’s think about the product \(19 \times 21\). What does this equal? It seems like it could also be \(400\) – after all, all we did was take one away from the left side of the times and move it to the right.

In fact, if you work it out, \(19 \times 21 = 399\). That’s kind of interesting: somehow we lost a \(1\) by shuffling around the things we were multiplying.

An intriguing question to ask yourself is whether this is always true, or whether we’ve just gotten lucky with the examples we looked at.

But the more interesting question, in my opinion, is what happens if we go from \(19 \times 21 = 399\) to \(18\times22\). Will we lose another \(1\) when we fiddle with it? Or will something else happen? Form an opinion on what the answer will be before continuing.

Neat, right? Even if you carefully read through the proof of the difference of squares earlier, you might not have noticed that we’ve been playing with them the entire time! I blame western math education for this; too often are equations presented only to be solved, and never to be thought about. It’s a disservice we’ve done to ourselves.

The takeaway of all of this, in my opinion, is that we should spend some time thinking about the notion of equality, about the \(=\) symbol. Ever since looking at this difference of squares thing, I’ve started viewing \(=\) not as the symbol which separates the left side of an equation from the right, but as a transformation. The \(=\) sign transforms something we can experience into something we can manipulate, and back again.

What I mean by that is that it’s a lot easier to conceptualize \(22\times18\) than it is to think about \((x+y)(x-y)\). The numeric representation is better suited for human minds to experience, while the algebraic expression is better at twiddling. We know how to twiddle algebra, but twiddling numbers themselves is rather meaningless.

In terms of everyday usefulness, this isn’t particularly helpful, except that it’s often easier to compute a difference of squares than it is to do the multiplication naively. If you can recognize one, you could probably impress someone with your mental arithmetic – but, again, it’s not going to revolutionize your life in any way.

All of this is to say that math is neat. Even if you don’t see any practical value in this stuff, hopefully you’ll agree that there might be interesting puzzles to be found here. And, as it turns out, algebra can be a satisfying tool for solving these puzzles.

Thanks to Matt Parsons for proof-reading an early version of this post.

Context

At work recently I’ve been working on a library to get idiomatic gRPC support in our Haskell project. I’m quite proud of how it’s come out, and thought it’d make a good topic for a blog post. The approach demonstrates several type-level techniques that in my opinion are under-documented and exceptionally useful in using the type-system to enforce external contracts.

Thankfully the networking side of the library had already been done for me by Awake Security, but the interface feels like a thin-wrapper on top of C bindings. I’m very, very grateful that it exists, but I wouldn’t expect myself to be able to use it in anger without causing an uncaught type error somewhere along the line. I’m sure I’m probably just using it wrong, but the library’s higher-level bindings all seemed to be targeted at Awake’s implementation of protobuffers.

We wanted a version that would play nicely with proto-lens, which, at time of writing, has no official support for describing RPC services via protobuffers. If you’re not familiar with proto-lens, it generates Haskell modules containing idiomatic types and lenses for protobuffers, and can be used directly in the build chain.

So the task was to add support to proto-lens for generating interfaces to RPC services defined in protobuffers.

My first approach was to generate the dumbest possible thing that could work – the idea was to generate records containing fields of the shape Request -> IO Response. Of course, with a network involved there is a non-negligible chance of things going wrong, so this interface should expose some means of dealing with errors. However, the protobuffer spec is agnostic about the actual RPC backend used, and so it wasn’t clear how to continue without assuming anything about the particulars behind errors.

More worrisome, however, was that RPCs can be marked as streaming – on the side of the client, server, or both. This means, for example, that a method marked as server-streaming has a different interface on either side of the network:

This is problematic. Should we generate different records corresponding to which side of the network we’re dealing with? An early approach I had was to parameterize the same record based on which side of the network, and use a type family to get the correct signature:

This seems like it would work, but in fact the existence of the forall on the client-side is “illegally polymorphic” in GHC’s eyes, and it will refuse to compile such a thing. Giving it up would mean we wouldn’t be able to return arbitrarily-computed values on the client-side while streaming data from the server. Users of the library might be able to get around it by invoking IORefs or something, but it would be ugly and non-idiomatic.

So that, along with wanting to be backend-agnostic, made this approach a no-go. Luckily, my brilliant coworker Judah Jacobson (who is coincidentally also the author of proto-lens), suggested we instead generate metadata for RPC services in proto-lens, and let backend library code figure it out from there.

With all of that context out of the way, we’re ready to get into the actual meat of the post. Finally.

Generating Metadata

According to the spec, a protobuffer service may contain zero or more RPC methods. Each method has a request and response type, either of which might be marked as streaming.

While we could represent this metadata at the term-level, that won’t do us any favors in terms of getting type-safe bindings to this stuff. And so, we instead turn to TypeFamilies, DataKinds and GHC.TypeLits.

For reasons that will become clear later, we chose to represent RPC services via types, and methods in those services as symbols (type-level strings). The relevant typeclasses look like this:

You’ll notice that these typeclasses perfectly encode all of the information we had in the protobuffer definition. The idea is that with all of this metadata available to them, specific backends can generate type-safe interfaces to these RPCs. We’ll walk through the implementation of the gRPC bindings together.

The Client Side

The client side of things is relatively easy. We can the HasMethod instance directly:

Would-be callers attempting to use the wrong function for their method will now be warded off by the type-system, due to the equality constraints being unable to be discharged. Success!

The actual usability of this code leaves much to be desired (it requires being passed a proxy, and the type errors are absolutely disgusting), but we’ll circle back on improving it later. As it stands, this code is type-safe, and that’s good enough for us for the time being.

The Server Side

Method Discovery

Prepare yourself (but don’t panic!): the server side of things is significantly more involved.

In order to run a server, we’re going to need to be able to handle any sort of request that can be thrown at us. That means we’ll need an arbitrary number of handlers, depending on the service in question. An obvious thought would be to generate a record we could consume that would contain handlers for every method, but there’s no obvious place to generate such a thing. Recall: proto-lens can’t, since such a type would be backend-specific, and so our only other strategy down this path would be Template Haskell. Yuck.

Instead, recall that we have an instance of HasMethod for every method on Service s – maybe we could exploit that information somehow? Unfortunately, without Template Haskell, there’s no way to discover typeclass instances.

But that doesn’t mean we’re stumped. Remember that we control the code generation, and so if the representation we have isn’t powerful enough, we can change it. And indeed, the representation we have isn’t quite enough. We can go from a HasMethod s m to its Service s, but not the other way. So let’s change that.

If we ensure that the ServiceMethods s type family always contains an element for every instance of HasService, we’ll be able to use that info to discover our instances. For example, our previous MyService will now get generated thusly:

We can think of xs here as the list of constraints we want. Obviously if we don’t want any constraints (the '[] case), we trivially have all of them. The other case is induction: if we have a non-empty list of constraints we’re looking for, that’s the same as looking for the tail of the list, and having the constraint for the head of it.

Read through these instances a few times; make sure you understand the approach before continuing, because we’re going to keep using this technique in scarier and scarier ways.

With this HasAllMethods superclass constraint, we can now convince ourselves (and, more importantly, GHC), that we can go from a Service s constraint to all of its HasMethod s m constraints. Cool!

Typing the Server

We return to thinking about how to actually run a server. As we’ve discussed, such a function will need to be able to handle every possible method, and, unfortunately, we can’t pack them into a convenient data structure.

Our actual implementation of such a thing might take a list of handlers. But recall that each handler has different input and output types, as well as different shapes depending on which bits of it are streaming. We can make this approach work by existentializing away all of the details.

While it works as far as the actual implementation of the underlying gRPC goes, we’re left with a great sense of uneasiness. We have no guarantees that we’ve provided a handler for every method, and the very nature of existentialization means we have absolutely no guarantees that any of these things are the right ype.

Our only recourse is to somehow use our Service s constraint to put a prettier facade in front of this ugly-if-necessary implementation detail.

The actual interface we’ll eventually provide will, for example, for a service with two methods, look like this:

runServer ::HandlerForMethod1->HandlerForMethod2->IO ()

Of course, we can’t know a priori how many methods there will be (or what type their handlers should have, for that matter). We’ll somehow need to extract this information from Service s – which is why we previously spent so much effort on making the methods discoverable.

The technique we’ll use is the same one you’ll find yourself using again and again when you’re programming at the type-level. We’ll make a typeclass with an associated type family, and then provide a base case and an induction case.

classHasServer s (xs :: [Symbol]) wheretypeServerType s xs ::*

We need to make the methods xs explicit as parameters in the typeclass, so that we can reduce them. The base case is simple – a server with no more handlers is just an IO action:

The idea is that as we pull methods x off our list of methods to handle, we build a function type that takes a value of the correct type to handle method x, which will take another method off the list until we’re out of methods to handle. This is exactly a type-level fold over a list.

The only remaining question is “what is this MethodHandler thing?” It’s going to have to be a type family that will give us back the correct type for the handler under consideration. Such a type will need to dispatch on the streaming variety as well as the request and response, so we’ll define it as follows, and go back and fix HasServer later.

and, if we had other methods defined for MyService, they’d show up here with the correct handler type, in the order they were listed in ServiceMethods MyService.

Implementing the Server

Our ServerType family now expands to a function type which takes a handler value (of the correct type) for every method on our service. That turns out to be more than half the battle – all we need to do now is to provide a value of this type.

The generation of such a value is going to need to proceed in perfect lockstep with the generation of its type, so we add to the definition of HasServer:

where existentialize is a new class method we add to HasMethodHandler We will elide it here because it is just a function MethodHandler i o cs mm -> AnyHandler and is not particularly interesting if you’re familiar with existentialization.

It’s evident here what I meant by handlers being an explicit accumulator – our recursion adds the parameters it receives into this list so that it can pass them eventually to the base case.

There’s a problem here, however. Reading through this implementation of runServerImpl, you and I both know what the right-hand-side means, unfortunately GHC isn’t as clever as we are. If you try to compile it right now, GHC will complain about the non-injectivity of HasServer as implied by the call to runServerImpl (and also about HasMethodHandler and existentialize, but for the exact same reason.)

The problem is that there’s nothing constraining the type variables s and xs on runServerImpl. I always find this error confusing (and I suspect everyone does), because in my mind it’s perfectly clear from the HasServer s xs in the instance constraint. However, because SeverType is a type family without any injectivity declarations, it means we can’t learn s and xs from ServerType s xs.

Let’s see why. For a very simple example, let’s look at the following type family:

Here we have NotInjective Int ~ () and NotInjective Bool ~ (), which means even if we know NotInjective a ~ () it doesn’t mean that we know what a is – it could be either Int or Bool.

This is the exact problem we have with runServerImpl: even though we know what type runServerImpl has (it must be ServerType s xs, so that the type on the left-hand of the equality is the same as on the right), that doesn’t mean we know what s and xs are! The solution is to explicitly tell GHC via a type signature or type application:

Client-side Usability

Sweet and typesafe all of this might be, but the user-friendliness on the client-side leaves a lot to be desired. As promised, we’ll address that now.

Removing Proxies

Recall that the runNonStreamingClient function and its friends require a Proxy m parameter in order to specify the method you want to call. However, m has kind Symbol, and thankfully we have some new extensions in GHC for turning Symbols into values.

We can define a new type, isomorphic to Proxy, but which packs the fact that it is a KnownSymbol (something we can turn into a String at runtime):

This sym ~ sym' thing is known as the constraint trick for instances, and is necessary here to convince GHC that this can be the only possible instance of IsLabel that will give you back WrappedMethods.

Now turning on the {-# LANGUAGE OverloadedLabels #-} pragma, we’ve changed the syntax to call these client functions from the ugly:

runBiDiStreamingClient MyService (Proxy@"biDiStreaming")

into the much nicer:

runBiDiStreamingClient MyService#biDiStreaming

Better “Wrong Streaming Variety” Errors

The next step in our journey to delightful usability is remembering that the users of our library are only human, and at some point they are going to call the wrong run*Client function on their method with a different variety of streaming semantics.

At the moment, the errors they’re going to get when they try that will be a few stanza long, the most informative of which will be something along the lines of unable to match 'False with 'True. Yes, it’s technically correct, but it’s entirely useless.

Instead, we can use the TypeError machinery from GHC.TypeLits to make these error messages actually helpful to our users. If you aren’t familiar with it, if GHC ever encounters a TypeError constraint it will die with a error message of your choosing.

and similarly for our other client functions. Reduction of the resulting boilerplate is left as an exercise to the reader.

With all of this work out of the way, we can test it:

runNonStreamingClient MyService#biDiStreaming

Main.hs:45:13: error:
• Called 'runNonStreamingClient' on a bidi-streaming method.
Perhaps you meant 'runBiDiStreamingClient'.
• In the expression: runNonStreamingClient MyService #bidi

Amazing!

Better “Wrong Method” Errors

The other class of errors we expect our users to make is to attempt to call a method that doesn’t exist – either because they made a typo, or are forgetful of which methods exist on the service in question.

As it stands, users are likely to get about six stanzas of error messages, from No instance for (HasMethod s m) to Ambiguous type variable 'm0', and other terrible things that leak our implementation details. Our first thought might be to somehow emit a TypeError constraint if we don’t have a HasMethod s m instance, but I’m not convinced such a thing is possible.

But luckily, we can actually do better than any error messages we could produce in that way. Since our service is driven by a value (in our example, the data constructor MyService), by the time things go wrong we do have a Service s instance in scope. Which means we can look up our ServiceMethods s and given some helpful suggestions about what the user probably meant.

The first step is to implement a ListContains type family so we can determine if the method we’re looking for is actually a real method.

In the base case, we have no list to look through, so our needle is trivially not in the haystack. If the head of the list is the thing we’re looking for, then it must be in the list. Otherwise, take off the head of the list and continue looking. Simple really, right?

We can now use this thing to generate an error message in the case that the method we’re looking for is not in our list of methods:

Replacing our final ShowType with ShowList in RequireHasMethod now gives us error messages of the following:

Main.hs:54:15: error:
• No method "missing" available for service 'MyService'.
Available methods are: "biDiStreaming"

Absolutely gorgeous.

Conclusion

This is where we stop. We’ve used type-level metadata to generate client- and server-side bindings to an underlying library. Everything we’ve made is entirely typesafe, and provides gorgeous, helpful error messages if the user does anything wrong. We’ve found a practical use for many of these seemingly-obscure type-level features, and learned a few things in the process.

“It is up to us, as people who understand a problem at hand, to try and teach the type system as much as we can about that problem. And when we don’t understand the problem, talking to the type system about it will help us understand. Remember, the type system is not magic, it is a logical reasoning tool.”

This resounds so strongly in my soul, and maybe it will in yours too. If so, I encourage you to go forth and find uses for these techniques to improve the experience and safety of your own libraries.

Whose article “Opaleye’s sugar on top” was a strong inspiration on me, and subsequently on this post.↩

Today’s classic functional programming paper we will review is Meijer et al.’s Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire. The exciting gist of the paper is that all explicit recursion can be factored out into a few core combinators. As such, the reasoning is that we should instead learn these combinators (or “recursion schemes” as they’re called), rather than doing our own ad-hoc recursion whenever we need it.

Despite being a marvelous paper, it falls into the all-too-common flaw of functional programming papers, which is to have an absolutely horrible title. “Bananas”, “lenses”, “envelopes” and “barbed wire” correspond to obscure pieces of syntax invented to express these ideas. In our treatment of the literature, we will instead use standard Haskell syntax, and refer to the paper as Functional Programming with Recursion Schemes.

Specialized Examples of Recursion Schemes

Catamorphisms over Lists

Catamorphisms refer to a fold over a datastructure. A mnemonic to remember this is that a catamorphism tears down structures, and that if that structure were our civilization it’d be a catastrophe.

By way of example, Meijer et al. present the following specialization of a catamorphism over lists:

Let default :: b and step :: a -> b -> b, then a list-catamorphismh :: [a] -> b if a function of the following form:

h [] = default
h (a : as) = step a (h as)

This definition should look pretty familiar; if you specialize the function foldr to lists, you’ll see it has the type:

foldr :: (a -> b -> b) -> b -> [a] -> b

We can view foldr as taking our values step :: a -> b -> b and default :: b, and then giving back a function that takes an [a] and computes some b. For example, we can write a few of the common prelude functions over lists as catamorphisms of this form.

which intuitively says that you can “fuse” a catamorphism with a subsequent composition into a single catamorphism.

Anamorphisms over Lists

If a catamorphism refers to a “fold”, an anamorphism corresponds to an unfold of a data structure. A good mnemonic for this is that an anamorphism builds things up, just like anabolic steroids can be an easy way to build up muscle mass.

Meijer et al. present this concept over lists with the following (again, very specialized) definition:

An interesting case is that of map :: (a -> b) -> [a] -> [b]. We note that both the input and output of this function are lists, and thus might suspect the function can be written as either a catamorphism or an anamorphism. And indeed, it can be:

Hylomorphisms over Lists

A hylomorphism over lists is a recursive function of type a -> b whose call-tree is isomorphic to a list. A hylomorphism turns out to be nothing more than a catamorphism following an anamorphism; the anamorphism builds up the call-tree, and the catamorphism evaluates it.

An easy example of hylomorphisms is the factorial function, which can be naively (ie. without recursion schemes) implemented as follows:

fact ::Int->Int
fact 0=1
fact n = n * fact (n -1)

When presented like this, it’s clear that fact will be called a linear number of times in a tail-recursive fashion. That sounds a lot like a list to me, and indeed we can implement fact as a hylomorphism:

The hylomorphic representation of fact works by unfolding its argument n into a list [n, n-1 .. 1], and then folding that list by multiplying every element in it.

However, as Meijer et al. point out, this implementation of fact is a little unsatisfactory. Recall that the natural numbers are themselves an inductive type (data Nat = Zero | Succ Nat), however, according to the paper, there is no easy catamorphism (nor anamorphism) that implements fact.

Paramorphisms

Enter paramorphisms: intuitively catamorphisms that have access to the current state of the structure-being-torn-down. Meijer et al.:

General Recursion Schemes

Intuition

As you’ve probably guessed, the reason we’ve been talking so much about these recursion schemes is that they generalize to all recursive data types. The trick, of course, is all in the representation.

Recall the standard definition of list:

dataList a =Nil|Cons a (List a)

However, there’s no reason we need the explicit recursion in the Cons data structure. Consider instead, an alternative, “fixable” representation:

dataList' a x =Nil'|Cons' a x

If we were somehow able to convince the typesystem to unify x ~ List' a x, we’d get the type List' a (List' a (List' a ...)), which is obviously isomorphic to List a. We’ll look at how to unify this in a second, but a more pressing question is “why would we want to express a list this way?”.

It’s a good question, and the answer is we’d want to do this because List' a x is obviously a functor in x. Furthermore, in general, any datastructure we perform this transformation on will be a functor in its previously-recursive x parameter.

We’re left only more curious, however. What good is it to us if List' a x is a functor in x? It means that we can replace x with some other type b which is not isomorphic to List a. If you squint and play a little loose with the type isomorphisms, this specializes fmap :: (x -> b) -> List' a x -> List' a b to (List a -> b) -> List' a x -> List' a b.

Notice the List a -> b part of this function – that’s a fold of a List a into a b! Unfortunately we’re still left with a List' a b, but this turns out to be a problem only in our handwaving of x ~ List' a x, and the actual technique will in fact give us just a b at the end of the day.

Algebras

An f-algebra is a function of type forall z. f z -> z, which intuitively removes the structure of an f. If you think about it, this is spiritually what a fold does; it removes some structure as it reduces to some value.

As it happens, the type of an f-algebra is identical to the parameters required by a catamorphism. Let’s look at List' a-algebras and see how the correspond with our previous examples of catamorphisms.

Coalgebras

f-algebras correspond succinctly to the parameters of catamorphisms over fs. Since catamorphisms are dual to anamorphisms, we should expect that by turning around an algebra we might get a representation of the anamorphism parameters.

And we’d be right. Such a thing is called an f-coalgebra of type forall z. z -> f z, and corresponds exactly to these parameters. Let’s look at our previous examples of anamorphisms through this lens:

You might have noticed that these coalgebras don’t line up as nicely as the algebras did, due namely to the produce functions returning a type of Maybe (a, b), while the coalgebras return a List' a b. Of course, these types are isomorphic (Nothing <=> Nil', Just (a, b) <=> Cons a b), it’s just that the authors of unfoldr didn’t have our List' functor to play with.

From Algebras to Catamorphisms

As we have seen, f-algebras correspond exactly to the parameters of a catamorphism over an f. But how can we actually implement the catamorphism? We’re almost there, but first we need some machinery.

type family Fixable t ::*->*

The type family Fixable takes a type to its fixable functor representation. For example, we can use it to connect our List a type to List' a:

typeinstanceFixable (List a) =List' a

Now, assuming we have a function toFixable :: t -> Fixable t t, which for lists looks like this:

Very cool. What we’ve built here is general machinery for tearing down any inductive data structure t. All we need to do it is its Fixable t representation, and a function project :: t -> Fixable t t. These definitions turn out to be completely mechanical, and thankfully, can be automatically derived for you via the recursion-schemes package.

Coalgebras and Catamorphisms

We can turn all of our machinery around in order to implement anamorphisms. We’ll present this material quickly without much commentary, since there are no new insights here.

General Paramorphisms

Because there is nothing interesting about hylomorphisms when viewed via our Fixable machinery, we skip directly to paramorphisms.

Recall that a paramorphism is a fold that has access to both the accumulated value being constructed as well as the remainder of the structure at any given point in time. We can represent such a thing “algebraically:” Fixable t (t, z) -> z. With a minor tweak to cata, we can get para:

Miscellaneous Findings

All Injective Functions are Catamorphisms

Meijer et al. make the somewhat-unsubstantiated claim that all injective functions are catamorphisms. We will reproduce their proof here, and then work through it to convince ourselves of its correctness.

Let f :: A -> B be a strict function with left-inverse g. Then for any φ :: F A -> A, we have:

g . f = id
f . cata φ = cata (f . φ . fmap g)

Taking `φ = fromFixable$ we immediately get that any strict injective function can be written as a catamorphism:

f = cata (f . fromFixable . fmap g)

Sounds, good? I guess? Meijer et al. must think I’m very smart, because it took me about a week of bashing my head against this proof before I got it. There were two stumbling blocks for me which we’ll tackle together.

To jog our memories, we’ll look again at the definition of cata:

cata φ = φ . fmap (cata φ) . toFixable

There are two claims we need to tackle here, the first of which is that given φ = fromFixable:

f . cata φ = cata (f . φ . fmap g)

We can show this by mathematical induction. We’ll first prove the base case, by analysis of the [] case over list-algebras.

Great! That’s the base case tackled. It’s easy to see why this generalizes away from lists to any data structure; the only way to terminate the recursion of a cata is for one of the type’s data constructors to not be recursive (such as Nil). If the data constructor isn’t recursive, its Fixable representation must be a constant functor, and thus the final fmap will always fizzle itself out.

As you can see here, subsequent expansions of cata line their gs and fs up in such a way that they cancel out. Also, we know from our experience looking at the base case that the final g will always sizzle out, and so we don’t need to worry about it only being a left-inverse.

The other stumbling block for me was that cata fromFixable = id, but this turns out to be trivial:

cata fromFixable
= fromFixable . fmap (cata fromFixable) . toFixable

Eventually this will all bottom out when it hits the constant functor, which will give us a giant chain of fromFixable . toFixables, which is obviously id.

To circle back to our original claim that all injective functions are catamorphisms, we’re now ready to tackle it for real.

All Surjective Functions are Anamorphisms

Anamorphisms are dual to catamorphisms, and surjective functions are dual (in \(\mathbb{Set}\)) to injective functions. Therefore, we can get this proof via duality from the proof that injective functions are catamorphisms. \(\blacksquare\)

Closing Remarks

Functional Programming with Recursion Schemes has some other (in my opinion) minor contributions about this stuff, such as how catamorphisms preserve strictness, but I feel like we’ve tackled the most interesting pieces of it. It is my hope that this review will serve as a useful complement in understanding the original paper.

One of my long-term goals since forever has been to get good at music. I can sightread music, and I can play music by ear – arguably I can play music well. But this isn’t to say that I am good at music; I’m lacking any theory which might take me from “following the path” of music to “navigating” music.

Recently I took another stab at learning this stuff. Every two years or so I make an honest-to-goodness attempt at learning music theory, but inevitably run into the same problems over and over again. The problem is that I have yet to find any music education resources that communicate on my wavelength.

Music education usually comes in the form of “here are a bunch of facts about music; memorize them and you will now know music.” As someone who got good at math because it was the only subject he could find that didn’t require a lot of memorization, this is a frustrating situation to be in for me. Math education, in other words, presents too many theorems and too few axioms.

My learning style prefers to know the governing fundamentals, and derive results when they’re needed. It goes without saying that this is not the way most music theory is taught.

Inspired by my recent forays into learning more mathematics, I’ve had an (obvious) insight into how to learn things, and that’s to model them in systems I already understand. I’m pretty good at functional programming, so it seemed like a pretty reasonable approach.

I’ve still got a long way to go, but this post describes my first attempt at modeling music, and, vindicating my intuitions, shows how we can derive value out of this model.

Music from First Principles

I wanted to model music, but it wasn’t immediately obviously how to actually go about doing that. I decided to write down the few facts about music theory I did know: there are notes.

Because Haskell doesn’t let you use # willy-nilly, I decided to mark sharps with apostrophes.

I knew another fact, which is that the sharp keys can also be described as flat keys – they are enharmonic. I decided to describe these as pattern synonyms, which may or may not have been a good idea. Sometimes the name of the note matters, but sometimes it doesn’t, and I don’t have a great sense of when that is. I resolved to reconsider this decision if it caused issues down the road.

The next thing I knew was that notes have some notion of distance between them. This distance is measured in semitones, which correspond to the pitch difference you can play on a piano. This distance is called an interval, and the literature has standard names for intervals of different sizes:

It’s pretty obvious that intervals add in the usual way, since they’re really just names for different numbers of semitones. We can define addition over them, with the caveat that if we run out of interval names, we’ll loop back to the beginning. For example, this will mean we’ll call an octave a Unison, and a 13th a Perf4. Since this is “correct” if you shift down an octave every time you wrap around, we decide not to worry about it:

This “wrapping around” structure while adding should remind us of our group theory classes; in fact intervals are exactly the group \(\mathbb{Z}/12\mathbb{Z}\) – a property shared by the hours on a clock where \(11 + 3 = 2\). That’s certainly interesting, no?

If intervals represent distances between notes, we should be able to subtract two notes to get an interval, and add an interval to a note to get another note.

Looks good so far! Encouraged by our success, we can move on to trying to model a scale.

Scales

This was my first stumbling block – what exactly is a scale? I can think of a few: C major, E major, Bb harmonic minor, A melodic minor, and plenty others! My first attempt was to model a scale as a list of notes.

Unfortunately, this doesn’t play nicely with our mantra of “axioms over theorems”. Represented as a list of notes, it’s hard to find the common structure between C major and D major.

Instead, we can model a scale as a list of intervals. Under this lens, all major scales will be represented identically, which is a promising sign. I didn’t know what those intervals happened to be, but I did know what C major looked like:

cMajor :: [Note]
cMajor = [C, D, E, F, G, A, B]

We can now write a simple helper function to extract the intervals from this:

Seems reasonable; the presence of all those major intervals is probably why they call it a major scale. But while memorizing the intervals in a scale is likely a fruitful exercise, it’s no good to me if I want to actually play a scale. We can write a function to add the intervals in a scale to a tonic in order to get the actual notes of a scale:

transpose ::Note-> [Interval] -> [Note]
transpose n = fmap (iAbove n)

> transpose A major
[A,B,C',D,E,F',G']

Looking good!

Modes

The music theory I’m actually trying to learn with all of this is jazz theory, and my jazz theory book talks a lot about modes. A mode of a scale, apparently, is playing the same notes, but starting on a different one. For example, G mixolydian is actually just the notes in C major, but starting on G (meaning it has an F♮, rather than F#).

That has a F♮, all right. Everything seems to be proceeding according to our plan!

Something that annoys me about modes is that “G mixolydian” has the notes of C, not of G. This means the algorithm I need to carry out in my head to jam with my buddies goes something as follows:

G mixolydian?

Ok, mixolydian is the fifth mode.

So what’s a major fifth below G?

It’s C!

What’s the C major scale?

OK, got it.

So I want to play the C major scale but starting on a different note.

What was I doing again?

That’s a huge amount of thinking to do on a key change. Instead, what I’d prefer is to think of “mixolydian” as a transformation on G, rather than having to backtrack to C. I bet there’s an easier mapping from modes to the notes they play. Let’s see if we can’t tease it out!

So to figure out what are the “mode rules”, I want to compare the intervals of C major (ionian) to C whatever, and report back any which are different. As a sanity check, we know from thinking about G mixolydian that the mixolydian rules should be Maj7 => Min7 in order to lower the F# to an F♮.

What this does is construct the notes in C ionian, and then in C whatever, turns both sets into intervals, and then removes any groups which are the same. What we’re left with is pairs of intervals that have changed while moving modes.

Very cool. Now I’ve got something actionable to memorize, and it’s saved me a bunch of mental effort to compute on my own. My new strategy for determining D dorian is “it’s D major but with a minor 3rd and 7th”.

Practicing

My jazz book suggests that practicing every exercise along the circle of fifths would be formative. The circle of fifths is a sequence of notes you get by successively going up or down a perfect 5th starting from C. In jazz allegedly it is more valuable to go down, so we will build that:

circleOfFifths :: [Note]
circleOfFifths = iterate (`iMinus`Perf5) C

This is an infinite list, so we’d better be careful when we look at it:

> take 5 circleOfFifths
[C,F,A',D',G']

Side note, we get to every note via the circle of fifths because there are 12 distinct notes (one for each semitone on C). A major fifth, being 7 semitones, is semi-prime with 12, meaning, meaning it will never get into a smaller cycle. Math!

Ok, great! So now I know which notes to start my scales on. An unfortunate property of the jazz circle of fifths is that going down by fifths means you quickly get into the freaky scales they don’t teach 7 year olds. You get into the weeds where the scales start on black notes and don’t adhere to your puny human intuitions about fingerings.

A quick google search suggested that there is no comprehensive reference for “what’s the fingering for scale X”. However, that same search did provide me with a heuristic – “don’t use your thumb on a black note.”

That’s enough for me to go on! Let’s see if we can’t write a program to solve this problem for us. It wasn’t immediately obvious to me how to generate potential fingerings, but it seems like we’ll need to know which notes are black:

That’s it. That’s all the fingerings I know. Don’t judge me. It’s obvious that none of my patterns as written will avoid putting a thumb on a black key in the case of, for example, Bb major, so we’ll make a concession and say that you can start anywhere in the finger pattern you want.

So it doesn’t work amazingly, but it does in fact find fingerings that avoid putting a thumb on a black key. We could tweak how successful this function is by putting more desirable fingerings earlier in allFingerings, but as a proof of concept this is good enough.

That’s about as far as I’ve taken this work so far, but it’s already taught me more about music theory than I’d learned in 10 years of lessons (in which, admittedly, I skipped the theory sections). More to come on this topic, probably.

One of the most exciting papers I’ve read in a long time is James and Sabry’s Information Effects. It starts with the hook “computation is a physical process which, like all other physical processes, is fundamentally reversible,” and it goes from there. If that doesn’t immediately pull you in, perhaps some of the subsequent PL jargon will – it promises a “typed, universal, and reversible computation model in which information is treated as a linear resource”.

I don’t know about you, but I was positively shaking with anticipation at this point. That’s one heck of an abstract.

After some philosophy and overview of the paper, James and Sabry dive into the appetizer in a section titled “Thermodynamics of Computation and Information”. They give the following definition:

DEFINITION 2.2 (Entropy of a variable). Let \(b\) be a (not necessarily finite) type whose values are labeled \(b_1\), \(b_2\), \(\ldots\). Let \(\xi\) be a random variable of type \(b\) that is equal to \(b_i\) with probability \(p_i\). The entropy of \(\xi\) is defined as \(- \sum p_i \log{p_i}\).

and the following, arguably less inspired definition:

DEFINITION 2.3 (Output entropy of a function). Consider a function f : a -> b where b is a (not necessarily finite) type whose values are labeled \(b_1\), \(b_2\), \(\ldots\). The output entropy of the function is given by \(- \sum q_j \log{q_j}\) where \(q_j\) indicates the probability of the function to have value \(b_j\).

We can say now that a function is reversible if and only if the entropy of its arguments is equal to the entropy of its output. Which is to say that the gain in entropy across the function is 0.

Of course, as astute students of mathematics we know that reversibility of a function is equivalent to whether that function is an isomorphism. While this is how we will prefer to think of reversibility, the definition in terms of entropy brings up interesting questions of pragmatics that we will get to later.

James et al. present the following language, which we have reproduced here translated into Haskell. The language is first order, and so we will ignore function types, giving us the types:

The language presented is based around the notion of type isomorphisms, and so in order to model this language in Haskell, we’ll need the following types:

data a <=> b =Iso
{ run :: a -> b
, rev :: b -> a
}

This type a <=> b represents an isomorphism between type a and type b, as witnessed by a pair of functions to and from. This probably isn’t the best encoding of an isomorphism, but for our purposes it will be sufficient.

The implementations of these terms are all trivial, being that they are purely syntactic isomorphisms. They will not be reproduced here, but can be found in the code accompanying this post. The motivated reader is encouraged to implement these for themself.

With the terms of our algebra out of the way, we’re now ready for the operators. We are presented with the following:

But how does our ifthen combinator actually work? Recall that Bool = U + U, meaning that we can distribute the Us across the pair, giving us the type (U * a) + (U * a). The left branch (of type U * a) of this coproduct has an inhabitant if the incoming boolean was true.

We can thus bimap over the coproduct. Since the left case corresponds to an incoming true, we can apply an isomorphism over only that branch. Because we want to transform the incoming a by the combinator c, we then bimap over our U * a with id .* c – not touching the U but using our combinator.

Finally, we need to repackage our (U * a) + (U * a) into the correct return type Bool * a, which we can do by factoring out the a. Factoring is the inverse of distributing, and so we can use the sym operator to “undo” the distrib.

It’s crazy, but it actually works! We can run these things to convince ourselves. Given:

not ::Bool<=>Bool
not = swapP -- move a left ('true') to a right ('false'), and vice versa.

James et al. are eager to point out that ifthen (ifthen not) :: Bool * (Bool * Bool) <=> Bool * (Bool * Bool) is the Toffoli gate – a known universal reversible gate. Because we can implement Toffoli (and due to its universalness), we can thus implement any boolean expression.

Recursion and Natural Numbers

Given two more primitives, James and Sabry show us how we can extend this language to be “expressive enough to write arbitrary looping programs, including non-terminating ones.”

We’ll need to define a term-level recursion axiom:

trace :: (a + b <=> a + c) -> (b <=> c)

The semantics of trace are as follows: given an incoming b (or, symmetrically, a c), lift it into InR b :: a + b, and then run the given iso over it looping until the result is an InR c, which can then be returned.

Notice here that we have introduced potentially non-terminating looping. Combined with our universal boolean expressiveness, this language is now Turing-complete, meaning it is capable of computing anything computable. Furthermore, by construction, we also have the capability to compute backwards – given an output, we can see what the original input was.

You might be concerned that the potential for partiality given by the trace operator breaks the bijection upon which all of our reversibility has been based. This, we are assured is not a problem, because a divergence is never actually observed, and as such, does not technically violate the bijectiveness. It’s fine, you guys. Don’t worry.

There is one final addition we need, which is the ability to represent inductive types:

What’s interesting here is that the introduction of 0 is an isomorphism between U and Nat, as we should expect since 0 is a constant.

Induction on Nats

The paper teases an implementation of isEven for natural numbers – from the text:

For example, it is possible to write a function even? :: Nat * Bool <=> Nat * Bool which, given inputs (n, b), reveals whether n is even or odd by iterating notn-times starting with b. The iteration is realized using trace as shown in the diagram below (where we have omitted the boxes for fold and unfold).

Emphasis mine. The omitted fold and unfold bits of the diagram are the actual workhorses of the isomorphism, and their omission caused me a few days of work to rediscover. I have presented the working example here to save you, gentle reader, from the same frustration.

The insight is this – our desired isomorphism has type Nat * a <=> Nat * a. Due to its universally qualified nature, we are unable to pack any information into the a, and thus to be reversible, the Nat must be the same on both sides. Since we are unable to clone arbitrary values given our axioms (seriously! try it!), our only solution is to build a resulting Nat up from 0 as we tear apart the one we were given.

We can view the a in trace :: (a + b <=> a + c) -> (b <=> c) as “scratch space” or “intermediate state”. It is clear that in order to execute upon our earlier insight, we will need three separate pieces of state: the Nat we’re tearing down, the Nat we’re building up, and the a along for the ride.

For reasons I don’t deeply understand, other than it happened to make the derivation work, we also need to introduce a unit to the input of our traced combinator.

For clarity, we’ll annotate the natural number under construction as Nat'.

When the iteration begins, our combinator receives an InR whose contents are of type U * (Nat * a) corresponding to the fact that there is not yet any internal state. From there we can factor our the Nat * a:

All of a sudden this looks like a more tenable problem. We now have a product of (conceptually) a Maybe Nat', the Nat being torn down, and our a. We can fold :: U + Nat <=> Nat our Nat', which will give us 0 in the case that the state hasn’t yet been created, or \(n+1\) in the case it has.

And finally, we apply our step iso to the internal state (we do this after the distrib so that we don’t apply the combinator if the incoming number was 0). The fruits of our labor are presented in entirety:

Lo and behold, the types now line up, and thus quod erat demonstrandum. The implementation of isEven is now trivial:

isEven ::Nat*Bool<=>Nat*Bool
isEven = iterNat not

which computes if a Nat is even in the case the incoming Bool is false, and whether it is odd otherwise.

Lists

James and Sabry provide a sketch of how to define lists, but I wanted to flesh out the implementation to test my understanding.

For reasons I don’t pretend to understand, Haskell won’t let us partially apply a type synonym, so we’re forced to write a higher-kinded data definition in order to describe the shape of a list.

-- To be read as @type ListF a b = U + (a * b)@.dataListF a b
=Nil|Cons a b

We can then get the fixpoint of this in order to derive a real list:

typeList a =Fix (ListF a)

And to get around the fact that we had to introduce a wrapper datatype in order to embed this into Haskell, we then provide an eliminator to perform “pattern matching” on a List a. In a perfect world, this function would just be sym fold, but alas, we must work with what we have.

liste ::List a <=>U+ (a *List a)
liste =Iso to from
where
to (FixNil) =InLU
to (Fix (Cons a b)) =InR (Pair a b)
from (InLU) =FixNil
from (InR (Pair a b)) =Fix (Cons a b)

And given that, we can write an isomorphism between any a and any b. The catch, of course, is that you can never witness such a thing since it obviously doesn’t exist. Nevertheless, we can use it to convince the type checker that we’re doing the right thing in cases that would diverge in any case.

Finally we can implement nil using the same trick we did for zero – use trace to vacuously introduce exactly the type we need, rip out the result, and then divergently reconstruct the type that trace expects.

Induction on Lists

In a manner spiritually similar to iterNat, we can define iterList :: (a * z <=> b * z) -> (List a * z <=> List b * z). The semantics are mostly what you’d expect from its type, except that the resulting List b is in reverse order due to having to be constructed as the List a was being destructed. We present the implementation here for completeness but without further commentary.

Remnants

The bulk of the remainder of the paper is an extension to the reversible semantics above, introducing create :: U ~> a and erase :: a ~> U where (~>) is a non-reversible arrow. We are shown how traditional non-reversible languages can be transformed into the (~>)-language.

Of more interest is James and Sabry’s construction which in general transforms (~>) (a non-reversible language) into (<=>) (a reversible one). But how can such a thing be possible? Obviously there is a trick!

The trick is this: given a ~> b, we can build h * a <=> g * b where h is “heap” space, and g is “garbage”. Our non-reversible functions create and erase thus become reversible functions which move data from the heap and to the garbage respectively.

Unfortunately, this is a difficult thing to model in Haskell, since the construction requires h and g to vary based on the axioms used. Such a thing requires dependent types, which, while possible, is quite an unpleasant undertaking. Trust me, I actually tried it.

However, just because it’s hard to model entirely in Haskell doesn’t mean we can’t discuss it. We can start with the construction of (~>):

With the ability to create and erase information, we’re (thankfully) now able to write some everyday functions that you never knew you missed until trying to program in the iso language without them. James et al. give us what we want:

We are also provided with the ability to clone a piece of information, given by structural induction. Cloning U is trivial, and cloning a pair is just cloning its projections and then shuffling them into place. The construction of cloning a coproduct, however, is more involved:

It should be quite clear that this arrow language of ours is now more-or-less equivalent to some hypothetical first-order version of Haskell (like Elm?). As witnessed above, information is no longer a linear commodity. A motivated programmer could likely get work done in a 9 to 5 with what we’ve built so far. It probably wouldn’t be a lot of fun, but it’s higher level than C at the very least.

The coup de grace of Information Effects is its construction lifting our arrow language back into the isomorphism language. The trick is to carefully construct heap and garbage types to correspond exactly with what our program needs to create and erase. We can investigate this further by case analysis on the constructors of our arrow type:

Arr :: (a <=> b) -> (a ~> b)

As we’d expect, an embedding of an isomorphism in the arrow language is already reversible. However, because we need to introduce a heap and garbage anyway, we’ll use unit.

Since we can’t express the typing judgment in Haskell, we’ll use a sequent instead:

Assuming we have a way of describing this type in Haskell, all that’s left is to implement the lifting of our iso into the enriched iso language:

lift (Arr f) = id .* f

Compose :: (a ~> b) -> (b ~> c) -> (a ~> c)

Composition of arrows proceeds likewise in a rather uninteresting manner. Here, we have two pairs of heaps and garbages, results from lifting each of the arrows we’d like to compose. Because composition will run both of our arrows, we’ll need both heaps and garbages in order to implement the result. By this logic, the resulting heap and garbage types are pairs of the incoming ones.

Lifting arrows over products again is uninteresting – since we’re doing nothing with the second projection, the only heap and garbage we have to work with are those resulting from the lifting of our arrow over the first projection.

Finally, we get to an interesting case. In the execution of Left, we may or may not use the incoming heap. We also need a means of creating a b + c given a bor given a c. Recall that in our iso language, we do not have create (nor relatedly, leftA) at our disposal, and so this is a harrier problem than it sounds at first.

We can solve this problem by requiring both a b + c and a c + b from the heap. Remember that the Toffoli construction (what we’re implementing here) will create a reversible gate with additional inputs and outputs that gives the same result when all of its inputs have their default values (ie. the same as those provided by create’s semantics). This means that our incoming b + c and c + b will both be constructed with InL.

Given this, we can thus perform case analysis on the incoming a + c, and then use leftSwap from earlier to move the resulting values into their coproduct.

What does the garbage situation look like? In the case we had an incoming InL, we will have used up our function’s heap, as well as our b + c, releasing the g, b (the default value swapped out of our incoming b + c), and the unused c + b.

If an InR was input to our isomorphism, we instead emit the function’s heap h, the unused b + c, and the default c originally in the heap’s coproduct.

The home stretch is within sight. We have only two constructors of our arrow language left. We look first at Create:

Create ::U~> a

Because we’ve done all of this work to thread through a heap in order to give us the ability to create values, the typing judgment should come as no surprise:

\[
\frac{}{\lifted{\text{create}}{a\times\u}{\u\times a}}
\]

Our heap contains the a we want, and we drop our incoming U as garbage. The implementation of this is obvious:

lift Create= swapT

We’re left with Erase, whose type looks suspiciously like running Create in reverse:

Erase :: a ~>U

This is no coincidence; the two operations are duals of one another.

\[
\frac{}{\lifted{\text{erase}}{\u\times a}{a\times\u}}
\]

As expected, the implementation is the same as Create:

lift Erase= swapT

And we’re done! We’ve now constructed a means of transforming any non-reversible program into a reversible one. Success!

Summary

Still here? We’ve come a long way, which we’ll briefly summarize. In this paper, James and Sabry have taken us through the construction of a reversible language, given a proof that it’s Turing-complete, and given us some simple constructions on it. We set out on our own to implement lists and derived map for them.

We then constructed a non-reversible language (due to its capability to create and erase information), and then gave a transformation from this language to our earlier reversible language – showing that non-reversible computing is a special case of its reversible counterpart.

Information Effects ends with a short discussion of potential applications, which won’t be replicated here.

Commentary (on the physics)

Assuming I understand the physics correctly (which I probably don’t), the fact that these reversible functions do not increase entropy implies that they should be capable of shunting information for near-zero energy. Landauer’s Principle and Szilard’s engine suggests that information entropy and thermodynamic entropy are one and the same; if we don’t increase entropy in our computation of a function, there is nowhere for us to have created any heat.

That’s pretty remarkable, if you ask me. Together with our construction from any non-reversible program to a reversible one, it suggests we should be able to cut down on our CPU power usage by a significant order of magnitudes.

Commentary (on where to go from here)

An obvious limitation of what we’ve built here today is that it is first-order, which is to say that functions are not a first class citizen. I can think of no immediate problem with representing reversible functions in this manner. We’d need to move our (<=>) directly into the language.

id would provide introduction of this type, and (>>) (transitivity) would allow us to create useful values of the type. We’d also need a new axiom:

apply :: a * (a <=> b) <=> b * (b <=> a)

which would allow us to use our functions. We should also expect the following theorems (which may or may not be axioms) due to our iso language forming a cartesian closed category:

due to the symmetry of (<=>), both of these are equivalent to create and erase. I think the fact that these are not theorems despite U being the terminal object is that (<=>) requires arrows in both directions, but U only has incoming arrows.

The thesis of the paper is that given a most-general (taking as few constraints on its values as possible) polymorphic type signature, we can generate for free a theorem to which any inhabitant of such a type must adhere.

Translating into familiar Haskell notation, Wadler gives the following example:

r :: [x] -> [x]

From this, as we shall see, it is possible to conclude that r satisfies the following theorem: for all types a and b and every total function f : a -> b we have:

map f . r = r . map f

He explains:

The intuitive explanation of this result is that r must work on lists of x for any type x. Since r is provided with no operations on values of type x, all it can do is rearrange such lists, independent of the values contained in them. Thus applying a to each element of a list and then rearranging yields the same result as rearranging and then applying f to each element.

This passage is somewhat misleading: r above is not restricted only to rearrangements, r can also structurally manipulate the list; for example, it can duplicate the first element and drop the middle three if it so pleases.

Wadler continues, with what might be one of the greatest lines in an academic paper:

This theorem about functions of type [x] -> [x] is pleasant but not earth-shaking. What is more exciting is that a similar theorem can be derived for every type.

“Exciting” isn’t exactly the word I’d use, but I’d certainly settle for “neat”! What I do find exciting, however, is that Wadler makes the claim that these theorems can be derived not only for Hindley-Milner type systems, but also for System-F. Hindley-Milner is Haskell98’s everyday type system; System-F is what you get when you turn on RankNTypes too.

But enough dilly dally. If you’re anything like me, you’re just aching to know what the secret here is. And it’s this: we can build a structurally inductive function from types to set-theoretic mathematical relations. The elements of the relations are theorems about inhabitants of the original type: our “theorems for free”.

If you’re not super comfortable with what it means to be a relation (I wasn’t when I started writing this), it’s a set of pairs of things which are related somehow. For example, we can write less-than for the natural numbers as a relation:

\((0, 1) \in (<_\mathbb{N})\)

\((0, 2) \in (<_\mathbb{N})\)

\((1, 2) \in (<_\mathbb{N})\)

\((0, 3) \in (<_\mathbb{N})\)

\((1, 3) \in (<_\mathbb{N})\)

\((2, 3) \in (<_\mathbb{N})\)

\((0, 4) \in (<_\mathbb{N})\)

… and so on

Here, \((<_\mathbb{N})\) is understood to be the name of the relation/set. We can write it more formally in set-builder notation:

which says that the pair \((x, y)\), plucking \(x \in \mathbb{N}\) and \(y \in \mathbb{N}\) is in our set only when \(x < y\).

It is interesting to note that a function \(f : A \to B\) is a special case of a relation. We will denote such a function-viewed-as-a-relation \(\reln{\hat{f}}\), since we are computer scientists, and to us, functions are not sets. We can define \(\reln{\hat{f}}\) as:

\[
\reln{\hat{f}} = \myset{a, f\;a}{a \in A}
\]

As a notational convention, we will name particular relations with scripted letters (eg. \(\reln{A}\)) and write out the sets they are a relation between as \(X \Leftrightarrow Y\). Therefore, \(\rel{A}{X}{Y}\) is a relation named \(\reln{A}\) which relates the sets \(X\) and \(Y\).

And so the trick is as follows; we can inductively transform type constructors into relations. It is these relations which are the “theorems for free” we have been hearing so much about. Wadler gives the following constructions:

Concrete Types

A concrete type \(T\) (for example, Bool or Char) has only the following relation:

\[
\rel{T}{T}{T} = \myset{x, x}{x \in T}
\]

This is an “identity relation”, and it states that values of concrete types are related only to themselves. Unsurprisingly, this relation can be expressed in Haskell as the (monomorphized) id :: T -> T function.

All this is to say that we can’t get any “interesting” theorems for free if we only have monomorphic types to deal with.

Product Types

Given two relationships \(\rel{A}{A}{A'}\) and \(\rel{B}{B}{B'}\), we can form a product relation \(\rel{A\times B}{(A\times B)}{(A' \times B')}\) by the construction:

If you’re familiar with the bimap function provided by the Bifunctor class, prodRel is a special case of that.

This technique of specializing a relation \(\reln{A}\) to a function \(\reln{\hat{f}}\) turns out to be a very useful trick for actually getting results out of the technique. I’m trying to emphasize this point since I missed it my first few times through the paper, and was subsequently rather stumped.

List Types

If we have a relation \(\reln{A}\), we can construct a relation \(\rel{[A]}{[A]}{[A]}\):

That is, lists are related if they have the same length and corresponding elements are related. In the special case where \(\reln{A}\) is a function, \(\reln{[A]}\) is the familiar map :: (a -> b) -> [a] -> [b] function.

This can be understood as related functions take a related values in the domain to related values in the codomain.

Wadler is careful to point out that even if \(\reln{\hat{g}}\) and \(\reln{\reln{h}}\) are functions, the resulting relation \(\reln{\hat{g}\to\hat{h}}\) is not a function, but instead a proof that \(f' \circ g = h \circ f\), given any pair \((f, f')\in\reln{\hat{g}\to\hat{h}}\).

Universally Qualified Types

Finally, Wadler brings us to types of the form forall x. f x, where f is some type alias of kind * -> *. For example, we might use the type alias type F z = [z] -> [z] in order to denote the Haskell type forall x. [x] -> [x].

Wadler:

Let \(\reln{F(X)}\) be a relation depending on \(X\). Then \(\reln{F}\) corresponds to a function from relations to relations, such that for every relation \(\rel{A}{A}{A'}\) there is a corresponding relation \(\rel{F(A)}{F(A)}{F(A')}\).

There is nothing interesting going on here except for the substitution of the type \(\reln{A}\) for the type variable \(\reln{X}\).

We can interpret this as two polymorphic expressions are related if they preserve their relationship under being monomorphized to any possible type. This property can be hard to see in Haskell, since the language makes it a difficult thing to violate.

Coproduct Types

As an attentive reader, you might be scratching your head right now. Why were we given constructions on lists, but not on coproducts? The paper is mysteriously quiet on this point; my best guess is that it was written in 1989 and perhaps that was before coproducts were well understood.

Regardless, with the practice we’ve gained from going through the above constructions, we should be able to build the coproduct relation ourselves.

Given two relations, \(\rel{A}{A}{A'}\) and \(\rel{B}{B}{B'}\), we can construct the coproduct relation \(\rel{(A|B)}{(A|B)}{(A'|B')}\) as follows:

which again, if you’re familiar with Bifunctor, is just bimap in disguise

Generating Free Theorems

With all of that foreplay out of the way, we’re now ready to tackle the meat of the paper. Wadler gives his contribution of the article:

Proposition. (Parametricity.) If t is a … term of type T, then \((t, t) \in \reln{T}\) where \(\reln{T}\) is the relation corresponding to the type T.

That this is a proposition (ie. “assumed to be true”) is troubling, given that we just went through all of the work to construct these relations. But we will persevere, and in fact, see later, why this must in fact be true.

We will repeat Wadler’s derivation of the originally presented theorem here:

We can now specialize this with the trick above – assume our relation is a function. In particular, we will simplify our derivation by equating \(\rel{A}{A}{A'}=\reln{\hat{f}} : A\to A'\).

This substitution means that we now know \((x, f\;x)\in\reln{\hat{f}}\). We also know the special case of the list relation means that the relation \(\reln{[\hat{f}]}\) contains only pairs of the form \((xs, \text{map}\;f\;xs)\).

That’s pretty cool, if you come to think about it. We came up with a theorem about our function r knowing nothing more about it than its type. This implies that every function of type forall x. [x] -> [x] will share this property, and more generally, that all expressions with the same type will share the same free theorem.

Wadler’s next example is folds of type forall x y. (x -> y -> y) -> y -> [x] -> y. However, if you can follow the above derivation, you’ll be able to follow his working of folds. I wanted to go out on my own and find a free theorem not provided by the paper.

Although id :: forall a. a -> a seemed to be too trivial, I still wanted an easy example, so I went for const :: forall a b. a -> b -> a. Before cranking out the theorem, I wasn’t sure what it would look like, so it seemed like a good candidate. My derivation is as follows:

Very snazzy! Maybe Wadler is onto something with all of this stuff. The remainder of the paper is a tighter formalization of the preceding, as well as an extension of it into System F. Finally it provides a proof that fixpoints don’t violate parametricity, which crucially gives us access to inductive types and recursive functions.

At this point, however, we have enough of an understanding of the technique for the purpose of this essay, and we’ll accept the remainder of Wadler89 without further ado.

Commentary (on the computer science)

Neat! The fact that we can derive theorems for terms given their most general type means that giving functions the “correct” type must be important. For example, if we monomorphize a function of type a -> b -> a to Bool -> String -> Bool, we can no longer reason about it; despite its implementation being identical.

What’s perhaps more interesting about this to me is what it implies about looking for functions. I recall once asking some coworkers if they had an implementation of Int -> [a] -> [[a]], which they suggested could be replicate @[a]. While it typechecks, it’s obviously not the implementation I wanted, since that is not the most general type of replicate : Int -> a -> [a].

I think this realization is the most important contribution of the paper for an every-day Haskell programmer. Darn! We could have skipped all of the math!

Commentary (on the mathematics)

Three observations of this paper stopped to give me pause.

The first curious feature is that all of Wadler’s examples of generating theorems for free involve specialization of the relation \(\rel{A}{A}{A'} = \reln{\hat{a}}:A\to A'\). Why is this? Is the relation machinery itself overkill?

The second odd thing is that when the relations are specialized to functions in the constructions of the product, coproduct, and list relations all just happen to be instances of Bifunctor (just squint and pretend like lists have a phantom type parameter to make this statement true). Suspicious, no?

The coup de grace is that when its arguments are specialized to functions, the function relation \((f, f') \in \reln{\hat{g}\to\hat{h}}\) itself reduces to a proof of \(f' \circ g = h \circ f\). Call me crazy, but that looks like too big a coincidence to be… well, a coincidence.

What do I mean? Good question. The definition of a natural transformation \(\mathcal{N} : F\to G\) between two functors (for convenience, let’s say they’re both \(\mathcal{Hask}\to\mathcal{Hask}\): the traditional functors we think about in everyday Haskell), is:

We can understand such a thing in Haskell as looking at the arrows as functions, and the objects (the things that the functions are between) as types. Therefore, a natural transformation \(\mathcal{N} : F\to G\) takes a function f :: A -> B to the equation \(\mathcal{N}_B \circ Ff = Gf \circ \mathcal{N}_A\). Remind you of anything we’ve looked at recently?

A natural transformation is a mapping from one functor to another; which we can express in Haskell as:

typeNat f g = (Functor f, Functor g) => forall x. f x -> g x

Remember how our relation constructors when specialized to functions turned out to be (bi)functors? As a matter of fact, we can view our relation for concrete types as the Identity functor, and so the rabbit hole continues.

But why must we specialize our relations to functions in all of our free theorem analysis? Well by specializing to functions, we ensure they’re arrows in \(\mathcal{Hask}\). Given that our identity, product, coproduct, and list relation constructions are functors from \(\mathcal{Hask}\to\mathcal{Hask}\) (ie. “endofunctors”), this means our analysis must stay in the realm of Haskell. Which makes sense, since our original goal was to prove things about Haskell types.

The pieces of the puzzle have mostly come together. We must specialize our relations to arrows in order to force our other relations to form (endo)functors in Haskell. Once we have endofunctors, we can use our function relation as a natural transformation as the only way of introducing non-trivial equations into our analysis (the so-called naturality condition). All that’s left before we can definitively claim that Wadler’s free theorems are nothing more than basic applications of category theory1 is a categorical notion of the universally quantified relation.

Let’s look again at the definition of our universally quantified construction:

Two universally quantified expressions are related if they maintain relatedness under any substitution of their type variable. Honestly, I don’t have a great idea about where to go from here, but I’ve got three intuitions about how to proceed. In order of obviousness:

The \(\forall\) here looks like a smoking gun compared to the expression of a natural transformation in Haskell. Maybe this construction is just an artifact of being expressed in set theory, and in fact is the other side of the coin as the function relation’s natural transformation.

Relatedly, would we get more insight if we looked at a universally quantified type in Haskell that didn’t contain a function type?

Do we get any hints if we specialize the \(\reln{F(A)}\) relation to a function?

The first bullet isn’t actionable, so we’ll keep it in mind as we go through the other points.

However, the second bullet is interesting. Interesting because if we look at any universally qualified types that don’t involve functions, we’ll find that they aren’t interesting. For example:

The only inhabitants of these types are ones that don’t contain any as at all. Given this realization, it seems safe to say that our first bullet point is correct; that universal construction is the other side of the coin to the natural transformation created by our function relation, manifest as an artifact for reasons only the eldritch set-theoretical gods know.