Tying the Knot, Redux

tl;dr: Use RWS and feed the W to the R for great justice.

A few weeks ago, while working on a side project, I went through someepicgyrations with the help of Stack Overflow, trying to figure out a good way to tie a particular kind of “knot” in Haskell. A bit of background first:

In an imperative language, where mutable data structures are common, if you want to build a data structure containing cycles (such as a circular linked list), you typically construct your nodes and then fix up the links after construction. For example:

Fine, this was Java, so you got what you paid for. Let’s say you want to do this in Haskell now, in which data is immutable (unless you do some shenanigans with STRefs, for example). You’d declare your data type:

First, let bindings are recursive, meaning that variables declared and bound within a single let block can all reference one another regardless of order. In other words, the variable name n1 is in scope for the declaration of n0 (and vice-versa, of course).

Second, expressions in Haskell are evaluated lazily by default. When we declare let n0 = Node 0 n1, it merely promises that when somebody needs the value of n0, it will be computed as a node containing the integer 0 and referencing the node n1, whatever that ends up being. We know that there is a node named n1, because (as discussed above) it was declared within the same let binding, but we don’t care about its value until the point when somebody eventually “forces” it.

These are very different from a typical imperative language, in which you’d expect a statement like to construct a node immediately and blow up, because n1 is “uninitialized” or “null” or some such nonsense. Anyway, this is a simple form of a technique called tying the knot. If you want to declare a bunch of interdependent variables, comprising some sort of recursive data structure, then declare them all in a single let binding, make sure nothing prematurely forces their strict evaluation in the scope of the let, and you’re golden.

But what if you’re trying to build up a big, possibly cyclic data structure based on some user input? You’re going to be doing some sort of parsing, probably using monadic parser combinators or something similar, and because your “knot” depends on user input, there isn’t any obvious way to structure your code such that the knot is tied in a single let binding. This problem was the essence of my thrashing and flailing about on Stack Overflow.

Let’s talk about monads for a minute. I won’t get into much detail on this, as it’s sort of a separate large-ish topic, but I want to introduce the Reader, Writer and State monads for anybody who’s not seen them before:

The Reader monad, sometimes called the Environment monad, simply lets you ask for some bit of immutable “global” state in the midst of a computation. For example:

But whatever. Question: how is this different from regular function application?

printenv :: Show r => r -> String
printenv env = show env

Answer: it’s not. The map and >>= operations just represent function composition and application. You provide the function’s argument when you run the monad, and that argument is what ask gives you. That is all.

The Writer monad sounds intuitively like it should be the opposite of Reader somehow, but it’s probably best understood as its own separate thing. So never mind the fact that it has a function called tell (the obvious counterpart to ask)! Writer gives you something like a stream you can output to in the midst of a computation. To be precise, that stream can be any monoid, but just for an intuitive real-world example let’s say it’s a list of strings representing some logging information. For example:

log :: Show a => a -> Writer [String] ()
log a = tell [show a]

Show and tell, nice.

Finally, Reader and Writer provide a good way of isolating their respective effects from some other stateful monadic computation. We have a number of common monads representing computations that may not be idempotent or referentially transparent, such as IO and ST, but one of the simplest and most common is State. So let’s roll with that.

Here we’re emulating the unary ++i and i++ operations from languages like C. The get and put functions are sort of the primitive operations to read and write the State. For fun, we can simplify these further using a common helper, modify, and applicative style (thanks to Reddit user LordGreyhawk for the comment):

preincrement = modify (+1) *> get
postincrement = get <* modify (+1)

Now, Reader, Writer and State are three great tastes that taste so great together, that there’s another monad called RWS which simply lumps them into a single type. For example, we can write:

type Contrived a = RWS Int [Int] Int a
contrived :: Contrived ()
contrived = do
env <- ask -- read from the fixed environment
s0 <- get -- grab the current state
let s1 = s0 + env -- increment the current state by the environment value
put s1 -- update the state to the new, incremented value
tell [s1] -- log the new value

Finally, you probably want to run one of these things eventually, which looks like:

In other words, running an RWS computation consumes a reader environment and an initial state, and produces both a value, a final state and some writer output.

Okay, monadic digression time over. This all turns out to be very convenient for knot-tying in general: all we need to do is feed the writer output back into the reader environment. Doing so, exploiting lazy evaluation, makes it so that you can indirectly reference your eventual output before you have produced it! The Ouroboros in question looks like this:

First, we say that Knot is a type synonym for an RWS monad, in which the reader and writer types are declared to be the same k thing. We haven’t talked yet about what might be a useful concrete type for k; we’ll get to that shortly. Next, tie takes a Knot and some initial state, and just like runRWS, it produces a triple containing some output value, the final state, and again one of these k things.

So check out the let binding: we call runRWS, which produces a triple containing a value k as its writer output. And what parameter do we pass as the reader environment for runRWS? Aha: k again. This is how we feed the snake its tail.

Now, let’s try to figure out an appropriate concrete data type for k. There’s no single correct answer for this, it depends on the problem you’re trying to solve, but let’s figure one example out just to get some satisfaction here.

Take our circular linked list example from before. I’ll simplify it just a little bit here, and add a Show instance so we can see what happens when we run our stuff:

data Node = Node {
value :: Int,
next :: Node
} deriving Show

Okay, now we want to parse a list of pairs of Ints into our structure. For example, we want [(0, 1), (1, 2), (2, 0)] to be equivalent to the following code:

Since the second component of each pair is referencing some other node by its list offset, this means we need a Map from list offsets to the nodes that will eventually be found there. The Map will be our k type for this example. We’ll write to this map (which is conveniently a monoid) using tell, and query it using ask. We’ll also want some state to keep track of what our current offset is and what raw data we have left to parse. Let’s do this:

We get the current state and see how much input we have remaining. If we’re done, we’re done. Otherwise, pull the current node’s value and reference to the next node off of the list. Look the reference up in our map (which we will have constructed by the time we’re done) to find the next node, and write the newly constructed current node to the writer output. Update the state and recurse.

This was obviously a toy example. It would be much more realistic to be accepting input from some external source such as a file, in which case your input is monadic. The Knot monad can be easily adapted into a KnotT monad transformer, on top of whatever underlying monad is needed (e.g. IO). Just implement it in terms of RWST instead of RWS.

I demonstrated knot-tying here in the context of parsing, but it works just as well in the context of serialization. The Map in that case is flipped: you need to look up the offsets at which each Node will eventually be stored, before they have necessarily been written there. This requires a bit of care to ensure that you don’t force strict evaluation of these offsets while writing. That also probably means you can’t do any nice output buffering performance tricks, but I’ll just say “whatever” to that for now.