Wednesday, July 6, 2011

Do (side) effects really need a total order?

Monads and applicative functors enforce a total ordering on sequencing of effects - often not the right choice. Sure, you could write your distributed or concurrent program to globally sequence effects, but the result is often unnatural, less efficient, and requires more coordination than truly dictated by the problem domain. At the other end of the spectrum we have commutative monads and applicatives - structures where liftM2 f a b == liftM2 (flip f) b a. These give maximum flexibility in how effects may be sequenced, but the full commutativity requirement is just not possible for most situations. In a commutative monad, there can be no dependencies between the effects - an effect cannot depend in any way on what effects have come before, only on the set of effects that constitute it.

As an aside, it makes more sense to talk about the commutative applicative functor underlying a monad, rather than speaking of "commutative monads". The reason is the usual definition of commutativity of effects for monads - that do { a is equal to do { b - is only even defined if e2 does not reference a. That is, the code cannot be actually monadic - for effects that commute, it must be possible to describe the behavior in terms of some underlying applicative. Applicative lifts the global ordering on the production of the effects (we no longer have to wait until all previous effects have generated to produce the current effect), but not the global ordering on their sequencing (since we cannot rearrange effects without changing meaning).

What I'd like is the ability to specify and reason with a partial order on effects - this would incidentally cover commutativity as the degenerate case where all effects are incomparable. I'm not sure how to express this as an API, though. Here was a first attempt:

The idea here is that if commutes a b, then lift2 f a b == lift2 (flip f) b a. This enables a few things - sequence and traverse could now work very differently - they might internally rearrange the effects in some way that is more efficient, then reconstruct the original sequence at the end. Or imagine the stream of effects one might feed to an iteratee - the iteratee could choose to skip a value in the input - the corresponding effect needed to generate this input could then be placed in a buffer and sequenced in later only if a subsequent effect depends on it. Hence the iteratee would not be forced to wait for evaluation of effects it was not interested in!

This interface would need to be implemented a bit differently than most applicatives. Consider the state applicative. The commutes a b translates to the claim that a and b operate on separate portions of the state. E.g., the state is a pair and a only touches the first element of the pair and b only touches the second. We might say there is some lensing going on, and the two foci of the lenses do not overlap. But it should be clear that the regular representation of state - s -> (a,s) - is not going to work, as we have no knowledge of what this opaque function might do with the given s. We need some sort of ADT to track the portion of the state each state action is given access to, so we can explicitly reason about dependencies and how they overlap.

I am only just starting to think about these possibilities. I'd love to hear if this has been looked into before, or if anyone else has experience with this sort of thing.

13 comments:

Have a look at how Darcs reasons about commutativity of patches. I've implemented a database migrations sequencer before, based loosely on Darcs's theory of operation, that allowed reasoning about commutativity of database schema changes.

@Runar - yeah I was thinking of darcs in the back of my mind as I wrote this. I should take a closer look at how darcs determines whether things commute.

One thing is that darcs is solving a slightly harder problem, since when it commutes two patches a, b, there is some transformation that is applied to both so their flipped composition yields the same result as before (a b == (commuted b) (commuted a)). My impression is that figuring out this transformation is most of the difficulty of patch theory. It might be interesting to represent this ability with another another typeclass - perhaps with commute :: m a -> m b -> Maybe (m b, m a).

I would be satisfied with simply describing the dependencies without any ability to associate a transformation with commutation.

You will have to explain your database migration project to me, it sounds pimp. :)

This is related to what lazy evaluation normally gives you, I think, and the lazy state monad is a very interesting example [1, 2].

But it sounds like you want something more than laziness, I guess you also want speculative execution, maybe? Laziness will already allow computations to commute whenever possible, but it will also allow unforced "effects" to disappear, which is maybe not what you have in mind. So maybe a sepeculative state monad is more like what you have in mind, where computations will only block when they actually need to wait on a future? [3] may be relevant, but I think it's a log uglier than what you have in mind...

@Matt - thanks for the links. I have not really played with lazy state. Honestly, I am not totally sure what I have in mind. :) Just exploring really - I've just been thinking a lately about how monads, applicatives oversequentialize, and this doesn't work for distributed/concurrent apps.

If you'd like a big-picture view of proper effect-systems in action, check out Disciple: http://disciple.ouroborus.net/wiki/Language/Overview

Of course, it's not up to par with Haskell at the moment, and it would probably take reading the thesis and perhaps messing with the compiler to get precisely what you want. However, from an academic, non-practical point of view, it might be interesting.

This is one area where uniqueness types can do better than monads, when they're applicable. ST computations sequence all side effects linearly, but if you instead use uniqueness typed arrays, there's no reason one such array can't be used concurrently with another. The threads of execution only need to meet up when you're moving things between the arrays or something similar.

Nice article. I think the management and execution of side effects is in need of some serious research. Pure code affords us many luxuries like parallelism, reordering, rewriting, that we lose when we start dealing with effects. However, if we can encode more information about our effects in the type system, we can still gain some of these benefits to some degrees. We can't just run them and see what happens, we don't need to just treat them as taboo - we need something in the middle.

I wrote an article along similar lines to this one of yours. Thought you might be interested.

All good questions! But your "usual definition of commutativity of effects for monads" is only usual in Haskell. Underlying the equation between the two "do" expressions is the observation that, for a strong monad M (ie one with a "strength" t : (a * M b) -> M (a * b)), there are two obvious ways of transforming (M a * M b) to M (a * b); we say that the monad is commutative if these two ways coincide. That formulation runs into no problems with bound variables; those are simply an artifact of the "do" notation. (But you're still right about applicative functors; these coincident transformations are part of M being monoidal.)

@Jeremy - thank you for the clarification. That makes perfect sense. Also, since you are commenting on my blog, just wanted to say that "Essence of the Iterator Pattern" is an awesome paper. :)

@Alex - I have looked at Disciple a little bit. Effect systems are cool, but I was thinking of something dynamic, such that I can reorder and/or parallelize effectful computations at runtime based on their dependencies (which are not known until runtime). A (static) effect system doesn't help with that. But it seems what I want is something like effect tracking that I can inspect and compute with at runtime.

@techtangents - read your post. I suspect you don't need dependent types though. The file being modified by a particular IO action doesn't need to be reflected in the type. You can get away with just having IO carry a runtime tag indicating which file(s) it is modifying, say, and a function you can call to determine if two IO actions commute (which would compare the two sets of tags or something). You can then use this to implement a smarter sequence function that runs actions on different files in parallel, or reorders computations, etc.

Or imagine the stream of effects one might feed to an iteratee - the iteratee could choose to skip a value in the inputI am afraid I do not understand what is a “stream of effects.” Examples are of great use when expressing ideas when you can not express it formally. I have such an example. There are 'readIORef' and 'writeIORef' in @base.Data.IORef@, and 'readIORef's commute, but no 'writeIORef' commute with anything.