Wednesday, March 3, 2010

Follow up to 'Actors are not a good concurrency model'

My recent post on actors generated a lot of comments. I'll try to summarize some of them here and offer my responses.

To start, a number of people took issue with my working definition of actors, claiming that in implementation X, actors are composable. We can quibble about my definition, but I hope the deeper point was clear - side effects and state hurt composability and usage of actors as I defined them is stateful. James Iry has also made some of the same points about the statefulness of Erlang style actors.

But, if you don't like my definition, if you'd like to claim that an "actor" is actually just a function A => Future[B] and that therefore "actors" are composable, then I don't really have a problem with that (although, why call this model 'actors'? Why not call it 'functions from A => Future[B]' or 'the Kleisli arrow for futures'?). But if the actor model also includes the ability to asynchronously "send a message" to another arbitrary actor, and if the expression representing this message send does not evaluate to a future containing the result sent back by the receiving actor, then my subsequent arguments about the lack of composability still apply. The takeaway from my post, even if you think my definitions are bogus, shouldn't be "Aha!! Actors are okay! There is no issue with using them, even in stateful, non-composable ways".

This brings up another point - I don't think everyone commenting (both on my post and James' above) is working with the same ideas about what it means for a function or expression to be pure. The "standard" definition is that an expression (such as sending a message to an actor) is side-effect free if it is referentially transparent. And while there are some nuances to the definition of referential transparency, I think everyone familiar with the concept would agree that functions from A => Unit can't possibly be RT unless they are literally the constant unit function.

Looking through the various responses, I notice that no one really argued with my claim that side-effects hurt composability. I'd be interested if anyone can poke holes in my argument here (and by that I mean finding some problem with my logic, not disputing my definitions).

A number of people responded to my negative offhand remarks on OOP. Pointing out problems with OOP was not really the point of my post. Obviously I'm no fan of OOP and maybe someday I'll write more about that. The general point I was making is we should be wary of adopting a "better" technology without understanding the underlying problem that technology purports to solve. Doing so inevitably leads to solutions with a lot of incidental complexity that fail to even solve the underlying problems fully. For solutions like this, you'll often see advocates unable to really give a formal argument for why that solution is superior, instead falling back on pointing to particular examples, harping on how convenient certain things are, and gushing about various intangibles like how "beautiful" it is. There's nothing wrong with such advocacy, but if you cannot formalize your argument that one solution is better than another, chances are you do not fully understand the underlying problem and are therefore ignorant of whether some more direct, simpler solution exists.

Moving on, Chris Quenelle had this interesting comment, claiming that any form of explicit parallelism is unnecessary:

If your program is purely functional, the compiler can assign threads to whichever chunks of calculation it wants to. Hence you don't need actors. Or any other form of explicit parallelism. The purpose of explicit synchronization is to manage the timing of side-effects in the presence of parallelism.

I'm sympathetic to this point of view but I do think there needs to be something more. I've tinkered with the Future monad, which is explicit in the sense that you have decide when you are writing code in that monad, but implicit in that it only requires that you indicate dependencies between computations, not specify how those computations are scheduled out to threads, how many threads are used, etc. But I believe this breaks down for distributed computation, where the topology of the concurrency is a static or semi-static structure that must be under the control of the programmer. I've also found the monadic style doesn't work so great for "pipeline" parallelism. But more on that in a later post.

Ulf Wiger had a pretty interesting comment in which he argued (I think, Ulf, correct me if I'm wrong) that the compositional style often leads to excessive dependencies between components. This is bad given that "the thing that often kills large projects is dependency management". I would buy the claim that large projects can be killed by poor dependency management, although of course projects can fail due to lots of other reasons, too. But I view this as completely separate from my general arguments about how side-effects kill composability. Even when programming purely functionally, nothing stops you from duplicating programming work to avoid relying on some shared (not yet completed) dependency. For instance, even if two modules could in principle both be implemented using some shared generic code, it might be worth not building this out if the communication overhead and additional dependency would create a bottleneck for the overall team. This is analogous to the situation that often arises in parallel algorithms, where one can improve runtimes by duplicating some work but running things in parallel. In any case, nothing about purely functional programming precludes you from duplicating work like this if it makes sense. But if you do decide you'd like to reuse and compose code, you have the means to do it.

Lastly, there was one commenter who disagreed with my claim that "intuitiveness" does not justify use of actors as a programming model: "The link to our intuition and hence ability to leverage prior experience is paramount in being able to comprehend complex systems."

This is sort of a loaded statement. Let me unpack it a bit. First, what is really meant by intuition here? When you learn something, anything, you develop an intuition for it which of course is helpful. But if this is the case, what does it really mean to say that one technology is more intuitive than another? Assuming you understand both technologies and have an intuition for both, is one more "intuitive" than the other? With respect to what? Perhaps that simply means you are more familiar with one than the other?

No, I suspect what is actually meant by 'intuitive' here is "analogous to the physical world". But is there really anything particularly magical about the intuition each of us has from our understanding of the physical world? The more you try to pin down the supposed benefits of this intuition, the more they seem to vanish. In programming, the inferences one can make by analogy to the physical world are either too vague and too unreliable to be useful, or so simple that nothing is gained by tying them back to a physical system. Intuitions can inspire you as you explore a design space, but actual programming requires much more precise thinking than the vague intuitions we all have; fuzzy models here are for the most part "worse than useless". We're better off building, understanding, and developing an intuition for some new, simpler, more precise abstraction.

14 comments:

Side effects hurt composability inasmuch as they narrow the possibilities. id :: a -> a is perfectly composable, but not much useful, every step towards adding some semantic to a function makes it harder to compose (i.e. via Curry-Howard, stronger the guarantees our theorems gives us, narrower will be the possible compositions). So the issue isn't side effects, but more specific types. Side effects reduces composability because it forces us to care about the effects.

One shouldn't underestimate the power of defaults. Haskell is pure by default, and even if it has things like UnsafePerformIO, most programmers don't use that unless they really have to. Correspondingly, while you can write functional code in C, most C programmers do not do that by default, but stick to the model that is most intuitively expressed by the language.

My experience is that this tendency grows stronger as projects grow, and also as schedules become increasingly tight. Departing from the "model default" requires consensus, which means extra work, a coordination of wills, etc. For the lone programmer, this is trivial (there is only one will). In a 100-man organization, it's painful, and in a 1000-man project, it borders on the impossible.

@Daniel - I'd agree that side effects hurt composability by limiting possible combinations. But I see a difference in how side effects propagate. The composition of two functions with effects is generally the union of these effects (not always true, depending on the effect). The result of applying this composed function can only be used in places permitting BOTH effects. Because of how this propagates, the effects pile up and such code tends to become unusable outside the context in which it was originally written.

On the other hand, the composition of two pure functions is constrained in its combinations only by its input and output types. Composing two pure functions doesn't "remember" the intermediate type.

As an example, if I have a function ComplicatedType => B, and a function A => Complicated type, and I compose them yielding an A => B, that function is just as composable as any other A => B function. In contrast, if ComplicatedType => B has an effect, that effect will usually be observable in the composed function as well (depending on the effect).

@Paul - Yes, I agree with you. It's just that I think the effect problem isn't that big if we model them to be better composable, as you mention. Effects get a bad name (wrt composability) because most of the languages just create a big effect bag instead of better granular ones (e.g. cont vs shift/reset).

@Daniel - when you say making effects more fine-grained, do you mean in the sense of giving explicit programmer control over their scope, as in a usingCloneOfCurrentUniverse { ... } block? Or do you just mean having more fine-grained information about what the effects are. For instance, an effect indicating that only *this particular file* is modified by this function.

Both, but mostly the second (the first is implied).Let's try this idea using an imaginary (dependently typed) language:

map (e1:Effect) (e2:Effect) (f:a -(e1)> b) : [a] -(e1+e2)> [b] = ...

In this scenario we can see exactly which effects are caused by the map application. If we have scoped effect contexts (i.e. control delimiters or local regions) we can then encapsulate the effects in a piece of code (like using runState and friends). Combining both we can reason about effects inside and outside a scope:

parseWith (p:Parser a e1) (f:File) : -(e1+Read f)> a

Inside parseWith we could create an array and use shift/reset but as both are inside parseWith only the external effects would leak.

@Daniel - Ah, okay, I see. Very interesting. Thank you for that explanation.

Question about how this plays out: consider function composition of two functions, f and g. You have several choices as far as effects propagation - allow both effects to propagate, allow only f's, allow only g's, or neither. Obviously, at this level, you have no real basis to make a decision about limiting effects propagation and so your only real choice is to pass on both effects. Wouldn't this be the case for most HOFs?

Or maybe you'd just stipulate that to compose two functions, both must be pure, and it is a type error to try to do otherwise?

I've never actually programmed in a language with an effect system like this, but it seems like you might end up with the same sorts of problems - functions with effects become difficult to reuse elsewhere, difficult to compose with other functions, etc. Although I can certainly see that it's much easier to avoid bugs due to unintentional effect propagation, and it becomes easier to implement pure functions using local effects.

@Paul - Yes we want to pass up all effects on HOFs, otherwise we lose a bunch of great free theorems. It's essential to keep the effects visible and offer good combinators to deal with them.

Contrast, in Haskell, (.) vs (>>=) vs (>>>), all three are (in a sense) function composition but dealing with different possible effects.

This language is mostly Haskell but with much finer-grained effects than the IO monad: functions using IO are harder to reuse in other contexts, so we have to be careful about writing as few IO as possible. OTOH we can use local effects like State and Cont inside pure functions.

I would say that intuition is defined by what a life-time of learning has done to the lower levels of your brain.

To "intuit" something is to act on the knowledge that is so deeply embedded in the structures of you unconsciousness that you no-longer require the prefrontal cortex to get involved in your reasoning.

While you can certainly retrain your brain and thus gain new intuitions the former assessment that intuition is key to understand complex systems is probably correct. It simply not possible to rely on you limited consciousness for that.

So I guess my point is that it is a trade-off. New ways of thinking may be better, but there is a cost associated with replacing old investments for new ones.

I don't think I disagree with your definition of intuition, but I think it's possible to develop intuitions for things quite rapidly, in days, weeks, or months, if you're willing. I don't think "a lifetime" is needed. And of course, the more new concepts you learn, the easier it becomes to learn others, etc.

So to me one's present intuitions are not really a strong point in favor of any programming technique.

To be honest, I think many of the people arguing so fervently for things to be "intuitive" have a resistance to learning new things and are to some extent rationalizing their own deficiencies. "It's not intuitive" has kinda become a blanket excuse to avoid engaging with an unfamiliar concept.

@Paul I agree with you that Actors break composability in the way FP builds combinator functions. But i disagree they are not a good concurrency model.

I think you are confusing Erlang Processes, a not faithful implementation with the actual actor model. For instance, an actor in the model has no concept of mailbox. A mailbox in itself would be another actor.

Actors are a model of computation which is inherently concurrent. It is inspired by physics & not by a logical system like the typed lambda calculi as Haskell is.

In the model, an actor is the fundamental unit of computation so it has processing, storage (memory) & communication. So when i think of this i think of a completely isolated hardware component, so when an actor sends a message to another actor it's an IO operation, so it's a side effect. So computation with actors are closer to distributed systems. To compose systems with actors you have to think in terms of distributed systems. You could implement a fork of Haskell with something like this Symmetric Modal Lambda Calculus. Inside every actor you can or you might need to be referential transparent while you are processing the message, until you have to change the state for the next message.

One thing actor model can give, which logic systems can't is indeterminacy.

For Haskell, as it is, a concurrency model like CSP (a la Go) with Channels, would be more suited. You can type your channels and bind stuff.

watch Erick Meijer talking/interviewing with Carl Hewitt about what is the Actor Model

I fully agree with Paul's remarks about intuition. More than often intuition is the wrong approach to solving problems. If only because we all have our own intuition (not necessarily compatible with the intuition of others). Here is an example. Consider a car that drives straight to a wall at 50 km/h. This causes some damage to the car. Now consider two identical cars driving straight to each other at 50 km/h. This causes some (identical) damage to the cars. I would bet that some people would "intuitively" think that, in the second case, there is more damage to the car(s) than in the first case ...