Sunday, May 3, 2009

Notes on dataflow concurrency

At a conference, I heard someone say something to the effect of: "Declarative dataflow concurrency is awesome. You can't get deadlocks! It all just works, and it is so beautiful".

If you are unfamiliar with dataflow concurrency, it is a technique used in some languages, where a program, if it reaches a statement that requires the use of a variable which hasn't been assigned a value yet, pauses until the variable has been assigned by another thread and then continues the computation as if nothing had happened.

It is pretty neat, but I thought to myself - "Hang on, if we have two threads that each wait for the other to assign a variable, won't we get a circular dependency between them? Or if there is just one executing thread, won't that block forever? I don't understand, that doesn't sound like much different from what I'm used to...?"

The computation models of this book use [dataflow programming]. This is unreasonable in a sequential system, since the program will wait forever. It is reasonable in a concurrent system, where it could be part of normal operation that some other thread binds the varialbe. [Dataflow programming] introduces a new kind of program error, namely a suspension that waits forever. For example, if a variable name is misspelled then it will never be bound. A good debugger should detect when this occurs.

So even if they wait on an infinitely unbound variable instead of lock that is never released, the experience for the user would appear to be much the same - a program that "hangs" without producing the desired result. I wrote this little snipped in Mozart to demonstrate:

declare

Y

X

thread X=Y+1 end

Y=X+1 end

{Browse X}

And indeed the browse window never appears. The main thread blocks on line 5, the spawned thread on line 4, each waiting for an unbound variable.

But even though it hasn't been stated outright in the book (at least not yet, I'm at page 350), I think there is an important difference compared to threaded programs as they are usually written in, for instance, Java. The Mozart program is deterministic (if written in a completely declarative style). It will *always* block, no matter how the threads are scheduled, which makes this programming error easier to detect and (hopefully) to locate and fix.

So what the person at the conference perhaps meant, was that the programs written in dataflow concurrency style don't have *race conditions*.