Distributed Deterministic Dataflow Programming

Christopher Meiklejohn
Software Engineer @Basho and Graduate Student

Distributed Deterministic Dataflow Programming

Erlang implements a message-passing execution model in which concurrent processes send each other asynchronous messages. This model is inherently nondeterministic, in that a process can receive messages sent by any process which knows its process identifier, leading to an exponential number of possible executions based on the number messages received. Concurrent programs in non- deterministic languages, are notoriously hard to prove correct, and have lead to many well-known disasters.

In addition, Erlang natively provides distribution and clustering as part of the runtime environment. This provides the ability to have processes asynchronously communicate across the network between different instances of the virtual machine, effectively increasing the amount of non-determinism.

We propose an alternative execution model for Erlang, namely deterministic dataflow programming. This execution model provides concurrency, while also eliminating all observable non-determinism. Given the same input values, a program written in deterministic dataflow style will always return the same output values, or never return. Our proposed solution provides a distributed deterministic data flow solution which operates transparently over distributed Erlang, providing the ability to have highly-available, fault-tolerant, deterministic computations.

About Christopher

Christopher Meiklejohn loves distributed systems and programming languages. Previously, Christopher worked at Basho Technologies, Inc. on the distributed key-value store, Riak. Christopher develops a programming language for distributed computation, called Lasp. Christopher is currently a Ph.D. student at the Université Catholique de Louvain in Belgium.