Visually designing distributed systems in Clojure

This story begins a few months ago. I was being introduced to a fairly complex piece of distributed software built by some fairly intelligent guys.

In a certain moment of the presentation, the person speaking started explaining the inner workings of the system using a huge flowchart diagram full of boxes, arrows and diamonds. The diagram was used to sequence the concurrent execution of different parts of the system so the designers could be “quite sure” that they have dealt with all potential race conditions and conflicts in the construction of this particular distributed system.

One of my conclusions after the talk was that there should be better ways of designing distributed systems where you can assert some basic properties of the software being built rather than ‘believing’ in its correctness.

Petri nets, also known as place transition networks in its most basic variant, are a well known and quite old formalism for modeling distributed systems.

Petri nets are built using three main components:

places: locations where data token can be stored and retrieved

transitions: nodes in the network that can retrieve tokens from places and put new tokens in different places when fired

arcs: joining places and transitions

Operational semantics of Petri nets are really simple: If all the places linked by incoming arcs into a transition contain tokens, that transition can be fired. If the transition is fired, the tokens are retrieved from the places and moved to the places with outgoing arcs from the transition.
Parallel execution, iteration and other interesting constructs can be modeled using this kind of nets. The formalism can also be proved to be Turing-complete if a special kind of arc, known as inhibitor arc is added to the model.
Petri nets can be extended in different ways: using “colored” tokens, hierarchical composition of networks, etc. Petri nets are also interesting because they have a foundational role in the development of later process calculi like Milner’s CCS calculus and Pi calculus. Petri nets are also heavily used in the description of process workflows.

Tools for developing Petri nets

Being mainly a graphical notation for modeling distributed systems, there are many tools for designing and simulating the execution of Petri nets. One of my favorites is WoPeD (http://woped.ba-karlsruhe.de/woped) developed at the Karlsruhe University. The most interesting feature of WoPeD is that it makes possible to export Petri net diagrams as XML documents using the Petri Net Markup Language, PNML (http://www.pnml.org/).

Tokengame uses this format to parse the description of a network and execute the right Clojure code.

When editing a Petri net that is going to be run with Tokengame, it is necessary to add the fully qualified name of a Clojure function or the code of a clojure lambda function to every transition in the net. These functions will be executed in different nodes when Tokengame is running.

Communication primitives

The kind of systems that can be built using Tokengame depend on a small set of primitive operations that can be combined to obtain more complex systems:

sequence

In this examples two arithmetic operations are applied in a sequence to the tokens inserted in the ‘in’ place. The results are stored in the ‘out’ place.

parallel execution

In this example the tokens inserted into the ‘in’ place are duplicated by the first transition labeled ‘dup’ using the Clojure ‘identity’ function. These tokens are then processed in parallel by two different transitions: ‘t2’ and ‘t3’. The intermediary results are stored in the places ‘p4’ and ‘p5’ and finally combined by the final transition using Clojure’s ‘or’ function.

conditional execution

Conditional execution is the result of connecting several transitions to the same ingoing place. The result is that the transitions are fired randomly if a token is present. In this example tokens containing lists of numbers are filtered by Clojure’s ‘even?’ or ‘odd?’ predicate. Conditional execution is specially useful to model 1 server – n client job processing scenarios.

iteration

Iteration is the result of creating a cycle in a Petri net. In this example inserting a number in the ‘in’place will trigger a loop where the number is duplicated and inserted in the ‘out’ place.

Once the Petri net has been modeled and it has been saved as a .pnml document, it can be directly executed using the Tokengame library.

Tokengame includes an executable class that can be used to perform several useful tasks with the net description.

The only prerequisite that must be met is that RabbitMQ version 2.0 must be installed and running in some reachable node. Tokengame uses Rabbit as the communication mechanism among the processes simulating the Petri net description.

In order to test the network, it can be executed in a single node using different threads in the a single JVM. For instance, to try the parallel execution network example shown before, the following command can be executed:

Once the network is being executed, we can use the ‘watch-place’ command to check the sequence of tokens being inserted into a certain place of the net. For example, to see the sequence of of tokens being inserted in the final ‘out’ place, the following command can be executed:

The presence of the new token in the ‘in’ place should fire the ‘dup’ transition and trigger the execution of the rest of the net. If the execution is successful, the process watching the ‘out’ transition should register the arrival of a new token with the correct value:

*** TOKEN - Sun Aug 29 23:10:04 CEST 2010 : false

All the transitions in the net can use a common logger function ‘remote-log’ that can collect log messages from different nodes in the system.

Once the net has been tested it can be run distributed using the ‘run-transition’ command to run a certain transition in a network node.

All the commands accept an array of parameters to describe the connection to the right RabbitMQ server.

A classical example
Once we have reviewed the basic operations available to describe a distributed system with a Petri net. We can take a look at a well known example: counting words.

A possible system for counting words can be modeled as follows:

In the ‘lines’ places new lines of text can be inserted. they will be split by the ‘t1’ transition executing the function ‘mapper’. The result will be tokens containing pairs ‘(word, occurrences)’ that will be stored in the ‘pairs’ place.

The ‘t2’ transition retrieves the pairs from the ‘pair’ place and a map data strcture from the ‘out’ place. With these data the transition will update the map with new word count. As a result of firing the ‘t2’ transition the updated map will be stored again in the ‘out’ place.

The ‘mapper’ and ‘reducer’ functions are simple Clojure function that invoke the ‘fire’ function to store a new token in a certain outgoing transition. If a function being executed in a transition doesn’t call to the ‘fire’ function, the value returned by the function will be stored by Tokengame in all the outgoing places for that transition. To control which tokens are stored in which places, the ‘fire’ function must be called in a explicit way from the function code as a side-effect:

Any number of ‘mapper’ transitions can be run in parallel. In the same way an additional layer of ‘partitioner’ tasks can be added to the network to reduce the list of pairs in parallel.

Some conclussions

Tokengame is just a small weekend experiment. Some important things, like the support for ‘confussion’ constructs in a Petri net are missing.

Anyway it is just a example of how distributed systems could be built using a simple formal model that allows developers to assert important properties for the system, using regular functions that can be distributed through the nodes in the systems and using a graphical notation that makes easy the design and the documentation of the software.