Jobim: an actors library for Clojure

In the latest days I’ve been working in an actors library for Clojure built on
top of RabbitMQ and Zookeeper. I’ve called this little piece of software
Jobim. The source code is available in github.

External dependencies

Jobim depends on RabbitMQ 2.X for dispatching messages between JVM
nodes. RabbitMQ is reliable, high performant messaging solution, supporting the
0.9 version of the AMQP messaging protocol, built on top of Erlang’s OTP
platform.
I’ve been using in my day to day work Rabbit for several months and is one of
the best alternatives you can find to build a message queue.

Jobim also has a dependency on Apache’s ZooKeeper. ZooKeeper is a really
impressive software that in the purest UNIX tradition does only one thing but
does it in the best possible way. In this case Zookeeper allows a set of
distributed processes to manage a shared tree of nodes and get notification
whenever this tree is modified by other processes. This basic functionality
provides out of the box support to use ZooKeeper as a powerful directory service
as well as a group membership service. It can also be extended to provide a whole
bunch of high level coordination services like priority queues or 2 phase commit
protocols.

Turning the JVM into a Node

Jobim actors need to be executed in a JVM that is executing a Node service. JVM
nodes are aware of the existence of other nodes, can exchange messages and
coordinate their activities.

A node service can be started in the JVM using a single functionjobim/bootstrap-node. This function receives as a parameter, a path
to a configuration file where the name of the node as well as the connection
options to RabbitMQ and ZooKeeper must be stated.

Nodes are aware of changes in the list of available nodes. They are also notified about
shutdown of nodes or about nodes not being reachable due to network partitions.

Creating an actor

A Jobim actor can be started using any Clojure function. This function can use
two special functions: send! and receive to send and
receive messages from other actors in the same node or in a different node.
In order to send a message, the actor needs to know the PID of the actor
receiving the message. This PID is passed as the first argument of thesend! function. The payload of the message is the second argument.

Jobim uses standard Java serialization mechanism to build the payload of the
messages. This means that whatever object implementingjava.io.Serializable can be send inside of a message.

For instance, we can send Date objects to our ping actor:

=> (send! *pid* [(self) (java.util.Date.)])
ok
=> (receive)
#

It is possible to change the serialization and deserialization mechanism used by
a node altering the jobim.core/default-encode andjobim.core/default.decode symbols with suitable functions.

Going distributed

The most basic building block for distributed computing in Jobim is thejobim/rpc-call function. This function receives a node identifier, a
string containing a function and an array of arguments and tries the invocation
of that function in the remote node.rpc-call returns inmediately
without knowing the result of the invocation in the remote node. If we want to
retrieve the result of the invocation we can use the blocking variant:jobim/rpc-blocking-call that blocks until the result is returned or
an exception is launched.

RPC functions in Jobim accepts node identifiers, we can transform the name of a
node, maybe already retrieved with the jobim/nodes function into a
node identifier, using the jobim/resolve-node-name function.

One specially important use of the RPC functions is to start new actors in other
nodes invoking the jobim/spawn function remotely. If we use the
blocking variant of the the RPC function, we will retrieve the PID of the remote
actor and we could start sending messages to it:

As long as we have the PID of an actor, we will be able to exchange messages
with it. Besides, since the PID is just a string, we can pass the PIDs inside
messages allowing actors to be “mobile actors” in a Pi-Calculus sense.

Nevertheless, it is sometimes convenient to be able to query for an actor using
a constant reference, for instance an alias, we know beforehand, so we can
communicate with the actor without needing to know its PID.

Jobim supports this use case with the jobim/register-name
function. Using this function, we can provide a name for the PID of a process
that will be globally available to all nodes in the system.

Registered names can be queried using the jobim/registered-name
function in a similar way to the jobim/nodes function for node
names and node identifiers.

We can transform a registered name into an actor PID using thejobim/resolve-name function, so we can pass it as an argument injobim/send! function calls.

Erlang systems have a very particular approach to error handling that consists of
not preventing failures but reacting quickly after a failure happens, most of
the time, restarting the failing component.

The basic mechanism behind this approach is the “linking” of processes. When two
Erlang processes are linked, any failure in one of the two process will produce
a fail signal in the other process that, if not properly handled, will cause the
other process to fail.

Special processes, known as supervisors, take care of creating and linking to
children processes as well as handling exceptions in the children according to some
kind of recovery policy.

Distributed Erlang applications are usually arranged as trees of processes where the
process at a node handle the error in the leafs of that node, and if is not able
to recover from that error, dies and bubble the error to the upper level.

Jobim provides limited support for this style of error handling with thejobim/link function. The link function receives the PID of an actor
as an argument and links bidirectionally both actors.

From this point on, any error in one actor or a node down in the node where one
of the actors is running will produce a special message signaling the error in
the other actor.

This means that linked processes in Jobim must explicitly look for this kind of
messages and handle them, maybe throwing an exception, to obtain a similar
behaviour to OTP applications.

Evented actors

The actors introduced so far are executed in their own Java thread. This is a
huge problem since a JVM will start throwing OutOfMemory exceptions
after a few thousands of threads are created.

On the other hand, Erlang systems can handle millions of concurrent actors in a single node, using
a preemptive scheduler that applies a fixed number of reductions in each Erlang
process being executed. This means that Erlang processes are extremely
lightweight and can be benefit from features like the linking of processes
previously discussed.

A possible alternative for building systems using a large amount of actors is to
use “evented actors” so a single java thread can execute different actors. This
solution has been explored in Scala actors.

Jobim evented actors rely on three special functions: jobim/react that is
equivalent to the receive funtion of a regular actor,jobim/react-loop that creates a recursive evented actor andjobim/spawn-evented that creates a new evented actor returning its
PID. This PID can be used with the regular jobim/send! function to
send messages to the evented actor.

The following is an evented implementation of the previously defined ping actor:

Erlang is an incredible platform for building distributed reliable
systems. Support for an actors library providing support for distributed failure
signals and tolerance to network partition can be a nice addition to Clojure’s
own concurrency mechanism to build distributed applications in the JVM.
It could also be mixed with different distribution mechanisms available in the
JVM.

Jobim is just an experiment on how this kind of systems could be built using two
beautiful pieces of software like RabbitMQ and ZooKeeper.