Messages and Agents

In this post, we'll look at the message-based (or actor-based) approach to concurrency.

In this approach, when one task wants to communicate with another, it sends it a message, rather than contacting it directly. The messages are put on a queue, and the receiving task (known as an "actor" or "agent") pulls the messages off the queue one at a time to process them.

This message-based approach has been applied to many situations, from low-level network sockets (built on TCP/IP) to enterprise wide application integration systems (for example MSMQ or IBM WebSphere MQ).

From a software design point of view, a message-based approach has a number of benefits:

You can manage shared data and resources without locks.

You can easily follow the "single responsibility principle", because each agent can be designed to do only one thing.

It encourages a "pipeline" model of programming with "producers" sending messages to decoupled "consumers", which has additional benefits:

The queue acts as a buffer, eliminating waiting on the client side.

It is straightforward to scale up one side or the other of the queue as needed in order to maximize throughput.

Errors can be handled gracefully, because the decoupling means that agents can be created and destroyed without affecting their clients.

From a practical developer's point of view, what I find most appealing about the message-based approach is that when writing the code for any given actor, you don't have to hurt your brain by thinking about concurrency. The message queue forces a "serialization" of operations that otherwise might occur concurrently. And this in turn makes it much easier to think about (and write code for) the logic for processing a message, because you can be sure that your code will be isolated from other events that might interrupt your flow.

With these advantages, it is not surprising that when a team inside Ericsson wanted to design a programming language for writing highly-concurrent telephony applications, they created one with a message-based approach, namely Erlang. Erlang has now become the poster child for the whole topic, and has created a lot of interest in implementing the same approach in other languages.

How F# implements a message-based approach

F# has a built-in agent class called MailboxProcessor. These agents are very lightweight compared with threads - you can instantiate tens of thousands of them at the same time.

These are similar to the agents in Erlang, but unlike the Erlang ones, they do not work across process boundaries, only in the same process.
And unlike a heavyweight queueing system such as MSMQ, the messages are not persistent. If your app crashes, the messages are lost.

But these are minor issues, and can be worked around. In a future series, I will go into alternative implementations of message queues. The fundamental approach is the same in all cases.

In the rest of this post we'll look at two slightly more useful examples:

Managing shared state without locks

Serialized and buffered access to shared IO

In both of these cases, a message based approach to concurrency is elegant, efficient, and easy to program.

Managing shared state

Let's look at the shared state problem first.

A common scenario is that you have some state that needs to be accessed and changed by multiple concurrent tasks or threads.
We'll use a very simple case, and say that the requirements are:

A shared "counter" and "sum" that can be incremented by multiple tasks concurrently.

Changes to the counter and sum must be atomic -- we must guarantee that they will both be updated at the same time.

The locking approach to shared state

Using locks or mutexes is a common solution for these requirements, so let's write some code using a lock, and see how it performs.

First let's write a static LockedCounter class that protects the state with locks.

openSystemopenSystem.ThreadingopenSystem.Diagnostics// a utility functiontypeUtility()=staticletrand=newRandom()staticmemberRandomSleep()=letms=rand.Next(1,10)Thread.Sleepms// an implementation of a shared counter using lockstypeLockedCounter()=staticlet_lock=newObject()staticletmutablecount=0staticletmutablesum=0staticletupdateStatei=// increment the counters and...sum<-sum+icount<-count+1printfn"Count is: %i. Sum is: %i"countsum// ...emulate a short delayUtility.RandomSleep()// public interface to hide the statestaticmemberAddi=// see how long a client has to waitletstopwatch=newStopwatch()stopwatch.Start()// start lock. Same as C# lock{...}lock_lock(fun()->// see how long the wait wasstopwatch.Stop()printfn"Client waited %i"stopwatch.ElapsedMilliseconds// do the core logicupdateStatei)// release lock

Some notes on this code:

This code is written in using a very imperative approach, with mutable variables and locks

The public Add method has explicit Monitor.Enter and Monitor.Exit expressions to get and release the lock. This is the same as the lock{...} statement in C#.

We've also added a stopwatch to measure how long a client has to wait to get the lock.

The core "business logic" is the updateState method, which not only updates the state, but adds a small random wait as well to emulate the time taken to do the processing.

Oh dear! Most tasks are now waiting quite a while. If two tasks want to update the state at the same time, one must wait for the other's work to complete before it can do its own work, which affects performance.

And if we add more and more tasks, the contention will increase, and the tasks will spend more and more time waiting rather than working.

The message-based approach to shared state

Let's see how a message queue might help us. Here's the message based version:

typeMessageBasedCounter()=staticletupdateState(count,sum)msg=// increment the counters and...letnewSum=sum+msgletnewCount=count+1printfn"Count is: %i. Sum is: %i"newCountnewSum// ...emulate a short delayUtility.RandomSleep()// return the new state(newCount,newSum)// create the agentstaticletagent=MailboxProcessor.Start(funinbox->// the message processing functionletrecmessageLoopoldState=async{// read a messagelet!msg=inbox.Receive()// do the core logicletnewState=updateStateoldStatemsg// loop to topreturn!messageLoopnewState}// start the loop messageLoop(0,0))// public interface to hide the implementationstaticmemberAddi=agent.Posti

Some notes on this code:

The core "business logic" is again in the updateState method, which has almost the same implementation as the earlier example, except the state is immutable, so that a new state is created and returned to the main loop.

The agent reads messages (simple ints in this case) and then calls updateState method

The public method Add posts a message to the agent, rather than calling the updateState method directly

This code is written in a more functional way; there are no mutable variables and no locks anywhere. In fact, there is no code dealing with concurrency at all!
The code only has to focus on the business logic, and is consequently much easier to understand.

We can't measure the waiting time for the clients, because there is none!

Shared IO

A similar concurrency problem occurs when accessing a shared IO resource such as a file:

If the IO is slow, the clients can spend a lot of time waiting, even without locks.

If multiple threads write to the resource at the same time, you can get corrupted data.

Both problems can be solved by using asynchronous calls combined with buffering -- exactly what a message queue does.

In this next example, we'll consider the example of a logging service that many clients will write to concurrently.
(In this trivial case, we'll just write directly to the Console.)

We'll first look at an implementation without concurrency control, and then at an implementation that uses message queues to serialize all requests.

IO without serialization

In order to make the corruption very obvious and repeatable, let's first create a "slow" console that writes each individual character in the log message
and pauses for a millisecond between each character. During that millisecond, another thread could be writing as well, causing an undesirable
interleaving of messages.

Serialized IO with messages

So what happens when we replace UnserializedLogger with a SerializedLogger class that encapsulates a message queue.

The agent inside SerializedLogger simply reads a message from its input queue and writes it to the slow console. Again there is no code dealing with concurrency and no locks are used.

typeSerializedLogger()=// create the mailbox processorletagent=MailboxProcessor.Start(funinbox->// the message processing functionletrecmessageLoop()=async{// read a messagelet!msg=inbox.Receive()// write it to the logslowConsoleWritemsg// loop to topreturn!messageLoop()}// start the loopmessageLoop())// public interfacememberthis.Logmsg=agent.Postmsg// test in isolationletserializedLogger=SerializedLogger()serializedLogger.Log"hello"

So now we can repeat the earlier unserialized example but using the SerializedLogger instead. Again, we create five child tasks and run them in parallel: