Thursday, 4 December 2008

STM with control: Communication for retrying transactions

While working on the join pattern with guards and propagation implementation, we (my PhD student Edmund and me) came across an interesting problem which points out the current limited control/communication capabilities of retrying STM transactions (in Haskell). More on join patterns later. Here's the gist of the problem and a solution in form of a new Haskell library.

The task

We're given a list of switches, they can either be on or off

> data Switch = On | Off

If all switches are on, we switch them all off and report True If one switch is off, we do nothing and report False.

In a sequential setting, this task is straightforward. But what if the switches are shared among concurrent threads performing the above task?

In a concurrent setting, the task must be formulated as follows.

Atomically (If all switches are on, we switch them all off and report TrueIf one switch is off, we do nothing and report False. )

We first check that all switches are on. Only then we switch all of them off. Isolation and atomicity of STM are important here.

Isolation guarantees that no intermediate step is visible to any other thread. After checking that all switches are on, we have the guarantee that they are still on, once we switch them off. Atomicity guarantees that all operations are executed all at once or not all which implies that some of the initially read switch values has changed. In this case, we abort the transaction and discard all changes so far and start over again.

It's tempting to immediately turn a switch off once we've read that the switch is on. But what if we read that a switch is still on? We need some additionally functionality to explicitly abort the transactions. That's what the retry and orElse combinators are for.

The loop reads the status of each switch and turns the switch off if the switch is on. Otherwise, we 'abort' by returning False. For example, in case of the initial list [On, On, Off, On] we reach the third switch and the intermediate result [Off, Off, Off, On]. We can't just naively exit at this stage because the tasks demands that all changes are done atomically, ie to its full extent or not at all.

Here comes the rescue. The retry combinator allows the programmer to explicitly abort a transaction. That is, we discard all changes (which is important) here. The orElse combinator allows the programmer to compose two transaction. The second transaction will be executed if the first transaction retries. Exactly what we require.

Making the task more interesting

Suppose each switch is attached to an action, sort of a continuation. Let's say an IO action.

> type Cnt = IO ()

We want to return the continuation of the switch which first 'failed'. Here's the updated task description.

Atomically (If all switches are on, we switch them all off and report TrueOtherwise, return the continuation of the first turned on switch.)

The signature of the function we need to implement

verifyAndSwitchOffCnt :: [(TVar Switch, Cnt)] -> IO Cnt

What are our options how to implement the above?

We could adapt our first solution verifyAndSwitchOff. This is pretty straightforward. What about verifyAndSwitchOff2? Previously, we simply return False in the second orElse branch. For the more interesting task, we need to iterate over the list of pairs of switches and continuation and select the continuation of the first switch which is still on. This means that in both case we possibly need to iterate over the list twice. This is inefficient and leads to clumsy code.

Below we make use of an extension of STM where a retrying transaction can pass some information to the second transaction composed with orElse.

The extension is implemented as the STMControl.STMC library. The new primitives are:

STMC stands for STM with more control. The first parameter refers to the type of the value transmited in case of a retryC. The parameter in the orElseC combinator is now a function which accepts the result transmitted of the retrying transaction.

The implementation is fairly simple and uses an (abused) reader monad on top of STM. Here's finally an application of the new functionality.

The loop checks and resets all switches. In case a switch is turned off we return the attached continuation. We use the Maybe type to indicate failure (Just c) or success (Nothing). In case of success, we return the donothing continuation. Otherwise, we retryC to discard all changes made so far and pass the continuation to the second transaction which then returns this continuation.

I'd say compared to the other options, the STMC solution looks pretty clean and easy to understand.

Making the task even more interesting

That's the final extension, I promise. Suppose that to each switch there's also an attached STM Bool transaction. In addition to testing that the switch is off, we also test that the STM Bool transaction evaluates to True. Otherwise, we discard our changes and return the attached continuation.

The required changes to our previous solution are minor. In fact, I can't see how to implement the even more interesting task with STM alone. I believe that this task strictly requires STMC or something similar.

The complete source code of this example is available as part of the stmcontrol package on the Haskell platform hackage.

Implementation details

What if the transaction is aborted just after execution of L1? Is our use of unsafeIOToSTM (un)safe? It's safe. It's okay to discard the IO action in case the transaction is rolled back. The important point is that if the transaction is explicitly retried, the IO action has been executed (which is the case as far as I understand the interplay of STM and unsafeIOToSTM).

4 comments:

You could also do this with an exception, instead of the retry/orElse pair, since an exception aborts the transaction in the scope of the handler. Using a custom exception type would avoid the need for the reader monad too, I think.

Interesting thought. You're saying that an exception raised within a transaction will cancel all updates and the control then flows to the handler which catches the exception. Indeed, no retry/orElse and no reader monad required.

In fact, it almost seems that retry/orElse are redundant given we can raise exceptions in transactions?