I’ve been writing a couple of papers recently based on CHP. The latest is (provisionally) titled “Abstractions for Message-Passing” (naming suggestions are welcome!). It’s a combination of general CHP, process wiring, process composition and behaviours. I’m going to try submitting it to ICFP — I hope it’s a good enough fit, being the application of FP-like ideas to message-passing within an FP language. I’m posting it here, partly so you can read it if interested (it effectively provides a summary of a lot of my recent work in CHP), and partly in the hope that I might get some constructive feedback (small or big). The deadline is this Friday afternoon (2nd April), which I realise is now quite close, but in case anyone has time to read it this week I’d welcome any comments — for example, I can’t help but feel I’m missing more related work. Link to the current draft is here.

I’ve been doing some benchmarking with Criterion recently, and wanted to write up the results. However, I didn’t really understand how it was arriving at some of the statistics that it spits out, particularly the confidence intervals. One book on the subject later, and I think I’ve got it. I thought I’d write it down in case anyone else needed to know. (I’m not a statistics expert though, so please do send corrections if necessary.) The mean result from Criterion is straightforward; the benchmark is run (typically 100 times) and the mean is calculated in the standard way (total time divided by count). But Criterion also gives 95% confidence intervals for the mean, and these require some more detailed explanation.

Confidence Intervals

Confidence intervals are a notoriously tricky part of statistics. A confidence interval of 95% means that if you run many separate experiments, and calculate a confidence interval for each of them, 95% of those intervals will contain the true mean. The kicker being that you have no way of knowing whether the interval you have for your one experiment is one of the 95% that contain the mean, or one of the 5% that don’t.

One way to calculate a confidence interval is to make an assumption about the distribution of the data. It is important to realise that this is an assumption. For example, you might be tempted to assume the times were normally distributed — but do you really believe that your times come from a perfectly symmetric bell-curve? Run times are often skew, distributed more exponentially, or simply too noisy to pick a distribution that could have generated them. (You can perform a statistical test for the validity of the normality assumption in your sample, if you’re interested.) If you did pick a distribution and calculate the parameters — e.g. the variance of a normal distribution — you could say where 95% of the values are likely to fall because you know what shape the distribution is. But if you don’t know the shape of the real distribution (from which your noisy sample was drawn), you need to use a different technique; Criterion uses the bootstrap.

The Bootstrap

Imagine for a moment that rather than just one mean, you instead had 1000 means. A straightforward way to form a 95% confidence interval would be to arrange your sample times in order, and pick the 2.5% mark (roughly the 25th highest mean) and the 97.5% mark (roughly the 975th highest mean). This method makes no new assumptions, and would derive the interval just from the data. Of course, you don’t have the time to wait around and record enough samples to form 1000 means. (Some of my benchmarks took in the order of 20 minutes per benchmark to collect the 100 values for a single mean; for 1000 means I would have to wait a long time!)

The bootstrap technique generates a lot of means, by “faking” a lot more samples than you actually recorded and taking the mean of each of them. It takes your single sample of 100, and generates many more samples (in Criterion, 100000 by default) by randomly picking 100 values out. Obviously, if you picked 100 values from a set of 100 without replacement you’d always end up with the starting set; the bootstrap picks with replacement, so it is very likely to feature repeated original values in the new fake sample, and also to omit some of the original values in each new fake. The confidence interval can be formed from these fake samples as described in the previous paragraph.

That is the intuition of the bootstrap method of calculating confidence intervals. In fact, Criterion uses bootstrap with bias correction and acceleration. These two extra measures affect how the confidence intervals are calculated by attempting to adjust for skew and sensitivity in the data. Unfortunately the reference books I have available are not very illuminating on the intuition behind bias and acceleration — so you and I will have to remain in the dark, pending a better reference.

The Code

Often, programmers learn a system best by tinkering and seeing what happens. In that spirit, here is a little code that takes a sample and prints out the bootstrapped mean confidence intervals for it (as Criterion would derive them):

I’ve also now created a public darcs repository on patch-tag that holds the CHP code:

darcs get http://patch-tag.com/r/twistedsquare/chp

While I’m tidying things up: I wired up the blog to a twitter account a while ago and then never told anyone about it. If you want to follow the blog through twitter (I don’t yet use the account for much more than that), you can do so.

While giving a talk last week, someone asked about obeying the arrow laws with CHP processes. The long and the short of it is that the arrow laws mean that arr id >>> arr id should be equivalent to arr id. But with the instance I have in chp-plus, the former is two processes and the latter one — and each process introduces one place of buffering. If you put in the same input you’ll get the same output with both, but with different buffering behaviour (this is already pointed out in the documentation). The same issue came up again later in the day with process fusion (more on that another time).

Extended Inputs

The way to resolve the problem of how much buffering to introduce is quite simple in retrospect: don’t introduce any buffering, anywhere. Then there is no issue of adding or removing it with the laws. We can easily eliminate buffering in CHP using extended input. Extended input means that when you read from a channel, you keep the writer waiting longer than usual, while the reader performs an extra action. The API in CHP is quite straightforward:

extReadChannel :: ReadableChannel r => r a -> (a -> CHP b) -> CHP b

The function takes an incoming channel end, an action to perform with the value, and then performs an extended input using your given action. Note that if you need the read value after the action, you’ll have to return it yourself as part of the “b” type.

They share the same type but have slightly different behaviour. The key thing is that two of these extended identity processes composed next to each other have the same behaviour as one of these extended identity processes: extReadId <=> extReadId = extReadId. So if we build our arrow instance on these, we can avoid the buffering problem and fully obey the arrow laws.

Proof

This claim of identical behaviour can actually be proved using the FDR model checker for CSP. FDR doesn’t know about extended inputs, but an extended input can be formalised using a channel communication followed by an acknowledgement; the extended action takes place before the acknowledgement is sent. So here is the extended identity process from above in FDR script, parameterised by two pairs of data and ack channels:

The two composed processes synchronise together on mid2C and midack2C (but not the other channels). We then hide (backslash is hide) these events so that they are not visible outside. Without this hiding we would not have equivalence because some outside process could synchronise on mid2C and break everything; hiding makes sure these events are contained. All we then need are some more channel declarations and our assertions of equality:

The last two lines express refinement; you can think of this as expressing that one process is as general in its behaviour as another. The refinement operator is anti-symmetric, so expressing it twice in different directions like this is asserting equality of the two processes — a strong relationship. Putting all the above together we can run FDR and get the vital output:

Inductively, if composing two extended identity processes together is the same as one extended identity process, any chain of N extended identity processes is equivalent to a single one. And the map process (on which the arrow instance is based) has the same communication pattern, so the result applies there too if we construct an extended map process: extMap f <=> extMap g = extMap (f.g)

Next Monday morning I’m going to be in Portland (Oregon), and will be giving a talk at Galois on the CHP library. On the off-chance that there are any readers from that area: the talk is open to all, so feel free to come along. It will be an introduction to the library and some of its capabilities.

Update: the talk seemed to go well; here are the Galois talk slides (with notes). I’ve added various notes amongst the slides to help it be understandable without me gesturing at them. The chats afterwards with some of the Galois guys were very useful and interesting, and have given me a lot to think (and blog) about in CHP.

An MVar in Haskell is a shared variable that is either full, or empty. Trying to write to a full one, or read from an empty one, will cause you to block. It can be used as a one-place buffered asynchronous channel. Consider if you didn’t want choice, or conjunction or any of the fancy features of CHP, but you do want to build a synchronous channel using MVars. How would you do it?

MVars — The Obvious Way

There is a very straightforward way to turn asynchronous channels into synchronous channels: you form a pair of channels, and use one for the writer to send the reader the data, and the other for the reader to send the writer an acknowledgement:

Let’s assume that context-switching is the major cost in these algorithms, and examine how many times a process must block in the above algorithm. We know that this must be at least one; whoever arrives first will have to block to wait for the second participant.

We’ll start with what happens if the writer arrives first. The writer arrives, puts the value into the data MVar, then blocks waiting for the ack. The reader arrives, takes the data and sends the ack, at which point the writer wakes up. So in this case: one block.

If the reader arrives first, it will block waiting for the data MVar. The writer will arrive, put the value into the data MVar, then block waiting for the ack. The reader will wake up, take the data and send the ack; then the writer will wake up. So here we had two blocks. The writer blocking is unnecessary; if the reader was already there waiting, there is no need for the writer to wait for an ack, it could just deposit the value and go — if it knew the reader was there.

MVars — The Faster Way

There are several ways to remove that second block. One way is to have an MVar as a sort of status MVar. When the reader or writer arrives, they try to put into this MVar. If they succeed, they are first and they wait on a second MVar. If they fail, they are second and act accordingly, emptying the status variable and waking up the first party:

This version is, in my benchmarks, twice as fast as the first version, which suggests that context-switching really is the expensive part of these algorithms. In fact, I started out with the first version in thi spost, but CHP’s more featured and complex algorithms were coming out faster because I only ever block once. It was only when I improved the MVar version to the second one above that the results were as I expected.

I’m currently writing a paper on CHP’s performance for conjunction, which I have been optimising recently. The problem with a new feature like conjunction is that there is nothing else to benchmark it against! But I can benchmark the effect that supporting conjunction has on performance for simple channel communications and other things that don’t feature conjunction.

Two of my comparisons are simple synchronous channels based on MVars and STM. These don’t support choice between events — you can’t choose between writing to two synchronous channels built on MVars or STM without more machinery on top. But they are fast. Another comparison is the CML package, which does support such choice between events — the performance of CML merits its own post some time (in short: fine normally, but terrible if you use its choice operator a lot — unless I have made a mistake in the benchmark).

I also wanted to benchmark an implementation that supported choice but not conjunction, based on STM. Version 1.0.0 of CHP fulfils this criteria, but was badly designed and totally unoptimised — and I know from my recent optimisations how bad the performance might be. So I constructed an optimised version of channels with choice but no conjunction. I was surprised at how short the algorithm was, and realised that it could be explained in a blog post. So here it is.

Implementing Event Choice with STM

Let’s be clear on the problem, first. I want an Event type such that I can say “wait for event A or event B, whichever happens first, but only engage in one of them”. Then I want someone else to be able to concurrently say “wait for event B or event C or event D, whichever happens first, but only engage in one of them” and have the algorithm resolve it all. STM doesn’t achieve this by itself; you can use orElse to choose between writing to two variables, but that doesn’t suffice for multiple people engaging in events with each other.

We begin with a helper function — one of those functions that is general enough that it might almost feature in a standard library. Here it is:

Next we’ll declare our data-types. Our Event contains a constant enrollment count (the number of processes required to synchronise together), and a transactional variable holding a list of current offers, each with an associated STM action. An offer is in turn a list of events which uses a ThreadId as a unique identifier; think of an offer as saying: I offer to engage in exactly one of the events in the list, and I’m waiting until I do:

Adding an offer to an event is as simple as adding it to the list of offers. My modifyTVar' function has type (a -> a) -> TVar a -> STM () and applies the modification function to the contents of the TVar, but it adds a little strictness that helps performance:

We also define a function for checking if an event is able to complete (when we are making an offer on it). This takes an event, and an action to perform if the event can complete. It then reads the current offers from the event’s transactional variable. If the enrollment count is equal to the number of current offers plus one (the offer now being made), it can complete. Completion involves performing all the associated STM actions, and then using a revoke function to remove the offers (which have now chosen this event, say: A) from all the events that they had offered on (e.g. event A, event B, event C):

We only require one further function. This function, offerAll, handles the creation of an offer, checks if any of the events in the offer can complete immediately, and otherwise records the offers in the event then waits for one of them to be completed by a later participants. It must use two transactions for this; one to make the offers (this transaction needs to finish for it to be visible to the other participants) and one to wait for an event to be completed. A crucial part of the function is not just knowing that an offer completed, but also knowing which one. For this we construct a TVar of our own into which a result can be written. This starts off as Nothing, and we later wait for it to become a Just value. We augment the user-supplied action-on-completion to also write into this TVar. The design of the algorithm as a whole ensures that this variable will only ever be written to once. Here is offerAll:

This is all that is needed for events with choice at both ends. You call offerAll with a list of offers and it gives you back the value you associated with that offer.

The Public API

To wrap this into a communication channel with a CML-like API, we wrap it up as follows. First we declare an SEvent type (named after CML, hence the re-use of the term event for another meaning) that represents a synchronisation action; this is a list (of choices), each containing an internal event, an action to perform during the completion of the offer, and one to perform afterwards that will yield a return value (which we can use for a Functor instance):

You can see that the choose function simply joins lists of choices together. We define our synchronisation function using offerAll, which will return the corresponding afterwards-STM-action for the chosen event, which we then execute using atomically:

The send function puts the value to send into the variable during the original event completion, and then afterwards the reader takes the value out of the variable at its leisure. (The implementation assumes that the same participants will use the channel each time; an extra level of indirection could be added to make the implementation more flexible in this regard.)

The code in this post provides nearly the same functionality as the CML library, but my tests indicate it is faster. I have now uploaded this code (with some documentation) as the sync package on Hackage. This provides a useful “lite” alternative to CHP that runs in the IO monad, and an alternative implementation of most of the features of the CML package.