2012 July 10

Announcing split-channel

The split-channel package is new library that is a small variation on Control.Concurrent.Chan. The most obvious change is that it splits the channel into sending and receiving ports. This has at least two advantages: first, that this enables the type system to more finely constrain program behavior, and second, a SendPort can have zero ReceivePorts associated with it, and messages written to such a channel can be garbage collected.

This library started life last fall as part of my experiments in adding support for PostgreSQL’s asynchronous notifications to Chris Done’s native pgsql-simple library. The initial motivation was that if a notification arrived and nobody was listening, I wanted to be able to garbage collect it. However, the type advantages are what keep me coming back.

Beyond the primary change, this library has a number of other small improvements over Control.Concurrent.Chan: the deprecated thread-unsafe functions aren’t there, and several operators have been added or improved, most notably listen, sendMany, fold, and split.

listen attaches a new ReceivePort to an existing SendPort. By contrast, Chan only provides the ability to duplicate an existing ReceivePort.

Edit: I was mistaken: listen is essentially equivalent to dupChan, whereas duplicate is new.

sendMany sends a list of messages atomically. It’s a better name than writeList2Chan, which is not atomic and is only a convenience function written in terms of send. However, writeList2Chan does work on infinite streams, whereas sendMany does not.

fold is a generalization of getChanContents, potentially avoiding some data structures.

split cuts an existing channel into two channels. It gives you back a new ReceivePort associated with the existing SendPort, and a new SendPort associated with the existing ReceivePorts. This is a more general operator than one I’ve used in a few places to transparently swap out backend services.

Chan does not provide the split operator, though one could be added. However I am skeptical that this is a good idea: it’s just a little too effect-ful for comfort. I think that putting a SendPort in an MVar tends to be a better idea than using split, even though it does introduce another layer of indirection.

Finally, a few acknowledgements are in order: primarily, Control.Concurrent.Chan and its authors and contributors, and secondarily, Joey Adams for GHC Bug #5870, the fix of which has been incorporated into split-channel.

I can’t look at your lib right now since hackage appears to be down, but did you see my library ‘chan-split’? I’m interested in seeing what you’ve done (especially curious how you managed to let the GC collect chan receive ends)

I did see your library, actually, but only after I’d released mine. The key is that I didn’t build a wrapper around Control.Concurrent.Chan, I reimplemented the idea. (It’s a very simple idea and implementation, by the way. Did you ever look at the source, or read the Concurrent Haskell paper?)

The problem is that a Chan is basically a pair consisting of a SendPort and ReceivePort, so an active channel necessarily has a reference to at least one ReceivePort. Thus messages will build up in that channel if somebody is talking and nobody is listening. So I got rid of that pair and made a few other small improvements.

Yes, I remember being surprised that Chans were implemented so simply with MVar. Your lib has quite different semantics and warrants a new implementation, but the naming clash is unfortunate. Do you have any interest in providing an instance for my SplitChan class?

A couple questions: doesn’t the existence of ‘listen’ weaken your advantage no. 1, since we can always get a receive port from a SendPort? won’t it be difficult to use ‘listen’ effectively (i.e. without throwing messages “into the aether” non-deterministically) in real concurrent code?

Well, same semantics, just wrapped up in a slightly different interface. The module name clash between our packages is somewhat unfortunate, I agree, but I doubt that it’ll be a problem, and I don’t really want to add any dependencies other than base.

Regarding listen and race conditions, I don’t think it’s going to be especially problematic compared to any other concurrent operator. I would reemphasize that listen is the equivalent of dupChan, so you can create the exact same problems as you can with Chan. And I think it enhances the potential advantage of having zero ReceivePorts, otherwise this situation is a lot less useful, in my opinion. I mean, the program that initially motivated this library required the use of listen.

I would point out that that the “zero receive ports” is something of a special situation, that the real advantage is the finer-grained types. I’ve only found one use for that situation, which is my fork of the pgsql-simple library, and that code was only in production for a couple of months before being replaced with the postgresql-simple library. And even then, the motivation was providing pgsql-simple with some desirable properties; the client of that library didn’t actually make use of those particular properties.

If not for the fact that I really like having SendPorts seperate from ReceivePorts, I probably wouldn’t still be using this library in other projects, and probably wouldn’t have released it on hackage.

Do you have any benchmarks that you’re using to measure GC behavior vs Chan? In all of my simple tests of vanilla Chan (on GHC 7.4.1), unless optimizations are turned completely off, messages are successfully garbage collected as we write, when there are no more readers (i.e. no more references to the fst / read side of the Chan).

I certainly see the benefit to making it easier for the GC to do its duties, but I’m wondering if there are any cases where you’ve seen your implementation have better behavior then straight Chan with optimizations on.

No, I haven’t benchmarked. I find that surprising, but only a little bit. (I’m not surprised by being surprised at GHC optimizations.) It’s probably better to not rely on this particular optimization though, unless it’s a particularly well-understood and reliable optimization. I would expect it to be relatively easy to come up with use cases where the optimization doesn’t kick in, for reasons that may be less than immediately obvious.

Yeah, I’m not too surprised that GHC manages to optimize Chan’s ReceivePort away in this simple benchmark. And it may well do so in more realistic use cases too. You may find it enlightening to study GHC’s core output, and probe some of the limitations of the set of optimizations that accomplishes this.