Using Transducers with Core.async in ClojureScript

I was working on a pretty big post last year when I was sidelined with a new job and started writing a book, Learning ClojureScript. Working on the book has given me a lot of opportunities to explore areas of the ClojureScript ecosystem that I wouldn't otherwise have explored, and some of the learnings that have resulted have been pretty cool.

Specifically, two of the things I've been learning about lately are core.async and transducers. Core.async is a highly popular Clojure(Script) library for embedding CSP-style concurrent program design in libraries and applications; transducers are composable functions useful for performing transformations on data.

There are a number of articles about core.async already floating around on the internet, but transducers appear to remain largely misunderstood. Speaking personally, I've have had quite a bit of trouble wrapping my mind around them, and one of the things that has helped me tighten up my understanding has been to get a better understanding of their specific use cases.

In general, I think the fundamental insight for understanding transducers is to realize that their application is not so different from other higher-order functions, but that they're applied to slightly different (though no less valuable) contexts. Thus far, the most awesome examples I've seen have been ones in which transducers were used in combination with core.async to build data transformations into asynchronous application message queues, so that's what I'll focus on here.

Let's get into it.

Error Handling Patterns

One of the advantages of core.async is that it encourages programs with a clear separation of concerns. Data is produced in one part of your application, and consumed and processed in another.

This separation of concerns has a drawback, however. If we have code that expects to be able to read from a channel, what happens when errors occur on the input side of the channel? We could catch and handle errors on the input side, but there are a number of circumstances in which that might be undesirable. For instance, what if there are many possible places in the codebase in which we put value into this channel? It is not unreasonable to imagine that we might want a central pipeline for handling errors.

There are a few different ways of going about this. One possibility would be to create a separate channel for error messages and to have an independent listener for that channel (a pattern discussed on the ClojureScript mailing list here).

Let’s assume we have an application where we're regularly polling the kitten factory to make sure production is running smoothly. If things are running smoothly, we'll create a report, and if they go badly, we'll log that to our exception tracking service.

Side note: What even happens at the kitten factory? Is it a factory that makes kittens, or do kittens run the factory? If they’re running the factory, what are they making – balls of yarn? I have trouble imagining this is a very efficient factory.

An alternative pattern that we could embrace would be one in which we passed the errors themselves into the channel and checked to see if a message was an error when consuming. This is a pattern that David Nolen has written about previously on his blog here.

If you read his post, he refers to a <? macro, the source code for which is available on his blog's GitHub repo. The <? macro just checks to see if the value that’s been pulled off of the channel is an error, and if it is, throws. Without this, we’d just be passing an error around by value and the compiler would have no reason to actually throw the error (thereby preventing us from catching it, as well).

We'll copy the relevant code from his blog's GitHub repo into our app, modulo a few changes:

This is a perfectly good solution, but do we really need that <? macro? This is a great chance to use a pure function (a transducer!) instead. Transducers can be passed as an optional argument to the chan constructor function, and they'll perform the given transformation on any values passed through the channel.

This means that we have an opportunity to write a transducer for our channel that just maps our helpful throw-err function over values being taken from the channel. Let's see what that looks like:

Let's take a look at another example - this one courtesy of my friend Allen (@arohner), with a few modifications.

Batching Data

Let's say we've got a similar setup to earlier, but instead we want to batch up at least 10 events before we send a report back to cat hq. If we were going to design this without transducers, we'd probably do something like the following:

This could probably be cleaned up some more, but at the very least it feels like we're doing more work than we should be. By contrast, if we were to use a transducer to batch these up, it would look like the following:

This is unquestionably much cleaner than the version without transducers. As Allen put it: "You put one kitten in, and take ten out. You can't explain that."

Conclusion

The thing that stands out to me about transducers is that they empower us in a way that is simple. As a language feature, it's easy to imagine an implementation where performing data transformations on asynchronous streams could have been over-engineered and a nightmare to write code for. Instead, Clojure(Script)'s transducers are easy to read, write and reason about.

There are plenty of other applications for transducers that I haven't talked about in this post, but for me, the core.async tie-in is where their value was first made clear. If you think transducers are n.b.d. and can explain why I can't find that image of Rich Hickey pasted onto the guy from Ancient Aliens on Google, feel free to let me know.