Generalizing getNumberEnum

In part 2 of this series, we created a getNumberEnum function with a type signature:

getNumberEnum :: MonadIO m => Enumerator Int m b

If you don't remember, this means getNumberEnum produces a stream of Ints. In particular, our getNumberEnum function read lines from stdin, converted them to ints and fed them into an iteratee. It stopped reading lines when it saw a "q".

But this functionality seems like it could be useful outside the realm of Ints. We may like to deal with the original Strings, for example, or Bools, or a bunch of other things. We could easily define a more generalized function which simply doesn't do the String to Int conversion:

What we've done here is wrap around the original iteratee. As usual, we first need to unwrap the Iteratee constructor and the monad to get at the heart of the Step value. Once we have that innerStep value, we pass it to the go function, which simply transforms that values in the Stream value from Strings to Ints.

Even more general

Of course, it would be nice if we could apply this transformation to any iteratee. To start with, let's just pass the inner iteratee and the mapping function as parameters.

Nothing much to see here, it's basically identical to the previous version. What's funny is that enumerator comes built in with a map function to do just this, but it has a significantly different type signature:

What's with all this extra complication in type signature? Well, it's not necessary for map itself, but it is necessary for a whole bunch of other similar functions. But let's focus on this map for a second so we don't get lost: the first argument is the same old mapping function we had before. The second argument is a Step value. This isn't really so surprising: in our mapIter, we took an Iteratee with the same parameters, and we internally just unwrapped it to a Step.

But what's happening with that return value? Remembering the meanings for all these datatypes, it's an Iteratee which will be fed a stream of aOuts and return a Step (aka, a new iteratee, right?). This kind of makes intuitive sense: we've introduced a middle man which accepts input from one source and transforms a Step to a newer state.

But now perhaps the trickiest part of the whole thing: how do we actually use this map function? It turns out that an Enumeratee is close enough in type signature to an Enumerator that we can just do:

map read $$ sumIter

But the type signature on that turns out to be a little bit weird:

Iteratee String m (Step Int m Int)

Remembering that an Iteratee is just a wrapped up Step, what we've got here is an iteratee that takes Strings and returns an Iteratee, which in turn takes Ints and produces an Int. Having this fancy result allows us to do one of our great tricks with iteratees: plug in data from multiple sources. For example, we could plug some Strings into this whole ugly thing, run it, get a new iteratee which takes Ints, feed that some Ints and get an Int result.

(If all that went over your head, don't worry. I won't be talking about that kind of stuff any more.)

But often times, we don't need all of that power. We just want to stick our enumeratee onto our iteratee and get a new iteratee. In our case, we want to attach our map onto the sumIter to produce a new iteratee that takes Strings and returns Ints. In order to do that, we need a function like this:

What's interesting about the approach here is how similar it looks to an Enumerator. We're doing a lot of the same things: checking if the Step value is a Continue; if it's not, then simply return it. Then we capitalize on the Iteratee Monad instance, using the head function to pop two values out of the stream. If there's no more data, we return the original Continue value: just like with an Enumerator, we don't give an EOF so that we can feed more data into the iteratee later. If there is data, we pass it off to the iteratee, get our new step value and then loop.

Here, we read lines, skip every other input, convert the Strings to Ints and sum them.

Real life examples: http-enumerator package

I started working on these tutorials as I was working on the http-enumerator package. I think the usage of enumeratees there is a great explanation of the benefits they can offer in real life. There are three different ways the response body can be broken up:

Chunked encoding. In this case, the web server gives a hex string specifying the length of the next chunk and then that chunk. At the end, it sends a 0 to indicate the end of that response.

Content length. Here, the web server sends a header before any of the body is sent specifying the total length of the body.

Nothing at all. In this case, the response body lasts until an end-of-file.

In addition, the body may or may not be GZIP compressed. We end up with the following enumeratees, each with type signature Enumeratee ByteString ByteString m b: chunkedEncoding, contentLength and ungzip. We then get to do something akin to:

We create a chain: the data from the server is fed into the parseBody function. In the case of chunked encoding, the data is processed appropriately and then headers are filtered out. If we are dealing with content length, then only the specified number of bytes are read. And in the case of neither of those, parseBody is a no-op.

Whatever the case may be, the raw response body is then fed into decompress. If the body is GZIPed, then ungzip inflates it, otherwise decompress is a no-op. Finally, the parsed and inflated data is fed into the user-supplied bodyIteratee function. The user remains blissfully unaware of any steps the data took to get to him/her.