enumerator: simple, efficient incremental IO for Haskell

Say you want to read in a huge file. You're going to calculate its checksum, or count how many newlines it has, or whatever. How do you do that when the file's larger than your machine's memory?

In most languages, the answer is "write a loop". You read the file in small chunks, then run those chunks through whatever processing you want to do. Each loop might do something different with the data, but they've all got the same boilerplate structure.

Haskell programmers noticed that if you squint a bit, files look like really long lists of bytes. Haskell already has tons of functions which work on lists, so all they needed to do to get easy file processing was trick the compiler. The trick these programmers used is called "lazy I/O".

It turns out that lazy I/O has a big downside: it makes thinking about the program's resource requirements very difficult. Servers based on lazy I/O tend to run out of file descriptors, or allocate huge amounts of memory, without any obvious way to fix them.

Another approach to the problem is to go back to the original buffer/loop design, and chop it up. Loops are split into a data source (or enumerator), a data sink (or iteratee), and intermediate data transformers (or enumeratees). These types are composable just like basic list functions, so it's easy to build up complex data processors from re-usable components.

Here's a quick example; we're going to count how many Unicode characters are in a UTF-8 file.