Recently I've been working on a library for generating large JSON[1]
documents quickly. Originally I started writing it in Haskell, but
quickly encountered performance problems. After exhausting my (meager)
supply of optimization ideas, I rewrote some of it in C, with dramatic
results. Namely, the C solution is
* 7.5 times faster than the fastest Haskell I could write (both using
raw pointer arrays)
* 14 times faster than a somewhat functional version (uses monads, but
no explicit IO)
* >30 times faster than fancy functional solutions with iteratees, streams, etc
I'm wondering if string processing is simply a Haskell weak point,
performance-wise. The problem involves many millions of very small
(<10 character, usually) strings -- the C solution can copy directly
from string literals into a fixed buffer and flush it occasionally,
while even the fastest Haskell version has a lot of overhead from
copying around arrays.
Dons suggested I was "doing it wrong", so I'm posting on -cafe in the
hopes that somebody can tell me how to get better performance without
resorting to C.
Here's the fastest Haskell version I could come up with. It discards
all error handling, validation, and correctness in the name of
performance, but still can't get anywhere near C:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=16423
[1] http://json.org/