Data.BufferBuilder is an efficient library for incrementally building
up ByteStrings, one chunk at a time. Early benchmarks show it
is over twice as fast as ByteString Builder, primarily because
BufferBuilder is built upon an ST-style restricted monad and
mutable state instead of ByteString Builder's monoidal AST.

Internally, BufferBuilder is backed by a few C functions.
Examination of GHC's output shows nearly optimal code generation
with no intermediate thunks -- and thus, continuation passing and
its associated indirect jumps and stack traffic only occur when
BufferBuilder is asked to append a non-strict ByteString.

I benchmarked four approaches with a URL encoding benchmark:

State monad, concatenating ByteStrings: 6.98 us

State monad, ByteString Builder: 2.48 us

Crazy explicit RealWorld baton passing with unboxed state: 28.94 us (GHC generated really awful code for this, but see the revision history for the technique)