Bλog

Links

The Blaze Builder

A fast builder monoid for string concatenation
Published on August 5, 2010 under the tag haskell

What is blaze-builder?

For our work on BlazeHtml, we were looking at string concatenation in Haskell from quite a low level. After some time, we decided we needed a Builder structure. Because we think it is useful for other libraries and programs as well, we have now released it as a separate package on Hackage.

The blaze-builder package is divided into three modules: Core, Utf8 and Html.

Core: provides the core functionality, such as inserting raw bytes and bytestrings.

Utf8: provides functions to insert String and Text values. These values are then internally stored as UTF-8.

Some might argue that HTML escaping functions do not belong in a builder package. However, this functionality was needed to get BlazeHtml (expect a 0.2 soon!) fast enough, and it does not introduce extra dependencies for the blaze-builder package (which only depends on base, text, and bytestring). So if you don’t need it, don’t import Html – like we do in the example.

An example

We’re not going to use the HTML-related functions in our example, so we do not need to import it.

result is now a lazy bytestring – which is a list of strict bytestrings. An important property of blaze-builder is that all these strict bytestrings will have a nice length of about 32kb (bigger chunks means less overhead when, for example, you are sending this string over the network).

Differences with Data.Binary.Builder

As some of you might know, the idea of having a “Builder” was stolen from the binary library. Initially, we used that builder, but meanwhile, we have written a new builder from scratch to bring optimal speed to Blaze. Here are some differences:

blaze-builder focuses on the concatenation of many small strings. You can small “small” as, say, chunks of less than 4kb.

blaze-builder fixes the internal representation to UTF-8.

blaze-builder provides extra functionality for HTML-related strings.

It is also faster for these small strings. Some benchmarks in which we timed the concatenation of 10000 small string pieces ("<img>") in different formats using the two builders (benchmarked using Criterion):