Sunday, January 19, 2014

Rewrite rules are a powerful tool that you can use to optimize Haskell code without breaking backwards compatibility. This post will illustrate how I use rewrite rules to implement a form of shortcut fusion for my pipes stream programming library. I compare pipes performance before and after adding shortcut fusion and I also compare the peformance of pipes-4.0.2 vs. conduit-1.0.10 and io-streams-1.1.4.0.

This post also includes a small introduction to Haskell's rewrite rule system for those who are interested in optimizing their own libraries.

Edit: This post originally used the term "stream fusion", but Duncan Coutts informed me that the more appropriate term for this is probably "short-cut fusion".

Rule syntax

The following rewrite rule from pipes demonstrates the syntax for defining optimization rules.

All rewrite rules are substitution rules, meaning that they instruct the compiler to replace anything that matches the left-hand side of the rule with the right-hand side. The above rewrite rule says to always replace for p yield with p, no matter what p is, as long as everything type-checks before and after substitution.

Rewrite rules are typically used to substitute code with equivalent code of greater efficiency. In the above example, for p yield is a for loop that re-yields every element of p. The re-yielded stream is behaviorally indistinguishable from the original stream, p, because all that for p yield does is replace every yield in p with yet another yield. However, while both sides of the equation behave identically their efficiency is not the same; the left-hand side is less efficient because for loops are not free.

Safety

Rewrite rules are not checked for correctness. The only thing the compiler does is verify that the left-hand side and right-hand side of the equation both type-check. The programmer who creates the rewrite rule is responsible for proving that the substitution preserves the original behavior of the program.

In fact, rewrite rules can be used to rewrite terms from other Haskell libraries without limitation. For this reason, modules with rewrite rules are automatically marked unsafe by Safe Haskell and they must be explicitly marked TrustWorthy to be used by code marked Safe.

You can verify a rewrite rule is safe using equational reasoning. If you can reach the right-hand side of the equation from the left-hand side using valid code substitutions, then you can prove (with some caveats) that the two sides of the equation are functionally identical.

pipes includes a complete set of proofs for its rewrite rules for this purpose. For example, the above rewrite rule is proven here, where (//>) is an infix synonym for for and respond is a synonym for yield:

for = (//>)
yield = respond

This means that equational reasoning is useful for more than just proving program correctness. You can also use derived equations to optimize your program , assuming that you already know which side of the equation is more efficient.

Laws

Rewrite rules have a significant limitation: we cannot possibly anticipate every possible expression that downstream users might build using our libraries. So how can we optimize as much code as possible without an excessive proliferation of rewrite rules?

I anecdotally observe that equations inspired from category theory prove to be highly versatile optimizations that fit within a small number of rules. These equations include:

Category laws

Functor laws

Natural transformation laws (i.e. free theorems)

The first half of the shortcut fusion optimization consists of three monad laws, which are a special case of category laws. For those new to Haskell, the three monad laws are:

If you take these three laws and replace (>>=)/return with for/yield (and rename m to p, for 'p'ipe), you get the following "for loop laws":

-- Looping over a yield simplifies to function application
for (yield x) f = f x
-- Re-yielding every element returns the original stream
for p yield = p
-- You can transform two passes over a stream into a single pass
for (for p f) g = for p (\x -> for (f x) g)

This analogy to the monad laws is precise because for and yield are actually (>>=) and return for the ListT monad when you newtype them appropriately, and they really form a Monad in the Haskell sense of the word.

What's amazing is that these monad laws also double as shortcut fusion optimizations when we convert them to rewrite rules. We already encountered the second law as our first rewrite rule, but the other two laws are useful rewrite rules, too:

Note that the RULES pragma lets you group multiple rules together, as long as you separate them by semicolons. Also, there is no requirement that the rule label must match the left-hand side of the equation, but I use this convention since I'm bad at naming rewrite rules. This labeling convention also helps when diagnosing which rules fired (see below) without having to consult the original rule definitions.

Free theorems

These three rewrite rules alone do not suffice to optimize most pipes code. The reason why is that most idiomatic pipes code is not written in terms of for loops. For example, consider the map function from Pipes.Prelude:

A free theorem is an equation that you can prove solely from studying the types of all terms involved. I will omit the proof of this free theorem for now, but I will discuss how to derive free theorems in detail in a follow-up post. For now, just assume that the above equations are correct, as codified by the following rewrite rule:

With this rewrite rule the compiler can begin to implement simple map fusion. To see why, we'll compose two map pipes and then pretend that we are the compiler, applying rewrite rules at every opportunity. Every time we apply a rewrite rule we will refer to the rule by its corresponding string label:

So those rewrite rules sufficed to fuse the two map passes into a single pass. You don't have to take my word for it, though. For example, let's say that we want to prove that these rewrite rules fire for the following sample program, which increments, doubles, and then discards every number from 1 to 100000000:

I've highlighted the rule firings that correspond to map fusion, although there are many other rewrite rules that fire (including more shortcut fusion rule firings).

Shortcut fusion

We don't have to limit ourselves to just fusing maps. Many pipes in Pipes.Prelude has an associated free theorem that rewrites pipe composition into an equivalent for loop. After these rewrites, the "for loop laws" go to town on the pipeline and fuse it into a single pass.

This is the kind of single pass loop we might have written by hand if we were pipes experts, but thanks to rewrite rules we can write high-level, composable code and let the library automatically rewrite it into efficient and tight loops.

Note that not all pipes are fusible in this way. For example the take pipe cannot be fused in this way because it cannot be rewritten in terms of a for loop.

Benchmarks

These rewrite rules make fusible pipe stages essentially free. To illustrate this I've set up a criterion benchmark testing running time as a function of the number of map stages in a pipeline:

In fact, once you have just two stages in your pipeline, pipes greatly outperforms conduit and breaks roughly even with io-streams. To show this I've written up a benchmark comparing pipes performance against these libraries for both pure loops and loops that are slightly IO-bound (by writing to /dev/null):

I hypothesize that pipes performs slightly slower on IO compared to io-streams because of the cost of calling lift, whereas iostreams operates directly within the IO monad at all times.

These benchmarks should be taken with a grain of salt. All three libraries are most frequently used in strongly IO-bound scenarios, where the overhead of each library is pretty much negligible. However, this still illustrates how big of an impact shortcut fusion can have on pure code paths.

Conclusions

pipes is a stream programming library with a strong emphasis on theory and the library's contract with the user is a set of laws inspired by category theory. My original motivation behind proving these laws was to fulfill the contract, but I only later realized that the for loop laws doubled as fortuitous shortcut fusion optimizations. This is a recurring motif in Haskell: thinking mathematically pays large dividends.

For this reason I like to think of Haskell as applied category theory: I find that many topics I learn from category theory directly improve my Haskell code. This post shows one example of this phenomenon, where shortcut fusion naturally falls out of the monad laws for ListT.