GHC's rewrite rules (invoked by the RULE pragma) offer a powerful way to optimise your program. This page is place for people who use rewrite rules to collect thoughts about how to use them.

+

GHC's rewrite rules (invoked by the RULES pragma) offer a powerful way to optimise your program. This page is a place for people who use rewrite rules to collect thoughts about how to use them.

If you aren't already familiar with RULES, read this stuff first:

If you aren't already familiar with RULES, read this stuff first:

Line 12:

Line 15:

* Use the flag <tt>-ddump-simpl-stats</tt> to see how many rules actually fired.

* Use the flag <tt>-ddump-simpl-stats</tt> to see how many rules actually fired.

* For even more detail use <tt>-ddump-simpl-stats -ddump-simpl-iterations</tt> to see the core code at each iteration of the simplifer. Note that this produces '''lots''' of output so you'll want to direct the output to a file or pipe it to <tt>less</tt>. Looking at the output of this can help you figure out why rules are not firing when you expect them to do so.

* For even more detail use <tt>-ddump-simpl-stats -ddump-simpl-iterations</tt> to see the core code at each iteration of the simplifer. Note that this produces '''lots''' of output so you'll want to direct the output to a file or pipe it to <tt>less</tt>. Looking at the output of this can help you figure out why rules are not firing when you expect them to do so.

−

* You need to be careful that your identifiers aren't inlined before your RULES have a chance to fire. To control this we add an <tt>INLINE [1]</tt> pragma to identifiers we want to match in rules, to ensure they haven't disappeared by the time the rule matching comes around.

+

* Another tip for discovering why rules do not fire, is to use the flag <tt>-dverbose-core2core</tt>, which (amongst other things) produces the AST after every rule is fired. This can help you to examine whether one rule is creating an expression that thereby prevents another rule from firing, for example.

+

* You need to be careful that your identifiers aren't inlined before your RULES have a chance to fire. Consider

+

<haskell>

+

{-# INLINE nonFusable #-}

+

{-# RULES "fusable/aux" forall x y.

+

fusable x (aux y) = faux x y ; #-}

+

nonFusable x y = fusable x (aux y)

+

</haskell>

+

: You are possibly surprised when the rule for <hask>fusable</hask> does not fire. It may well be that <hask>fusable</hask> was inlined before rules were applied.

+

: To control this we add an <hask>NOINLINE</hask> or an <hask>INLINE [1]</hask> pragma to identifiers we want to match in rules, to ensure they haven't disappeared by the time the rule matching comes around.

+

+

To have rewrite rules fire in code interpreted in GHCi, you'll need

+

to explicitly ask for -frewrite-rules in an options pragma at the

+

start of your file.

+

+

=== Structure of simplification process ===

+

+

There are currently the simplifier phases "gentle", 2, 1, 0, each consisting of 4 iterations.

+

Starting with GHC 6.10 you can alter these numbers with the command line options <code>-fsimplifier-phases</code> and <code>-fmax-simplifier-iterations</code>.

+

However in each iteration rules are applied multiple times, until rules can no longer be applied.

+

That rules can no longer be applied is due to the fact that the simplifier chooses some way from outer to inner or reverse.

+

Actually it's always the same, but you should not rely on a particular order, mind you?

Which rule will fire? "f->g" or "h->g"? (Each rule disables the other.)

+

+

Answer: rules are matched against the AST for expressions basically

+

''bottom-up'' rather than top-down. In this example, "h->g" is the rule

+

that fires. But due to the nature of inlining and so on, there are

+

absolutely no guarantees about this kind of behaviour. If you really

+

need to control the order of matching, phase control is the only

+

reliable mechanism.

+

+

=== [[Confluent term rewriting system]] ===

+

+

Since there is no guarantee on a particular order of rule application, except the control by phases,

+

you should assert that the result is the same independent of the order of rule application.

+

This property of a term rewriting system is called confluence.

+

See for example:

+

<haskell>

+

{-# RULES

+

"project/project" forall x.

+

project (project x) = project x ;

+

+

"project/foo" forall x.

+

project (foo x) = projectFoo x ;

+

#-}

+

+

f = project . project . foo

+

</haskell>

+

For this set of rewriting rules it matters whether you apply "project/project" or "project/foo" first to the body of <hask>f</hask>.

+

In the first case you can apply additionally "project/foo" yielding <hask>projectFoo x</hask>,

+

whereas the second case leaves you with <hask>project (projectFoo x)</hask>.

+

To make the system confluent you should add the rule

+

<haskell>

+

project (projectFoo x) = projectFoo x

+

</haskell>

+

You can complete a rule system this way by hand, although it'd be quite a nice thing to automate it in GHC.

+

+

We assume that non-confluent rewriting systems are bad design,

+

but it is not clear how to achieve confluence for any system.

+

+

=== Pair rules ===

+

+

It is often useful to provide two implementations of a function, and

+

have rewrite rules pick which version to use depending on context. In

+

both GHC's foldr/build fusion, and more extensively in Data.ByteString's

+

stream fusion system, pair rules are used to allow the compiler to

+

choose between two implementations of a function.

+

+

Consider the rules:

+

+

<haskell>

+

"FPS length -> fused" [~1]

+

length = F.strConsumerBi F.lengthS

+

+

"FPS length -> unfused" [1]

+

F.strConsumerBi F.lengthS = length

+

</haskell>

+

+

This rule pair tells the compiler to rewrite occurences of <hask>length</hask> to a stream-fusible form in early simplifications phases, hoping for fusion to happen. However, if by phase 1 (remember that phases count down from 4), the fusible form remains unfused, it is better to rewrite it back to the unfused-but-fast implementation of length. A similar trick is used for <hask>map</hask> in the base libraries.

+

+

As we want to match <hask>length</hask> in the rules, we need to ensure that it isn't inlined too soon:

+

+

<haskell>

+

length :: ByteString -> Int

+

length (PS _ _ l) = assert (l >= 0) $ l

+

{-# INLINE [1] length #-}

+

</haskell>

+

+

and we need <hask>strConsumerBi</hask> to stick around for even longer:

+

+

<haskell>

+

strConsumerBi :: (Stream -> a) -> (ByteString -> a)

+

strConsumerBi f = f . readStrUp

+

{-# INLINE [0] strConsumerBi #-}

+

+

lengthS :: Stream -> Int

+

lengthS ...

+

{-# INLINE [0] lengthS #-}

+

</haskell>

+

+

Pair rules thus provide a useful mechanism to allow a library to provide

+

multiple implementations of a function, picking the best one to use

+

based on context.

+

+

=== Custom specialisation rules ===

+

+

Another use for rules is to replace a particular use of a slow,

+

polymorphic function with a custom monomorphic implementation.

+

+

Consider:

+

<haskell>

+

zipWith :: (Word8 -> Word8 -> a) -> ByteString -> ByteString -> [a]

+

</haskell>

+

+

This is a bit slow, but useful. It's often used to zip ByteStrings into

+

a new ByteString, that is:

+

<haskell>

+

pack (zipWith f p q)

+

</haskell>

+

+

We'd like to spot this, and throw away the intermediate [a] created. And

This code illustrates how higher order functions can be rewritten to optimised first order equivalents, for special cases like <hask>(==)</hask>. In the case of Data.ByteString, functions using <hask>(==)</hask> or <hask>(/=)</hask> are much faster when implemented with memchr(3), and we can use rules to do this, as long as it is possible to match sections. In ghc 6.6 we can now write:

+

+

<haskell>

+

"FPS specialise break (==x)" forall x.

+

break (==x) = breakByte x

+

+

"FPS specialise break (x==)" forall x.

+

break (x==) = breakByte x

+

</haskell>

+

+

Some fragility remains in this rule though, as described below.

+

+

=== Literals, dictionaries and sections ===

+

+

Consider:

+

<haskell>

+

break (== 10)

+

</haskell>

+

+

Hopefully, this can be rewritten to a <hask>breakByte 10</hask> call, however, the combination of sections, literals and dictionaries for Eq makes this rather fragile.

+

+

The rule for break ends up translated by GHC as;

+

+

<haskell>

+

forall ($dEq :: base:GHC.Base.Eq base:GHC.Word.Word8)

+

(x :: base:GHC.Word.Word8)

+

+

break (base:GHC.Base.== @ base:GHC.Word.Word8 $dEq x) =

+

breakByte x

+

</haskell>

+

+

Notice the LHS: an application of the selector to a (suitably-typed) Eq

+

dictionary. GHC does very little simplification on LHSs, because if it

+

does too much, the LHS doesn't look like you thought it did. Here it

+

might perhaps be better to simplify to GHC.Word.Word8.==, by selecting

+

from the dictionary, but GHC does not do that.

+

+

When this rules works, GHC generates exactly that pattern; we get

+

+

<haskell>

+

eq = (==) deq

+

main = ... break (\x. eq x y) ...

+

</haskell>

+

+

GHC is anxious about substituting eq inside the lambda, but it does it

+

because (==) is just a record selector, and hence is very cheap.

+

+

But when we put a literal inline, we get an (Eq a) constraint and a (Num

+

a) constraint (from the literal). Ultimately, 'a' turns out to be Int,

+

by defaulting, but we don't know that yet. So GHC picks the Eq

+

dictionary from the Num dictionary:

+

+

<haskell>

+

eq = (==) ($p1 dnum)

+

main = ... break (\x. eq x y) ...

+

</haskell>

+

+

Now the 'eq' doesn't look quite so cheap, and it isn't inlined, so the

+

rule does not fire. However, GHC 6.6 has been modified to believe that

+

nested selection is also cheap, so that makes the rule fire.

+

+

The underlying lesson is this: the only robust way to make rules fire is

+

if the LHS is a normal form. Otherwise GHC may miss the fleeting moment

+

at which (an instance of) the rule LHS appears in the program. The way

+

you ensure this is with inline phases: don't inline LHS stuff until

+

later, so that the LHS stuff appears in the program more than

+

fleetingly.

+

+

But in this case you have (==) on the LHS, and you have no phase control

+

there. So it gets inlined right away, so the rule doesn't match any

+

more. The only way the rule "works" is because GHC catches the pattern

+

right away, before (==) is inlined. Not very robust.

+

+

To make this robust, you'd have to say something like

+

+

<haskell>

+

instance Eq Word 8 where

+

(==) = eqWord8

+

+

eqWord8 = ..

+

{-# NOINLINE [1] eqWord8 #-}

+

+

{-# RULES

+

"FPS specialise break (x==)" forall x.

+

break (x`eqWord8`) = breakByte x

+

#-}

+

</haskell>

+

+

=== Rules and method sharing ===

+

+

GHC by default instantiates overloaded methods by partially applying the original overloaded identifier. This facilitates sharing of multiple method instances with one global definition. However, since a new function name is created during this process, rules matching the original names will not fire. Here is an example from <tt>Control.Arrow</tt>:

+

+

<haskell>

+

class Arrow a where

+

arr :: (b -> c) -> a b c

+

first :: a b c -> a (b,d) (c,d)

+

(>>>) :: a b c -> a c d -> a b d

+

+

{-# RULES

+

"compose/arr" forall f g . arr f >>> arr g = arr (f >>> g)

+

"first/arr" forall f . first (arr f) = arr (first f)

+

...

+

-#}

+

</haskell>

+

+

Consider an instance of an arrow and some code on which the rules above should fire:

+

+

<haskell>

+

newtype SF a b = SF ([a] -> [b])

+

+

instance Arrow SF where

+

arr f = SF (map f)

+

...

+

+

foo :: SF (Int,Int) (Int,Int)

+

foo = first (arr (+1)) >>> first (arr (+2) >>> arr (+3))

+

</haskell>

+

+

GHC would generate intermediate code like:

+

+

<haskell>

+

dsf :: Arrow SF

+

dsf = ...

+

+

first_1 = Control.Arrow.first SF dsf

+

arr_1 = Control.Arrow.arr SF dsf

+

+

foo = first_1 (arr_1 (+1)) ...etc...

+

</haskell>

+

+

Due to the introduction of <tt>first_1</tt> and <tt>arr_1</tt>, the rules no longer match since the names have changed.

+

+

The solution is to switch off sharing with the <tt>-fno-method-sharing</tt> flag.

+

+

=== Coexistence of fusion frameworks ===

+

+

I like to use my own fusion framework on an existing data structure because I want to experiment with it

+

or because I have a specific application and I want to optimize the fusion framework for it.

+

How can I disable the fusion rules shipped with that data structure - or at least defer them until the optimizer is finished with my rules?

+

+

Answer:

+

Second part of the question first:

+

Asserting that your rules are used before the standard rules is not possible with [[GHC]] up to version 6.8.

+

The current system is quite monolithic in this respect.

+

It would be a nice application of a more sophisticated rule control system that allows any number of simplifier phases with explicit statements which phase shall be entered after which other phase.

+

+

First part of the question:

+

You may wrap the data structure in a <hask>newtype</hask> or, to be entirely safe, redefine the data structure.

+

This means that several functions have to be lifted to the wrapped data type.

+

This is tedious, but given that you make an application specific fusion framework,

+

the set of basic functions will be different from that of the general data structure.

+

You might have planned to make your data type distinct anyway, may it be for the <hask>Arbitrary</hask> class of [[QuickCheck]].

+

Remember to attach a NOINLINE pragma to the wrapped functions,

+

otherwise the compiler may unpack the wrappers and starts fusion on the underlying data structure.

+

+

=== Interaction of INLINE and RULES ===

+

+

Rules can be seen as alternative function definitions. They are somehow special because they do not allow pattern matching, but allow expressions in arguments using existing variable names in the left hand side. Since pattern matching can be moved into a <hask>case</hask>, fusion rules are actually the more flexible way to define functions. Alternative function definition means that the compiler has to decide which definition to use: The original function definition (by an explicit call or by [[inlining]]/[[unfolding]]) or one of the optimizer rules ("a rule fires").

+

+

Now the critical question is: How do inlining and rule application interact?

+

+

Inlining is always an option for the compiler, whether you use the INLINE pragma or not.

+

The compiler measures some kind of size of the function and

+

decides whether a function is small enough in order to be inlined.

+

The INLINE pragma reduces this size virtually.

+

But the inlining can still be omitted.

+

In that sense RULES are applied more aggressively because they don't respect a size measurement of functions.

+

+

It's interesting to note, that declaring a function as INLINE disables fusion for the inner of the function.

+

E.g., you can expect that

+

<haskell>

+

doubleMap f g = map f . map g

+

</haskell>

+

is fused to

+

<haskell>

+

doubleMap f g = map (f . g)

+

</haskell>

+

However, if you add

+

<haskell>

+

{-# INLINE doubleMap #-}

+

</haskell>

+

then the function definition is not fused to <hask>map (f . g)</hask>.

+

The compiler expects that the function will never called as is, and thus skips fusion.

+

That is, you cannot control the application order of rules by enclosing expressions in function definitions,

+

that shall be fused before fusion with outer parts.

+

+

However, since the compiler may decide not to inline, the compiler may leave you with a call to the unoptimized <hask>doubleMap</hask>. You could prevent this e.g. by:

+

+

<haskell>

+

{-# NOINLINE doubleMap #-}

+

doubleMap f g = map f . map g -- will be fused

+

+

{-# RULES

+

"doubleMap" forall f g. doubleMap f g = map f . map g

+

#-}

+

</haskell>

+

+

This makes sure that the right hand side of <hask>doubleMap</hask> will be optimised for those cases when the rule doesn't fire, e.g. when <hask>doubleMap</hask> is applied to less than two arguments.

+

+

=== Interaction of SPECIALISE and INLINE ===

+

+

<hask>SPECIALISE</hask> pragmas are also some kind of rules, where calls to functions with a specific [[type class dictionary]] are replaced by calls to versions of a function which are instantiated to a specific type.

+

If you want to use a function the inlined way, it might be a bad idea to add the <hask>SPECIALISE</hask> pragma, since this will replace a call to the function by a call to a specialised function instead of inlining it.

+

+

== Future of rules in GHC ==

+

+

GHC has much too rigid a notion of phases up to version 6.8.

+

There are precisely 3, namely 2 then 1 then 0, and that does not give enough control.

+

Really we should let you give arbitrary names to phases,

+

express constraints (A must be before B), and run a constraint solver to map phase names to a linear ordering.

For even more detail use -ddump-simpl-stats -ddump-simpl-iterations to see the core code at each iteration of the simplifer. Note that this produces lots of output so you'll want to direct the output to a file or pipe it to less. Looking at the output of this can help you figure out why rules are not firing when you expect them to do so.

Another tip for discovering why rules do not fire, is to use the flag -dverbose-core2core, which (amongst other things) produces the AST after every rule is fired. This can help you to examine whether one rule is creating an expression that thereby prevents another rule from firing, for example.

You need to be careful that your identifiers aren't inlined before your RULES have a chance to fire. Consider

There are currently the simplifier phases "gentle", 2, 1, 0, each consisting of 4 iterations.
Starting with GHC 6.10 you can alter these numbers with the command line options -fsimplifier-phases and -fmax-simplifier-iterations.
However in each iteration rules are applied multiple times, until rules can no longer be applied.
That rules can no longer be applied is due to the fact that the simplifier chooses some way from outer to inner or reverse.
Actually it's always the same, but you should not rely on a particular order, mind you?

Up to (but not including) phase 1, we use the "map" rule to
rewrite all saturated applications of map with its build/fold
form, hoping for fusion to happen.
In phase 1 and 0, we switch off that rule, inline build, and
switch on the "mapList" rule, which rewrites the foldr/mapFB
thing back into plain map.

It's important that these two rules aren't both active at once
(along with build's unfolding) else we'd get an infinite loop
in the rules. Hence the activation control below.

Which rule will fire? "f->g" or "h->g"? (Each rule disables the other.)

Answer: rules are matched against the AST for expressions basically
bottom-up rather than top-down. In this example, "h->g" is the rule
that fires. But due to the nature of inlining and so on, there are
absolutely no guarantees about this kind of behaviour. If you really
need to control the order of matching, phase control is the only
reliable mechanism.

Since there is no guarantee on a particular order of rule application, except the control by phases,
you should assert that the result is the same independent of the order of rule application.
This property of a term rewriting system is called confluence.
See for example:

It is often useful to provide two implementations of a function, and
have rewrite rules pick which version to use depending on context. In
both GHC's foldr/build fusion, and more extensively in Data.ByteString's
stream fusion system, pair rules are used to allow the compiler to
choose between two implementations of a function.

to a stream-fusible form in early simplifications phases, hoping for fusion to happen. However, if by phase 1 (remember that phases count down from 4), the fusible form remains unfused, it is better to rewrite it back to the unfused-but-fast implementation of length. A similar trick is used for

This is useful for higher order functions as well. As of ghc 6.6, the
rule LHS syntax has been relaxed, allowing for sections and lambda
abstractions to appear. Previously, only applications of the following
form were valid:

"FPS specialise break (x==)"forall x.break((==) x)= breakByte x

That is, replace occurences of:

break(x==)

with the optimised breakByte function.
This code illustrates how higher order functions can be rewritten to optimised first order equivalents, for special cases like

(==)

. In the case of Data.ByteString, functions using

(==)

or

(/=)

are much faster when implemented with memchr(3), and we can use rules to do this, as long as it is possible to match sections. In ghc 6.6 we can now write:

Notice the LHS: an application of the selector to a (suitably-typed) Eq
dictionary. GHC does very little simplification on LHSs, because if it
does too much, the LHS doesn't look like you thought it did. Here it
might perhaps be better to simplify to GHC.Word.Word8.==, by selecting
from the dictionary, but GHC does not do that.

When this rules works, GHC generates exactly that pattern; we get

eq =(==) deq
main =...break(\x. eq x y)...

GHC is anxious about substituting eq inside the lambda, but it does it
because (==) is just a record selector, and hence is very cheap.

But when we put a literal inline, we get an (Eq a) constraint and a (Num
a) constraint (from the literal). Ultimately, 'a' turns out to be Int,
by defaulting, but we don't know that yet. So GHC picks the Eq
dictionary from the Num dictionary:

eq =(==)($p1 dnum)
main =...break(\x. eq x y)...

Now the 'eq' doesn't look quite so cheap, and it isn't inlined, so the
rule does not fire. However, GHC 6.6 has been modified to believe that
nested selection is also cheap, so that makes the rule fire.

The underlying lesson is this: the only robust way to make rules fire is
if the LHS is a normal form. Otherwise GHC may miss the fleeting moment
at which (an instance of) the rule LHS appears in the program. The way
you ensure this is with inline phases: don't inline LHS stuff until
later, so that the LHS stuff appears in the program more than
fleetingly.

But in this case you have (==) on the LHS, and you have no phase control
there. So it gets inlined right away, so the rule doesn't match any
more. The only way the rule "works" is because GHC catches the pattern
right away, before (==) is inlined. Not very robust.

GHC by default instantiates overloaded methods by partially applying the original overloaded identifier. This facilitates sharing of multiple method instances with one global definition. However, since a new function name is created during this process, rules matching the original names will not fire. Here is an example from Control.Arrow:

I like to use my own fusion framework on an existing data structure because I want to experiment with it
or because I have a specific application and I want to optimize the fusion framework for it.
How can I disable the fusion rules shipped with that data structure - or at least defer them until the optimizer is finished with my rules?

Answer:
Second part of the question first:
Asserting that your rules are used before the standard rules is not possible with GHC up to version 6.8.
The current system is quite monolithic in this respect.
It would be a nice application of a more sophisticated rule control system that allows any number of simplifier phases with explicit statements which phase shall be entered after which other phase.

First part of the question:

You may wrap the data structure in a

newtype

or, to be entirely safe, redefine the data structure.

This means that several functions have to be lifted to the wrapped data type.
This is tedious, but given that you make an application specific fusion framework,
the set of basic functions will be different from that of the general data structure.

You might have planned to make your data type distinct anyway, may it be for the

Rules can be seen as alternative function definitions. They are somehow special because they do not allow pattern matching, but allow expressions in arguments using existing variable names in the left hand side. Since pattern matching can be moved into a

case

, fusion rules are actually the more flexible way to define functions. Alternative function definition means that the compiler has to decide which definition to use: The original function definition (by an explicit call or by inlining/unfolding) or one of the optimizer rules ("a rule fires").

Now the critical question is: How do inlining and rule application interact?

Inlining is always an option for the compiler, whether you use the INLINE pragma or not.
The compiler measures some kind of size of the function and
decides whether a function is small enough in order to be inlined.
The INLINE pragma reduces this size virtually.
But the inlining can still be omitted.
In that sense RULES are applied more aggressively because they don't respect a size measurement of functions.

It's interesting to note, that declaring a function as INLINE disables fusion for the inner of the function.
E.g., you can expect that

doubleMap f g =map f .map g

is fused to

doubleMap f g =map(f . g)

However, if you add

{-# INLINE doubleMap #-}

then the function definition is not fused to

map(f . g)

.

The compiler expects that the function will never called as is, and thus skips fusion.
That is, you cannot control the application order of rules by enclosing expressions in function definitions,
that shall be fused before fusion with outer parts.

However, since the compiler may decide not to inline, the compiler may leave you with a call to the unoptimized

pragmas are also some kind of rules, where calls to functions with a specific type class dictionary are replaced by calls to versions of a function which are instantiated to a specific type.
If you want to use a function the inlined way, it might be a bad idea to add the

SPECIALISE

pragma, since this will replace a call to the function by a call to a specialised function instead of inlining it.

GHC has much too rigid a notion of phases up to version 6.8.
There are precisely 3, namely 2 then 1 then 0, and that does not give enough control.
Really we should let you give arbitrary names to phases,
express constraints (A must be before B), and run a constraint solver to map phase names to a linear ordering.
The current system is horribly non-modular.
(See Haskell-Cafe on Properties of optimizer rule application?)