Removing Impediments to Loop Fusion through Code Transformations

Bob Blainey, Christopher Barton, Jos&eacute Nelson Amaral

Loop fusion is a common optimization technique that takes several
loops and combines them into a single large loop. Most of the existing
work on loop fusion concentrates on the heuristics required to
optimize an objective function, such as data reuse or creation of
instruction level parallelism opportunities. Often, however, the code
provided to a compiler has only small sets of loops that are control
flow equivalent, normalized, have the same iteration count, are
adjacent, and have no fusion-preventing dependences. This paper
focuses on code transformations that create more opportunities for
loop fusion in the IBM\reg XL compiler suite that generates code for
the IBM family of PowerPC\reg processors. In this compiler an
objective function is used at the loop distributor to decide which
portions of a loop should remain in the same loop nest and which
portions should be redistributed. Our algorithm focuses on eliminating
conditions that prevent loop fusion. By generating maximal fusion our
algorithm increases the scope of later transformations. We tested our
improved code generator in an IBM pSeries\tm 690 machine equipped with
a POWER4\tm processor using the SPEC CPU2000 benchmark suite. Our
improvements to loop fusion resulted in three times as many loops
fused in a subset of CFP2000 benchmarks, and four times as many for a
subset of CINT2000 benchmarks.