Instruction scheduling is a critical phase of compilation for extracting
large amounts of ILP from a program. In this work we present the status
of the implementation of an architecture-independent, aggressive global
instruction scheduler based on Treegions. We also present a technique for
efficiently tail duplicating treegions considering the tradeoff between
codes size and ILP. The implementation
currently resides on the sched-treegion-branch.

A treegion is a non-linear, single-entry, multiple-exit region of code
containing basic blocks that constitute a subgraph of the control-flow-graph
(CFG). Building large regions are critical for enabling a compiler to
exploit large amounts of parallelism through speculation. Unlike other
region formation algorithms, such as traces and superblocks, treegions take
into account multiple execution paths, producing larger regions and more
opportunities for speculation. In addition, treegions do not require special
architectural features (e.g., predication for Hyperblocks) for region formation.
GCC currently supports both linear regions, in the form of superblocks (tracer.c)
and extended basic blocks (sched-ebb.c), as well as non-linear regions
(sched-rgn.c), which are limited to loop-free procedures and reducible inner
loops. Treegions have the advantage that unlike superblocks and extended basic
blocks, treegion formation is based on the CFG and does not require profile
information. Further, treegion formation typically results in much larger
regions as compared to the current region formation implementation.

Natural treegion formation (i.e., treegions formed without tail duplication)
begins at the entry point of a procedure, which forms the root of a new
treegion. Starting at this root, the CFG is traversed and successor basic
blocks are absorbed into the treegion if they are not a merge point (i.e.,
have multiple predecessor edges). Once all possible blocks have been added
to the treegion, the leaf nodes, all of which are merge points, are added to
a saplings list. These saplings form the roots of other treegions. For each
sapling the same process is applied until all basic blocks in the CFG have been
consumed. Treegion formation requires only a single pass over the CFG. Treegions
also simplify the calculation of dominators since all basic blocks dominate all
successor blocks.

In this section we discuss our method for efficient tail duplication, with
treegions being the unit of duplication. Our heuristic considers the increase
in code size relative to the increase in ILP from tail duplication. This metric,
referred to as the Instantaneous Code Size Efficiency (ICSE), is defined as
the change in IPC relative to the increase in code size after tail duplication.

To calculate the ICSE of a tail duplicate candidate the IPC of a region must
be known at compile time. Since this information is not available we've
developed a heuristic to calculate the estimated execution time of a treegion.
We define the estimated execution time of a multi-path treegion as the sum of
the expected execution time of each path through treegion biased by the
execution frequency of each path. The execution frequency of each path is
determined through profiling or other heuristics. The expected execution time
of any path is the maximum of the data dependence bound and the resource bound.
The data dependence bound is calculated as the height of the true data dependence
bound of the data dependence graph (DDG) for a given treegion. The resource
bound is computed as the number of instructions in the treegion divided by the
issue width of the target machine.

The tail duplication process begins by calculating the ICSE for each tail
duplication candidate. Each control edge between a parent and child treegion
is a potential candidate with the child being the target for duplication.
After the ICSE has been calculated for each candidate, the best candidate is
chosen for tail duplication. After duplication the effected treegions are
reformed and the candidate list is updated. Tail duplication continues until
either no more candidates exist or no more candidates are above the ICSE threshold.
Discussion of this optimal threshold can be found in [1]. Under code-size or
compile time constraints, treegion size may also be limited by the number of
basic blocks and/or the number of instructions contained with in the treegion.

Due to the acyclic nature of treegions, the Haifa scheduler does not
require any modifications to schedule treegions. We do however have a
number of enhancements currently in development to improve the performance
of treegion scheduling.

The goal of Tree Traversal Scheduling (TTS) is to speedup every execution
path through the treegion. This is accomplished by prioritizing speculative
instructions from different paths which compete for limited resources. Profile
information is used to prioritize the scheduling of basic blocks within a
treegion. Tree traversal scheduling consists of two steps (1) construction of
the control/data dependence graph to perform instruction ordering and
(2) scheduling of the instructions in the treegion.

For each treegion instructions are prioritized based on a.) execution frequency,
b.) exit count heuristic to resolve ties from (a), and c.) data dependence height
to resolve ties from (b). Heuristic (a) gives priority to the most frequently
executed path. Heuristic (b) gives priority to instructions that help more exits,
which has the potential to make more speculative instructions ready for execution.

The algorithm for treegion scheduling is as follows: (1) For a treegion, sort the
basic blocks according to a depth-first traversal order with the child block
selected with the highest execution frequency. (2) Begin list scheduling blocks
at the root basic block. (3) During the scheduling of a basic block, consider
speculation for instructions dominated by this basic block.
(4) Repeat step 3 until all basic blocks in the treegion have been scheduled.

For questions related to the use of GCC,
please consult these web pages and the
GCC manuals. If that fails,
the gcc-help@gcc.gnu.org
mailing list might help.
Comments on these web pages and the development of GCC are welcome on our
developer list at gcc@gcc.gnu.org.
All of our lists
have public archives.

Copyright (C)
Free Software Foundation, Inc.
Verbatim copying and distribution of this entire article is
permitted in any medium, provided this notice is preserved.