Jason Lee Eckhardt wrote:>> As for gcc. Vladimir Makarov has implemented a FA-based hazard detection> module for the gcc instruction scheduler and an accompanying pipeline> description language (as an extension of the current machine desc. facility).> The code currently is in a development branch at gcc.gnu.org. It is to be> merged into the main sources when the code review process is done. Vlad> and I have both implemented production schedulers with his FA-based code.

Hi, Jason.

It is in the main line of the gcc repository now. So some gcc ports
using the DFA scheduler will be probably in 3.2 release of gcc.

In brief, gcc DFA scheduler is gcc Haifa interblock scheduler with
recognizing pipeline hazards based on DFA (deterministic finite state
automaton). Modern RISC and VLIW processors issue insns when the data
are ready for given insn and the processor functional units are free
to execute the insn. A pipeline hazard recognizer aims to recognize
processor stalls because of busy functional units.

The DFA pipeline hazard recognizer uses DFA for this. Each state
encodes the current and planned future reservations of the functional
units. Arcs between states are marked by insns. If there is an
output arc marked by an insn in the automaton, then the insn can be
issued in given state (i.e. all necessary functional units are free in
given processor state). The destination state encodes the functional
unit reservation after issuing the insn. There is always arc marked
by advancing processor cycle from each state. So DFA describing a
processor pipelines is closely connected (there is a way from each its
state to any other state).

Recognizing pipeline hazards based on the DFA is much faster than
other approaches. The more functional units are used in a processor
pipeline description, the faster DFA based pipeline hazard recognizer
is. It is really important. E.g. more 15% of ia64 gcc time spent in
pipeline hazard recognizer and bundling. It is important to implement
insn schedulers trying more heuristics, more insn schedules
(e.g. Muchnik in his book about optimized compiler mentioned the
problem of inadequate insn scheduling heuristics for superscalar RISC
processors). Usage of DFA also simplifies implementation of insn
scheduler algorithms (e.g. you could easy to compare processor states
in kernel recognition software pipelining algorithms).

With the user point of view, gcc DFA description is a set of regular
expressions. Each expression describes a functional unit reservation
by an insn, e.g.

p1+div,div*10
(p1|p2)+(int1|int2)

The first expression could describe non pipelined division (because
the functional unit `div' is reserved during all insn execution) which
is issued into the 1st pipeline (p1). The second one describes an
integer insn which is issued into the 1st pipeline (p1) and, if it is
busy, into the 2nd pipeline and which is executed on the first integer
unit (int1) and, if it is busy, on the 2nd integer unit (int2).

Gcc DFA description is more powerful than the previous one and
permits more accurate descriptions. It is more important for
classical RISC and VLIW processor scheduling (e.g. some embedded
benchmarks for ultrasparc was sped up to 15%) and less important for
out-of-order and/or speculative RISC processors.

Gcc DFA pipeline hazard recognizer can be used for VLIW packing
(insn bundling) too. Although separation of insn scheduling and VLIW
packing tasks might do a big impact on code quality. Fortunately, you
could easy describe VLIW packing rules for the DFA scheduler too.

The gcc DFA pipeline hazard recognizer infrastructure has many new
features in comparison with other DFA approaches:

1. Usage of alternatives. You could describe that an insn reserves
these units or other units.

2. Possibility of describing NDFA. The processor itself is a
deterministic device. Therefore DFA is usually used to describe
it. It means that the first possible alternative of the unit
reservation is chosen in given processor state (see the examples
above). In NDFA, the alternative operator means that any
alternative can be chosen. So you could describe all possible
VLIW packing and (even inserting NOP insns for VLIW packing).

3. Exclusion, presence, and absence sets which permits to describe
easily VLIW packing and some complex cases for RISC processors.

4. Implementation of more complex queries (e.g. minimal time in
which given insn can be issued in given state) in the pipeline
hazard recognizer interface. You also could query unit
reservation in given state (the automaton minimization is not
possible in this case).

Actually, gcc could have the biggest benefits from such
infrastructure because it is the single compiler which is ported to
majority of the processors. Having a huge feedback (from Richard
Henderson, Jeff Law, Jason Eckhardt, David Miller, Jan Hubicka, Dan
Nicolaescu, David Edelhson and tens of other developers, sorry if I
did not name you) I see the advantage of open source model to design
and improve the infrastructure.