In article <2964@ima.ima.isc.com>, markhall@pyrps5.pyramid.com (Mark Hall) writes:> I'm interested in knowing whether any of the common code> generator-generators (Glanville, Cattell, Ganapathi, etc) are used in> any production compilers. ...

I've tangentially worked on two and designed a third code generator using
CGGs, so I guess I've something to offer.

The first was the SofTech ALS Ada compiler for the Vax. I worked on this
towards the end of the implementation phase. This was a fairly direct
implementation of the technique described in Glanville's thesis. Graham was
actually a consultant during the early design stages for this compiler.
Glanville didn't really address global optimization/register allocation
issues and the ALS compiler dealt with this by doing on-the-fly register
allocation based on lifetime information produced by the global optimizer
which worked on a largely machine independent expanded DIANA tree. The parser
generator was written by a SofTech engineer. It was written in Ada, the
parsers were in Ada and the parser tables were Ada aggregates. A minor but
important change in the language accepted by the parser (compared to
Glanville's design) was to factor addressing modes. This was crucial to bring
the size of the parse tables for the VAX down to merely LARGE.

This compiler was retargeted to various of the Intel microchips (8086, 80286,
?) and to a navy computer (AN/UYK-?). The local code generation was
acceptable, but for the VAX and AN/UYK the code quality was poor. The reason
for this was that the register management scheme really did not work at all
well. The code quality for the Intel machines was fairly good. This was
partly because register management is just not very important and partly
because more tuning was done on the compilers.

The SofTech code generator was redesigned (by a group headed by me and a
colleague) for the NTT/DIPS machine. This is a variation on the IBM 370
architecture. Instead of using the straight LR approach of Graham and
Glanville, we went to an affix grammar approach. This is similar too, but not
the same as Ganapathi's approach. For this we contracted with MetaWare to
modify their TWS to produce an affix grammar parser. The actual language used
is described in the MetaWare TWS manual (such as it is). It shows a notable
Ada influence. Again the parser tables are Ada aggregates, and the parser
skeleton is written in Ada.

We also changed a number of aspects of the code generation/register allocation
process. The machine description used was not for the actual machine, but
rather for a similar virtual machine having all three operand instructions and
an infinite supply of all types of registers. [In contrast to the IBM
machine, the DIPS has separate arithmetic and address registers.] Following
code generation, we then do flow analysis, replacement of three operand
instructions by two operand instructions and register binding using a register
coloring approach.

I left SofTech before the implementation was complete, but I've been assured
by friends who are still there that the quality of code generated is quite
good. We actually had a contractual commitment to produce code within some
fixed percentage of the execution speed of the best hand assembled code for a
set of benchmarks and we met that goal.

All of these compilers are quite slow. This was primarily because the front
end and global optimizer are slow, but this just allowed the code generator to
slip by. These are slow code generators, but this seems to be a reflection
that speed was not important rather than an inability to be fast.

The most recent CGG based compiler I was involved with is the Retargetable
Back End at Prime Computer. Aspects of that project are discussed in two
recent papers:

The major change here was that Scott Turner came up with a different parsing
method. I'm told that Weisgerber and Wilhelm at the Universitat des
Saarlandes have devised a similar tree matching code generator.

Two compilers were built at Prime using this technique for code generation.
They were used internally, but never released due to a change in corporate
plans. The speed was not good and the code quality was poor. Again the local
code quality was quite good, but nothing was done take advantage of global
information. In particular the problem of integrating register management
well had not been solved when the project was cancelled.

As mentioned in <2971@ima.ima.isc.com> grow@druhi.att.com (Gary Oblock)
mentions the Intermetrics AIE Ada compiler as being based on PQCC technology.
Most of the recent Intermetrics compilers have been inspired by the PQCC
project and the Ada compiler came much closer to using the full PQCC approach,
but it's probably too strong to say it was based on the PQCC technology.

This also illustrates why the term "Code generator generator" is ill-advised.
The original hope of the PQCC project was to largely automate the production
of compilers using as input language and machine specifications. At
Intermetrics a large suite of tools have been built to help automate the
building of compilers, but the process has not gotten terribly much faster or
cheaper. Similarly at SofTech, the reason for going with the Graham-Glanville
technique was to produce retargetable compilers. But the process of
retargetting these compilers has not proved particularly easier than building
hand crafted code generators from well-defined specifications. The reason for
this is that local code selection - the part that can be done by parsing, tree
matching, what have you - is not the hardest part of code generation, so there
remains a substantial hand crafted part.

This is a discussion of the University of Karlsruhe Ada compiler which is
based on the GAG (Generator based on Attribute Grammars) and CGSS (Code
Generator Synthesis System). At least the VAX version of this compiler is
available from a small German company, Systeam A.G.

This describes the new code generators being used in the Apollo compiler for
their new RISC machines. It is based on bottom-up tree matching and was
implemented using LEX, YACC and C code.

Finally I mention that COMPASS has an automated code generator production
system (PEARL?) based on a lexical analysis technique. This is discussed in a
COMPASS technical report by Kathleen Knobe and Joan Lukas.

Based on my experience, both direct and indirect, I've come to feel that the
value of machine generated code generators (or, more likely, machine generated
code selectors) is in providing a framework for generating AND MAINTAINING
many different compilers. And this not because making one new compiler is
much faster this way than in a less automated fashion, but because this
enforces a discipline and uniformity which is very hard to achieve without
automation.

I would be very interested in hearing other viewpoints on this question.