[comp.compilers] code generator-generators

markhall@pyrps5.pyramid.com (Mark Hall) (11/30/88)

I'm interested in knowing whether any of the common code
generator-generators (Glanville, Cattell, Ganapathi, etc) are used in
any production compilers.  Assuming they are not, is this just a
manifestation of academia vs. industry, or are there real reasons for
not using them in (*ahem*) real compilers?

Pointers and opinions welcome.  Particulary from those with experience
with more than one CGG.

-Mark Hall (smart mailer): markhall@pyramid.pyramid.com
	{amdahl|decwrl|sun|seismo|lll-lcc}!pyramid!markhall
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

grow@druhi.att.com (12/02/88)

In article <2964@ima.ima.isc.com>, markhall@pyrps5.pyramid.com (Mark Hall) writes:
> I'm interested in knowing whether any of the common code
> generator-generators (Glanville, Cattell, Ganapathi, etc) are used in
> any production compilers.  ...

The AIE Ada compiler done by Intermetrics Incorporated of Cambridge Mass.
used the PQCC (Production Quality Compiler Compiler) technology. This 
compiler does very sexy instruction selections and register allocations.
If anybody mentions this compiler they are very libale to say that it
was a real dog as far as its speed of compilation was concerned. This
is probabily true but, the blame should be placed on the intermediate
representations (Diana to be exact) rather than the code generator
technology!

The ALS Ada compiler done the Softech Corporation of Waltham Mass.
used the Graham-Glanville technology. This compiler's code wasn't
nearly as sexy as the code produced by the AIE but, it was supposed
to be fairly good. The same thing can be said about the speed of
compilation that was said about the AIE. Note, I believe that the
ALS compiler is still under development, so any comments made about
it must be taken as relevant only to it's early history!

                                 -- Gary Oblock
                                    Compiler Consultant to Bell Labs
                                    att!druhi!grow
                                    (303)538-4169
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

knighten@encore.uucp (Bob Knighten) (12/06/88)

In article <2964@ima.ima.isc.com>, markhall@pyrps5.pyramid.com (Mark Hall) writes:
> I'm interested in knowing whether any of the common code
> generator-generators (Glanville, Cattell, Ganapathi, etc) are used in
> any production compilers.  ...

I've tangentially worked on two and designed a third code generator using
CGGs, so I guess I've something to offer.

The first was the SofTech ALS Ada compiler for the Vax.  I worked on this
towards the end of the implementation phase.  This was a fairly direct
implementation of the technique described in Glanville's thesis.  Graham was
actually a consultant during the early design stages for this compiler.
Glanville didn't really address global optimization/register allocation
issues and the ALS compiler dealt with this by doing on-the-fly register
allocation based on lifetime information produced by the global optimizer
which worked on a largely machine independent expanded DIANA tree.  The parser
generator was written by a SofTech engineer.  It was written in Ada, the
parsers were in Ada and the parser tables were Ada aggregates.  A minor but
important change in the language accepted by the parser (compared to
Glanville's design) was to factor addressing modes.  This was crucial to bring
the size of the parse tables for the VAX down to merely LARGE.

This compiler was retargeted to various of the Intel microchips (8086, 80286,
?) and to a navy computer (AN/UYK-?).  The local code generation was
acceptable, but for the VAX and AN/UYK the code quality was poor.  The reason
for this was that the register management scheme really did not work at all
well.  The code quality for the Intel machines was fairly good.  This was
partly because register management is just not very important and partly
because more tuning was done on the compilers.

The SofTech code generator was redesigned (by a group headed by me and a
colleague) for the NTT/DIPS machine.  This is a variation on the IBM 370
architecture.  Instead of using the straight LR approach of Graham and
Glanville, we went to an affix grammar approach.  This is similar too, but not
the same as Ganapathi's approach.  For this we contracted with MetaWare to
modify their TWS to produce an affix grammar parser.  The actual language used
is described in the MetaWare TWS manual (such as it is).  It shows a notable
Ada influence.  Again the parser tables are Ada aggregates, and the parser
skeleton is written in Ada.

We also changed a number of aspects of the code generation/register allocation
process.  The machine description used was not for the actual machine, but
rather for a similar virtual machine having all three operand instructions and
an infinite supply of all types of registers.  [In contrast to the IBM
machine, the DIPS has separate arithmetic and address registers.]  Following
code generation, we then do flow analysis, replacement of three operand
instructions by two operand instructions and register binding using a register
coloring approach.

I left SofTech before the implementation was complete, but I've been assured
by friends who are still there that the quality of code generated is quite
good.  We actually had a contractual commitment to produce code within some
fixed percentage of the execution speed of the best hand assembled code for a
set of benchmarks and we met that goal.

All of these compilers are quite slow.  This was primarily because the front
end and global optimizer are slow, but this just allowed the code generator to
slip by.  These are slow code generators, but this seems to be a reflection
that speed was not important rather than an inability to be fast.

The most recent CGG based compiler I was involved with is the Retargetable
Back End at Prime Computer.  Aspects of that project are discussed in two
recent papers:

Prescott K. Turner, "Up-down parsing with prefix grammars",
SIGPLAN Notices, Vol. 21 No 12 (Dec. 1986) pp. 167-174

David Spector and Prescott K. Turner.  "Limitations of Graham-Glanville style
code generation", SIGPLAN Notices, Vol. 22 No. 2 (Feb. 1987) pp. 100-108

The major change here was that Scott Turner came up with a different parsing
method.  I'm told that Weisgerber and Wilhelm at the Universitat des
Saarlandes have devised a similar tree matching code generator.

Two compilers were built at Prime using this technique for code generation.
They were used internally, but never released due to a change in corporate
plans.  The speed was not good and the code quality was poor.  Again the local
code quality was quite good, but nothing was done take advantage of global
information.  In particular the problem of integrating register management
well had not been solved when the project was cancelled.

As mentioned in <2971@ima.ima.isc.com> grow@druhi.att.com (Gary Oblock)
mentions the Intermetrics AIE Ada compiler as being based on PQCC technology.
Most of the recent Intermetrics compilers have been inspired by the PQCC
project and the Ada compiler came much closer to using the full PQCC approach,
but it's probably too strong to say it was based on the PQCC technology.

This also illustrates why the term "Code generator generator" is ill-advised.
The original hope of the PQCC project was to largely automate the production
of compilers using as input language and machine specifications.  At
Intermetrics a large suite of tools have been built to help automate the
building of compilers, but the process has not gotten terribly much faster or
cheaper.  Similarly at SofTech, the reason for going with the Graham-Glanville
technique was to produce retargetable compilers.  But the process of
retargetting these compilers has not proved particularly easier than building
hand crafted code generators from well-defined specifications.  The reason for
this is that local code selection - the part that can be done by parsing, tree
matching, what have you - is not the hardest part of code generation, so there
remains a substantial hand crafted part.

There are other production compilers based on CGGs:

Rodney Farrow, "Experience with an attribute grammar-based compiler"
SIGPLAN Notices, 1982

This is a report on Intel's Pascal-86 compiler.

Hans-Stephan Jansohn, Rudolf Landwehr, Jochen Hayek and Michael Thatner,
"Generating MC68000 code for Ada", SIGPC Notes, Vol. 6 No. 2 (1983) pp. 81-87
(1983 ACM Conference on Personal and Small Computers)

This is a discussion of the University of Karlsruhe Ada compiler which is
based on the GAG (Generator based on Attribute Grammars) and CGSS (Code
Generator Synthesis System).  At least the VAX version of this compiler is
available from a small German company, Systeam A.G.

John Yates and Robert Schwartz, "Dynamic programming and industrial-strength
instruction selection:  Code generation by tiring, but not exhaustive, search",
SIGPLAN Notices, Vol 23 No. 10 (October 1988) pp. 131-140 

This describes the new code generators being used in the Apollo compiler for
their new RISC machines.  It is based on bottom-up tree matching and was
implemented using LEX, YACC and C code.

Finally I mention that COMPASS has an automated code generator production
system (PEARL?) based on a lexical analysis technique.  This is discussed in a
COMPASS technical report by Kathleen Knobe and Joan Lukas.

Based on my experience, both direct and indirect, I've come to feel that the
value of machine generated code generators (or, more likely, machine generated
code selectors) is in providing a framework for generating AND MAINTAINING
many different compilers.  And this not because making one new compiler is
much faster this way than in a less automated fashion, but because this
enforces a discipline and uniformity which is very hard to achieve without
automation.  

I would be very interested in hearing other viewpoints on this question.

Bob Knighten                        
Encore Computer Corp.
257 Cedar Hill St.        
Marlborough, MA 02172   
(508) 460-0500 ext. 2626

Arpanet:  knighten@multimax.arpa
Usenet:   {bu-cs,decvax,necntc,talcott}!encore!knighten
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request