[comp.compilers] intermediate representation

napi@rangkom.MY (Mohd Hanafiah b. Abdullah) (08/07/90)

I would like know what people think is the best Intermediate Representation
(IR) to be used for highly effective optimizations and code generation, and
it should be portable.  An IR is one of the most crucial considerations in
designing a competent compiler.

Examples of IRs that I know:
(1) Abstract-syntax-tree (looks like Scheme code)
(2) DAG
(3) Three address code
(4) P-code
(5) Stanford's U-code

I'm more leaning towards "abstract-syntax-tree" since it is portable (machine
independent), readable, visibility of high-level structures (eg. loops) for
effective optimizations, language independent most of the time.

What do you think?

Please respond by email since I don't have online access to USENET
news.  Thanks.

Hanafiah
[But copy your messages to compilers if they're of general interest,
please. -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

briscoe-duke@CS.YALE.EDU (Duke Briscoe) (08/09/90)

You might be interested in the following PhD thesis:

Compilation by Program Transformation

Richard Kelsey
Yale University
May 1989

Abstract: This dissertation describes a simple compiler, based on concepts
from denotational semantics, that can be used to compile standard programming
languages and produces object code as efficient as that of production
compilers.  The compiler uses only source-to-source transformations.  The
transformations are performed on programs that have been translated into an
intermediate language resembling the lambda calculus.  The output of the
compiler, while still in the intermediate language, can be trivially
translated into machine code for the target machine.  The
compilation-by-transformation strategy is simple: the goal is to remove any
dependencies on the intermediate language semantics that the target machine
cannot implement directly.  Front-ends have been written for Pascal and
Basic, and the compiler produces code for the MC68020 microprocessor.

-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

preston@rice.edu (Preston Briggs) (08/09/90)

In article <1990Aug07.153407.8877@esegue.segue.boston.ma.us> you write:
>I would like know what people think is the best Intermediate Representation
>(IR) to be used for highly effective optimizations and code generation, and
>it should be portable.

>Examples of IRs that I know:
>(1) Abstract-syntax-tree (looks like Scheme code)
>(2) DAG
>(3) Three address code
>(4) P-code
>(5) Stanford's U-code

First, I'd classify (5) as an extension of (4) and I might group (2) with
(3).

I like low-level 3-address code (even lower level than assembly).  Several
optimizing compilers that have been built using such a representation -- the
best known is the PL.8 compiler built by researchers at IBM.

I don't much like IL's based on stack machines (roughly 4 and 5), at least
for optimization purposes.  Optimization wants explicit registers to hold
intermediate results.

AST's seem too high-level for doing lots of optimization.  We want the
details of array accesses, etc. exposed to the optimizer.

Actually, I'll qualify this a little.  I believe there are many optimizations
that can be carried out most effectively on a high-level representation (for
example, those requiring dependence analysis) and many that should be carried
out on a low level representation (e.g., CSE elimination, strength
reduction).  This is the sort of approach we've taken locally.

Naturally, there are counter-examples to everything I've said.  AST based
compilers include Bliss-11, the PQCC based systems, and some of the Ada
efforts.  There's also Guy Steele's Rabbit compiler.  Fred Chow's thesis
describes an optimizing Pascal compiler based on (5).  I believe Chow's work
is also the basis for the MIPS compilers.  Some version of (4) is used in
Powell's Modula-2 compiler.

Preston
[Anybody tried program dependency graphs, per Ferrante et al. at IBM.  From
the articles I've seen, they look pretty neat. -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

mod@westford.ccur.com (508) (08/10/90)

In article <1990Aug07.153407.8877@esegue.segue.boston.ma.us> napi@rangkom.MY (Mo
hd Hanafiah b. Abdullah) writes:
>I would like know what people think is the best Intermediate Representation
                            ...
>(1) Abstract-syntax-tree (looks like Scheme code)
                            ...
>I'm more leaning towards "abstract-syntax-tree" since it is portable (machine
>independent), readable, visibility of high-level structures (eg. loops) for
>effective optimizations, language independent most of the time.
                            ...

I'm just a beginner at this and I recently built a "student" compiler in a
university Compiler Construction class.  The text and the instructor were
both presenting an LR(1)/tuple implementation coded in Pascal.  I pushed hard
for (and obtained) permission to instead use a Lex/Yacc/AST approach written
in C.  My amateur's opinion is probably not worth much here, but it seems
apparent to me that AST's are the easiest IR to manipulate, especially
considering the amount of information they can contain.  I didn't use DAG's
(they seem like an inspired twist on the notion of an AST) because I found
their (perceived) complexity a bit daunting.  Well, enough noise - now for my
question:

Why do you make several references to the "visual" attributes of an AST?  My
instructor and textbook (Fischer/LeBlanc) only touched lightly on AST's and I
had to infer a lot from that little bit.  Is there a standard textual
representation of AST's?  Mine only existed as data structures in memory,
although the Dragon book mentioned that there is a kind of postfix
representation.  Is that it?

Thanks,

Michael O'Donnell    (508)392-2915    home(508)251-7576    fax(508)692-8307
...!{harvard,uunet,petsd}!masscomp!mod, mod@westford.ccur.com
Concurrent Computer Corporation, 1 Technology Way, Westford, MA  01886
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

grover@brahmand.Eng.Sun.COM (Vinod Grover) (08/10/90)

In article <1990Aug08.171640.13892@esegue.segue.boston.ma.us> Preston Briggs <preston@rice.edu> writes:
>AST's seem too high-level for doing lots of optimization.  We want the
>details of array accesses, etc. exposed to the optimizer.
>
>Actually, I'll qualify this a little.  I believe there are many optimizations
>that can be carried out most effectively on a high-level representation (for
>example, those requiring dependence analysis) and many that should be carried
>out on a low level representation (e.g., CSE elimination, strength
>reduction).  This is the sort of approach we've taken locally.

First of all, there is no reason why an AST based IR cannot have low-level
features (such as array/memory accesses). The Bliss-11 compiler had the '.'
(dot) operator for representing memory fetches. One nice feature property of
the dot operator was that it could be used, both, in lhs and rhs position of
an assignment operator. e.g. a[i] would be represented as .(a+.i) This made
a lot of optimizations quite simple and uniform. e.g a memory access could
be treated as a CSE.

Second, as Bliss-11 showed global CSE elimination can be done at a high
level. Similarly, strength reduction isnt that hard at a high level. Karl
Ottenstein showed, in an IEEE paper several years ago, that strength
reduction is easy to do on PDGs and can easily be extended to ASTs. (I do
not have the reference handy but can post it, if anyone is interested.)

Vinod Grover
Sun Microsystems
grover@sun.com
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

preston@titan.rice.edu (Preston Briggs) (08/10/90)

I wrote:
>>AST's seem too high-level for doing lots of optimization.  We want the
>>details of array accesses, etc. exposed to the optimizer.

>>I believe there are many optimizations that can be carried out 
>>most effectively on a high-level representation (for example, 
>>those requiring dependence analysis) and many that should be 
>>carried out on a low level representation (e.g., CSE elimination, 
>>strength reduction).

grover@brahmand.Eng.Sun.COM (Vinod Grover) writes:
>First of all, there is no reason why an AST based IR cannot have low-level
>features (such as array/memory accesses).

>Second, as Bliss-11 showed global CSE elimination can be done at a high
>level. Similarly, strength reduction isnt that hard at a high level. Karl
>Ottenstein showed, in an IEEE paper several years ago, that strength
>reduction is easy to do on PDGs and can easily be extended to ASTs. (I do
>not have the reference handy but can post it, if anyone is interested.)

I'm mostly presenting my "religious convictions" here; I agree that there
are many ways to skin a cat.  However, Ottenstein's paper serves a 
nice illustration.  

He discusses strength reduction of induction variables in the context of
DO/FOR-style loops, with the iteration variable clearly marked by the
header node, as increment expressions and the limit tests.

A more general algorithm (say Cocke and Kennedy) would perform strength 
reduction over any loop (while loops, loops built out of if's and goto's), 
would discover all the induction variables (not just the loop control 
variable), and would make provision for linear function test replacement.  
Other algorithms extend partial redundancy elimination to discover strength 
reductions in linear regions (I've seen versions by Chow and by Dhamdhere).

The point is that people are seduced by the apparent simplicity of
analysis of ASTs.  Don't be fooled!  Without care, you'll have to write
routines to analyze DO loops, WHILE loops, REPEAT-UNTILs, LOOPs with
EXITs, etc...  I write routines that work with more traditional flow
graphs (directed graphs) -- all loops look the same to me.  Ditto for CASE
statements, IF statements, GOTO's, ad nauseum.

I also wonder about the memory requirements of a detailed AST; that is,
one with all the fun of each multidimensional array access written out in
tree form.  PDG's and PDW's are even scarier in this respect.

While it may be that AST fans can show ways around my complaints, can they
also give me some positive reasons to switch?  What's the gain?

(speaking as a devil's advocate, of course)

-- 
Preston Briggs				looking for the great leap forward
preston@titan.rice.edu
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

larus@primost.cs.wisc.edu (James Larus) (08/12/90)

In my opinion (a phrase sadly missing from other replies to this message),
both RTL and AST/DAG representations are equivalent for a compiler designed
and written for a single machine.  I base this belief on having written or
worked on compilers that use both types of IR. Both forms represent the
program in a form that is easy to traverse and manipulate.  However, I don't
believe that the equivalence holds for retargetable compilers.  RTL appears
to be a much better IR for retargetable code generators.  The
Davidson-Fraser approach produces fast, practical code generators while the
retargetable CG for ASTs are still mainly university projects.

A more interesting question is: how would you demonstrate that one approach
is superior to the other one?  If computer science was a science, we could
just lock up a sample of programmers in a room and have them write a
compiler of a randomly-choosen type and compare the final products.

/Jim
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

grover@brahmand.Eng.Sun.COM (Vinod Grover) (08/12/90)

In article <1990Aug09.180627.18848@esegue.segue.boston.ma.us> I wrote:

>reduction is easy to do on PDGs and can easily be extended to ASTs. (I do
>not have the reference handy but can post it, if anyone is interested.)

Here is the reference:

    Karl Ottenstein, "A Simplified Framework for Reduction in Strength", 
    IEEE Transactions on Software Engineering, Vol 15, No 1, Jan 89, 
    pp. 86-92.

Vinod Grover
Sun Microsystems
grover@sun.com
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

hankd@dynamo.ecn.purdue.edu (Hank Dietz) (08/12/90)

In article <1990Aug08.171640.13892@esegue.segue.boston.ma.us>:
>In article <1990Aug07.153407.8877@esegue.segue.boston.ma.us>:
>>(1) Abstract-syntax-tree (looks like Scheme code)
>>(2) DAG
>>(3) Three address code
>>(4) P-code
>>(5) Stanford's U-code
>
>First, I'd classify (5) as an extension of (4) and I might group (2) with
>(3).
>
>I like low-level 3-address code (even lower level than assembly).

I agree 100%.  BTW, (3) is a walk embedding (2).  Catch is that (1) is
easier to automate, so, for example, PCCTS is headed that direction....

>AST's seem too high-level for doing lots of optimization.  We want the
>details of array accesses, etc. exposed to the optimizer.

You're making an assumption here that simply doesn't hold...  the AST
elements need not be *just* the syntax elements from program, but can
be any transformation thereof.  In other words, it can be quite low
level, just like code tuples.

>[Anybody tried program dependency graphs, per Ferrante et al. at IBM.  From
>the articles I've seen, they look pretty neat. -John]

Yes and no.  We've played with a similar form for automatic parallelization
at the process packaging stage, but it doesn't work so well for some
lower-level things....  The truth is that most optimizing, parallelizing,
compilers use several different intermediate forms which are all
interconnected (one form annotates another).  No big deal...  that is,
unless you want to print out a program's intermediate form
representation....  :-)

					-hankd@ecn.purdue.edu

PS: Why is there a gratuitous ref to Scheme in (1)...?
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

jouvelot@brokaw.lcs.mit.edu (Pierre Jouvelot) (08/13/90)

In article <1990Aug9.225859.10175@rice.edu> preston@titan.rice.edu
(Preston Briggs) writes:

>>>AST's seem too high-level for doing lots of optimization.  We want the
>>>details of array accesses, etc. exposed to the optimizer.

>The point is that people are seduced by the apparent simplicity of
>analysis of ASTs.  Don't be fooled!  Without care, you'll have to write
>routines to analyze DO loops, WHILE loops, REPEAT-UNTILs, LOOPs with
>EXITs, etc...  I write routines that work with more traditional flow
>graphs (directed graphs) -- all loops look the same to me.  Ditto for CASE
>statements, IF statements, GOTO's, ad nauseum.

With the AST approach, the cleanest way to deal with this overabundance of
control structures is to define a single "generic" loop construct; all the
control structures can then be trivially desugared (e.g., by the parser)
into your own version of loop.  This is the only loop your transformation
algorithms will ever have to deal with (besides, admittedly, the
unparser).

>While it may be that AST fans can show ways around my complaints, can they
>also give me some positive reasons to switch?  What's the gain?

It happens that a lot of tranformations applied by
optimizing/vectorizing/parallelizing compilers can be expressed consisely
in a recursive manner, i.e. can be defined by induction on your AST. This
gives a nice and powerful framework to express your algorithms and a data
structure that is perfectly matched to it.

The PIPS parallelizing compiler for Fortran which is under current
development at the Ecole des Mines by Remi Triolet, Francois Irigoin and me
applies this "generic" approach of AST to its most "extremist" point.  The
whole Fortran77 syntax is expressed in only a couple of dozens of
different datatypes. More precisely, the whole IR, plus the information
generated by the different phases of the parallelizer, is described in one
page of domain equations (we use a tool akin to IDL, called NewGen and
developped in-house, that generates, from a high-level description of
domains, functions that manipulate values of those types; all the PIPS
data types are implemented in NewGen, so that records and other unions
data structures are never explictly used by programmers).

Pierre
--
Pierre Jouvelot
. CAI, Ecole des Mines, Paris (France): jouvelot@ensmp.fr
. LCS, MIT, Cambridge (USA): jouvelot@brokaw.lcs.mit.edu
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

csmith@uunet.UU.NET (Chris Smith) (08/13/90)

In article <1990Aug9.225859.10175@rice.edu> Preston Briggs writes:

   >>AST's seem too high-level for doing lots of optimization.

   A more general algorithm (say Cocke and Kennedy) would perform strength 
   reduction over any loop (while loops, loops built out of if's and goto's), 
   would discover all the induction variables (not just the loop control 
   variable), and would make provision for linear function test replacement.  

These algorithms *are* tree based, if not AST based -- the first step is
to do some control flow analysis to write the program as a sequence of
properly nested loops.  You can do the same optimizations on the loops no
matter how you discover them.

   The point is that people are seduced by the apparent simplicity of
   analysis of ASTs.  Don't be fooled!  Without care, you'll have to write
   routines to analyze DO loops, WHILE loops, REPEAT-UNTILs, LOOPs with
   EXITs, etc...  I write routines that work with more traditional flow
   graphs (directed graphs) -- all loops look the same to me.  Ditto for CASE
   statements, IF statements, GOTO's, ad nauseum.

Depends on how abstract the ASTs are.  All of those loops can be easily
written as infinite trivial loops with breaks, plus other stuff in the
right places.  You need sequential composition, alternation,
loop/break/continue, and nothing else, to handle everything except goto.
(maybe alternation should be an n-way branch, maybe it makes sense to have
a special form for C switch, but such details don't matter much).  Gotos
that don't jump into statements from outside are pretty easy too (as break
or continue in a properly positioned loop).

   While it may be that AST fans can show ways around my complaints, can they
   also give me some positive reasons to switch?  What's the gain?

It runs in linear time, and it does each thing that needs doing (and/or
bit vectors, scan through them) exactly once.  Probably faster.

You have a hold on both ends of an interesting region.  Say, in

	if ... then
	   ... a + b ...
	end if
	... a + b ...

where the if doesn't modify a+b and you're allowed to raise exceptions
prematurely.  You can insert a+b before the if.  Partial redundancy
elimination won't.  (It would insert a+b into an else branch, but there
isn't one.)  Iterative algorithms have info about one end of a region
whose other end is implicitly the procedure entry or exit, or sometimes
the enclosing loop entry or exit.  Tree-based algorithms extend this.

In
	loop
	 if ... a + b ... end if ...
	end loop

If a+b is loop invariant and doesn't raise exceptions or you don't care,
you can hoist it.  You might not want to if it's under switch rather than
if.  With trees, you can.
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

preston@titan.rice.edu (Preston Briggs) (08/13/90)

I wrote:
>A more general algorithm (say Cocke and Kennedy) would perform strength 
>reduction over any loop (while loops, loops built out of if's and goto's), 

Chris Smith replied:
>These algorithms *are* tree based, if not AST based

I disagree.  They're based on directed graphs.

> -- the first step is to do some control flow analysis

over the control flow graph, right?

> to write the program as a sequence of properly nested loops.  You can do
>the same optimizations on the loops no matter how you discover them.

Remember, I was arguing about Ottenstein's paper, which had an explicit
DO-loop header node, with subnodes that distinguished the induction variable,
loop body, and termination conditions.  My point was that such an
approach, while simple, is not general.

A sub-point is that the loops are discovered in a an explicit pass over 
the control flow graph (which is built in an explicit pass over the IL), 
not while parsing.  AST's are built while parsing, though they aren't
the same as parse trees.  That's why there's an "S" for Syntax in AST.

Parsing is local.  It's hard to discover global facts (like the connection of
goto's and labels) while parsing.

>Gotos that don't jump into statements from outside are pretty easy too

I don't agree that even the "simple" goto's are easy.  Break's and
Continue's are easy since they're bound to enclosing loop structures.
Imagine you're parsing along (we'll say Pascal), building your AST, and you
see a goto...

	while i < limit do begin
	    i := i + 1;
	    x := a[i];
	    if x = 0 then goto 10;
	    ...
	end;

what do you do?  Is it a tiny local branch that describes a little
if-then-else?  Is it a giant break statement (perhaps out of several levels
of loop)?  Is it a backwards branch describing a loop?  Is it a forwards
branch describing part of a loop?

And what are you going to do about Fortran?

>It runs in linear time, and it does each thing that needs doing (and/or
>bit vectors, scan through them) exactly once.  Probably faster.
						^ eh?

If you've got a language that restricts you to reducible routines (not a bad
thing!), then you can do data flow analysis in a linear number of bit-vector
steps.

This is indeed quick (and it's also simple), but it isn't general since
you can't handle irreducible routines.  You could do node splitting,
but that's potentially exponentional.

>You have a hold on both ends of an interesting region.  Say, in
>
>	if ... then
>	   ... a + b ...
>	end if
>	... a + b ...
>
>where the if doesn't modify a+b and you're allowed to raise exceptions
>prematurely.  You can insert a+b before the if.  Partial redundancy
>elimination won't.  (It would insert a+b into an else branch, but there
>isn't one.)  

Use one of the edge placement algorithms,
such as

	A Fast Algorithm for Code Movement Optimisation
	Dhamdhere
	Siplan Notices, Volume 23, Number 10

or

	A Solution to a Problem with Morel and Renvoise's
	`Global Optimization by Suppression of Partial Redundancy'
	Dreschler and Stadel
	TOPLAS, Volume 10, Number 4 (1988)

>Iterative algorithms have info about one end of a region
>whose other end is implicitly the procedure entry or exit, or sometimes
>the enclosing loop entry or exit.  Tree-based algorithms extend this.
>
>In
>	loop
>	 if ... a + b ... end if ...
>	end loop
>
>If a+b is loop invariant and doesn't raise exceptions or you don't care,
>you can hoist it.  You might not want to if it's under switch rather than
>if.  With trees, you can.

I'm a little baffled here.  You seem to be arguing iterative analysis vs.
some sort of interval-based analysis.  Maybe that's what you intended and
I've misunderstood the point of your article.  We could certainly use one
form of interval analysis or another if desired, though there is always the
ugliness of irreducible flow graphs.  Node splitting is a solution, but it's
a lot of code for me to write when I no a simpler way.

At any rate, partial redundancy elimination wouldn't hoist a+b in the
example shown since it isn't always executed (the IF condition may never be
true).  It's a conservative design decision, but doesn't reflect on the rest
of the discussion that I can see.

-- 
Preston Briggs				looking for the great leap forward
preston@titan.rice.edu
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

preston@titan.rice.edu (Preston Briggs) (08/13/90)

In article <1990Aug11.203048.848@mintaka.lcs.mit.edu> jouvelot@brokaw.lcs.mit.edu (Pierre Jouvelot) writes:

>With the AST approach, the cleanest way to deal with this overabundance of
>control structures is to define a single "generic" loop construct; all the
>control structures can then be trivially desugared (e.g., by the parser)
>into your own version of loop.

I don't see the trivial desuguring by the parser, particularly with an IF -
GOTO form of the loop.

I can see discovering the entire CFG and then doing interval analysis to
discover a nice tree-like grouping of nested control structures a la Sharir
(or others), but you're still punting on irreducible flow graphs.

However, thanks for the interesting post on your design.  It's fun to see
how other people approach problems.

-- 
Preston Briggs				looking for the great leap forward
preston@titan.rice.edu
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

jouvelot@brokaw.lcs.mit.edu (Pierre Jouvelot) (08/14/90)

In article <1990Aug12.212723.1247@rice.edu> preston@titan.rice.edu 
(Preston Briggs) writes:

>In article <1990Aug11.203048.848@mintaka.lcs.mit.edu> I write:
>
>>With the AST approach, the cleanest way to deal with this overabundance of
>>control structures is to define a single "generic" loop construct; all the
>>control structures can then be trivially desugared (e.g., by the parser)
>>into your own version of loop.
>
>I don't see the trivial desuguring by the parser, particularly with an IF -
>GOTO form of the loop.
>
>I can see discovering the entire CFG and then doing interval analysis to
>discover a nice tree-like grouping of nested control structures a la Sharir
>(or others), but you're still punting on irreducible flow graphs.

Well, you almost got a point here. The way the PIPS system deals with
irreducible flow graphs is interesting. Basically, the AST data type
definition looks something like this (I use the straightforward NewGen
syntax) :

    instruction = block:instruction* + test + call + loop + unstructured ;

    test = condition:expression x true:instruction x false:instruction ;

    unstructured = instruction x predecessors:unstructured*
			       x successors:unstructured* ;

    ....

which says that an instruction is either a block (which is a list of
instructions), a test (which has a condition expression and a true and false
instruction), a call or a loop (not specified here) or an "unstructured". An
"unstructured" datatype denotes a control structure that could correspond to
an irreducible graph. The key point is that the instruction that appears in
an "unstructured" value can be *any* instruction (i.e., can be a block, or a
test ... or even another control graph !). The usual data flow algorithms
seem to carry along with this kind of recursive approach (although I still
have to find the time to prove it :-).

With this way, you keep the power of ASTs while being able to deal with
arbitrary control graphs on which, our parallelizer is pretty unable to do
much about, anyway. However, notice that, if you can figure out (as I do)
that a "dirty" control graph is embedded inside an otherwise "nice" DO loop,
then from the outside, the loop looks like a well structured one which can
be parallelized ; this is a great plus of this approach.

Pierre
--
. CAI, Ecole des Mines, Paris (France): jouvelot@ensmp.fr
. LCS, MIT, Cambridge (USA): jouvelot@brokaw.lcs.mit.edu
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

marti@antares.inf.ethz.ch (08/14/90)

In article <1990Aug07.153407.8877@esegue.segue.boston.ma.us>
napi@rangkom.MY (Mohd Hanafiah b. Abdullah) writes:
>I would like know what people think is the best Intermediate Representation
>(IR) to be used for highly effective optimizations and code generation, and
>it should be portable.

Many language designers/implementors have turned to using C as an intermediate
representation language, for example in the following research projects and/or
products:

- C++ translator from AT&T (cfront)
- portable Cedar compiler developed at Xerox PARC
- Eiffel compiler from ISE
- Modula-3 compilers developed at DEC SRC and the (now defunct?) Olivetti
  Research Center

Compilation speed using C as an IRL may not be overwhelming, but this solution
is certainly portable.  Of course, the quality of the generated machine code
depends substantially on on the C compiler used as a backend.

--
Robert Marti                        Phone:    +41 1 254 72 60
Institut fur Informationssysteme
ETH-Zentrum                         Internet: marti@inf.ethz.ch
CH-8092 Zurich, Switzerland         UUCP:     ...uunet!mcvax!ethz!marti
[Probably not what the original author had in mind, but a good thought
nonetheless. -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

muller@procope.pa.dec.com (Eric Muller) (08/14/90)

In article <1990Aug13.214614.16644@esegue.segue.boston.ma.us>, marti@antares.inf.ethz.ch writes:

|> Many language designers/implementors have turned to using C as an 
|> intermediate representation language, for example in the following 
|> research projects and/or products:
|>   ...
|> - Modula-3 compilers developed at DEC SRC and the (now defunct?) Olivetti
|>   Research Center
|> 
|> Compilation speed using C as an IRL may not be overwhelming, but this 
|> solution is certainly portable.  Of course, the quality of the generated 
|> machine code depends substantially on on the C compiler used as a backend.

I am working on the SRC Modula-3 system. Bill Kalsow wrote most of the
compiler before I joined and he chose C to achieve portability. This
certainly works: our system runs on VAX/Ultrix, DECstation/Ultrix,
SPARC/SunOs, Apollo DN 4500/DomainOs, IBM R6000/AIX 3.1, AIX/PS2, IBM RT/IBM
4.3, HP 9000/300/HP-UX, without too much pain.

However, we do not view C as an intermediate language, but rather as a
portable assembler. The compiler maintains a private AST as the intermediate
representation.

The speed of compilation is not too bad: the Modula-3 to C compilation time
is of the same order as the C to .o compilation. The quality of generated
code depends a lot on the quality of the C compiler. This should not be very
surprising, as we are trying to have a simple compiler: many intermediate
expressions become separate assignments in C, for example. I have compared
the assembly generated for equivalent Modula-3 and C programs, using the -O2
optimizations of /bin/cc on a DECstation; in many cases, the same assembly
is generated. We may spend some time to further improve that aspect.

One of the advantages of generating C is that we can assume some runtime
support. Modula-3 has exceptions and requires the presence of a Thread
module (lightweight processes). setjmp/longjmp are enough to implement that.

My own conclusion is that generating C or using a native back-end does not
influence that much the quality of the generated code. Choosing one way or
the other is a compromise between compilation speed and portability.

Eric Muller.
------------------------------------------------------------------------------
System Research Center - 130 Lytton Av. - Palo Alto, CA 94301 - (415) 853 2193
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

pd@complex.Eng.Sun.COM (Peter C. Damron) (08/15/90)

I think that the discussion of intermediate representations has
centered too much on the abstract structure of the IR.

There are 3 pieces that I can think of to an IR:
1. The abstract machine model
2. The operators and their semantics
3. The abstract structure (AST, 3-address, PDG, etc.)

I would say that #1 and #2 are more important than #3 in determining
how hard it is to optimize over that IR.

For #1, the usual choices are either a stack machine, or a register
based stack machine.

For #2, I'm not sure that there is much of a consensus about what
the best set of operators is.  Any thoughts?

What about "ROSIR" -- reduced operator set intermediate representation?

Just a thought,
Peter.

----------------------------
Peter C. Damron
Sun Microsystems, Inc.
Advanced Languages, MS 12-40
2550 Garcia Ave.
Mtn. View, CA  94043

pdamron@eng.sun.com
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

staff@cadlab.sublink.org (Alex Martelli) (08/18/90)

marti@antares.inf.ethz.ch writes:

>Many language designers/implementors have turned to using C as an
>intermediate representation language, for example in the following research
>projects and/or
[...omitted for brevity...]
>Compilation speed using C as an IRL may not be overwhelming, but this
>solution is certainly portable.  Of course, the quality of the generated
>machine code depends substantially on on the C compiler used as a backend.

...and don't forget Fortran... (see my msg of today in comp.lang.fortran-
f2c + Sun's cc produce code which runs almost twice as fast as Sun's
f77, for a strictly numerical code -2d complex fft- on a Sparcstation 1!).
-- 
Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 45, Bologna, Italia
Email: (work:) staff@cadlab.sublink.org, (home:) alex@am.sublink.org
Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; 
Fax: ++39 (51) 366964 (work only; any time of day or night).
[I knew that Sun's f77 wasn't great, but that's embarassing. -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

ok@goanna.cs.rmit.OZ.AU (Richard A. O'Keefe) (08/20/90)

In article <250@cadlab.sublink.ORG>, staff@cadlab.sublink.org (Alex Martelli) writes:
> marti@antares.inf.ethz.ch writes:
> ...and don't forget Fortran... (see my msg of today in comp.lang.fortran-
> f2c + Sun's cc produce code which runs almost twice as fast as Sun's
> f77, for a strictly numerical code -2d complex fft- on a Sparcstation 1!).

On reading this I immediately went and converted a small numerical code
from Fortran to C and ran it on an Encore MultiMax (NS 32532 processors)
running UMAX V 2.2M.  The f77 and cc compilers on that machine do not appear
to be ATT or Berkeley derived; the f77 compiler in particular is described
as "an advanced, optimizing compiler".  The code didn't exploit any of the
micro-tasking or parallel features of Encore Fortran or Encore C, and no
special declarations ('register' or the like) were used.  Both used double
precision.  Result:  the C code ran twice as fast (4 seconds / 8 seconds).

The Fortran code used 2D arrays.  The C code used 1D with strength reduction
(a standard optimising technique) done by hand, and indexing replaced by
pointer increments (also a standard technique, and the NS32532 doesn't use
auto-??crement addressing) by hand.  f2c starts to sound interesting...

So it wasn't just Sun.  (I say "wasn't" because as the original poster
pointed out, Sun have some new compilers.)
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

han@cs.rochester.edu (02/21/91)

   I am taking a compiler seminar course that emphasizes the back end
optimization issues.   In order to get hands-on experience, I would like to
practice applying various optimization techniques on the intermediate
representation produced by the front end.   (I have not decided on the
front end, yet.  I might write one or pick up one and modify.)   At this
point, I am looking for intermediate representations that I can play with.
So, my first question is:  Are there well documented (and, hopefully,
state-of-the-art) definitions of intermediate representations available?
How can I get them?  (Does the RTL that GCC uses count as one such IR?)
   Actually, my fellow students and I have taken a look at the RTL dump
produced by GCC.   To our surprise, the RTL dump showed strong dependency
on the target machine even at the earliest stage of optimization.  
The RTL dump on SPARC looked very different from that on Sun3 workstation.
We had expected that the machine dependency would show up at much later
stage and that we would be able to try optimization with RTL without worrying
about machine dependencies.
So, my second question is: Is it generally expected for an intermediate
representation to be machine dependent?   How far toward the end of code
generation phase can machine dependencies be prevented from showing up?
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

lavinus@csgrad.cs.vt.edu (02/23/91)

Hi!

In answer to both your questions, I hope: ideally, intermediate
representations should not be at all machine dependent (beyond the fact that
they might be dependent on, say, the Von Neumann architecture).  When I think
of intermediate code, I think of something along the lines of P-code (by
Wirth), which is a sort of machine code for a simple, hypothetical,
stack-based machine, and is quite generic and non-machine-dependent.  That
might be a good one for your purposes.  By the way, could you (or a
colleague) suggest a good book or reference on such back-end compiler
optimization?  

Thanks and good luck...

Joe Lavinus
--
Joseph W. Lavinus
Virginia Tech, Blacksburg, Virginia
email: lavinus@csgrad.cs.vt.edu
[I suspect that after you read all about compiler optimization you'll be more
sympathetic to machine dependent intermediate code. -John]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

mike@vlsivie.tuwien.ac.at (Michael K. Gschwind) (02/23/91)

In article <9102210223.AA15327@cyan.cs.rochester.edu> han@cs.rochester.edu writes:
>(Does the RTL that GCC uses count as one such IR?)

I guess it does.

>   Actually, my fellow students and I have taken a look at the RTL dump
>produced by GCC.   To our surprise, the RTL dump showed strong dependency
>on the target machine even at the earliest stage of optimization.  

This is not the fault of the RTL language, but rather that of the front end. 
The front end uses different intermediate code generation strategies, 
depending on available instructions and/or big-endianness etc. For example,
If your target machine does not have a multiply instruction, it is reasonable
for the front end to emit code which calls the multiply function. 
Now at least you can profit from optimization code which reduces function call
cost. (On the other hand, you might argue, that information is lost, and 
may cause problem, say, when trying to perform strength reduction and the 
like. - RTL adds a tag to such function calls, explaining waht the function
does:

(call_insn 15 14 16 (parallel[ 
           (set (reg:SI 1)		 ; assign to register 1
               (call (mem:SI (reg:SI 35)); the return value of this
                   (const_int 0)))	 ; function call
           (use (reg:SI 1))
       ] ) -1 (nil)
   (nil))

(insn 16 15 17 (set (reg:SI 36) 	; copy the the return value
       (reg:SI 1)) -1 (nil)		; to register 36
   (expr_list:REG_EQUAL (mult:SI (reg/v:SI 32)
           (mem:SI (reg:SI 33)))
       (insn_list:REG_RETVAL 10 (nil))))

- The REG_EQUAL says that reg:SI 36 really contains the value computed
by multiplying registers 32 and 33. )

>The RTL dump on SPARC looked very different from that on Sun3 workstation.
>We had expected that the machine dependency would show up at much later
>stage and that we would be able to try optimization with RTL without worrying
>about machine dependencies.

I wouldn't worry too much about machine dependencies. The worst case is
(I guess) that some optimizations do not benefit certain architectures.
But this is probably due to the target processor's peculiarities. Condition
code optimization for example can _never_ benefit a CPU without 
condition codes - so what? 

>So, my second question is: Is it generally expected for an intermediate
>representation to be machine dependent?   How far toward the end of code
>generation phase can machine dependencies be prevented from showing up?

Probably depends on how you define machine dependency and the kind of IR.
In RTL, the language itself is nearly completely machine independent
(except when it comes to things like little/big endianness, which 
would be difficult to change after optimizations depending on this 
have been performed.) The code emitted, however, depends very much 
on the target processor - postincrement language elements, for example, 
are never used for targets without postincrement addressing modes. The 
sooner you introduce machine dependencies, the sooner you can optimize
aggressively. On the other hand, the language itself should be 
as machine independent as possible, to allow easy retargeting. 

			bye,
				mike

Michael K. Gschwind, Dept. for VLSI Design, Vienna University of Technology
mike@vlsivie.tuwien.ac.at
mike@vlsivie.uucp
e182202@awituw01.bitnet
Voice: (++43).1.58801 8144
Fax:   (++43).1.569697
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

rfg@ncd.com (Ron Guilmette) (02/26/91)

In article <9102210223.AA15327@cyan.cs.rochester.edu> han@cs.rochester.edu writes:
>
>   Actually, my fellow students and I have taken a look at the RTL dump
>produced by GCC.   To our surprise, the RTL dump showed strong dependency
>on the target machine even at the earliest stage of optimization.  

For what it's worth, I thought that I should mention that GCC actually
deals with two different internal representations of a program.  There
is a high level representation (TREEs) and also the low-level representation
that you mentioned (i.e. RTL).

The high level representation is fairly machine independent, however it is
*not* something that you would want to try applying optimizations to.
(Well, at least GCC doesn't try to do that.  Some people may say that
the high level tree representation would be a good thing to be munging
for certain "high level" optimizations.  I won't argue that point either
way.)

-- 

// Ron Guilmette
// Internet: rfg@ncd.com      uucp: ...uunet!lupine!rfg
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

djones@decwrl.dec.com (Dave Jones) (02/28/91)

As you know, there are lots of optimizations that can be described in a
machine-independent way as correctness-preserving algorithm-transformations.
You really want to work those optimizations on the machine-independent
intermediate representation, but whether or not they really win or lose is
machine-dependent. It looks like a sitch where the "front end" and "back end"
need to collaborate, rather than run one after the other.  I recently wrote a
special-purpose Pascal compiler on that principle, and I've started my second
"back end" for it (SPARC). But alas, the application does not warrant
investing much in optimization at this time, because the Pascal layers of
these application programs account for a very small percentage of the
execution-time. So I don't have any war stories to tell re. optimization. A
really nifty implementation of the scheme might use coroutines.

-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

tarvydas@turing.toronto.edu (Paul Tarvydas) (03/04/91)

A good basis for any intermediate language is described in:

	R.C. Holt, "Data Descriptors: A Compile-Time Model of Data and
	Addressing", ACM TOPLAS, vol. 9, no. 3 (July 1987), 367-389. 

The representation gives you a way to uniformly describe all run-time data
at compile time - this includes memory locations, registers, manifest
constants, etc.

You get many of the local optimizations, such as address mode arithmetic,
constant folding and manifest subscripting, etc, almost "for free" when
you use this representation.

The representation is very clean and simple to work with, so your project
can pushed quite a long ways before it starts to quake with complexity.

There are a number of theses and papers dealing with compilers built on
top of this technology, most notably

	J.R.Cordy and R.C.Holt, "Code Generation Using an Orthogonal
	Model", Software Practice and Experience, vol. 20, no. 3,
	March 1990, pp 301-320.

which describes a declarative tree- and menu-driven method for emitting
code and building coder-compilers (eg. like Graham-Glanville and
Davidson/Fraser/Wendt).  The code generation algorithms are the simplest
I've seen (and that includes Small-C :-), yet they produce very good code.
The simplicity of this method makes it ripe for picking and extending.
(I've used this work in industry for a few years now and have had very
favourable results).

You might also be interested in Peter James Ashwood-Smith's MSc thesis,
"The Source Level Optimization of Turing Plus", March 7, 1987, at the
University of Toronto's CSRI, which applies global optimizations to a
compiler built with Data Descriptors and S/SL (a compiler-building tool -
see R.C. Holt, J.R. Cordy and D.B. Wortman, "An Introduction to S/SL:
Syntax/Semantic Language", ACM TOPLAS, vol.  4, no. 2 (April 82),
149-178).

If you are interested in more papers in this area, I'd be glad to supply
more pointers.

Paul Tarvydas
TS CONTROLS
5 Bowness Court
Etobicoke, Ont.
Canada
M9B 5Z8

tel: (416)-234-0889, fax: (416)-234-9193
uucp: tarvydas@tsctrl,       on-failure: tarvydas@turing.toronto.edu
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.