napi@rangkom.MY (Mohd Hanafiah b. Abdullah) (08/07/90)
I would like know what people think is the best Intermediate Representation (IR) to be used for highly effective optimizations and code generation, and it should be portable. An IR is one of the most crucial considerations in designing a competent compiler. Examples of IRs that I know: (1) Abstract-syntax-tree (looks like Scheme code) (2) DAG (3) Three address code (4) P-code (5) Stanford's U-code I'm more leaning towards "abstract-syntax-tree" since it is portable (machine independent), readable, visibility of high-level structures (eg. loops) for effective optimizations, language independent most of the time. What do you think? Please respond by email since I don't have online access to USENET news. Thanks. Hanafiah [But copy your messages to compilers if they're of general interest, please. -John] -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
briscoe-duke@CS.YALE.EDU (Duke Briscoe) (08/09/90)
You might be interested in the following PhD thesis: Compilation by Program Transformation Richard Kelsey Yale University May 1989 Abstract: This dissertation describes a simple compiler, based on concepts from denotational semantics, that can be used to compile standard programming languages and produces object code as efficient as that of production compilers. The compiler uses only source-to-source transformations. The transformations are performed on programs that have been translated into an intermediate language resembling the lambda calculus. The output of the compiler, while still in the intermediate language, can be trivially translated into machine code for the target machine. The compilation-by-transformation strategy is simple: the goal is to remove any dependencies on the intermediate language semantics that the target machine cannot implement directly. Front-ends have been written for Pascal and Basic, and the compiler produces code for the MC68020 microprocessor. -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
preston@rice.edu (Preston Briggs) (08/09/90)
In article <1990Aug07.153407.8877@esegue.segue.boston.ma.us> you write: >I would like know what people think is the best Intermediate Representation >(IR) to be used for highly effective optimizations and code generation, and >it should be portable. >Examples of IRs that I know: >(1) Abstract-syntax-tree (looks like Scheme code) >(2) DAG >(3) Three address code >(4) P-code >(5) Stanford's U-code First, I'd classify (5) as an extension of (4) and I might group (2) with (3). I like low-level 3-address code (even lower level than assembly). Several optimizing compilers that have been built using such a representation -- the best known is the PL.8 compiler built by researchers at IBM. I don't much like IL's based on stack machines (roughly 4 and 5), at least for optimization purposes. Optimization wants explicit registers to hold intermediate results. AST's seem too high-level for doing lots of optimization. We want the details of array accesses, etc. exposed to the optimizer. Actually, I'll qualify this a little. I believe there are many optimizations that can be carried out most effectively on a high-level representation (for example, those requiring dependence analysis) and many that should be carried out on a low level representation (e.g., CSE elimination, strength reduction). This is the sort of approach we've taken locally. Naturally, there are counter-examples to everything I've said. AST based compilers include Bliss-11, the PQCC based systems, and some of the Ada efforts. There's also Guy Steele's Rabbit compiler. Fred Chow's thesis describes an optimizing Pascal compiler based on (5). I believe Chow's work is also the basis for the MIPS compilers. Some version of (4) is used in Powell's Modula-2 compiler. Preston [Anybody tried program dependency graphs, per Ferrante et al. at IBM. From the articles I've seen, they look pretty neat. -John] -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
mod@westford.ccur.com (508) (08/10/90)
In article <1990Aug07.153407.8877@esegue.segue.boston.ma.us> napi@rangkom.MY (Mo hd Hanafiah b. Abdullah) writes: >I would like know what people think is the best Intermediate Representation ... >(1) Abstract-syntax-tree (looks like Scheme code) ... >I'm more leaning towards "abstract-syntax-tree" since it is portable (machine >independent), readable, visibility of high-level structures (eg. loops) for >effective optimizations, language independent most of the time. ... I'm just a beginner at this and I recently built a "student" compiler in a university Compiler Construction class. The text and the instructor were both presenting an LR(1)/tuple implementation coded in Pascal. I pushed hard for (and obtained) permission to instead use a Lex/Yacc/AST approach written in C. My amateur's opinion is probably not worth much here, but it seems apparent to me that AST's are the easiest IR to manipulate, especially considering the amount of information they can contain. I didn't use DAG's (they seem like an inspired twist on the notion of an AST) because I found their (perceived) complexity a bit daunting. Well, enough noise - now for my question: Why do you make several references to the "visual" attributes of an AST? My instructor and textbook (Fischer/LeBlanc) only touched lightly on AST's and I had to infer a lot from that little bit. Is there a standard textual representation of AST's? Mine only existed as data structures in memory, although the Dragon book mentioned that there is a kind of postfix representation. Is that it? Thanks, Michael O'Donnell (508)392-2915 home(508)251-7576 fax(508)692-8307 ...!{harvard,uunet,petsd}!masscomp!mod, mod@westford.ccur.com Concurrent Computer Corporation, 1 Technology Way, Westford, MA 01886 -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
grover@brahmand.Eng.Sun.COM (Vinod Grover) (08/10/90)
In article <1990Aug08.171640.13892@esegue.segue.boston.ma.us> Preston Briggs <preston@rice.edu> writes: >AST's seem too high-level for doing lots of optimization. We want the >details of array accesses, etc. exposed to the optimizer. > >Actually, I'll qualify this a little. I believe there are many optimizations >that can be carried out most effectively on a high-level representation (for >example, those requiring dependence analysis) and many that should be carried >out on a low level representation (e.g., CSE elimination, strength >reduction). This is the sort of approach we've taken locally. First of all, there is no reason why an AST based IR cannot have low-level features (such as array/memory accesses). The Bliss-11 compiler had the '.' (dot) operator for representing memory fetches. One nice feature property of the dot operator was that it could be used, both, in lhs and rhs position of an assignment operator. e.g. a[i] would be represented as .(a+.i) This made a lot of optimizations quite simple and uniform. e.g a memory access could be treated as a CSE. Second, as Bliss-11 showed global CSE elimination can be done at a high level. Similarly, strength reduction isnt that hard at a high level. Karl Ottenstein showed, in an IEEE paper several years ago, that strength reduction is easy to do on PDGs and can easily be extended to ASTs. (I do not have the reference handy but can post it, if anyone is interested.) Vinod Grover Sun Microsystems grover@sun.com -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
preston@titan.rice.edu (Preston Briggs) (08/10/90)
I wrote: >>AST's seem too high-level for doing lots of optimization. We want the >>details of array accesses, etc. exposed to the optimizer. >>I believe there are many optimizations that can be carried out >>most effectively on a high-level representation (for example, >>those requiring dependence analysis) and many that should be >>carried out on a low level representation (e.g., CSE elimination, >>strength reduction). grover@brahmand.Eng.Sun.COM (Vinod Grover) writes: >First of all, there is no reason why an AST based IR cannot have low-level >features (such as array/memory accesses). >Second, as Bliss-11 showed global CSE elimination can be done at a high >level. Similarly, strength reduction isnt that hard at a high level. Karl >Ottenstein showed, in an IEEE paper several years ago, that strength >reduction is easy to do on PDGs and can easily be extended to ASTs. (I do >not have the reference handy but can post it, if anyone is interested.) I'm mostly presenting my "religious convictions" here; I agree that there are many ways to skin a cat. However, Ottenstein's paper serves a nice illustration. He discusses strength reduction of induction variables in the context of DO/FOR-style loops, with the iteration variable clearly marked by the header node, as increment expressions and the limit tests. A more general algorithm (say Cocke and Kennedy) would perform strength reduction over any loop (while loops, loops built out of if's and goto's), would discover all the induction variables (not just the loop control variable), and would make provision for linear function test replacement. Other algorithms extend partial redundancy elimination to discover strength reductions in linear regions (I've seen versions by Chow and by Dhamdhere). The point is that people are seduced by the apparent simplicity of analysis of ASTs. Don't be fooled! Without care, you'll have to write routines to analyze DO loops, WHILE loops, REPEAT-UNTILs, LOOPs with EXITs, etc... I write routines that work with more traditional flow graphs (directed graphs) -- all loops look the same to me. Ditto for CASE statements, IF statements, GOTO's, ad nauseum. I also wonder about the memory requirements of a detailed AST; that is, one with all the fun of each multidimensional array access written out in tree form. PDG's and PDW's are even scarier in this respect. While it may be that AST fans can show ways around my complaints, can they also give me some positive reasons to switch? What's the gain? (speaking as a devil's advocate, of course) -- Preston Briggs looking for the great leap forward preston@titan.rice.edu -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
larus@primost.cs.wisc.edu (James Larus) (08/12/90)
In my opinion (a phrase sadly missing from other replies to this message), both RTL and AST/DAG representations are equivalent for a compiler designed and written for a single machine. I base this belief on having written or worked on compilers that use both types of IR. Both forms represent the program in a form that is easy to traverse and manipulate. However, I don't believe that the equivalence holds for retargetable compilers. RTL appears to be a much better IR for retargetable code generators. The Davidson-Fraser approach produces fast, practical code generators while the retargetable CG for ASTs are still mainly university projects. A more interesting question is: how would you demonstrate that one approach is superior to the other one? If computer science was a science, we could just lock up a sample of programmers in a room and have them write a compiler of a randomly-choosen type and compare the final products. /Jim -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
grover@brahmand.Eng.Sun.COM (Vinod Grover) (08/12/90)
In article <1990Aug09.180627.18848@esegue.segue.boston.ma.us> I wrote: >reduction is easy to do on PDGs and can easily be extended to ASTs. (I do >not have the reference handy but can post it, if anyone is interested.) Here is the reference: Karl Ottenstein, "A Simplified Framework for Reduction in Strength", IEEE Transactions on Software Engineering, Vol 15, No 1, Jan 89, pp. 86-92. Vinod Grover Sun Microsystems grover@sun.com -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
hankd@dynamo.ecn.purdue.edu (Hank Dietz) (08/12/90)
In article <1990Aug08.171640.13892@esegue.segue.boston.ma.us>: >In article <1990Aug07.153407.8877@esegue.segue.boston.ma.us>: >>(1) Abstract-syntax-tree (looks like Scheme code) >>(2) DAG >>(3) Three address code >>(4) P-code >>(5) Stanford's U-code > >First, I'd classify (5) as an extension of (4) and I might group (2) with >(3). > >I like low-level 3-address code (even lower level than assembly). I agree 100%. BTW, (3) is a walk embedding (2). Catch is that (1) is easier to automate, so, for example, PCCTS is headed that direction.... >AST's seem too high-level for doing lots of optimization. We want the >details of array accesses, etc. exposed to the optimizer. You're making an assumption here that simply doesn't hold... the AST elements need not be *just* the syntax elements from program, but can be any transformation thereof. In other words, it can be quite low level, just like code tuples. >[Anybody tried program dependency graphs, per Ferrante et al. at IBM. From >the articles I've seen, they look pretty neat. -John] Yes and no. We've played with a similar form for automatic parallelization at the process packaging stage, but it doesn't work so well for some lower-level things.... The truth is that most optimizing, parallelizing, compilers use several different intermediate forms which are all interconnected (one form annotates another). No big deal... that is, unless you want to print out a program's intermediate form representation.... :-) -hankd@ecn.purdue.edu PS: Why is there a gratuitous ref to Scheme in (1)...? -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
jouvelot@brokaw.lcs.mit.edu (Pierre Jouvelot) (08/13/90)
In article <1990Aug9.225859.10175@rice.edu> preston@titan.rice.edu (Preston Briggs) writes: >>>AST's seem too high-level for doing lots of optimization. We want the >>>details of array accesses, etc. exposed to the optimizer. >The point is that people are seduced by the apparent simplicity of >analysis of ASTs. Don't be fooled! Without care, you'll have to write >routines to analyze DO loops, WHILE loops, REPEAT-UNTILs, LOOPs with >EXITs, etc... I write routines that work with more traditional flow >graphs (directed graphs) -- all loops look the same to me. Ditto for CASE >statements, IF statements, GOTO's, ad nauseum. With the AST approach, the cleanest way to deal with this overabundance of control structures is to define a single "generic" loop construct; all the control structures can then be trivially desugared (e.g., by the parser) into your own version of loop. This is the only loop your transformation algorithms will ever have to deal with (besides, admittedly, the unparser). >While it may be that AST fans can show ways around my complaints, can they >also give me some positive reasons to switch? What's the gain? It happens that a lot of tranformations applied by optimizing/vectorizing/parallelizing compilers can be expressed consisely in a recursive manner, i.e. can be defined by induction on your AST. This gives a nice and powerful framework to express your algorithms and a data structure that is perfectly matched to it. The PIPS parallelizing compiler for Fortran which is under current development at the Ecole des Mines by Remi Triolet, Francois Irigoin and me applies this "generic" approach of AST to its most "extremist" point. The whole Fortran77 syntax is expressed in only a couple of dozens of different datatypes. More precisely, the whole IR, plus the information generated by the different phases of the parallelizer, is described in one page of domain equations (we use a tool akin to IDL, called NewGen and developped in-house, that generates, from a high-level description of domains, functions that manipulate values of those types; all the PIPS data types are implemented in NewGen, so that records and other unions data structures are never explictly used by programmers). Pierre -- Pierre Jouvelot . CAI, Ecole des Mines, Paris (France): jouvelot@ensmp.fr . LCS, MIT, Cambridge (USA): jouvelot@brokaw.lcs.mit.edu -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
csmith@uunet.UU.NET (Chris Smith) (08/13/90)
In article <1990Aug9.225859.10175@rice.edu> Preston Briggs writes: >>AST's seem too high-level for doing lots of optimization. A more general algorithm (say Cocke and Kennedy) would perform strength reduction over any loop (while loops, loops built out of if's and goto's), would discover all the induction variables (not just the loop control variable), and would make provision for linear function test replacement. These algorithms *are* tree based, if not AST based -- the first step is to do some control flow analysis to write the program as a sequence of properly nested loops. You can do the same optimizations on the loops no matter how you discover them. The point is that people are seduced by the apparent simplicity of analysis of ASTs. Don't be fooled! Without care, you'll have to write routines to analyze DO loops, WHILE loops, REPEAT-UNTILs, LOOPs with EXITs, etc... I write routines that work with more traditional flow graphs (directed graphs) -- all loops look the same to me. Ditto for CASE statements, IF statements, GOTO's, ad nauseum. Depends on how abstract the ASTs are. All of those loops can be easily written as infinite trivial loops with breaks, plus other stuff in the right places. You need sequential composition, alternation, loop/break/continue, and nothing else, to handle everything except goto. (maybe alternation should be an n-way branch, maybe it makes sense to have a special form for C switch, but such details don't matter much). Gotos that don't jump into statements from outside are pretty easy too (as break or continue in a properly positioned loop). While it may be that AST fans can show ways around my complaints, can they also give me some positive reasons to switch? What's the gain? It runs in linear time, and it does each thing that needs doing (and/or bit vectors, scan through them) exactly once. Probably faster. You have a hold on both ends of an interesting region. Say, in if ... then ... a + b ... end if ... a + b ... where the if doesn't modify a+b and you're allowed to raise exceptions prematurely. You can insert a+b before the if. Partial redundancy elimination won't. (It would insert a+b into an else branch, but there isn't one.) Iterative algorithms have info about one end of a region whose other end is implicitly the procedure entry or exit, or sometimes the enclosing loop entry or exit. Tree-based algorithms extend this. In loop if ... a + b ... end if ... end loop If a+b is loop invariant and doesn't raise exceptions or you don't care, you can hoist it. You might not want to if it's under switch rather than if. With trees, you can. -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
preston@titan.rice.edu (Preston Briggs) (08/13/90)
I wrote: >A more general algorithm (say Cocke and Kennedy) would perform strength >reduction over any loop (while loops, loops built out of if's and goto's), Chris Smith replied: >These algorithms *are* tree based, if not AST based I disagree. They're based on directed graphs. > -- the first step is to do some control flow analysis over the control flow graph, right? > to write the program as a sequence of properly nested loops. You can do >the same optimizations on the loops no matter how you discover them. Remember, I was arguing about Ottenstein's paper, which had an explicit DO-loop header node, with subnodes that distinguished the induction variable, loop body, and termination conditions. My point was that such an approach, while simple, is not general. A sub-point is that the loops are discovered in a an explicit pass over the control flow graph (which is built in an explicit pass over the IL), not while parsing. AST's are built while parsing, though they aren't the same as parse trees. That's why there's an "S" for Syntax in AST. Parsing is local. It's hard to discover global facts (like the connection of goto's and labels) while parsing. >Gotos that don't jump into statements from outside are pretty easy too I don't agree that even the "simple" goto's are easy. Break's and Continue's are easy since they're bound to enclosing loop structures. Imagine you're parsing along (we'll say Pascal), building your AST, and you see a goto... while i < limit do begin i := i + 1; x := a[i]; if x = 0 then goto 10; ... end; what do you do? Is it a tiny local branch that describes a little if-then-else? Is it a giant break statement (perhaps out of several levels of loop)? Is it a backwards branch describing a loop? Is it a forwards branch describing part of a loop? And what are you going to do about Fortran? >It runs in linear time, and it does each thing that needs doing (and/or >bit vectors, scan through them) exactly once. Probably faster. ^ eh? If you've got a language that restricts you to reducible routines (not a bad thing!), then you can do data flow analysis in a linear number of bit-vector steps. This is indeed quick (and it's also simple), but it isn't general since you can't handle irreducible routines. You could do node splitting, but that's potentially exponentional. >You have a hold on both ends of an interesting region. Say, in > > if ... then > ... a + b ... > end if > ... a + b ... > >where the if doesn't modify a+b and you're allowed to raise exceptions >prematurely. You can insert a+b before the if. Partial redundancy >elimination won't. (It would insert a+b into an else branch, but there >isn't one.) Use one of the edge placement algorithms, such as A Fast Algorithm for Code Movement Optimisation Dhamdhere Siplan Notices, Volume 23, Number 10 or A Solution to a Problem with Morel and Renvoise's `Global Optimization by Suppression of Partial Redundancy' Dreschler and Stadel TOPLAS, Volume 10, Number 4 (1988) >Iterative algorithms have info about one end of a region >whose other end is implicitly the procedure entry or exit, or sometimes >the enclosing loop entry or exit. Tree-based algorithms extend this. > >In > loop > if ... a + b ... end if ... > end loop > >If a+b is loop invariant and doesn't raise exceptions or you don't care, >you can hoist it. You might not want to if it's under switch rather than >if. With trees, you can. I'm a little baffled here. You seem to be arguing iterative analysis vs. some sort of interval-based analysis. Maybe that's what you intended and I've misunderstood the point of your article. We could certainly use one form of interval analysis or another if desired, though there is always the ugliness of irreducible flow graphs. Node splitting is a solution, but it's a lot of code for me to write when I no a simpler way. At any rate, partial redundancy elimination wouldn't hoist a+b in the example shown since it isn't always executed (the IF condition may never be true). It's a conservative design decision, but doesn't reflect on the rest of the discussion that I can see. -- Preston Briggs looking for the great leap forward preston@titan.rice.edu -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
preston@titan.rice.edu (Preston Briggs) (08/13/90)
In article <1990Aug11.203048.848@mintaka.lcs.mit.edu> jouvelot@brokaw.lcs.mit.edu (Pierre Jouvelot) writes: >With the AST approach, the cleanest way to deal with this overabundance of >control structures is to define a single "generic" loop construct; all the >control structures can then be trivially desugared (e.g., by the parser) >into your own version of loop. I don't see the trivial desuguring by the parser, particularly with an IF - GOTO form of the loop. I can see discovering the entire CFG and then doing interval analysis to discover a nice tree-like grouping of nested control structures a la Sharir (or others), but you're still punting on irreducible flow graphs. However, thanks for the interesting post on your design. It's fun to see how other people approach problems. -- Preston Briggs looking for the great leap forward preston@titan.rice.edu -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
jouvelot@brokaw.lcs.mit.edu (Pierre Jouvelot) (08/14/90)
In article <1990Aug12.212723.1247@rice.edu> preston@titan.rice.edu (Preston Briggs) writes: >In article <1990Aug11.203048.848@mintaka.lcs.mit.edu> I write: > >>With the AST approach, the cleanest way to deal with this overabundance of >>control structures is to define a single "generic" loop construct; all the >>control structures can then be trivially desugared (e.g., by the parser) >>into your own version of loop. > >I don't see the trivial desuguring by the parser, particularly with an IF - >GOTO form of the loop. > >I can see discovering the entire CFG and then doing interval analysis to >discover a nice tree-like grouping of nested control structures a la Sharir >(or others), but you're still punting on irreducible flow graphs. Well, you almost got a point here. The way the PIPS system deals with irreducible flow graphs is interesting. Basically, the AST data type definition looks something like this (I use the straightforward NewGen syntax) : instruction = block:instruction* + test + call + loop + unstructured ; test = condition:expression x true:instruction x false:instruction ; unstructured = instruction x predecessors:unstructured* x successors:unstructured* ; .... which says that an instruction is either a block (which is a list of instructions), a test (which has a condition expression and a true and false instruction), a call or a loop (not specified here) or an "unstructured". An "unstructured" datatype denotes a control structure that could correspond to an irreducible graph. The key point is that the instruction that appears in an "unstructured" value can be *any* instruction (i.e., can be a block, or a test ... or even another control graph !). The usual data flow algorithms seem to carry along with this kind of recursive approach (although I still have to find the time to prove it :-). With this way, you keep the power of ASTs while being able to deal with arbitrary control graphs on which, our parallelizer is pretty unable to do much about, anyway. However, notice that, if you can figure out (as I do) that a "dirty" control graph is embedded inside an otherwise "nice" DO loop, then from the outside, the loop looks like a well structured one which can be parallelized ; this is a great plus of this approach. Pierre -- . CAI, Ecole des Mines, Paris (France): jouvelot@ensmp.fr . LCS, MIT, Cambridge (USA): jouvelot@brokaw.lcs.mit.edu -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
marti@antares.inf.ethz.ch (08/14/90)
In article <1990Aug07.153407.8877@esegue.segue.boston.ma.us> napi@rangkom.MY (Mohd Hanafiah b. Abdullah) writes: >I would like know what people think is the best Intermediate Representation >(IR) to be used for highly effective optimizations and code generation, and >it should be portable. Many language designers/implementors have turned to using C as an intermediate representation language, for example in the following research projects and/or products: - C++ translator from AT&T (cfront) - portable Cedar compiler developed at Xerox PARC - Eiffel compiler from ISE - Modula-3 compilers developed at DEC SRC and the (now defunct?) Olivetti Research Center Compilation speed using C as an IRL may not be overwhelming, but this solution is certainly portable. Of course, the quality of the generated machine code depends substantially on on the C compiler used as a backend. -- Robert Marti Phone: +41 1 254 72 60 Institut fur Informationssysteme ETH-Zentrum Internet: marti@inf.ethz.ch CH-8092 Zurich, Switzerland UUCP: ...uunet!mcvax!ethz!marti [Probably not what the original author had in mind, but a good thought nonetheless. -John] -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
muller@procope.pa.dec.com (Eric Muller) (08/14/90)
In article <1990Aug13.214614.16644@esegue.segue.boston.ma.us>, marti@antares.inf.ethz.ch writes: |> Many language designers/implementors have turned to using C as an |> intermediate representation language, for example in the following |> research projects and/or products: |> ... |> - Modula-3 compilers developed at DEC SRC and the (now defunct?) Olivetti |> Research Center |> |> Compilation speed using C as an IRL may not be overwhelming, but this |> solution is certainly portable. Of course, the quality of the generated |> machine code depends substantially on on the C compiler used as a backend. I am working on the SRC Modula-3 system. Bill Kalsow wrote most of the compiler before I joined and he chose C to achieve portability. This certainly works: our system runs on VAX/Ultrix, DECstation/Ultrix, SPARC/SunOs, Apollo DN 4500/DomainOs, IBM R6000/AIX 3.1, AIX/PS2, IBM RT/IBM 4.3, HP 9000/300/HP-UX, without too much pain. However, we do not view C as an intermediate language, but rather as a portable assembler. The compiler maintains a private AST as the intermediate representation. The speed of compilation is not too bad: the Modula-3 to C compilation time is of the same order as the C to .o compilation. The quality of generated code depends a lot on the quality of the C compiler. This should not be very surprising, as we are trying to have a simple compiler: many intermediate expressions become separate assignments in C, for example. I have compared the assembly generated for equivalent Modula-3 and C programs, using the -O2 optimizations of /bin/cc on a DECstation; in many cases, the same assembly is generated. We may spend some time to further improve that aspect. One of the advantages of generating C is that we can assume some runtime support. Modula-3 has exceptions and requires the presence of a Thread module (lightweight processes). setjmp/longjmp are enough to implement that. My own conclusion is that generating C or using a native back-end does not influence that much the quality of the generated code. Choosing one way or the other is a compromise between compilation speed and portability. Eric Muller. ------------------------------------------------------------------------------ System Research Center - 130 Lytton Av. - Palo Alto, CA 94301 - (415) 853 2193 -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
pd@complex.Eng.Sun.COM (Peter C. Damron) (08/15/90)
I think that the discussion of intermediate representations has centered too much on the abstract structure of the IR. There are 3 pieces that I can think of to an IR: 1. The abstract machine model 2. The operators and their semantics 3. The abstract structure (AST, 3-address, PDG, etc.) I would say that #1 and #2 are more important than #3 in determining how hard it is to optimize over that IR. For #1, the usual choices are either a stack machine, or a register based stack machine. For #2, I'm not sure that there is much of a consensus about what the best set of operators is. Any thoughts? What about "ROSIR" -- reduced operator set intermediate representation? Just a thought, Peter. ---------------------------- Peter C. Damron Sun Microsystems, Inc. Advanced Languages, MS 12-40 2550 Garcia Ave. Mtn. View, CA 94043 pdamron@eng.sun.com -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
staff@cadlab.sublink.org (Alex Martelli) (08/18/90)
marti@antares.inf.ethz.ch writes: >Many language designers/implementors have turned to using C as an >intermediate representation language, for example in the following research >projects and/or [...omitted for brevity...] >Compilation speed using C as an IRL may not be overwhelming, but this >solution is certainly portable. Of course, the quality of the generated >machine code depends substantially on on the C compiler used as a backend. ...and don't forget Fortran... (see my msg of today in comp.lang.fortran- f2c + Sun's cc produce code which runs almost twice as fast as Sun's f77, for a strictly numerical code -2d complex fft- on a Sparcstation 1!). -- Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 45, Bologna, Italia Email: (work:) staff@cadlab.sublink.org, (home:) alex@am.sublink.org Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; Fax: ++39 (51) 366964 (work only; any time of day or night). [I knew that Sun's f77 wasn't great, but that's embarassing. -John] -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
ok@goanna.cs.rmit.OZ.AU (Richard A. O'Keefe) (08/20/90)
In article <250@cadlab.sublink.ORG>, staff@cadlab.sublink.org (Alex Martelli) writes: > marti@antares.inf.ethz.ch writes: > ...and don't forget Fortran... (see my msg of today in comp.lang.fortran- > f2c + Sun's cc produce code which runs almost twice as fast as Sun's > f77, for a strictly numerical code -2d complex fft- on a Sparcstation 1!). On reading this I immediately went and converted a small numerical code from Fortran to C and ran it on an Encore MultiMax (NS 32532 processors) running UMAX V 2.2M. The f77 and cc compilers on that machine do not appear to be ATT or Berkeley derived; the f77 compiler in particular is described as "an advanced, optimizing compiler". The code didn't exploit any of the micro-tasking or parallel features of Encore Fortran or Encore C, and no special declarations ('register' or the like) were used. Both used double precision. Result: the C code ran twice as fast (4 seconds / 8 seconds). The Fortran code used 2D arrays. The C code used 1D with strength reduction (a standard optimising technique) done by hand, and indexing replaced by pointer increments (also a standard technique, and the NS32532 doesn't use auto-??crement addressing) by hand. f2c starts to sound interesting... So it wasn't just Sun. (I say "wasn't" because as the original poster pointed out, Sun have some new compilers.) -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
han@cs.rochester.edu (02/21/91)
I am taking a compiler seminar course that emphasizes the back end optimization issues. In order to get hands-on experience, I would like to practice applying various optimization techniques on the intermediate representation produced by the front end. (I have not decided on the front end, yet. I might write one or pick up one and modify.) At this point, I am looking for intermediate representations that I can play with. So, my first question is: Are there well documented (and, hopefully, state-of-the-art) definitions of intermediate representations available? How can I get them? (Does the RTL that GCC uses count as one such IR?) Actually, my fellow students and I have taken a look at the RTL dump produced by GCC. To our surprise, the RTL dump showed strong dependency on the target machine even at the earliest stage of optimization. The RTL dump on SPARC looked very different from that on Sun3 workstation. We had expected that the machine dependency would show up at much later stage and that we would be able to try optimization with RTL without worrying about machine dependencies. So, my second question is: Is it generally expected for an intermediate representation to be machine dependent? How far toward the end of code generation phase can machine dependencies be prevented from showing up? -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
lavinus@csgrad.cs.vt.edu (02/23/91)
Hi! In answer to both your questions, I hope: ideally, intermediate representations should not be at all machine dependent (beyond the fact that they might be dependent on, say, the Von Neumann architecture). When I think of intermediate code, I think of something along the lines of P-code (by Wirth), which is a sort of machine code for a simple, hypothetical, stack-based machine, and is quite generic and non-machine-dependent. That might be a good one for your purposes. By the way, could you (or a colleague) suggest a good book or reference on such back-end compiler optimization? Thanks and good luck... Joe Lavinus -- Joseph W. Lavinus Virginia Tech, Blacksburg, Virginia email: lavinus@csgrad.cs.vt.edu [I suspect that after you read all about compiler optimization you'll be more sympathetic to machine dependent intermediate code. -John] -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
mike@vlsivie.tuwien.ac.at (Michael K. Gschwind) (02/23/91)
In article <9102210223.AA15327@cyan.cs.rochester.edu> han@cs.rochester.edu writes: >(Does the RTL that GCC uses count as one such IR?) I guess it does. > Actually, my fellow students and I have taken a look at the RTL dump >produced by GCC. To our surprise, the RTL dump showed strong dependency >on the target machine even at the earliest stage of optimization. This is not the fault of the RTL language, but rather that of the front end. The front end uses different intermediate code generation strategies, depending on available instructions and/or big-endianness etc. For example, If your target machine does not have a multiply instruction, it is reasonable for the front end to emit code which calls the multiply function. Now at least you can profit from optimization code which reduces function call cost. (On the other hand, you might argue, that information is lost, and may cause problem, say, when trying to perform strength reduction and the like. - RTL adds a tag to such function calls, explaining waht the function does: (call_insn 15 14 16 (parallel[ (set (reg:SI 1) ; assign to register 1 (call (mem:SI (reg:SI 35)); the return value of this (const_int 0))) ; function call (use (reg:SI 1)) ] ) -1 (nil) (nil)) (insn 16 15 17 (set (reg:SI 36) ; copy the the return value (reg:SI 1)) -1 (nil) ; to register 36 (expr_list:REG_EQUAL (mult:SI (reg/v:SI 32) (mem:SI (reg:SI 33))) (insn_list:REG_RETVAL 10 (nil)))) - The REG_EQUAL says that reg:SI 36 really contains the value computed by multiplying registers 32 and 33. ) >The RTL dump on SPARC looked very different from that on Sun3 workstation. >We had expected that the machine dependency would show up at much later >stage and that we would be able to try optimization with RTL without worrying >about machine dependencies. I wouldn't worry too much about machine dependencies. The worst case is (I guess) that some optimizations do not benefit certain architectures. But this is probably due to the target processor's peculiarities. Condition code optimization for example can _never_ benefit a CPU without condition codes - so what? >So, my second question is: Is it generally expected for an intermediate >representation to be machine dependent? How far toward the end of code >generation phase can machine dependencies be prevented from showing up? Probably depends on how you define machine dependency and the kind of IR. In RTL, the language itself is nearly completely machine independent (except when it comes to things like little/big endianness, which would be difficult to change after optimizations depending on this have been performed.) The code emitted, however, depends very much on the target processor - postincrement language elements, for example, are never used for targets without postincrement addressing modes. The sooner you introduce machine dependencies, the sooner you can optimize aggressively. On the other hand, the language itself should be as machine independent as possible, to allow easy retargeting. bye, mike Michael K. Gschwind, Dept. for VLSI Design, Vienna University of Technology mike@vlsivie.tuwien.ac.at mike@vlsivie.uucp e182202@awituw01.bitnet Voice: (++43).1.58801 8144 Fax: (++43).1.569697 -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
rfg@ncd.com (Ron Guilmette) (02/26/91)
In article <9102210223.AA15327@cyan.cs.rochester.edu> han@cs.rochester.edu writes: > > Actually, my fellow students and I have taken a look at the RTL dump >produced by GCC. To our surprise, the RTL dump showed strong dependency >on the target machine even at the earliest stage of optimization. For what it's worth, I thought that I should mention that GCC actually deals with two different internal representations of a program. There is a high level representation (TREEs) and also the low-level representation that you mentioned (i.e. RTL). The high level representation is fairly machine independent, however it is *not* something that you would want to try applying optimizations to. (Well, at least GCC doesn't try to do that. Some people may say that the high level tree representation would be a good thing to be munging for certain "high level" optimizations. I won't argue that point either way.) -- // Ron Guilmette // Internet: rfg@ncd.com uucp: ...uunet!lupine!rfg -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
djones@decwrl.dec.com (Dave Jones) (02/28/91)
As you know, there are lots of optimizations that can be described in a machine-independent way as correctness-preserving algorithm-transformations. You really want to work those optimizations on the machine-independent intermediate representation, but whether or not they really win or lose is machine-dependent. It looks like a sitch where the "front end" and "back end" need to collaborate, rather than run one after the other. I recently wrote a special-purpose Pascal compiler on that principle, and I've started my second "back end" for it (SPARC). But alas, the application does not warrant investing much in optimization at this time, because the Pascal layers of these application programs account for a very small percentage of the execution-time. So I don't have any war stories to tell re. optimization. A really nifty implementation of the scheme might use coroutines. -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
tarvydas@turing.toronto.edu (Paul Tarvydas) (03/04/91)
A good basis for any intermediate language is described in: R.C. Holt, "Data Descriptors: A Compile-Time Model of Data and Addressing", ACM TOPLAS, vol. 9, no. 3 (July 1987), 367-389. The representation gives you a way to uniformly describe all run-time data at compile time - this includes memory locations, registers, manifest constants, etc. You get many of the local optimizations, such as address mode arithmetic, constant folding and manifest subscripting, etc, almost "for free" when you use this representation. The representation is very clean and simple to work with, so your project can pushed quite a long ways before it starts to quake with complexity. There are a number of theses and papers dealing with compilers built on top of this technology, most notably J.R.Cordy and R.C.Holt, "Code Generation Using an Orthogonal Model", Software Practice and Experience, vol. 20, no. 3, March 1990, pp 301-320. which describes a declarative tree- and menu-driven method for emitting code and building coder-compilers (eg. like Graham-Glanville and Davidson/Fraser/Wendt). The code generation algorithms are the simplest I've seen (and that includes Small-C :-), yet they produce very good code. The simplicity of this method makes it ripe for picking and extending. (I've used this work in industry for a few years now and have had very favourable results). You might also be interested in Peter James Ashwood-Smith's MSc thesis, "The Source Level Optimization of Turing Plus", March 7, 1987, at the University of Toronto's CSRI, which applies global optimizations to a compiler built with Data Descriptors and S/SL (a compiler-building tool - see R.C. Holt, J.R. Cordy and D.B. Wortman, "An Introduction to S/SL: Syntax/Semantic Language", ACM TOPLAS, vol. 4, no. 2 (April 82), 149-178). If you are interested in more papers in this area, I'd be glad to supply more pointers. Paul Tarvydas TS CONTROLS 5 Bowness Court Etobicoke, Ont. Canada M9B 5Z8 tel: (416)-234-0889, fax: (416)-234-9193 uucp: tarvydas@tsctrl, on-failure: tarvydas@turing.toronto.edu -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.