compilers@ima.UUCP (01/10/86)
[from kurt at ALLEGRA/FLUKE (Kurt Guntheroth)]
Organization: John Fluke Mfg. Co., Inc., Everett, WA
1. When I first learned about optimization, I thought the ideal
optimizer would work in the following way: Parse the source into
trees/dags/IL/whatever and reorder and simplify the trees to the
optimal equivalent program. Then generate code by any reasonably good
technique. The code would be almost perfect since it was generated
from an optimal program. Now I find out that this doesn't work too
well. Real machines are so un-orthogonal that you inevitably
de-optimize the code when you generate machine instructions. People
seem to be concentrating more and more on the machine language,
generating machine instructions simply and then optimizing the
instructions by performing translations permitted by some rule set
(grammars, tables, etc.) People like Frazer swear by this method.
Other people I have talked to say that almost any optimization might
de-optimize code for a given processor by making it more difficult to
generate some instruction sequence. What do you practitioners
consider the 'right' way to do things? It seems that optimizations
involving moving code would be much more difficult if you do them on
machine instructions (the grammars don't handle it too well). What
about this, oh gurus?
2. I once considered generating code for the 6502. This miserable
little processor doesn't even have a 16 bit register so you must form
and move addresses one byte at a time (ugh). I suspect the problems of
generating code for the 6502 must be similar to the problems of
generating good code for a segmented machine like the x86, where the
low address byte is like an offset and the high address byte is like a
segment. Any comments?
3. How do I get the GNU compiler tools? Ideally, are they small enough
to post to usenet's net.sources? Also, is there a public domain S/SL
compiler? S/SL is a tool from U of Toronto that builds table driven
ad hoc top down compilers. This may sound strange, but it seems to
combine many of the nice intuitive features of ad hoc recursive descent
compilers with the speed and compactness of table driven parsers.
Building the S/SL translator is not difficult (especially compared to
writing a LALR parser generator) but I havn't done it yet and I am
interested if somebody else has.
4. Has anybody seen good books on actually implementing good
optimization? I have Wolfe's book on BLISS -- I mean any other ones.
I can find literature about optimization, but it is generally at a very
abstract level. I would like to steal implementation-level ideas if I
can, instead of reinventing all the wheels in the world.
Kurt Guntheroth
John Fluke Mfg. Co., Inc.
{uw-beaver,decvax!microsof,ucbvax!lbl-csam,allegra,ssc-vax}!fluke!kurt
[I haven't seen any books, but there have been many articles on machine-
specific optimizations. Look at the November 1980 issue of the IBM Journal
of R+D, or at the various compiler construction conference proceedings
published by Sigplan. -John]compilers@ima.UUCP (01/14/86)
[from Chris Torek <harvard!mimsy.umd.edu!gymble!chris>] Like yourself, I `thought the ideal optimizer would ... parse the source into trees/dags/IL/whatever and reorder and simplify [these] to the optimal equivalent program'. If I understand correctly, the problem with this is that the compiler's back end has to do some really strange things for some of what the front end might generate. But if this is the case then it seems to me the *real* problem is a lack of communication from the back end to the front end. What is needed is a way that the code generator can tell the optimizer, `x is not good', `y is very good', `z is only worthwhile under the following conditions', etc. Maybe a weighted production system style database that is interpreted by the front end during optimization would be the way to do this. Then you could say `if there is repeated division by a power of two inside a loop, try to determine whether the dividend is positive, even if you have to put an ``if'' around the outside and make two loops', or `loops that count down to zero are twice as good as those that count up to a constant', or whatever. (The former lets you directly use arithmetic shifts instead of integer divides on two's complement machines---this is extremely important on many microprocessors---and the latter allows use of the `sob' style VAX instructions or the NS32000 count-to-zero instructions.) Of course with enough rules this might produce the world's slowest compiler. . . . :-) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu [My understanding of the PL.8 compiler from IBM is that first it compiles into an extremely low-level intermediate code, second it does machine independent optimizations such as common subexpression and loop invariant removal, and then it does register assignment and instruction selection. The intermediate code has an infinite number of registers and very simple operations, so that real machine instructions usually subsume several intermediate instructions. I don't know how they deal with giving hints to the intermediate code generator so that it can use fancy real machine instructions, but this was part of the 801 RISC project which had a machine with very few "clever" instructions. They get superb code, but the compiler is so slow that I gather nothing less than a 370/168 will do. -John]
compilers@ima.UUCP (01/16/86)
[from allegra!sbcs!debray (Saumya Debray)]
It isn't entirely surprising that transforming a HLL program P1 to an
equivalent "optimal" (!?) HLL program P2 and then generating code from
P2 might not always yield the best results. The virtual machine
corresponding to the high-level language is really very different from
that corresponding to the low-level one, and one might expect their cost
criteria to be different as well.
I have a couple of comments against restricting optimization
_exclusively_ to low-level code, however:
(1) The idea behind code optimization is to preserve equivalence
between programs at the HLL level while improving the estimated cost
at the low level. However, if optimization is restricted to object
code, then we are necessarily required to preserve equivalence
at the object code level. This may be overly restrictive, since it
may not take into account the fact that certain aspects of the
object code programs are irrelevant at the HLL level.
(2) The level of abstraction provided by a HLL may make certain
optimizations much simpler at that level than at the object code
level. For example, the transformation of a recursive program
to compute the factorial function,
fact(N) = if N = 0 then 1 else N * fact(N-1)
to tail-recursive and thence to iterative form is fairly
straightforward at the HLL level, using the associativity of *. It's
not obvious to me how the corresponding transformation would go
working only on object code.
-Saumya Debray
SUNY at Stony Brook