[mod.compilers] Object code optimization

compilers@ima.UUCP (01/10/86)

[from kurt at ALLEGRA/FLUKE (Kurt Guntheroth)]

Organization: John Fluke Mfg. Co., Inc., Everett, WA
1.  When I first learned about optimization, I thought the ideal
optimizer would work in the following way:  Parse the source into
trees/dags/IL/whatever and reorder and simplify the trees to the
optimal equivalent program.  Then generate code by any reasonably good
technique.  The code would be almost perfect since it was generated
from an optimal program.  Now I find out that this doesn't work too
well.  Real machines are so un-orthogonal that you inevitably
de-optimize the code when you generate machine instructions.  People
seem to be concentrating more and more on the machine language,
generating machine instructions simply and then optimizing the
instructions by performing translations permitted by some rule set
(grammars, tables, etc.)  People like Frazer swear by this method.
Other people I have talked to say that almost any optimization might
de-optimize code for a given processor by making it more difficult to
generate some instruction sequence.  What do you practitioners
consider the 'right' way to do things? It seems that optimizations
involving moving code would be much more difficult if you do them on
machine instructions (the grammars don't handle it too well).  What
about this, oh gurus?

2.  I once considered generating code for the 6502.  This miserable
little processor doesn't even have a 16 bit register so you must form
and move addresses one byte at a time (ugh).  I suspect the problems of
generating code for the 6502 must be similar to the problems of
generating good code for a segmented machine like the x86, where the
low address byte is like an offset and the high address byte is like a
segment.  Any comments?

3.  How do I get the GNU compiler tools?  Ideally, are they small enough
to post to usenet's net.sources?  Also, is there a public domain S/SL
compiler?  S/SL is a tool from U of Toronto that builds table driven
ad hoc top down compilers.  This may sound strange, but it seems to
combine many of the nice intuitive features of ad hoc recursive descent
compilers with the speed and compactness of table driven parsers.
Building the S/SL translator is not difficult (especially compared to
writing a LALR parser generator) but I havn't done it yet and I am
interested if somebody else has.

4.  Has anybody seen good books on actually implementing good
optimization?  I have Wolfe's book on BLISS -- I mean any other ones.
I can find literature about optimization, but it is generally at a very
abstract level.  I would like to steal implementation-level ideas if I
can, instead of reinventing all the wheels in the world.
Kurt Guntheroth
John Fluke Mfg. Co., Inc.
{uw-beaver,decvax!microsof,ucbvax!lbl-csam,allegra,ssc-vax}!fluke!kurt

[I haven't seen any books, but there have been many articles on machine-
specific optimizations.  Look at the November 1980 issue of the IBM Journal
of R+D, or at the various compiler construction conference proceedings 
published by Sigplan.  -John]

compilers@ima.UUCP (01/14/86)

[from Chris Torek <harvard!mimsy.umd.edu!gymble!chris>]

Like yourself, I `thought the ideal optimizer would ... parse the
source into trees/dags/IL/whatever and reorder and simplify [these]
to the optimal equivalent program'.  If I understand correctly,
the problem with this is that the compiler's back end has to do
some really strange things for some of what the front end might
generate.  But if this is the case then it seems to me the *real*
problem is a lack of communication from the back end to the front
end.  What is needed is a way that the code generator can tell the
optimizer, `x is not good', `y is very good', `z is only worthwhile
under the following conditions', etc.

Maybe a weighted production system style database that is interpreted
by the front end during optimization would be the way to do this.
Then you could say `if there is repeated division by a power of
two inside a loop, try to determine whether the dividend is positive,
even if you have to put an ``if'' around the outside and make two
loops', or `loops that count down to zero are twice as good as
those that count up to a constant', or whatever.  (The former lets
you directly use arithmetic shifts instead of integer divides on
two's complement machines---this is extremely important on many
microprocessors---and the latter allows use of the `sob' style VAX
instructions or the NS32000 count-to-zero instructions.)

Of course with enough rules this might produce the world's slowest
compiler. . . .  :-)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

[My understanding of the PL.8 compiler from IBM is that first it compiles 
into an extremely low-level intermediate code, second it does machine 
independent optimizations such as common subexpression and loop invariant 
removal, and then it does register assignment and instruction selection.  
The intermediate code has an infinite number of registers and very simple 
operations, so that real machine instructions usually subsume several 
intermediate instructions.  I don't know how they deal with giving hints to 
the intermediate code generator so that it can use fancy real machine 
instructions, but this was part of the 801 RISC project which had a machine 
with very few "clever" instructions.  They get superb code, but the 
compiler is so slow that I gather nothing less than a 370/168 will do.  
-John] 

compilers@ima.UUCP (01/16/86)

[from allegra!sbcs!debray (Saumya Debray)]

It isn't entirely surprising that transforming a HLL program P1 to an
equivalent "optimal" (!?) HLL program P2 and then generating code from
P2 might not always yield the best results.  The virtual machine
corresponding to the high-level language is really very different from
that corresponding to the low-level one, and one might expect their cost
criteria to be different as well.

I have a couple of comments against restricting optimization
_exclusively_ to low-level code, however:

(1) The idea behind code optimization is to preserve equivalence
    between programs at the HLL level while improving the estimated cost
    at the low level.  However, if optimization is restricted to object
    code, then we are necessarily required to preserve equivalence
    at the object code level.  This may be overly restrictive, since it
    may not take into account the fact that certain aspects of the
    object code programs are irrelevant at the HLL level.

(2) The level of abstraction provided by a HLL may make certain
    optimizations much simpler at that level than at the object code
    level.  For example, the transformation of a recursive program
    to compute the factorial function,
    
	fact(N) = if N = 0 then 1 else N * fact(N-1)

   to tail-recursive and thence to iterative form is fairly
   straightforward at the HLL level, using the associativity of *.  It's
   not obvious to me how the corresponding transformation would go
   working only on object code.

-Saumya Debray
SUNY at Stony Brook