[comp.compilers] a practical UNCOL

johnson@p.cs.uiuc.edu (Ralph Johnson) (05/11/89)

The work of Davidson and Fraser points the way to a practical UNCOL.
We have been using a register transfer langauge as an intermediate
language in an optimizing compiler for Smalltalk.  Davidson and Fraser
designed a machine independent peephole optimizer that would translate
a machine language program into register transfers, optimize the
resulting program, and combine the transfers back into machine language.
They claimed that register transfers can be a useful UNCOL. Early
attempts at UNCOLs, they claimed, were essentially union languages
that tried to include all the features that any machine might have.
In contrast, register transfers contain only the intersection of
features that any machine might have.  They claimed that their
combining algorithm would reconstruct more complicated instructions.

Our use of register transfers as an intermediate language has been
very successful.  Our original system was quite similar to PO, which
Davidson wrote.  The main difference was that we generated register
transfers directly.  Also, we attempted more global optimizations
and better register assignment, though this was not significantly
better than what Davidson and Fraser did.  Our version of the
combining algorithm was much faster than theirs because we could
use a lot of tricks in Smalltalk.  However, we gradually changed
our code generator until it is now quite different from PO.  In
particular, our intermediate data structure is similar to Static
Single Assignment (SSA), invented by folks at IBM.  We think that
we now have a very elegant and reasonably efficient low-level
representation for programs that is easy to retarget to different
machines.

The code generator has only been targeted to two different machines,
so we don't yet have proof that it is really portable.  One of the
machines is the 68020 and the other is a local horizontally microcoded
machine.  However, our model stretches to fit both those machines,
which is a pretty good sign.  I don't think we will have any trouble
with RISC machines.

In summary, I think that a very simple-minded register transfer language
is a good candidate for an UNCOL, using a combining algorithm similar
to that of Davidson and Fraser for converting UNCOL programs into machine
language for each kind of machine.

Ralph Johnson
[Isn't this the same model that GCC uses, fairly successfully?  In any event,
I wonder how well it addresses machines like the 360 and to some extent the
PDP-10 where there are register asymmetries, e.g. the famous 360 BXLE
instruction which does "r1 += r2, if r1 <= r3 goto x", but r2 and r3 have to
be in an even-odd register pair.  I know that IBM's PL.8 uses a similar
scheme to generate 370 code, but believe that it has a lot of hints to help
put things in the right registers.  I know that such instructions are now
considered old-fashioned, but there are a lot of old-fashioned machines. -John]
certain registers in 
[From Ralph Johnson <johnson@p.cs.uiuc.edu>]
--
Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

pardo@june.cs.washington.edu (David Keppel) (05/13/89)

Ralph Johnson <johnson@p.cs.uiuc.edu> writes:
>[Fraser/Davidson RTL IR]
>[moderator: isn't that what gcc uses (successfully)?]

My compilers prof has pointed out to me that matching for
optimizations and instruction selection in an RTL (Fraser/Davidson,
IBM PL.801) is a special case of matching in a DAG (directed acyclic
graph) -structured ir (Graham/Glanville/Henry).

One important distinction is that all matching in an RTL is done at
the level of things that usually look like RISC machine instructions.
Thus the code quality is better when there is a good match between the
instructions in the machine-independent RTL and the instructions in
the physical hardware.

	;-D on  ( The code degenerator )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
--
Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

compilers-sender@ima.ima.isc.com (05/18/89)

> The work of Davidson and Fraser points the way to a practical UNCOL.
> We have been using a register transfer langauge as an intermediate
> language in an optimizing compiler for Smalltalk.  Davidson and Fraser
> designed a machine independent peephole optimizer that would translate
> a machine language program into register transfers, optimize the
> resulting program, and combine the transfers back into machine language.

But an RTL really only addresses part of the UNCOL problem, namely one part
of the executable code.  There are some knotty problems in the data side--
think about the usual nasties like FORTRAN EQUIVALENCE and other problems
of defining the "memory model" assumed by the intermediate language.  Para-
meter passing has been another sticking point--although maybe machines are
more similar now than they were 10-15 years ago.  (Does anyone still use
copy/restore for FORTRAN?)

I'll grant that their work *does* suggest some useful progress for the part
of the UNCOL problem it does address.

The trouble with trying to figure out why various UNCOL attempts didn't
succeed is that it's usually not possible to point to any single area and
say "THAT's where the big problem is; that's why it didn't work."  It tends
to be a myriad of little problems that overwhelms a putative solution as
it tries to migrate from experimental to practical.  (That's also why I was
fussing earlier about OSF's two-machine/one-language requirement--it may
not be enough to find problems.)
---
Dick Dunn      UUCP: {ncar,nbires}!ico!rcd           (303)449-2870
[Last time I checked, IBM 370s use copy/restore because the lack of indirect
addressing makes direct reference slow. I suggest that one of the ANDF target
machines for the demo should be one with a wordsize not a power of two. -John]
--
Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

johnson@p.cs.uiuc.edu (Ralph Johnson) (05/18/89)

Question from editor:
[Isn't this the same model that GCC uses, fairly successfully?  In any event,
I wonder how well it addresses machines like the 360 and to some extent the
PDP-10 where there are register asymmetries, e.g. the famous 360 BXLE
instruction which does "r1 += r2, if r1 <= r3 goto x", but r2 and r3 have to
be in an even-odd register pair.  I know that IBM's PL.8 uses a similar
scheme to generate 370 code, but believe that it has a lot of hints to help
put things in the right registers.  I know that such instructions are now
considered old-fashioned, but there are a lot of old-fashioned machines. -John]

It is quite similar to the model that GCC uses.  GCC has also been influenced
by the work of Robert Henry at U. of Washington.

Either our compiler or PO will handle instructions that combine assignments
with changes of control flow.  I think that other table-driven code
generators, like Henry's, will also handle those kind of instructions.
The technique we use for register allocation will handle register asymmetries
with no problem.  I don't recall how PO did it.

In our compiler, converting register transfers into machine instructions
creates constraints on register allocation.  For example, when the register
transfer r3 = r1 + r2 gets converted to a 68020 ADD instruction, registers
r1 and r3 are constrained to be the same, because ADD is a two-address
instruction.  If this constraint cannot be met (i.e. if r1 is used later)
then the register allocator will have to introduce a new logical register.
We use a graph-coloring based register allocation algorithm.
In general, these kinds of algorithms are good at handling constraints.

Ralph Johnson
[From Ralph Johnson <johnson@p.cs.uiuc.edu>]
--
Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request