[comp.compilers] more Chaitin and Chow

preston@rice.edu (Preston Briggs) (11/23/89)
[replying to Bob Scheulen]

Thanks for the (interesting!) reply.
I'm also posting this to comp.compilers, so there'll be
some extra English and some (perhaps overly-simplistic) 
examples now and then.  I hope no one is offended.

-----

One of the distinctions you make between Chow and Chaitin is that Chow
allocates before code generation and Chaitin after.  It's possible however,
to write a Chow-style allocator that runs after instruction selection, as
reported by Larus and Hilfinger (see the reference section at the end).

Chow allocates early as an aid to retargetability, recognizing the loss in
object code efficiency.

In any case, it seems clear that allocation after instruction selection
is superior in terms of the resulting object code.

----

Regarding lifetimes, Chaitin uses the same definition you propose.  In the
following example

	if x then
	    a := 5
	else
	    a := 6
	endif
	if y then
	    b := a
	else
	    c := a
	endif

Chaitin would say that "a" is a single live range despite the multiple
defintions and multiple uses.  People at HP have used the term "web" which
seems descriptive.  In contrast, consider

	a := 5
	b := a + c
	...
	a := 10
	d := a * q

Here, we have two distinct live ranges that may be given different colors.

(A note those who might misunderstand, both schemes attempt to allocate
scalar variables and temporaries introduced by the optimizer.  I just use
variables in examples for clarity.)

Chaitin explains all this clearly :-) in his '82 paper, with the simple
phrase "usedef chaining plus getting the right number of names".  Referring
instead to the '81 paper, we find a rather more complete description in
Section 3, "The Concept of Interference".  This is an important section for
all potential colorers; it explains the differences between interference and
liveness analysis.  That is, just because two registers are live (in a data
flow sense) at some point, doesn't mean they interfere.

An important example is the COPY instruction.

	rA := rB

does not imply that rA interferes with rB, even if rB is still live
(referenced again later).  This leads us to...

------

Subsumption, or coalescing.

Subsumption has lots of effects.  Imagine we're faced with the copy above.
If rA and Rb don't interfere, we can rename one live range to the other
everywhere and delete the copy.  Let's call the combination of the 2 live
ranges rC.  The degree of rC may be higher than deg(rA) or deg(rB) making it
more difficult to color.  Precisely,

     deg(rC) = deg(rA) + deg(rB) - card(neighbor(rA) intersect neighbor(rB))

That is, the neighbors that rA and rB have in common don't count against rC.
Additionally, these common neighbors each have their degree lowered since
they have one less neighbor (rC instead of rA and rB).  So, we'd really like
to coalesce nodes (live ranges) that share many neighbors in the graph.

Practically, I coalesce everywhere I can; however, some people won't coalesce
if the resulting degree is too high.  I feel this is pessimistic since many
nodes of high degree can be colored.

Additionally, when the COPY instruction is deleted, other interferences may
be eliminated.  For example,

	rA := something
	rB := rA		(1)
	rC := rB		(2)
	write(rA, rB, rC)
	stop

Obviously, rA, rB, and rC can be in the same register.  However, rC and rA
interfere with each other due to copy 2.  After we coalesce rC and rB (which
don't interfere), copy 2 is eliminated along with any interferences it
induced.  Now we can eliminate copy 1 achieving the desired result.

(Remember that all these trivial examples extend across the entire procedure
with the interference graph reflecting precisely the needed information.)

Finally, subsumption is useful for handling register constraints.  The usual
example is a two address ADD instruction that requires that the destination
register be the same as a source register.  During coalescing, we simply
attempt to coalesce the source and destination.  If they interfere, then a
copy instruction must be introduced later (which is why we all like 3-address
instructions).

A more complex example might be a MULtiply instruction that requires its
operands be in 2 particular registers and writes its result in a particular
register pair.  Here we'd need to introduce copies (say, during instruction
selection) from the operand temporaries to the required machine registers.
Additionally, we'd introduce a copy from the least significant result
register to the result temporary.  Then, while building the interference
graph, we'd nore that the multiply instruction causes all live temporaries to
interfere with the various machine registers.  Finally, while coalescing,
we'd attempt to get rid of all the copies.  This sounds like lots of work,
but it can all be table driven and falls out as a natural bonus of doing
coalescing.  The effect is that the relevant temporaries tend to end up
"pre-colored" to required color.

It's easy to imagine worse cases, but most can be handled naturally.  You
also allude to the fact that some registers may be more expensive for certain
instructions than others (at least, I think that's what you meant).  This is
more difficult to handle.  My tendency would be to punt and blame the
architect.  I'm happy to get *any* coloring.  Generally, I'd think that
minimizing spills would be the dominant priority.  

But, supposing I had to work on such a machine, I'd take the approach you
suggest and order the nodes as they are removed from the graph, so that more
important live ranges tend to be colored 1st.  When simplifying the graph,
Chaitin looks for any node of low degree (less than k, the number of
registers).  We could simply be pickier and say, "from the nodes of low
degree, choose the least important".  

Note that this will blow your linear time bound for coloring.  You could keep
O(n log n) by maintaining the nodes of low degree in a heap.  Additionally,
it conflicts with the heuristic suggested by Bernstein, et al. of choosing
the node with the greatest degree less than k.

-------

Where do all the copies come from?  In our compiler, there *are* lots of
silly copies.  We don't worry about them earlier because we know the register
allocator will get 'em.  It's particularly nice because it can do a thorough
job, globally and safely (without the need for lots of special case code).

(Note also that local analysis, say at code generation time or peephole
optimization time, cannot get rid of *all* the useless copies.  You need
global analysis.)

Other copies come from handling certain machine constraints (as above).  The
most important source however, comes from handling the procedure calling
convention.  On many machines, the calling conventions dictate that some
number of the parameters will be passed in certain machine registers.  We
handle this by computing all the arguments in temporaries and the inserting
copies from each temporary to the appropriate machine register.  Later,
subsumption will tend to force each of the arguments to be computed in the
correct register.  This scheme is especially simple in the presence of lots
of optimization which tends to make other methods difficult.

------------

Looking again at Chow, you're right that he won't spill given zillions of
registers.  I had mis-remembered another feature of his algorithm.

I had also forgotten about partitioning the registers into local and global.
I agree that it looks wrong compared to Chaitin.  Note that Larus and
Hilfinger's version eliminates the local allocation phase in favor of
splitting the basic blocks into (quite small) pieces.  This also improves the
quality of the interference graph.  Note however, that they still find it
useful to reserve 2 registers to aid in in inserting the required loads and
stores.  A friend examining the assembly from the MIPS (think Chow) fortran
compiler reports that they also seem to split basic blocks, though into
relatively large chunks.

----------

Finishing up, ...

Looking briefly at the code for GCC, they do color, though they don't seem to
do any live range splitting or coalescing.  Additionally, I'm not sure how
detailed the interference graph is (laziness on my part, as the source is
right there).

I also do code selection using ideas from Davidson and Fraser (in particular,
the HOP paper).  Of course, for RISC machines it's exceptionally easy.

Spilling often gives sub-optimimal code.  The easy example is a very long
live range, spanning many basic blocks.  If we have to spill it because one
of the basic blocks is especially crowded, then Chaitin will spill it
everywhere, even in blocks where there's very little register pressure.  Chow
would split the live range into several pieces, isolating the "high pressure
block".  The isolated part is later spilled and the rest of the live range
can be colored.

I think you want enough registers and regularity that you can select
instructions with impunity.  That is, select 1st, color later.  Machines
where this seems a bad idea are machines that we should avoid. (No smiley) 

Optimal instruction selection is NP-Complete.  Optimal register allocation is
NP-Complete.  If you stir them together, it doesn't get easier.  An adequate
register set provides the buffer needed to "separate our concerns."  

I've never read McKusick's thesis (though people here have said good things
about it.  I've looked at Cattell some (TN-Binding).  It might be cool for
constrained machines.  I've gotten the impression in (very limited)
converstaions, that coloring had pretty much replaced TN-Binding as the
method of choice, even at Tartan.

-------------

Since many people have asked, ...

There are several papers for Chaitin's method.  None of them are really a
summary though, so you want to get them all.

	Register Allocation via Coloring
	Chaitin, Auslander, Chandra, Cocke, Hopkins, Markstein
	Computer Languages 6 (1981)
	pp. 47-57

	Register Allocation and Spilling via Coloring
	Chaitin
	Proceedings of the Sigplan '82 Symposium on Compiler Construction
	pp. 98-105

	Spill Code Minimization Techniques for Optimizing Compilers
	Bernstein, Golumbic, Manour, Pinter, Goldin, Krawczyk, Nahshon
	Proceedings of the Sigplan '89 Conference on 
	Programming Language Design and Implementation
	pp. 258-263

	Coloring Heuristics for Register Allocation
	Briggs, Cooper, Kennedy, Torczon
	Proceedings of the Sigplan '89 Conference on 
	Programming Language Design and Implementation
	pp. 275-284

Chow's method is described in 2 papers

	Register Allocation by Priority-based Coloring
	Chow and Hennessy
	Proceedings of the Sigplan '84 Symposium on Compiler Construction
	pp. 222-231

	Register Allocation in the SPUR Lisp Compiler
	Larus and Hilfinger
	Proceedings of the Sigplan '86 Symposium on Compiler Construction
	pp. 255-263

You could also get Fred Chow's thesis from Stanford.

-------

Reading over your mail, I'm slightly concerned that I may have misunderstood
what you meant by "register constraints."  If so, give me a hint and I'll try
again.

Preston Briggs
preston@titan.rice.edu
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.