[comp.compilers] Reg. Alloc. - Graph Coloring - Copy Minimization

jsp@milton.u.washington.edu (Jeff Prothero) (10/23/90)

Question: Has anyone looked at coloring heuristics designed to
minimize the number of copies needed by the resulting code?

Background: The register allocation selected can affect the number of MOVE
instructions generated, since by assigning two temps with consecutive
lifetimes to the same register, one can avoid the need to do an explicit copy
between them.  I noticed this when I tried using Static Single Assignment
analysis to produce a finer-grain register interference graph before
coloring.  That is, I break existing variables into smaller pieces connected
by copies, so as to give the register colorer the option of not always
keeping a given variable in registers, or of keeping it in different
registers at different times.  (If the colorer elects not to use this
freedom, the copies will be optimized away later.)

In general, increasing the number of degrees of freedom before searching for
a solution should be a Good Thing, but in this case the extra degrees of
freedom are bought at the cost of potentially introducing extra copy
instructions.  Unfortunately, existing coloring heuristics seem to be
designed only to pack the registers as full as possible, ignoring the cost of
the copies introduced by a given coloring.  (While this effect is
particularly obvious in the fine-grain interference graphs I have, it should
always be a potential problem.)  Further, existing coloring heuristics don't
seem easily adaptable to the problem -- they propagate information roughly
perpendicular to the direction needed to minimize copies.  (I think one needs
to do something like propagate preferred registers outward from values pinned
in known registers, such as parameters passed or returned in registers.  This
propagates information between dense clusters in the interference graph,
where typical heuristics handle these clusters independently.)

Has anyone looked at this issue?  Is it wrong-headed to do so?  Should one
have two separate sets of coloring heuristics, one for use when the registers
are very full, and one for use when registers are going begging?  Are there
practical heuristics which can be tuned to handle both situations?  (I
presume simulated annealing, for example, could be so tuned, but would be too
slow to be practical?)

 jsp@milton.u.washington.edu
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

markhall@pyrps5.pyramid.com (Mark Hall) (10/26/90)

In article <9010222019.AA27142@milton.u.washington.edu> Jeff Prothero <jsp@milton.u.washington.edu> writes:
>Question: Has anyone looked at coloring heuristics designed to
>minimize the number of copies needed by the resulting code?

Yes.  In fact, the original "Chaitin et al" paper describes an algorithm to
coalesce nodes of an interference graph when there's a register copy
operation between two registers.  When we implemented our coloring 
algorithm for the Pyramid MIServer about 2 years ago, we did some
further research into the problem.

>Background: The register allocation selected can affect the number of MOVE
>instructions generated, since by assigning two temps with consecutive
>lifetimes to the same register, one can avoid the need to do an explicit copy
>between them.  

The lifetimes can actually be overlapping.  More on that later.

>In general, increasing the number of degrees of freedom before searching for
>a solution should be a Good Thing, but in this case the extra degrees of
>freedom are bought at the cost of potentially introducing extra copy
>instructions.  

Fortunately for us at Pyramid, the MIServer architecture has upwards of
48 registers available per stack frame.  Our first cut at a coloring
algorithm didn't consider copy elimination.  However, the first time we
compiled the source distribution with the coloring compiler and looked,
there were *way* too many unused registers lying around.  We figured we
better do something else with them quickly before the hardware guys saw
that we weren't using them anymore and tried to take them back :-).  That's
when we looked into the copy elimination problem.

>Unfortunately, existing coloring heuristics seem to be
>designed only to pack the registers as full as possible, ignoring the cost of
>the copies introduced by a given coloring.

The technique we chose interacted well with our coloring technique
(which is priority-based).  We use a priority-based copy elimination
algorithm (yawn...).  When you coalesce two nodes of the IG, you are
doing it based on a particular copy operation that appears somewhere in
the instruction stream you are analyzing.  Well, if that copy
instruction appears in a loop, you certainly want to get rid of it
before you get rid of one that isn't in a loop.  So you assign
priorities to the copy operations, and then you coalesce nodes in the
IG based on those priorities.

The technique worked well for us.  I think this was mostly because
the MIServer is a 2-address architecture, which requires more register
copies than a 3-address machine does.  For example, the naive code generated 
for the C line:

	i = j + k;
is:
	movw	Rj,Rt	#move register j into a temp reg
	addw	Rk,Rt	#add register k to the temp reg
	movw	Rt,Ri	#move the temp reg back to register i.

Now with some effort, any decent code generator can produce:

	movw	Rj,Ri	#move register j into a Register i
	addw	Rk,Ri	#add register k to the temp reg

But if "i" can share a register with either of k or j,
it takes a copy-elimination register allocator to get rid of both of
the register copies, yeilding:

	addw	Rk,Rij
or
	addw	Rj,Rik

To do this well, you have to play some games when building the 
interference graph.  Specifically, if you are doing lifetime analysis
on the instruction:

	i = k;

And "i" remains alive after the instruction (ie, the lifetimes overlap), you
might still be able to assign the same register to both i and k, so
you don't want to put the edge (i,k) into your interference graph (IG)
based on this instruction.  Of course, the edge might eventually appear in the
IG due to some *other* instruction, and that's OK, it just means you can't
coalesce the nodes.

>Has anyone looked at this issue?  Is it wrong-headed to do so?  Should one
>have two separate sets of coloring heuristics, one for use when the registers
>are very full, and one for use when registers are going begging?  

For us, since we have so many registers, it was appropriate to coalesce nodes
without worrying about how it affected coloring.  However, this may be true
even if you have only a few registers.  I say this because coalescing nodes
of an IG doesn't affect the "colorability" of the graph in a linear way.
That is, coalescing two nodes of the IG doesn't necessarily cause the
chromatic number to increase by one.  For the average program, you might be
able to coalese 10 nodes before increasing the chromatic number.  Moreover,
if the copy instruction is in a loop, you want to get rid of it even if you
raise the chromatic number beyond the number of available registers! (so long
as this doesn't cause you to spill any registers of higher priority than the
register copy instruction).  I maintain that there's a good thesis in somehow
quantifying this relationship.

>Are there practical heuristics which can be tuned to handle both situations?
>(I presume simulated annealing, for example, could be so tuned, but would be
>too slow to be practical?)

Now that I think of it, one simple thing to do is to check the combined 
number of incident edges which would result from coalescing two nodes.  If
this number is less than the number of available registers, then you can
coalesce without worry.

Watch out for a sinister sub-problem here.  Even if you have an
infinite number of registers, it's still an Extremely Hard Problem
(I'll bet it's NP-complete) to coalesce IG nodes such that you minimize
the "weights" of the remaining copy operations.  You are constrained by
the interference edges of the IG, and the problem gets really ugly.  We
looked at heuristic solutions using Min-cut/Max-flow algorithms (which 
are much better than annealing for this problem because you can retain an
acceptable compile time using them) but none of these seemed to be worth
the "extra effort" (ie, Pyramid doesn't pay me for pure research).

-Mark Hall (smart mailer): markhall@pyrps5.pyramid.com
	   (uucp paths): {ames|decwrl|sun|seismo}!pyramid!markhall
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.