jsp@milton.u.washington.edu (Jeff Prothero) (10/23/90)
Question: Has anyone looked at coloring heuristics designed to minimize the number of copies needed by the resulting code? Background: The register allocation selected can affect the number of MOVE instructions generated, since by assigning two temps with consecutive lifetimes to the same register, one can avoid the need to do an explicit copy between them. I noticed this when I tried using Static Single Assignment analysis to produce a finer-grain register interference graph before coloring. That is, I break existing variables into smaller pieces connected by copies, so as to give the register colorer the option of not always keeping a given variable in registers, or of keeping it in different registers at different times. (If the colorer elects not to use this freedom, the copies will be optimized away later.) In general, increasing the number of degrees of freedom before searching for a solution should be a Good Thing, but in this case the extra degrees of freedom are bought at the cost of potentially introducing extra copy instructions. Unfortunately, existing coloring heuristics seem to be designed only to pack the registers as full as possible, ignoring the cost of the copies introduced by a given coloring. (While this effect is particularly obvious in the fine-grain interference graphs I have, it should always be a potential problem.) Further, existing coloring heuristics don't seem easily adaptable to the problem -- they propagate information roughly perpendicular to the direction needed to minimize copies. (I think one needs to do something like propagate preferred registers outward from values pinned in known registers, such as parameters passed or returned in registers. This propagates information between dense clusters in the interference graph, where typical heuristics handle these clusters independently.) Has anyone looked at this issue? Is it wrong-headed to do so? Should one have two separate sets of coloring heuristics, one for use when the registers are very full, and one for use when registers are going begging? Are there practical heuristics which can be tuned to handle both situations? (I presume simulated annealing, for example, could be so tuned, but would be too slow to be practical?) jsp@milton.u.washington.edu -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
markhall@pyrps5.pyramid.com (Mark Hall) (10/26/90)
In article <9010222019.AA27142@milton.u.washington.edu> Jeff Prothero <jsp@milton.u.washington.edu> writes: >Question: Has anyone looked at coloring heuristics designed to >minimize the number of copies needed by the resulting code? Yes. In fact, the original "Chaitin et al" paper describes an algorithm to coalesce nodes of an interference graph when there's a register copy operation between two registers. When we implemented our coloring algorithm for the Pyramid MIServer about 2 years ago, we did some further research into the problem. >Background: The register allocation selected can affect the number of MOVE >instructions generated, since by assigning two temps with consecutive >lifetimes to the same register, one can avoid the need to do an explicit copy >between them. The lifetimes can actually be overlapping. More on that later. >In general, increasing the number of degrees of freedom before searching for >a solution should be a Good Thing, but in this case the extra degrees of >freedom are bought at the cost of potentially introducing extra copy >instructions. Fortunately for us at Pyramid, the MIServer architecture has upwards of 48 registers available per stack frame. Our first cut at a coloring algorithm didn't consider copy elimination. However, the first time we compiled the source distribution with the coloring compiler and looked, there were *way* too many unused registers lying around. We figured we better do something else with them quickly before the hardware guys saw that we weren't using them anymore and tried to take them back :-). That's when we looked into the copy elimination problem. >Unfortunately, existing coloring heuristics seem to be >designed only to pack the registers as full as possible, ignoring the cost of >the copies introduced by a given coloring. The technique we chose interacted well with our coloring technique (which is priority-based). We use a priority-based copy elimination algorithm (yawn...). When you coalesce two nodes of the IG, you are doing it based on a particular copy operation that appears somewhere in the instruction stream you are analyzing. Well, if that copy instruction appears in a loop, you certainly want to get rid of it before you get rid of one that isn't in a loop. So you assign priorities to the copy operations, and then you coalesce nodes in the IG based on those priorities. The technique worked well for us. I think this was mostly because the MIServer is a 2-address architecture, which requires more register copies than a 3-address machine does. For example, the naive code generated for the C line: i = j + k; is: movw Rj,Rt #move register j into a temp reg addw Rk,Rt #add register k to the temp reg movw Rt,Ri #move the temp reg back to register i. Now with some effort, any decent code generator can produce: movw Rj,Ri #move register j into a Register i addw Rk,Ri #add register k to the temp reg But if "i" can share a register with either of k or j, it takes a copy-elimination register allocator to get rid of both of the register copies, yeilding: addw Rk,Rij or addw Rj,Rik To do this well, you have to play some games when building the interference graph. Specifically, if you are doing lifetime analysis on the instruction: i = k; And "i" remains alive after the instruction (ie, the lifetimes overlap), you might still be able to assign the same register to both i and k, so you don't want to put the edge (i,k) into your interference graph (IG) based on this instruction. Of course, the edge might eventually appear in the IG due to some *other* instruction, and that's OK, it just means you can't coalesce the nodes. >Has anyone looked at this issue? Is it wrong-headed to do so? Should one >have two separate sets of coloring heuristics, one for use when the registers >are very full, and one for use when registers are going begging? For us, since we have so many registers, it was appropriate to coalesce nodes without worrying about how it affected coloring. However, this may be true even if you have only a few registers. I say this because coalescing nodes of an IG doesn't affect the "colorability" of the graph in a linear way. That is, coalescing two nodes of the IG doesn't necessarily cause the chromatic number to increase by one. For the average program, you might be able to coalese 10 nodes before increasing the chromatic number. Moreover, if the copy instruction is in a loop, you want to get rid of it even if you raise the chromatic number beyond the number of available registers! (so long as this doesn't cause you to spill any registers of higher priority than the register copy instruction). I maintain that there's a good thesis in somehow quantifying this relationship. >Are there practical heuristics which can be tuned to handle both situations? >(I presume simulated annealing, for example, could be so tuned, but would be >too slow to be practical?) Now that I think of it, one simple thing to do is to check the combined number of incident edges which would result from coalescing two nodes. If this number is less than the number of available registers, then you can coalesce without worry. Watch out for a sinister sub-problem here. Even if you have an infinite number of registers, it's still an Extremely Hard Problem (I'll bet it's NP-complete) to coalesce IG nodes such that you minimize the "weights" of the remaining copy operations. You are constrained by the interference edges of the IG, and the problem gets really ugly. We looked at heuristic solutions using Min-cut/Max-flow algorithms (which are much better than annealing for this problem because you can retain an acceptable compile time using them) but none of these seemed to be worth the "extra effort" (ie, Pyramid doesn't pay me for pure research). -Mark Hall (smart mailer): markhall@pyrps5.pyramid.com (uucp paths): {ames|decwrl|sun|seismo}!pyramid!markhall -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.