preston@rice.edu (Preston Briggs) (11/23/89)
[replying to Bob Scheulen] Thanks for the (interesting!) reply. I'm also posting this to comp.compilers, so there'll be some extra English and some (perhaps overly-simplistic) examples now and then. I hope no one is offended. ----- One of the distinctions you make between Chow and Chaitin is that Chow allocates before code generation and Chaitin after. It's possible however, to write a Chow-style allocator that runs after instruction selection, as reported by Larus and Hilfinger (see the reference section at the end). Chow allocates early as an aid to retargetability, recognizing the loss in object code efficiency. In any case, it seems clear that allocation after instruction selection is superior in terms of the resulting object code. ---- Regarding lifetimes, Chaitin uses the same definition you propose. In the following example if x then a := 5 else a := 6 endif if y then b := a else c := a endif Chaitin would say that "a" is a single live range despite the multiple defintions and multiple uses. People at HP have used the term "web" which seems descriptive. In contrast, consider a := 5 b := a + c ... a := 10 d := a * q Here, we have two distinct live ranges that may be given different colors. (A note those who might misunderstand, both schemes attempt to allocate scalar variables and temporaries introduced by the optimizer. I just use variables in examples for clarity.) Chaitin explains all this clearly :-) in his '82 paper, with the simple phrase "usedef chaining plus getting the right number of names". Referring instead to the '81 paper, we find a rather more complete description in Section 3, "The Concept of Interference". This is an important section for all potential colorers; it explains the differences between interference and liveness analysis. That is, just because two registers are live (in a data flow sense) at some point, doesn't mean they interfere. An important example is the COPY instruction. rA := rB does not imply that rA interferes with rB, even if rB is still live (referenced again later). This leads us to... ------ Subsumption, or coalescing. Subsumption has lots of effects. Imagine we're faced with the copy above. If rA and Rb don't interfere, we can rename one live range to the other everywhere and delete the copy. Let's call the combination of the 2 live ranges rC. The degree of rC may be higher than deg(rA) or deg(rB) making it more difficult to color. Precisely, deg(rC) = deg(rA) + deg(rB) - card(neighbor(rA) intersect neighbor(rB)) That is, the neighbors that rA and rB have in common don't count against rC. Additionally, these common neighbors each have their degree lowered since they have one less neighbor (rC instead of rA and rB). So, we'd really like to coalesce nodes (live ranges) that share many neighbors in the graph. Practically, I coalesce everywhere I can; however, some people won't coalesce if the resulting degree is too high. I feel this is pessimistic since many nodes of high degree can be colored. Additionally, when the COPY instruction is deleted, other interferences may be eliminated. For example, rA := something rB := rA (1) rC := rB (2) write(rA, rB, rC) stop Obviously, rA, rB, and rC can be in the same register. However, rC and rA interfere with each other due to copy 2. After we coalesce rC and rB (which don't interfere), copy 2 is eliminated along with any interferences it induced. Now we can eliminate copy 1 achieving the desired result. (Remember that all these trivial examples extend across the entire procedure with the interference graph reflecting precisely the needed information.) Finally, subsumption is useful for handling register constraints. The usual example is a two address ADD instruction that requires that the destination register be the same as a source register. During coalescing, we simply attempt to coalesce the source and destination. If they interfere, then a copy instruction must be introduced later (which is why we all like 3-address instructions). A more complex example might be a MULtiply instruction that requires its operands be in 2 particular registers and writes its result in a particular register pair. Here we'd need to introduce copies (say, during instruction selection) from the operand temporaries to the required machine registers. Additionally, we'd introduce a copy from the least significant result register to the result temporary. Then, while building the interference graph, we'd nore that the multiply instruction causes all live temporaries to interfere with the various machine registers. Finally, while coalescing, we'd attempt to get rid of all the copies. This sounds like lots of work, but it can all be table driven and falls out as a natural bonus of doing coalescing. The effect is that the relevant temporaries tend to end up "pre-colored" to required color. It's easy to imagine worse cases, but most can be handled naturally. You also allude to the fact that some registers may be more expensive for certain instructions than others (at least, I think that's what you meant). This is more difficult to handle. My tendency would be to punt and blame the architect. I'm happy to get *any* coloring. Generally, I'd think that minimizing spills would be the dominant priority. But, supposing I had to work on such a machine, I'd take the approach you suggest and order the nodes as they are removed from the graph, so that more important live ranges tend to be colored 1st. When simplifying the graph, Chaitin looks for any node of low degree (less than k, the number of registers). We could simply be pickier and say, "from the nodes of low degree, choose the least important". Note that this will blow your linear time bound for coloring. You could keep O(n log n) by maintaining the nodes of low degree in a heap. Additionally, it conflicts with the heuristic suggested by Bernstein, et al. of choosing the node with the greatest degree less than k. ------- Where do all the copies come from? In our compiler, there *are* lots of silly copies. We don't worry about them earlier because we know the register allocator will get 'em. It's particularly nice because it can do a thorough job, globally and safely (without the need for lots of special case code). (Note also that local analysis, say at code generation time or peephole optimization time, cannot get rid of *all* the useless copies. You need global analysis.) Other copies come from handling certain machine constraints (as above). The most important source however, comes from handling the procedure calling convention. On many machines, the calling conventions dictate that some number of the parameters will be passed in certain machine registers. We handle this by computing all the arguments in temporaries and the inserting copies from each temporary to the appropriate machine register. Later, subsumption will tend to force each of the arguments to be computed in the correct register. This scheme is especially simple in the presence of lots of optimization which tends to make other methods difficult. ------------ Looking again at Chow, you're right that he won't spill given zillions of registers. I had mis-remembered another feature of his algorithm. I had also forgotten about partitioning the registers into local and global. I agree that it looks wrong compared to Chaitin. Note that Larus and Hilfinger's version eliminates the local allocation phase in favor of splitting the basic blocks into (quite small) pieces. This also improves the quality of the interference graph. Note however, that they still find it useful to reserve 2 registers to aid in in inserting the required loads and stores. A friend examining the assembly from the MIPS (think Chow) fortran compiler reports that they also seem to split basic blocks, though into relatively large chunks. ---------- Finishing up, ... Looking briefly at the code for GCC, they do color, though they don't seem to do any live range splitting or coalescing. Additionally, I'm not sure how detailed the interference graph is (laziness on my part, as the source is right there). I also do code selection using ideas from Davidson and Fraser (in particular, the HOP paper). Of course, for RISC machines it's exceptionally easy. Spilling often gives sub-optimimal code. The easy example is a very long live range, spanning many basic blocks. If we have to spill it because one of the basic blocks is especially crowded, then Chaitin will spill it everywhere, even in blocks where there's very little register pressure. Chow would split the live range into several pieces, isolating the "high pressure block". The isolated part is later spilled and the rest of the live range can be colored. I think you want enough registers and regularity that you can select instructions with impunity. That is, select 1st, color later. Machines where this seems a bad idea are machines that we should avoid. (No smiley) Optimal instruction selection is NP-Complete. Optimal register allocation is NP-Complete. If you stir them together, it doesn't get easier. An adequate register set provides the buffer needed to "separate our concerns." I've never read McKusick's thesis (though people here have said good things about it. I've looked at Cattell some (TN-Binding). It might be cool for constrained machines. I've gotten the impression in (very limited) converstaions, that coloring had pretty much replaced TN-Binding as the method of choice, even at Tartan. ------------- Since many people have asked, ... There are several papers for Chaitin's method. None of them are really a summary though, so you want to get them all. Register Allocation via Coloring Chaitin, Auslander, Chandra, Cocke, Hopkins, Markstein Computer Languages 6 (1981) pp. 47-57 Register Allocation and Spilling via Coloring Chaitin Proceedings of the Sigplan '82 Symposium on Compiler Construction pp. 98-105 Spill Code Minimization Techniques for Optimizing Compilers Bernstein, Golumbic, Manour, Pinter, Goldin, Krawczyk, Nahshon Proceedings of the Sigplan '89 Conference on Programming Language Design and Implementation pp. 258-263 Coloring Heuristics for Register Allocation Briggs, Cooper, Kennedy, Torczon Proceedings of the Sigplan '89 Conference on Programming Language Design and Implementation pp. 275-284 Chow's method is described in 2 papers Register Allocation by Priority-based Coloring Chow and Hennessy Proceedings of the Sigplan '84 Symposium on Compiler Construction pp. 222-231 Register Allocation in the SPUR Lisp Compiler Larus and Hilfinger Proceedings of the Sigplan '86 Symposium on Compiler Construction pp. 255-263 You could also get Fred Chow's thesis from Stanford. ------- Reading over your mail, I'm slightly concerned that I may have misunderstood what you meant by "register constraints." If so, give me a hint and I'll try again. Preston Briggs preston@titan.rice.edu -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.