dgb@cs.washington.edu (David Bradlee) (11/23/89)
This is a response to Preston Briggs' article from a couple of days ago. It's from a friend with whom I've worked at Microsoft. *************************************************** From microsoft!bobs@beaver Tue Nov 21 15:34:39 1989 Reply-to: bobs@bobs (Bob Scheulen) Hi - A friend of mine sent me a copy of one of your news postings because I've been looking at both Chatin & Chow. I don't read news so appologies if I'm out of date on the discussion. Backgroud: We (MS) have implemented a scheme like Chow's for a 8086/286 C compiler. There are a few major differences between us and Chow: first, instead of assuming the live range is the whole procedure, we split a variable into its true lifetimes (see note below about what I mean by that). Secondly, we don't split ranges at all, we just give up (except we have a special hack for loops...). Definition of a lifetime: the set of all definitions and references to a variable which MUST be the same variable (and so must get the same register). This corresponds the the definition of liveness used in global optimization (in fact we discover the lifetime by tracing thru all contiguous blocks where the variable is live until no more can be added --yes I know this is ineffecient). Under this definition of lifetime, a variable could be split into its separate lifetimes in the user source (eg. a user variable "i" could be change to "i0", "i1" etc.). I point this out, because it does not agree with either Chatin or Chow. I admit I don't understand the implications (or reason) for Chatin's definition. I am not sure I even understand what Chatin means, but he apparently limits a live range to those uses which only one definition reaches. I assume this means that uses which can be reached by multiple definitions are in their own live range. One would think this could create tons of copies, but maybe thats why I don't understand why subsumption is good for much. Apologies if this didn't make much sense. Specific comments: >>The big win for Chaitin is the coalescing (subsumption) phase he uses ... In terms of coloring, it would seem that all subsumption can do is make things less colorable, because it is effectively extending the live range. For example, isn't it possible that the first pseudo register (the src of the copy) can be colored while the other can't? I also don't understand where all these copies come from (except possible as a result of the definition of live range Chatin uses..see above disscussion). The other obvious cases are silly copies generated by the code generator because it needs to preserve the original register contents. We solve this problem by a lot of special case code in the code generator so that it never generates useless copies. If we didn't do this we would get extra demand on the scratch register (remember that Chow partitions the register set) which could cause extra spills, but I think a smart peephole would be able to clean that up. It would seem that Chatin could so subsumtion after coloring (assuming I'm right that subsumption has no positive effect on colorability). >>If you have a zillion registers, >>you never spill with either method, so they're equal in that case. > >Chaitin won't spill, but Chow will in some cases. A very long live range, >with very few references, will be split and some portions spilled. >Additionally, the lack of coalescing in Chow causes different code (extra >copies). Finally, the coarser interference graph and inferior coloring >heuristic produce worse colorings. Practically (less than a zillion >registers), this means that Chow will spill when Chaitin can still color. If you had a zillion registers Chow won't spill. Chow does not spilt ranges until after coloring fails (Chow P.81). Chow claims to be able to deal with smaller than basic block granularity, by just splitting the basic block in sub blocks (see Chows thesis P.73). I certainly wouldn't defend this as a good solution. Chatin's fine-grained interference is clearly better. As previously discussed, as far as I know Chow doesn't need coalescing. In terms of working with a machine with few registers, I think Chatin has the advantage because he doesn't partition the register set. Chatin (or any post-generation coloring) allows you to look at the whole problem of allocation at once. The other alternative is McCusisk's variable partition, or Catells prepass to guess what scratch registers the codegen will need. Neither of these alternatives account for the possibility of different code sequences you can generate based on the availability of extra scratch registers (because Catell regenerates, he could use extra registers, but he has no cost analysis to determine whether to leave extra registers free). Note that this argument ignores the problem of constrained register assignment. The fact that the register usage is constrained (on the 86-family) is a problem no matter which technique you use. Our solution in the framework of Chow was twofold: first you must constrain each nodes to interfere with whatever fixed registers its range interferes with. This is no different that the solution proposed by Chatin. The other part is to give each range its "best" register. I didn't write the code, so I can't say how much of this we do, but clearly Chow has a slight advantage here because register are allocated in "maximum benefit" order (so the maximum benefit variable gets it first choice of registers). An obvious hack to Chatin would be to add an extra pass to reassign the colors in the same maximum benefit order. Or maybe you could always push nodes on the stack in spillcost order (even if they're unconstrained)..or..I'm not familiar enough with Chatin to really know. Even if you did this you still have to deal with two problems: do you assume you have an infinite number of registers in each constraint class, and if you do how do you correct the code when you have to spill (which on the 86/286 is often). Our reason for using Chow in the first place was because this problem seemed very difficult. On the 386 where the register set is small and the constraint problem isn't so bad, Chatin may well be a good choice. I belive GCC (Gnu) uses some kind of post-generation coloring (if they're really like Davidson- Fraser, then it's probably Frieberghouse's use count scheme). In the little fiddling I did with GCC it generated quite good code. From my point of view the advantage of Chatin (or any post-generation scheme) is in avoiding partitioning and giving more accurate cost analysis. The latter is because, the 386 being a CISC machine has complex addressing modes and alternate instruction sequences that demand different number of registers, so determining the benefit of register assigment by scanning il (as we do with Chows pre-generation scheme) make cost analysis difficult. I think the Chow .VS. Chatin could be summarized as: Chow: pre-generation(color on intermedite, partitioned register set) goal directed allocation uses range splitting Chatin: post-generation (color on instructions, uniform register usage) goal directed spilling. range is true live range, but no other splitting done. And the questions become: Which method has better information to do the cost analysis? Which method deals best with the "registers affect the code, code affects the registers" probem? When does spilling give sub-optimial results? What is the effect of range splitting? etc. - Bob Scheulen -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.
larus@primost.wisc.EDU (James Larus) (11/28/89)
Just a note on Bob Scheulen's (microsoft!bobs@beaver) article. There's no reason why Chow's algorithm can't be post-generation (i.e., allocate register for the actual instruction set). I used it that way in the SPUR Lisp compiler. Of course, assembly code for RISC machines looks a lot like intermediate code. /Jim -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.
cik@l.cc.purdue.edu (Herman Rubin) (02/25/90)
In article <1990Feb20.155619.3121@esegue.segue.boston.ma.us>, napi@rangkom.MY (Mohd Hanafiah b. Abdullah) writes: > How does one perform register allocation on scalar variables that may > be pointed to by others? > > I am curious because, a pointer assumes that the variable it points to > resides in memory, but in actuality the variable resides in a register. > > I am sure there is a solution to this problem and has been asked before > in the newsgroup, but could you please e-mail the answer to me anyway? Unfortunately, this is not the case. I have frequently wanted to do this, and I have no difficulty using machine language. The only mainframes whose instruction set I have known which are appropriately set up in hardware for this are the long obsolete UNIVAC 1108 and 1110. There are some I have seen for which this can be done with difficulty, but it would not be faster than keeping the items in memory. [The PDP-10/DEC-20 has addressable registers as well, but as pointed out elsewhere that's only a small part of the problem. As soon as you call another routine, it's liable to save the register and put something else there, and you lose. -John] -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.
wendyt@cs.washington.edu (Wendy Thrash) (02/27/90)
>[The PDP-10/DEC-20 has addressable registers as well, but as pointed out >elsewhere that's only a small part of the problem. As soon as you call >another routine, it's liable to save the register and put something else >there, and you lose. -John] The Pyramid architecture (Pyramid Technology) combines addressable registers (which take a bit of work to address, but not _too_ much) with register windows. As long as variables that have had their addresses taken are assigned to parameter registers or local registers, but not temporary registers, their registers are not accessible to called routines. (More precisely, they cannot be accessed in any quasi-legitimate way by a subroutine, and certainly will not be accessed by the compilers.) I believe their architecture solves the problems discussed here, but I've been told that the hardware cost to make this register addressing work is significant. (or perhaps it was "enormous") Disclaimer: Perhaps I should say "our" architecture, since I may technically still be an employee of Pyramid Technology, on leave. Maybe my old boss will see this and let me know whether I still "work" there. :-) -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.
Moss@cs.umass.edu (Eliot Moss) (02/27/90)
In article <1990Feb20.155619.3121@esegue.segue.boston.ma.us> napi@rangkom.MY (Mohd Hanafiah b. Abdullah) writes:
How does one perform register allocation on scalar variables that may
be pointed to by others?
I am curious because, a pointer assumes that the variable it points to
resides in memory, but in actuality the variable resides in a register.
Well, you either do not allocate it to a register, or you propagate its value
to its memory home whenever (a) it may have changed since the last time you
did so, and (b) there is a possibility that the memory value may be picked up
through an aliasing pointer. If (a) and (b) occur together often enough, then
there is little benefit to putting the variable in a register, but if they
occur rarely compared with use of the variables, then there may be benefit. Of
course, you could use a machine such as the DEC PDP-10 where the registers
*are* addressable memory (the first 16 locations) ... :-) Eliot
--
J. Eliot B. Moss, Assistant Professor
Department of Computer and Information Science
Lederle Graduate Research Center
University of Massachusetts
Amherst, MA 01003
(413) 545-4206; Moss@cs.umass.edu
--
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.
dds@cc.ic.ac.uk (Diomidis Spinellis) (02/28/90)
In article <1990Feb24.210757.4202@esegue.segue.boston.ma.us> cik@l.cc.purdue.edu (Herman Rubin) writes: >In article <1990Feb20.155619.3121@esegue.segue.boston.ma.us>, napi@rangkom.MY (Mohd Hanafiah b. Abdullah) writes: >> How does one perform register allocation on scalar variables that may >> be pointed to by others? [...] >and I have no difficulty using machine language. The only mainframes whose >instruction set I have known which are appropriately set up in hardware for >this are the long obsolete UNIVAC 1108 and 1110. [...] >[The PDP-10/DEC-20 has addressable registers as well, but as pointed out >elsewhere that's only a small part of the problem. As soon as you call >another routine, it's liable to save the register and put something else >there, and you lose. -John] Addressing registers is possible in the Texas Instruments TM 990 series of minis and TMS-9900, TMS-9950 microprocessors. All 16 general purpose registers are in a memory block pointed by a single processor register the workspace pointer. With some care one can implement register windowing. Saving the registers is then just a change of the workspace pointer; pointers that point to registers in previous stack frames still point to the correct object. On the 9995 the registers can be located on on-chip RAM, so the performance is not affected by the cost of external memory accesses. -- Diomidis Spinellis Internet: dds@cc.ic.ac.uk Department of Computing UUCP: ...!ukc!iccc!dds Imperial College JANET: dds@uk.ac.ic.cc London SW7 2BZ -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.