[comp.compilers] Register Allocation

dgb@cs.washington.edu (David Bradlee) (11/23/89)

This is a response to Preston Briggs' article from a couple of days
ago.  It's from a friend with whom I've worked at Microsoft.

***************************************************
From microsoft!bobs@beaver Tue Nov 21 15:34:39 1989
Reply-to: bobs@bobs (Bob Scheulen)

Hi - A friend of mine sent me a copy of one of your news postings
because I've been looking at both Chatin & Chow.  I don't read news
so appologies if I'm out of date on the discussion.

Backgroud:
We (MS) have implemented a scheme like Chow's for a 8086/286 C compiler.
There are a few major differences between us and Chow: first, instead 
of assuming the live range is the whole procedure, we split a variable
into its true lifetimes (see note below about what I mean by that).
Secondly, we don't split ranges at all, we just give up (except we have
a special hack for loops...).

Definition of a lifetime: the set of all definitions and references to
a variable which MUST be the same variable (and so must get the same
register).  This corresponds the the definition of liveness used in
global optimization (in fact we discover the lifetime by tracing thru
all contiguous blocks where the variable is live until no more can be added
--yes I know this is ineffecient).  Under this definition of lifetime,
a variable could be split into its separate lifetimes in the user source
(eg. a user variable "i" could be change to "i0", "i1" etc.).  I point
this out, because it does not agree with either Chatin or Chow.  I admit
I don't understand the implications (or reason) for Chatin's definition.
I am not sure I even understand what Chatin means, but he apparently limits
a live range to those uses which only one definition reaches.  I assume
this means that uses which can be reached by multiple definitions are in
their own live range.  One would think this could create tons of copies,
but maybe thats why I don't understand why subsumption is good for much.
Apologies if this didn't make much sense.

Specific comments:

>>The big win for Chaitin is the coalescing (subsumption) phase he uses ...

In terms of coloring, it would seem that all subsumption can do is make
things less colorable, because it is effectively extending the live
range.  For example, isn't it possible that the first pseudo register
(the src of the copy) can be colored while the other can't?
I also don't understand where all these copies come from (except possible
as a result of the definition of live range Chatin uses..see above 
disscussion).  The other obvious cases are silly copies generated by
the code generator because it needs to preserve the original register
contents.  We solve this problem by a lot of special case code in the
code generator so that it never generates useless copies.  If we didn't
do this we would get extra demand on the scratch register (remember that
Chow partitions the register set) which could cause extra spills, but
I think a smart peephole would be able to clean that up.  It would
seem that Chatin could so subsumtion after coloring (assuming I'm right
that subsumption has no positive effect on colorability).

>>If you have a zillion registers,
>>you never spill with either method, so they're equal in that case.
>
>Chaitin won't spill, but Chow will in some cases.  A very long live range,
>with very few references, will be split and some portions spilled.
>Additionally, the lack of coalescing in Chow causes different code (extra
>copies).  Finally, the coarser interference graph and inferior coloring
>heuristic produce worse colorings.  Practically (less than a zillion
>registers), this means that Chow will spill when Chaitin can still color.

If you had a zillion registers Chow won't spill.  Chow does not spilt
ranges until after coloring fails (Chow P.81).
Chow claims to be able to deal with smaller than basic block granularity,
by just splitting the basic block in sub blocks (see Chows thesis P.73).
I certainly wouldn't defend this as a good solution. Chatin's fine-grained
interference is clearly better.
As previously discussed, as far as I know Chow doesn't need coalescing.

In terms of working with a machine with few registers, I think Chatin
has the advantage because he doesn't partition the register set.
Chatin (or any post-generation coloring) allows you to look at the
whole problem of allocation at once. The other alternative is McCusisk's
variable partition, or Catells prepass to guess what scratch registers
the codegen will need.  Neither of these alternatives account for
the possibility of different code sequences you can generate based
on the availability of extra scratch registers (because Catell regenerates,
he could use extra registers, but he has no cost analysis to
determine whether to leave extra registers free).  Note that this
argument ignores the problem of constrained register assignment.

The fact that the register usage is constrained (on the 86-family) is a 
problem no matter which technique you use.  Our solution in the framework
of Chow was twofold: first you must constrain each nodes to
interfere with whatever fixed registers its range interferes with.
This is no different that the solution proposed by Chatin.
The other part is to give each range its "best" register.  I didn't
write the code, so I can't say how much of this we do, but clearly
Chow has a slight advantage here because register are allocated in
"maximum benefit" order (so the maximum benefit variable gets it first
choice of registers).  An obvious hack to Chatin would be to add
an extra pass to reassign the colors in the same maximum benefit order.
Or maybe you could always push nodes on the stack in spillcost order
(even if they're unconstrained)..or..I'm not familiar enough with
Chatin to really know.  Even if you did this you still have to deal
with two problems: do you assume you have an infinite number of
registers in each constraint class, and if you do how do you correct
the code when you have to spill (which on the 86/286 is often).
Our reason for using Chow in the first place was because this problem
seemed very difficult.

On the 386 where the register set is small and the constraint problem
isn't so bad, Chatin may well be a good choice.  I belive GCC (Gnu) uses
some kind of post-generation coloring (if they're really like Davidson-
Fraser, then it's probably Frieberghouse's use count scheme).  In the
little fiddling I did with GCC it generated quite good code.  From
my point of view the advantage of Chatin (or any post-generation scheme) 
is in avoiding partitioning and giving more accurate cost analysis.
The latter is because, the 386 being a CISC machine has complex addressing
modes and alternate instruction sequences that demand different number
of registers, so determining the benefit of register assigment by
scanning il (as we do with Chows pre-generation scheme) make cost 
analysis difficult.

I think the Chow .VS. Chatin could be summarized as:
Chow:	pre-generation(color on intermedite, partitioned register set)
	goal directed allocation
	uses range splitting
Chatin: post-generation (color on instructions, uniform register usage)
	goal directed spilling.
	range is true live range, but no other splitting done.
And the questions become:
Which method has better information to do the cost analysis?
Which method deals best with the "registers affect the code, code affects 
  the registers" probem?
When does spilling give sub-optimial results?
What is the effect of range splitting?
etc.

		- Bob Scheulen
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.

larus@primost.wisc.EDU (James Larus) (11/28/89)

Just a note on Bob Scheulen's (microsoft!bobs@beaver) article.  There's no
reason why Chow's algorithm can't be post-generation (i.e., allocate register
for the actual instruction set).  I used it that way in the SPUR Lisp compiler.
Of course, assembly code for RISC machines looks a lot like intermediate code.

/Jim
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.

cik@l.cc.purdue.edu (Herman Rubin) (02/25/90)

In article <1990Feb20.155619.3121@esegue.segue.boston.ma.us>, napi@rangkom.MY (Mohd Hanafiah b. Abdullah) writes:
> How does one perform register allocation on scalar variables that may
> be pointed to by others?
> 
> I am curious because, a pointer assumes that the variable it points to
> resides in memory, but in actuality the variable resides in a register.
> 
> I am sure there is a solution to this problem and has been asked before
> in the newsgroup, but could you please e-mail the answer to me anyway?

Unfortunately, this is not the case.  I have frequently wanted to do this,
and I have no difficulty using machine language.  The only mainframes whose
instruction set I have known which are appropriately set up in hardware for
this are the long obsolete UNIVAC 1108 and 1110.  There are some I have seen
for which this can be done with difficulty, but it would not be faster than
keeping the items in memory.
[The PDP-10/DEC-20 has addressable registers as well, but as pointed out
elsewhere that's only a small part of the problem.  As soon as you call
another routine, it's liable to save the register and put something else
there, and you lose.  -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.

wendyt@cs.washington.edu (Wendy Thrash) (02/27/90)

>[The PDP-10/DEC-20 has addressable registers as well, but as pointed out
>elsewhere that's only a small part of the problem.  As soon as you call
>another routine, it's liable to save the register and put something else
>there, and you lose.  -John]

The Pyramid architecture (Pyramid Technology) combines addressable registers
(which take a bit of work to address, but not _too_ much) with register
windows.  As long as variables that have had their addresses taken are
assigned to parameter registers or local registers, but not temporary
registers, their registers are not accessible to called routines.  (More
precisely, they cannot be accessed in any quasi-legitimate way by a
subroutine, and certainly will not be accessed by the compilers.)

I believe their architecture solves the problems discussed here, but I've been
told that the hardware cost to make this register addressing work is
significant.  (or perhaps it was "enormous")

Disclaimer: Perhaps I should say "our" architecture, since I may technically
still be an employee of Pyramid Technology, on leave.  Maybe my old boss will
see this and let me know whether I still "work" there. :-)
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.

Moss@cs.umass.edu (Eliot Moss) (02/27/90)

In article <1990Feb20.155619.3121@esegue.segue.boston.ma.us> napi@rangkom.MY (Mohd Hanafiah b. Abdullah) writes:

  How does one perform register allocation on scalar variables that may
  be pointed to by others?

  I am curious because, a pointer assumes that the variable it points to
  resides in memory, but in actuality the variable resides in a register.

Well, you either do not allocate it to a register, or you propagate its value
to its memory home whenever (a) it may have changed since the last time you
did so, and (b) there is a possibility that the memory value may be picked up
through an aliasing pointer. If (a) and (b) occur together often enough, then
there is little benefit to putting the variable in a register, but if they
occur rarely compared with use of the variables, then there may be benefit. Of
course, you could use a machine such as the DEC PDP-10 where the registers
*are* addressable memory (the first 16 locations) ... :-)      Eliot
--

		J. Eliot B. Moss, Assistant Professor
		Department of Computer and Information Science
		Lederle Graduate Research Center
		University of Massachusetts
		Amherst, MA  01003
		(413) 545-4206; Moss@cs.umass.edu
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.

dds@cc.ic.ac.uk (Diomidis Spinellis) (02/28/90)

In article <1990Feb24.210757.4202@esegue.segue.boston.ma.us> cik@l.cc.purdue.edu (Herman Rubin) writes:
>In article <1990Feb20.155619.3121@esegue.segue.boston.ma.us>, napi@rangkom.MY (Mohd Hanafiah b. Abdullah) writes:
>> How does one perform register allocation on scalar variables that may
>> be pointed to by others?
[...]
>and I have no difficulty using machine language.  The only mainframes whose
>instruction set I have known which are appropriately set up in hardware for
>this are the long obsolete UNIVAC 1108 and 1110. 
[...]
>[The PDP-10/DEC-20 has addressable registers as well, but as pointed out
>elsewhere that's only a small part of the problem.  As soon as you call
>another routine, it's liable to save the register and put something else
>there, and you lose.  -John]

Addressing registers is possible in the Texas Instruments TM 990 series
of minis and TMS-9900, TMS-9950 microprocessors.  All 16 general
purpose registers are in a memory block pointed by a single processor
register the workspace pointer.  With some care one can implement
register windowing.  Saving the registers is then just a change of the
workspace pointer; pointers that point to registers in previous stack
frames still point to the correct object.  On the 9995 the registers can
be located on on-chip RAM, so the performance is not affected by the cost
of external memory accesses.

--
Diomidis Spinellis                  Internet:                 dds@cc.ic.ac.uk
Department of Computing             UUCP:                    ...!ukc!iccc!dds
Imperial College                    JANET:                    dds@uk.ac.ic.cc
London SW7 2BZ

-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.