[comp.arch] One obvious reference for coloring and registers

pcg@cs.aber.ac.uk (Piercarlo Grandi) (01/28/91)

I was rereading the recent register count debate, and I realized that I
had mentioned David Wall on the poverty of conventional static based
register coloring, and then that probably me and Preston Briggs read
more recent work in quite different ways; I have realized that one of
such more recent papers I was unconsciously thinking of all the time,
but I never mentioned explicitly.

It is Chow&Hennessy on priority based register coloring. In my biased
and malicious way of looking at things, I had gotten from it the
impression that it said:

1) static register coloring is baaaad.

2) priorities based on expected frequency of dynamic use should be
attached to values; *after* this is done coloring should be run to
allocate the higher priority values to registers.

3) even pretty simple estimates of the dynamic use of values (assume all
loops run for 10 times) seem to give good results, except for some
cases.

4) on a MIPS R2000 the use of priorities before coloring means that, for
a fairly wide spectrum on nonfloat applications, one gets comparable
performance (but I think not the same code size) with just 8 (4+4) spare
registers instead of the 21 (12+9) available (results for the
intermediate cases are also given).

It is interesting to compare these numbers with my own smaller scale
experiments with the GNU cc register allocator on the same platform.

The paper is in the October 1990 issues of TOPLAS; it was submitted
January 1988. I have not been aware of it until recently therefore, but
I have an interesting speculation about somebody else.

John Mashey had explicitly denied (after I had specifically asked him on
these screens) having any results available about varying the number of
registers used by the MIPS code generator (he said some old experiments
maybe had been done, but had been forgotten). Strange, because one of
the authors of the paper works for MIPSco, John Mashey is personally
thanked by both authors, and in the paper it is written that the results
of the research described in the paper have been incorporated into
MIPSco's commercial compiler suite.

My speculation is: if it had been known earlier that MIPSco knew that
there might be reason to think that about half their register set is
redundant in scalar implementation of their architecture, their having
long term plans to go superscalar would have been made very obvious much
earlier than they wished.

Rumour mongering again... :-)
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

preston@ariel.rice.edu (Preston Briggs) (01/29/91)

pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>...
>I have realized that one of such more recent papers I was unconsciously
>thinking of all the time, but I never mentioned explicitly.
>It is Chow&Hennessy on priority based register coloring.

I wondered and eventually decided we were arguing in different
directions.  I usually think in terms of Chaitin's work.
I apologize for charging off on a tangent.

>In my biased and malicious way of looking at things, I had gotten
>from it the impression that it said:
>1) static register coloring is baaaad.

I'm unclear on what's meant by "static coloring."
Do you mean Chaitin's scheme?  Chow's comparison
of the two schemes is fairly kind to Chaitin.
I didn't see anything to warrent _bad_, much less "baaaad".

Preston Briggs

pcg@cs.aber.ac.uk (Piercarlo Grandi) (02/01/91)

On 28 Jan 91 22:27:04 GMT, preston@ariel.rice.edu (Preston Briggs) said:

pcg> [ ... on the Chow&Hennessy TOPLAS Oct. 1990 paper on priority based
pcg> register coloring ... ]

pcg> In my biased and malicious way of looking at things, I had gotten
pcg> from it the impression that it said: 1) static register coloring is
pcg> baaaad.

preston> I'm unclear on what's meant by "static coloring."  Do you mean
preston> Chaitin's scheme?  Chow's comparison of the two schemes is
preston> fairly kind to Chaitin.  I didn't see anything to warrent
preston> _bad_, much less "baaaad".

Well, you was warned: my way of looking at things is malicious and
biased. I did emphasize the "baaaad" a little too much. :-).

More seriously, the paper itself exists just because the authors thought
they could do better than vanilla register coloring a la Chaitin. They
do start from it -- hey nobody ever denied that coloring is a cool idea
for distributing values to registers, *once you have decided which ones
should be in registers*. The problem is that coloring is a mechanism,
which if used on its own implies a policy of putting into registers
variables that are statically frequently referenced.

The Chow&Hennessy paper shows that having register coloring (or any
other register allocation scheme, I would add, even if coloring is nice)
guided by some predictive caching policy based on estimates of runtime
frequency of reference is vastly preferable, even if the heuristics are
fairly simple minded, and that under it a small number of registers is
sufficient to capture the locality of a fair number of nonfloat
applications.

In one particular point (11.4, "Effects of allocating constants", p.529)
they write, quite interestingly:

  The result for _ccom_ demonstrates the shortcomings on relying on static
  program information in performing optimizations.

Hear hear! :-)

  In _ccom_, there are a number of program loops that seldom interate more
  than once. Our register allocator arbitrarily assumes that each loop is
  executed ten times, so it uses calle-saved registers to keep constants
  occurring inside the loops. Because of the lower number of iterations,
  the gain due to the allocation is not sufficient to offset the costs of
  saving and restoring the calle-saved registers. This effect can be
  prevented by the sue of profile information in performing register
  allocation.

I would say that in this brief quote there is a fractal microcosm of the
entire register policy allocation debate.

I quite like the entire article -- clearly valuable in terms of both
insight and raw data. As to insight, I am especially fond of 6.,
"Differences from Chaitin's approach", p.515, and 11., "Performance
measurements", p.524.

Quantitative architecture indeed.


Now let's switch to my usual speculation: in a sense vanilla register
coloring is perfectly adequate for the MIPS architecture, because it has
so many registers that a predictive policy, for example priorities, is
not that terribly important, as normally one can cache all values
irrespective of frequency of dynamic or static use.

It seems that the main benefit of priority based register coloring is
that it reduces substantially the number of registers needed to achieve
a certain level of performance (this seems to be the conclusion also of
David Wall's results, which also seem to indicate a factor of three in
the reduction for dynamic vs. static), but registers are not a scarce
resource for the MIPS. So in practice the Chow&Hennessy paper is
pointless!

But one can make some amusing hypothesis here:

1) Chow&Hennessy just like to do research for its own sake.

2) Chow&Hennessy were surprised by their own results.

3) Chow&Hennessy were looking forward to superscalar machines, where
minimizing the register usage for one microthread of computation is
important so that renaming or other hardware trickery are not needed to
map a smaller architectural register file onto a larger implementational
one.

I guess all three hypothesises are true are true in varying degrees, as
well as many others.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk