[comp.arch] CRegs

hankd@pur-ee.UUCP (Hank Dietz) (10/11/88)

In article <6011@june.cs.washington.edu>, pardo@june.cs.washington.edu (David Keppel) writes:
> In article <9864@cup.portal.com> bcase@cup.portal.com writes:
> >[ "lazy" used to be called "dribble back" ]
> 
> I believe that the term was coined by Richard L. Sites
> in his seminal paper from the Caltech VLSI conference
> of 1979, _How to use 1000 Registers_.  Good paper.
> Read it.

I agree that this is a very interesting paper, but it really isn't all that
pertinent to the discussion of register windows or of lazy ops...  in the
paper (as I read it :-) Sites is actually trying to extend the utility of
register-like memory with the concept of a generalized Short Term Memory
(STM) cell.  His STM cells basically look like cache cells, with both data
and address fields, but are referenced and managed like registers -- by
register names.  Unfortunately, this does not help much.

The problem with the STM as proposed by Sites is that you can't put anything
in an STM that you couldn't put in a conventional register.  For example:

	int i, j, *p;

	if (rand() & 1) p = &i; else p = &j;
	i = 5;
	j = 6;
	*p = 7;
	printf("%d %d %d\n", i, j, *p);

In this code, you can't keep the values of i, j, and *p in registers across
the statement "*p = 7;" because if you did, you wouldn't get the right value
from either the register for i or the register for j.  The printf should
result in either:

	5 7 7

or:

	7 6 7

but you will not get those answers if you try to keep any of i, j, and *p in
registers (or STMs) across "*p = 7;".  The problem is that i and j are
AMBIGUOUSLY ALIASED with *p.  It might seem an obscure example, but it is
actually very common -- happening with many array references, pointer
references, and references to call-by-address function arguments.

Step back a moment.  Conventional machines don't just have registers...  they
also have a thing called cache.  Hmm.  Putting aliased objects in cache is
Ok, because they will associatively access the same value for references to
the same address.  The problem is that you don't know where in cache each
value comes from...  so some things will be bumped-out by mistake, you don't
get to use (short, as compared to memory addresses) register names for
accessing values, etc.

Now that I've told you what's wrong and I've planted the seed of how to solve
the problem, it is time to admit that Chi's and my "mysterious" CReg
invention I've mentioned a number of times in the net news is the solution
to this problem.  Quite literally, a CReg cell is an STM cell, but unlike
STMs, CRegs are set associative.  "Like a cache?", you say.  No.  CRegs are
associative in sets grouped by REGISTER NAME, not by memory address as in a
cache.

For example, a machine with 16 CRegs might associate them as 4 sets of 4:
{0, 1, 2, 3}, {4, 5, 6, 7}, {8, 9, 10, 11}, and {12, 13, 14, 15}.  Now, if
some items are ambiguously aliased, I simply make sure the compiler puts
them in registers (CRegs, actually) within the same set -- using conventional
load operations.  Ah, but they don't act like conventional registers because
they are associative...  for example, suppose that CReg 0 holds i's value
and address and CReg 1 is to be loaded with *p's value and address.  If *p's
address just happens to be the same as i's address, the load doesn't cause a
memory reference:  the value of CReg 0's data field is copied into CReg 1's
data field.  Suppose next that another value is placed in CReg 0; if CReg 1's
address field is the same as that of CReg 0, then the value is also placed
in CReg 1's data field.  *ANYTHING* can be placed in a CReg, and the values
are automatically kept coherent (provided the values are non-volatile, as
X3J11 would put it).  Incidentally, lazy stores also become a piece of cake
to implement in this environment...  and, of course, you don't need store
instructions very often because putting something in a CReg IMPLIES the place
in memory where that value should be spilled/stored-back to.

You say 16 of these CRegs isn't enough, especially if there will not be any
(data) cache?  Consider this.  As reported earlier, Chi and I did some big
simulations and found that 10 registers were usually enough to avoid spills
if your compiler really did things right, and only about 20% of all
references are ambiguously aliased in most programs, hence about 12 or 13
CRegs should be plenty.  Our (admitedly preliminary) results bear this out.
(We can discuss the use of CRegs for instructions later....)

So, how much do CRegs help?  Well here's a little table for a C quicksort
benchmark with data CRegs (static numbers for a simple RISCish processor):

				Regs	Regs	CRegs
				Only	+ Cache	Only
Total instruction words		94	94	50
Total memory references (min)	157	124	72
Total memory references (max)	157	157	80

Oh well.  I guess now I've "let the cat out of the bag" (what an odd
expression...)... anyway there will be a paper on CRegs and the associated
compiler technology (alias set computations) next month at the
Supercomputing '88 conference and other materials on CRegs will be available
from Purdue shortly.  We're currently working on the design of a large scale
multiprocessor using CRegs. Chi and I were going to patent the invention, but
that's yet another academic horror story, and in any case we can talk freely
about CRegs now.

     __         /|
  _ |  |  __   / |  Compiler-oriented
 /  |--| |  | |  |  Architecture
/   |  | |__| |_/   Researcher from
\__ |  | | \  |     Purdue
    \    |  \  \
	 \      \   hankd@ee.ecn.purdue.edu   office: (317) 494 3357