hankd@pur-ee.UUCP (Hank Dietz) (10/11/88)
In article <6011@june.cs.washington.edu>, pardo@june.cs.washington.edu (David Keppel) writes: > In article <9864@cup.portal.com> bcase@cup.portal.com writes: > >[ "lazy" used to be called "dribble back" ] > > I believe that the term was coined by Richard L. Sites > in his seminal paper from the Caltech VLSI conference > of 1979, _How to use 1000 Registers_. Good paper. > Read it. I agree that this is a very interesting paper, but it really isn't all that pertinent to the discussion of register windows or of lazy ops... in the paper (as I read it :-) Sites is actually trying to extend the utility of register-like memory with the concept of a generalized Short Term Memory (STM) cell. His STM cells basically look like cache cells, with both data and address fields, but are referenced and managed like registers -- by register names. Unfortunately, this does not help much. The problem with the STM as proposed by Sites is that you can't put anything in an STM that you couldn't put in a conventional register. For example: int i, j, *p; if (rand() & 1) p = &i; else p = &j; i = 5; j = 6; *p = 7; printf("%d %d %d\n", i, j, *p); In this code, you can't keep the values of i, j, and *p in registers across the statement "*p = 7;" because if you did, you wouldn't get the right value from either the register for i or the register for j. The printf should result in either: 5 7 7 or: 7 6 7 but you will not get those answers if you try to keep any of i, j, and *p in registers (or STMs) across "*p = 7;". The problem is that i and j are AMBIGUOUSLY ALIASED with *p. It might seem an obscure example, but it is actually very common -- happening with many array references, pointer references, and references to call-by-address function arguments. Step back a moment. Conventional machines don't just have registers... they also have a thing called cache. Hmm. Putting aliased objects in cache is Ok, because they will associatively access the same value for references to the same address. The problem is that you don't know where in cache each value comes from... so some things will be bumped-out by mistake, you don't get to use (short, as compared to memory addresses) register names for accessing values, etc. Now that I've told you what's wrong and I've planted the seed of how to solve the problem, it is time to admit that Chi's and my "mysterious" CReg invention I've mentioned a number of times in the net news is the solution to this problem. Quite literally, a CReg cell is an STM cell, but unlike STMs, CRegs are set associative. "Like a cache?", you say. No. CRegs are associative in sets grouped by REGISTER NAME, not by memory address as in a cache. For example, a machine with 16 CRegs might associate them as 4 sets of 4: {0, 1, 2, 3}, {4, 5, 6, 7}, {8, 9, 10, 11}, and {12, 13, 14, 15}. Now, if some items are ambiguously aliased, I simply make sure the compiler puts them in registers (CRegs, actually) within the same set -- using conventional load operations. Ah, but they don't act like conventional registers because they are associative... for example, suppose that CReg 0 holds i's value and address and CReg 1 is to be loaded with *p's value and address. If *p's address just happens to be the same as i's address, the load doesn't cause a memory reference: the value of CReg 0's data field is copied into CReg 1's data field. Suppose next that another value is placed in CReg 0; if CReg 1's address field is the same as that of CReg 0, then the value is also placed in CReg 1's data field. *ANYTHING* can be placed in a CReg, and the values are automatically kept coherent (provided the values are non-volatile, as X3J11 would put it). Incidentally, lazy stores also become a piece of cake to implement in this environment... and, of course, you don't need store instructions very often because putting something in a CReg IMPLIES the place in memory where that value should be spilled/stored-back to. You say 16 of these CRegs isn't enough, especially if there will not be any (data) cache? Consider this. As reported earlier, Chi and I did some big simulations and found that 10 registers were usually enough to avoid spills if your compiler really did things right, and only about 20% of all references are ambiguously aliased in most programs, hence about 12 or 13 CRegs should be plenty. Our (admitedly preliminary) results bear this out. (We can discuss the use of CRegs for instructions later....) So, how much do CRegs help? Well here's a little table for a C quicksort benchmark with data CRegs (static numbers for a simple RISCish processor): Regs Regs CRegs Only + Cache Only Total instruction words 94 94 50 Total memory references (min) 157 124 72 Total memory references (max) 157 157 80 Oh well. I guess now I've "let the cat out of the bag" (what an odd expression...)... anyway there will be a paper on CRegs and the associated compiler technology (alias set computations) next month at the Supercomputing '88 conference and other materials on CRegs will be available from Purdue shortly. We're currently working on the design of a large scale multiprocessor using CRegs. Chi and I were going to patent the invention, but that's yet another academic horror story, and in any case we can talk freely about CRegs now. __ /| _ | | __ / | Compiler-oriented / |--| | | | | Architecture / | | |__| |_/ Researcher from \__ | | | \ | Purdue \ | \ \ \ \ hankd@ee.ecn.purdue.edu office: (317) 494 3357