aglew@oberon.crhc.uiuc.edu (Andy Glew) (07/05/90)
How often are quantities that *might* be allocated to registers *not* allocated to registers, because of the remote *possibility* of aliasing via some circuitous pointers? Hare brained idea: allocate quantities that *might* be aliased to registers anyway. Provide a register to contain the true memory address of the aliased quantity, which causes a trap when the address is accessed (or automagically forwards to/from the register). Not only are aliasing problems avoided, but you've got a set of data address breakpoint registers as well! (ie. this technique could be experimentally evaluated on machines that have data address breakpoints). Evaluation framework: Na is the number of such data address breakpoint registers provided. V(1)..V(Na) are possibly aliased values placed in registers. Tr is the performance (CPI) assuming V(1)..V(Na) are all in registers, and no aliasing occurs. Tm is the performance (CPI) assuming V(1)..V(Na) are placed in memory whenever there is a possibility of aliasing. Fa is the frequency of actual aliasing Ca is the cost of handling the aliasing, using the data address breakpoints (large in SW, less but still significant in HW) Obviously, Tr = Tr( V(1)..V(Na) ) = Tr(Na), and so on. Then we compare to Tr(1-Fa) + Tr(Fa)(1+Ca) to Tm for a particular Na. Unfortunately, I don't have any idea of what the parameters are like. I have the impression that Fa is low, and I can ballpark Ca. But what is Tm/Tr? Given the amount of fuss (number of papers) about aliasing in compilers, Tm/Tr would seem to be large, but I'm not sure. Especially as Na is varied? (Well, let's see if I've embarassed myself as much as I did in my last posting, where I said that 24-10=4. ;-} ) -- Andy Glew, aglew@uiuc.edu -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
rfg@ncd.com (Ron Guilmette) (07/07/90)
In article <1990Jul05.155937.13214@esegue.segue.boston.ma.us> you write: >How often are quantities that *might* be allocated to registers *not* >allocated to registers, because of the remote *possibility* of >aliasing via some circuitous pointers? > > Hare brained idea: allocate quantities that *might* be aliased to >registers anyway. Provide a register to contain the true memory >address of the aliased quantity, which causes a trap when the address >is accessed (or automagically forwards to/from the register). Not >only are aliasing problems avoided, but you've got a set of data >address breakpoint registers as well! (ie. this technique could be >experimentally evaluated on machines that have data address >breakpoints). Actually, this sounds like a marvelous idea to me! I guess that the only drawback would be the cost of the additional hardware to support the data breakpoints, but (as you noted) some machines already have such facilities, so it can't be THAT expensive. Offhand, I'd say that this could potentially work really well, except that I think that you would want (a) a set of data-breakpoint registers which had a one-to-one relationship with the actual registers (but you probably only need to cover about half of them for a typical 32 register RISC machine) and (b) special load and store instructions for potentially aliased data values. The special load instruction would not only load the data value into the specified data register, it would also force the address used for the load into the (associated) data breakpoint register. Finally, it would set a "valid" bit associated with the given data-breakpoint register. The special store instruction would simply, unset the "valid" bit for the specified data-breakpoint register (thus preventing further data acceses to the address indicated in the given breakpoint-register from being treated as if the data-value in the data-register were still the valid (and most up-to-date one). The idea of getting the hardware to automagically satisfy references to temporarily "register resident" data values directly from the registers would eliminate the overhead of the (software) handing of breakpoints, and would make this idea even better. As I said, I think that it would *not* be necessary to associate one data- breakpoint register with each actual data register. Keep in mind that registers often hold computational results as well as simple copies of variable values. (And don't forget the registers like PC, SP, and FP. These don't need the special treatment either.) Having a machine that did this stuff would effectively render alias analysis (in compilers) "impotent and obsolete". > Given the amount of fuss (number of papers) about aliasing in compilers... It is certainly a pain in the $#^%$$^$%$ to do good alias analysis statically in software (as opposed to dynamically and in hardware) especially for everybody's current favorite language, C. Further, I'd have to guess that static (compile-time) mechanisms for dealing with (i.e. preventing) aliasing problems could NEVER match the performance characteristics of the hardware- based dynamic scheme you have proposed. In short... I like it. Too bad I don't build chips! I just program the suckers. :-) -- // Ron Guilmette (rfg@ncd.com) -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
alvitar@xavax.com (Phillip Harbison) (07/09/90)
In article <1990Jul05.155937.13214@esegue.segue.boston.ma.us> aglew@oberon.crhc.uiuc.edu (Andy Glew) writes: > How often are quantities that *might* be allocated to registers *not* > allocated to registers, because of the remote *possibility* of > aliasing via some circuitous pointers? > > Hare brained idea: allocate quantities that *might* be aliased to > registers anyway. Provide a register to contain the true memory > address of the aliased quantity, which causes a trap when the address > is accessed (or automagically forwards to/from the register). This sounds like an interesting idea. What if we just added a tag to each register in the CPU. The tag would store the address which the register represents. A valid bit would be set if the tag was in use, and cleared for empty or scratch registers. A dirty bit could be set when the register was not consistent with memory. All load/store instructions would have to check the tags, so the tags would have to be fully associative. Registers with the dirty bit set could be automatically written back when they were reused. -- Live: Phil Harbison, Xavax, P.O. Box 7413, Huntsville, AL 35807 Uucp: alvitar@xavax.com Bell: 205-883-4233, 205-880-8951
baum@Apple.COM (Allen J. Baum) (07/10/90)
[] >In article <1990Jul9.073203.3358@xavax.com> alvitar@xavax.com (Phillip Harbison) writes: .... >This sounds like an interesting idea. What if we just added a tag >to each register in the CPU. The tag would store the address which >the register represents. Then again, why not just use ATT CRISP chips? Their 'register' file is really a 'top-of-stack' cache. CRISP is a memory-to-memory architecture, so there really are no registers, but one of its four addressing modes is stack-pointer-relative, with a 5 bit displacement, which effectively gives you 32 registers. If you go indirect through these 'registers', and it hits in the cache, it gets it from the 'registers' automagically. Procedure calls can advance the stack pointer, which gives you variable size register windows. -- baum@apple.com (408)974-3385 {decwrl,hplabs}!amdahl!apple!baum
preston@titan.rice.edu (Preston Briggs) (07/10/90)
Andy Glew writes: >> Hare brained idea: allocate quantities that *might* be aliased to >>registers anyway. Provide a register to contain the true memory >>address of the aliased quantity, which causes a trap when the address >>is accessed (or automagically forwards to/from the register). Not >>only are aliasing problems avoided, but you've got a set of data >>address breakpoint registers as well! (ie. this technique could be >>experimentally evaluated on machines that have data address >>breakpoints). And Ron Guilmette replied: >Actually, this sounds like a marvelous idea to me! ... >Having a machine that did this stuff would effectively render alias analysis >(in compilers) "impotent and obsolete". I disagree. Alias analysis has many uses during optimization. For example, during value numbering (with constant folding), this example might arise: *p = 5; *q = -5; -- does this change *p's value or not? if (*p > 0) { ... } else { ... } With adequate alias analysis, we might be able to determine that p and q don't point at the same location; therefore, we'd be able to eliminate the test and the entire else clause. With inadequate alias analysis (or none at all), we must be conservative and emit code for the test. It's easy to construct similar examples with common subexpression elimination, loop invariant code motion, &c. The register allocation question is a little more difficult. But consider that many of the values kept in registers are the result of common subexpression elimination or loop invariant code motion and you can get a hint of the problem. Here's a (very contrived) example: do { *p = a[*q]; p++; } while (p < pp); Now what we'd really like is to hold a[*q] in a register throughout the loop, giving r = a[*q]; do { *p = r; p++; } while (p < pp); But what if p = q at some point during the loop? Boom! Note that we're not keeping *q in a register, so there's no hint from the proposed trap hardware. Basically, it seems as though we can't keep the results of any expression that includes aliased elements in a register. More to the point, we can't do any (?) optimization for such expressions. Phil Harbison writes: >This sounds like an interesting idea. What if we just added a tag >to each register in the CPU. The tag would store the address which >the register represents. A valid bit would be set if the tag was >in use, and cleared for empty or scratch registers. A dirty bit >could be set when the register was not consistent with memory. All >load/store instructions would have to check the tags, so the tags >would have to be fully associative. Registers with the dirty bit >set could be automatically written back when they were reused. The whole scheme strikes me as "cache." Hank Dietz has argued (I believe) that the decision to keep a variable in cache or register should be based on whether or not it is aliased. For data breakpoints, perhaps there's an easy modification of cache hardware? (BTW, a similar post will appear someday on comp.compilers, but it's been delayed somewhat.) -- Preston Briggs looking for the great leap forward preston@titan.rice.edu
chc@briar.Philips.Com (Chi-Hung Chi) (07/10/90)
>Phil Harbison writes: >>This sounds like an interesting idea. What if we just added a tag >>to each register in the CPU. The tag would store the address which >>the register represents. A valid bit would be set if the tag was >>in use, and cleared for empty or scratch registers. A dirty bit >>could be set when the register was not consistent with memory. All >>load/store instructions would have to check the tags, so the tags >>would have to be fully associative. Registers with the dirty bit >>set could be automatically written back when they were reused. This is what my thesis advisor (Hank Dietz) and I call "CReg" structure (Cache Register). Two years ago, we had a preliminary paper about CReg which appeared in Supercomputing '88. Some updated technical reports about the implementation and allocation of CRegs and its extension to multiprocessors are under preparation and should be ready very soon. >>The whole scheme strikes me as "cache." Hank Dietz has argued >>(I believe) that the decision to keep a variable in cache or >>register should be based on whether or not it is aliased. >> >>For data breakpoints, perhaps there's an easy modification >>of cache hardware? This is the primary functional difference between registers and cache from a compiler's viewpoint. See my thesis for details. - Chi-Hung Chi (chc@philabs.philips.com)
aglew@oberon.crhc.uiuc.edu (Andy Glew) (07/11/90)
I doubt that you would want to add a tag to all registers. It could be a distribution property - one breakpoint enough to cover 90% of the not-provably-not-aliasing values. "Could be" - needs quantitaive analysis. I'm hoping that some compiler jock may already have some numbers. -- Andy Glew, aglew@uiuc.edu
mash@mips.COM (John Mashey) (07/15/90)
In article <1990Jul06.194618.4957@esegue.segue.boston.ma.us> rfg@ncd.com (Ron Guilmette) writes: >> Hare brained idea: allocate quantities that *might* be aliased to >>registers anyway. Provide a register to contain the true memory >>address of the aliased quantity, which causes a trap when the address >>is accessed (or automagically forwards to/from the register). Not >>only are aliasing problems avoided, but you've got a set of data >>address breakpoint registers as well! (ie. this technique could be >>experimentally evaluated on machines that have data address >>breakpoints). Some of this sounds interesting, and some may be useful in the future, for various applications. However, one must be careful, especially in a world of LIW, super-scalar, super-pipelined, and super-scalar-super- pipelined multiple-issue machines (i.e., all RISCs that expect to be competitive in the next few years), that you don't stick something in a critical path that blows your cycle time by 50%.... Maybe this is a good time to expound a little on a related widespread fantasy chat might be called: When You Have A Zillion Transistors On A Chip, All Of Your Problems Go Away. Most of the following is over-simplified discussion of EE stuff from a software guy's viewpoint; maybe some real VLSI types will corect goofs and expound more on this topic: It is clear that more space on a die help a lot, and they let you do things like: bigger on-chip caches a wonderful thing: regular, dense, and transistors, rather than wires this includes: I-caches, D-caches, TLBs, branch-target buffers, pre-decoded instruction buffers, etc. monster-fast FP units and other arithmetic units for some kinds of units (like multipliers), more space ==> faster, reduces latency of operation, always a good thing. more copies of functional units, or more pipelining increases the repeat rate for an operation, which may help some kinds of things. wider busses, increasing intra-chip bandwidth. On the other hand, there are some nasty facts of life (for CMOS, anyway): 1) FAN-OUT is not free. Put another way, the more loads on a bus, the slower it is. Bigger transistors help, up to a point, but what usually happens is that you must cascade the gates to keep the total delay minimized. 2) FAN-IN is not free either. 3) WIRES DON'T SHRINK as fast as transistors (because the resistance increases as they get narrower). Hence, as you do shrinks to increase the speed, and get more on a chip, this means the wires can gobble up more of the space. Put another way: 1) The more things listening to you, the slower you are. 2) The more things you listen to, the slower you are. 3) Don't think you can run monster busses all over the place for free. All of this says that people STILL have to think very hard about delays in the critical paths in a CPU. The faster you go, the more you're likely to be doing more things in parallel, but if you're not careful, these factors can bite you badly, especially in a single-chip design that can only dissipate so much heat. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086 -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
hankd@dynamo.ecn.purdue.edu (Hank Dietz) (07/15/90)
In article <1990Jul05.155937.13214@esegue.segue.boston.ma.us> aglew@oberon.crhc.uiuc.edu (Andy Glew) writes: >How often are quantities that *might* be allocated to registers *not* >allocated to registers, because of the remote *possibility* of >aliasing via some circuitous pointers? > > Hare brained idea: allocate quantities that *might* be aliased to >registers anyway. Provide a register to contain the true memory >address of the aliased quantity, which causes a trap when the address >is accessed (or automagically forwards to/from the register). If you look back in the IEEE proceedings from Supercomputing '88, you'll see a paper that I and Chi-Hung Chi did addressing exactly this issue (the title is something Like "CRegs: A New Type of Memory for Accessing Arrays and Pointers"). The CReg mechanism augments registers with address fields which are used associatively to keep the register contents coherent... we also pointed-out that with proper compiler analysis, the associativity could be limited to a rather small set size without ill effect (e.g., a set size of 4 rather than fully associative -- this keeps the speed of CRegs competitive with that of ordinary registers). >Evaluation framework: ... >Unfortunately, I don't have any idea of what the parameters are like. Some preliminary results on CRegs appear in the aforementioned paper. The results were pretty good. However, some of the most significant benefits are not the obvious ones. E.g., you don't need to do explicit stores as often (ever?) -- you can even use a dirty bit and a lazy store mechanism. Likewise, you don't need as many loads, and you certainly don't need to fetch as many addresses, which reduces the INSTRUCTION FETCH bandwidth required. -hankd@ecn.purdue.edu PS: Chi and I had actually tried to get this stuff patented back in 1987, and it looked like we had some very nice claims, but Purdue University basically put it on hold until it had been over a year since public disclosure... oh well. PPS: We've also looked at using CRegs for instructions (vs. data). -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
torbenm@diku.dk (Torben [gidius Mogensen) (07/15/90)
rfg@ncd.com (Ron Guilmette) writes: >In article <1990Jul05.155937.13214@esegue.segue.boston.ma.us> you write: >>How often are quantities that *might* be allocated to registers *not* >>allocated to registers, because of the remote *possibility* of >>aliasing via some circuitous pointers? >> >> Hare brained idea: allocate quantities that *might* be aliased to >>registers anyway. Provide a register to contain the true memory >>address of the aliased quantity, which causes a trap when the address >>is accessed (or automagically forwards to/from the register). Not >>only are aliasing problems avoided, but you've got a set of data >>address breakpoint registers as well! (ie. this technique could be >>experimentally evaluated on machines that have data address >>breakpoints). >Actually, this sounds like a marvelous idea to me! >I guess that the only drawback would be the cost of the additional hardware >to support the data breakpoints, but (as you noted) some machines already >have such facilities, so it can't be THAT expensive. There are two other (simpler) ways to circumvent the aliasing problem in a processor architecture. One is to assign fixed addresses to registers, e.g. let the first N addresses refer to the N registers. The processor would recognize that all the top bits in the address are 0, and use registers instead. I believe this was done in some PDP 11s. [actually, the PDP-6/10/20 -John] One is have no data registers at all, but keep everything in memory. This is not as costly as it may sound, as cache memory can be accessed (almost) as fast as register memory. An added benefit is fast process switch. Register allocation becomes a non-issue and register saving/loading in process calls is done invisibly on a "call-by-need" basis, as cache cells are overwritten. The only problem is address size in instructions, but this can be solved in several ways. One is to use a stacktop relative addressing similar to what Simon Peyton Jones suggested recently. Another way is the method used in the TI 99000 series: use a pointer to a block of memory that can be addressed using short addresses. This block was actually called the "register block". This series of processors was actually quite interesting, and it is a shame that TI dropped it in favor of the 8086. (Some people may remember the TI 99/4 home computer, which used a 99000 processor. It was notoriously slow, but this was (I have heard) mainly due to bad programming and system design). Torben Mogensen (torbenm@diku.dk) -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
mike@vlsivie.at (Inst.f.Techn.Informatik) (07/16/90)
In article <1990Jul06.194618.4957@esegue.segue.boston.ma.us>, rfg@ncd.com (Ron Guilmette) writes: > In article <1990Jul05.155937.13214@esegue.segue.boston.ma.us> you write: > >... > > Hare brained idea: allocate quantities that *might* be aliased to > >registers anyway. Provide a register to contain the true memory > >address of the aliased quantity, ... > > Actually, this sounds like a marvelous idea to me! It most certainly is a marvelous idea! And it works! And it has been implemented. Here a short description: As opposed to the suggestions made in these postings, there is no overhead vor loading the addresses associated with extra instructions into breakpoint registers. It is done automagically. It also uses a dynamic allocation scheme for the register values and can use *LOTS* of registers (Anywhere from 8 KB to 64KB). IT EVEN HAS A NAME: it's called ``cache memory''. Access times are short and if integrated on the chip can be as fast as a register access. bye, mike :-) Michael K. Gschwind mike@vlsivie.at Technical University, Vienna mike@vlsivie.uucp Voice: (++43).1.58801 8144 e182202@awituw01.bitnet Fax: (++43).1.569697 -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
aglew@dwarfs.crhc.uiuc.edu (Andy Glew) (07/16/90)
>IT EVEN HAS A NAME: it's called ``cache memory''. Access times are short >and if integrated on the chip can be as fast as a register access. As I am sure Mr. Gschwind knows, cache implemented on-chip is not as fast as register access, chiefly because cache is seldom multiported, as well as for other reasons such as loading, and the number of addresses that can be specified in an instruction. If you will, we are talking about the space formed by the cross-product of the following parameters: ADDRESSING = by memory address and/or by register number MULTIPORTING = 1r/w .. 2r/1w ... SIZE/SPEED = large/slow .. small/fast The standard configurations are CACHE = addressed by memory address (associatively) single-ported large/slow (at least wrt. registers) REGISTER = addressed by number multiported small/fast What I was asking about, and what Hank Dietz et al, and others, have explored, is whether there are other configurations between these two extremes that might be useful. -- Andy Glew, aglew@uiuc.edu -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.
vestal@src.honeywell.com (Steve Vestal) (07/20/90)
In article <1990Jul15.205606.19343@esegue.segue.boston.ma.us> mike@vlsivie.at (Inst.f.Techn.Informatik) writes: > IT EVEN HAS A NAME: it's called ``cache memory''. Access times are short > and if integrated on the chip can be as fast as a register access. CRegs do have some appeal over caches for hard real-time applications. Cache access times can vary; many developers of hard real-time systems just disable them. In contrast, my impression is that CReg access times can be as predictable as normal register access times. This helps when determining the worst-case execution time for a code fragment. The problem of processor design for reliable hard real-time applications is interesting (if often ignored). Commercial chip developers tend to minimize average execution time instead of minimizing worst-case execution time or minimizing execution time variability. Things like caches, the use of data-dependent algorithms for artihmetic, register windowing in a way that makes procedure call times variable, etc. causes some problems. Does anyone have benchmark results for the current crop of microprocessors where all caching has been disabled? Steve Vestal Mail: Honeywell S&RC MN65-2100, 3660 Technology Drive, Minneapolis MN 55418 Phone: (612) 782-7049 Internet: vestal@src.honeywell.com -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.