kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) (02/06/91)
A question regarding delays when loading words from memory: instruction 0: lw t0,foo instruction 1: instruction 2: instruction 3: When is the data available in t0? If the data word "foo" is in the cache, it is available by instruction 2. If it is not cached and you have to go to memory do you have to wait longer? If you have to wait for memory does the processor stall? Always? Or only if you try to use the results of the "lw" in t0? Thanks for any help. ----------------------------------------------------------------------------- == jeff kenton Consulting at kenton@decvax.dec.com == == (617) 894-4508 (603) 881-0011 == -----------------------------------------------------------------------------
mash@mips.COM (John Mashey) (02/08/91)
In article <540@decvax.decvax.dec.com.UUCP> kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) writes: >A question regarding delays when loading words from memory: > > instruction 0: lw t0,foo > instruction 1: > instruction 2: > instruction 3: > >When is the data available in t0? If the data word "foo" is in the cache, it >is available by instruction 2. If it is not cached and you have to go to >memory do you have to wait longer? If you have to wait for memory does the >processor stall? Always? Or only if you try to use the results of the "lw" >in t0? foo becomes available in instruction 2. if it is a cache miss, the main pipeline stalls, does the refill, and continues, regardless of how far off the usage of foo might occur. Do NOT, repeat do NOT ever assume that in instruction 1, that the data in t0 is that previous to instruction 0. this is "undefined", on purpose, because, for example, MIPS-II processors include the interlock, so that instruction 1, if it uses t0, will stall. (i.e., R6000/R4000). -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
jfc@athena.mit.edu (John F Carr) (02/10/91)
Why did the original design not have the load interlock, and why does the processor stall on a cache miss? Do MIPS-2 processors act the same way on a cache miss? Making the load delay optional increases code density and makes the compiler's job easier. -- John Carr (jfc@athena.mit.edu)
mark@mips.COM (Mark G. Johnson) (02/11/91)
In article <1991Feb9.221451.22230@athena.mit.edu> jfc@athena.mit.edu (John F Carr) writes: > >Why did the original design not have the load interlock, and why >does the processor stall on a cache miss? > Unfortunately this sounds very much like a classroom assignment. Usually "the net" lets such questions go unanswered, lest the poster be robbed of the opportunity of figuring out the answer for her/himself.
jfc@athena.mit.edu (John F Carr) (02/11/91)
In article <45762@mips.mips.COM> cprice@mips.COM (Charlie Price) writes: >> and why does the >>processor stall on a cache miss? Do MIPS-2 processors act the same way on a >>cache miss? >What else is there to do? >The only way not to stall when the instruction or data that you >want isn't available is if you are prepared to forge ahead and >try to execute instructions out of order. I meant cache miss for data, not instruction fetch. If the data being loaded is not in cache, it should be possible to continue anyway with unrelated instructions. This does require parts of the processor to stall when the data is needed if the load takes longer than expected, but the newer MIPS processors do this anyway. With such a design, the assembler would put as many unrelated instructions as possible after a load. I brought this up because I do much of my programming on an IBM RT, which allows 2 outstanding load operations which normally need 5 cycles to complete. If you are accessing slower memory (such as an I/O device), data takes longer to become available but the processor doesn't stop unless it needs to. I was surpised that a newer architecture with a faster clock rate had a simpler way of handling load delay. Over the next few months I will be running some simulations on a MIPS-based workstation. I expect the data size will need to be larger than the cache size, so I will be getting a lot of cache misses. The decision to have the processor wait until data is in cache makes the change when a program exceeds cache size larger than it would otherwise be. I've never designed a microprocessor nor have I ever taken any courses on this subject, but it seems to me that stalling on loads only when the data is needed would have been a better choice. no-ops could be eliminated, performance could be increased for some programs, and it would not have been necessary to change the instruction set later when the load delay time changed. -- John Carr (jfc@athena.mit.edu)
cprice@mips.COM (Charlie Price) (02/12/91)
In article <1991Feb11.043136.14845@athena.mit.edu> jfc@athena.mit.edu (John F Carr) writes: >In article <45762@mips.mips.COM> cprice@mips.COM (Charlie Price) writes: > >>The only way not to stall when the instruction or data that you >>want isn't available is if you are prepared to forge ahead and >>try to execute instructions out of order. > >I meant cache miss for data, not instruction fetch. Clearly it would be faster not to stall on cache misses -- if you can do that at the same speed. I don't design these things, but the folks who do tell me that this is VERY complicated. >I brought this up because I do much of my programming on an IBM RT, which >allows 2 outstanding load operations which normally need 5 cycles to >complete. The RT doesn't have a cache, right? That makes some difference in how you approach loads and stores -- and some difference in the performance of the system. -- Charlie Price cprice@mips.mips.com (408) 720-1700 MIPS Computer Systems / 928 Arques Ave. / Sunnyvale, CA 94086-23650