[comp.sys.mips] Load Delays -- a question

kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) (02/06/91)

A question regarding delays when loading words from memory:

	instruction 0:	lw	t0,foo
	instruction 1:
	instruction 2:
	instruction 3:

When is the data available in t0? If the data word "foo" is in the cache, it
is available by instruction 2.  If it is not cached and you have to go to
memory do you have to wait longer? If you have to wait for memory does the
processor stall? Always? Or only if you try to use the results of the "lw"
in t0?

Thanks for any help.


-----------------------------------------------------------------------------
==	jeff kenton		Consulting at kenton@decvax.dec.com        ==
==	(617) 894-4508			(603) 881-0011			   ==
-----------------------------------------------------------------------------

mash@mips.COM (John Mashey) (02/08/91)

In article <540@decvax.decvax.dec.com.UUCP> kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) writes:

>A question regarding delays when loading words from memory:
>
>	instruction 0:	lw	t0,foo
>	instruction 1:
>	instruction 2:
>	instruction 3:
>
>When is the data available in t0? If the data word "foo" is in the cache, it
>is available by instruction 2.  If it is not cached and you have to go to
>memory do you have to wait longer? If you have to wait for memory does the
>processor stall? Always? Or only if you try to use the results of the "lw"
>in t0?

foo becomes available in instruction 2.
if it is a cache miss, the main pipeline stalls, does the refill, and
continues, regardless of how far off the usage of foo might occur.
Do NOT, repeat do NOT ever assume that in instruction 1, that the
data in t0 is that previous to instruction 0.  this is "undefined",
on purpose, because, for example, MIPS-II processors include the interlock,
so that instruction 1, if it uses t0, will stall.  (i.e., R6000/R4000).
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

jfc@athena.mit.edu (John F Carr) (02/10/91)

Why did the original design not have the load interlock, and why does the
processor stall on a cache miss?  Do MIPS-2 processors act the same way on a
cache miss?  Making the load delay optional increases code density and makes
the compiler's job easier.

--
    John Carr (jfc@athena.mit.edu)

mark@mips.COM (Mark G. Johnson) (02/11/91)

In article <1991Feb9.221451.22230@athena.mit.edu> jfc@athena.mit.edu (John F Carr) writes:
   >
   >Why did the original design not have the load interlock, and why
   >does the processor stall on a cache miss?
   >

Unfortunately this sounds very much like a classroom assignment.
Usually "the net" lets such questions go unanswered, lest the
poster be robbed of the opportunity of figuring out the answer
for her/himself.

jfc@athena.mit.edu (John F Carr) (02/11/91)

In article <45762@mips.mips.COM> cprice@mips.COM (Charlie Price) writes:

>> and why does the
>>processor stall on a cache miss?  Do MIPS-2 processors act the same way on a
>>cache miss?

>What else is there to do?

>The only way not to stall when the instruction or data that you 
>want isn't available is if you are prepared to forge ahead and 
>try to execute instructions out of order.

I meant cache miss for data, not instruction fetch.  If the data being
loaded is not in cache, it should be possible to continue anyway with
unrelated instructions.  This does require parts of the processor to stall
when the data is needed if the load takes longer than expected, but the
newer MIPS processors do this anyway.  With such a design, the assembler
would put as many unrelated instructions as possible after a load.

I brought this up because I do much of my programming on an IBM RT, which
allows 2 outstanding load operations which normally need 5 cycles to
complete.  If you are accessing slower memory (such as an I/O device), data
takes longer to become available but the processor doesn't stop unless it
needs to.  I was surpised that a newer architecture with a faster clock rate
had a simpler way of handling load delay.

Over the next few months I will be running some simulations on a MIPS-based
workstation.  I expect the data size will need to be larger than the cache
size, so I will be getting a lot of cache misses.  The decision to have the
processor wait until data is in cache makes the change when a program
exceeds cache size larger than it would otherwise be.  I've never designed a
microprocessor nor have I ever taken any courses on this subject, but it
seems to me that stalling on loads only when the data is needed would have
been a better choice.  no-ops could be eliminated, performance could be
increased for some programs, and it would not have been necessary to change
the instruction set later when the load delay time changed.

--
    John Carr (jfc@athena.mit.edu)

cprice@mips.COM (Charlie Price) (02/12/91)

In article <1991Feb11.043136.14845@athena.mit.edu> jfc@athena.mit.edu (John F Carr) writes:
>In article <45762@mips.mips.COM> cprice@mips.COM (Charlie Price) writes:
>
>>The only way not to stall when the instruction or data that you 
>>want isn't available is if you are prepared to forge ahead and 
>>try to execute instructions out of order.
>
>I meant cache miss for data, not instruction fetch.

Clearly it would be faster not to stall on cache misses
-- if you can do that at the same speed.
I don't design these things, but the folks who do tell me
that this is VERY complicated.

>I brought this up because I do much of my programming on an IBM RT, which
>allows 2 outstanding load operations which normally need 5 cycles to
>complete. 

The RT doesn't have a cache, right?
That makes some difference in how you approach loads and stores --
and some difference in the performance of the system.
-- 
Charlie Price    cprice@mips.mips.com        (408) 720-1700
MIPS Computer Systems / 928 Arques Ave. / Sunnyvale, CA   94086-23650