[comp.arch] m88k memory stalls

markhall@pyramid.pyramid.com (Mark Hall) (06/27/88)

In article <2465@winchester.mips.COM> mash@winchester.UUCP (John Mashey) writes:

>In article <1098@nud.UUCP> tom@nud.UUCP (Tom Armistead) writes:
>>                         For example, in the following code sequence
>>(assume the ld is a cache miss):
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>>1)	ld	r2,r3,0		; Get value.
>>2)	add	r3,r3,16	; Bump pointer
>>3)	add	r2,r2,1		; Increment value.
>>4)	sub	r4,r4,1		; Dec count.
>>
>>the instruction unit will stall on instruction 3 since it attempts to
>>use stale data.  

>Thanx: we weren't sure whether it had multiple streams or not.
>The example seems to indicate that the 88K indeed has a load with
>2 cycles of latency (i.e., cycles 2 & 3 above).  
> ...
>Note that our numbers say that in our machines, it would cost us
>10-15% in overall performance to go from 1 cycle latency to 2,
>and the similarity of machines probably means about the same amount
>for an 88K.
>-- 
>-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>

Tom said to assume that the load caused a cache miss, so the example
given does not imply that the load has 2 delay cycles.  I think
everyone designing fast new machines HOPES that the m88k will have
some glaring defect like that.  I know I sincerely hope that it has
at least ONE delay cycle, or all our gooses are cooked!  ;-)

Now, wasn't that a clever way to give an example and still not give 
away the timings?

-Mark Hall (smart mailer): markhall@pyramid.pyramid.com
	   (uucp paths  ): 
		{amdahl|decwrl|sun|seismo|lll-lcc}!pyramid!markhall

andrew@frip.gwd.tek.com (Andrew Klossner) (06/28/88)

|>>                         For example, in the following code sequence
|>>(assume the ld is a cache miss):
|   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|>>
|>>1)	ld	r2,r3,0		; Get value.
|>>2)	add	r3,r3,16	; Bump pointer
|>>3)	add	r2,r2,1		; Increment value.
|>>4)	sub	r4,r4,1		; Dec count.
|>>
|>>the instruction unit will stall on instruction 3 since it attempts to
|>>use stale data.  
|
|>Thanx: we weren't sure whether it had multiple streams or not.
|>The example seems to indicate that the 88K indeed has a load with
|>2 cycles of latency (i.e., cycles 2 & 3 above).  
|
|Tom said to assume that the load caused a cache miss, so the example
|given does not imply that the load has 2 delay cycles.

On the 88k, a load on cache hit does in fact have two delay cycles.
The data memory load/store pipeline is three deep.

  -=- Andrew Klossner   (decvax!tektronix!tekecs!andrew)       [UUCP]
                        (andrew%tekecs.tek.com@relay.cs.net)   [ARPA]