markhall@pyramid.pyramid.com (Mark Hall) (06/27/88)
In article <2465@winchester.mips.COM> mash@winchester.UUCP (John Mashey) writes: >In article <1098@nud.UUCP> tom@nud.UUCP (Tom Armistead) writes: >> For example, in the following code sequence >>(assume the ld is a cache miss): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> >>1) ld r2,r3,0 ; Get value. >>2) add r3,r3,16 ; Bump pointer >>3) add r2,r2,1 ; Increment value. >>4) sub r4,r4,1 ; Dec count. >> >>the instruction unit will stall on instruction 3 since it attempts to >>use stale data. >Thanx: we weren't sure whether it had multiple streams or not. >The example seems to indicate that the 88K indeed has a load with >2 cycles of latency (i.e., cycles 2 & 3 above). > ... >Note that our numbers say that in our machines, it would cost us >10-15% in overall performance to go from 1 cycle latency to 2, >and the similarity of machines probably means about the same amount >for an 88K. >-- >-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> Tom said to assume that the load caused a cache miss, so the example given does not imply that the load has 2 delay cycles. I think everyone designing fast new machines HOPES that the m88k will have some glaring defect like that. I know I sincerely hope that it has at least ONE delay cycle, or all our gooses are cooked! ;-) Now, wasn't that a clever way to give an example and still not give away the timings? -Mark Hall (smart mailer): markhall@pyramid.pyramid.com (uucp paths ): {amdahl|decwrl|sun|seismo|lll-lcc}!pyramid!markhall
andrew@frip.gwd.tek.com (Andrew Klossner) (06/28/88)
|>> For example, in the following code sequence |>>(assume the ld is a cache miss): | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |>> |>>1) ld r2,r3,0 ; Get value. |>>2) add r3,r3,16 ; Bump pointer |>>3) add r2,r2,1 ; Increment value. |>>4) sub r4,r4,1 ; Dec count. |>> |>>the instruction unit will stall on instruction 3 since it attempts to |>>use stale data. | |>Thanx: we weren't sure whether it had multiple streams or not. |>The example seems to indicate that the 88K indeed has a load with |>2 cycles of latency (i.e., cycles 2 & 3 above). | |Tom said to assume that the load caused a cache miss, so the example |given does not imply that the load has 2 delay cycles. On the 88k, a load on cache hit does in fact have two delay cycles. The data memory load/store pipeline is three deep. -=- Andrew Klossner (decvax!tektronix!tekecs!andrew) [UUCP] (andrew%tekecs.tek.com@relay.cs.net) [ARPA]