[comp.arch] SPARC vs. MIPS on gcc really, single-word refill

mash@mips.COM (John Mashey) (01/04/89)
In article <486@blake.acs.washington.edu> lgy@blake.acs.washington.edu (Laurence Yaffe) writes:
>In article <10436@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
...
>-	if a benchmark has a high data cache miss rate, and block-fetch
>-		DOESN'T work (compress is the notorious example), then
>-		a 2000 is not as much better as the clock-rate difference.
>-		(Compress is notorious because it hashes data into a huge
>-		sparse array & 1-word-refilled data caches are BETTER
>-		than N-word-refilled caches, which is not often true.)
					     ^^^^^^^^^^^^^^^^^^^^^^^
>    I'm curious about the basis for this judgement.  In much of my own 
>recent work, I've been dealing with several large, integer programs which
>do special purpose symbolic algebra.  Much of the execution time of these
>programs is devoted to searches in a large ordered hash table (~50 Kb),
>plus assorted string operations which typically only access the first
>few characters in a string.  These appear to be examples of programs
>for which multi-word data cache refill is not helpful.  For example,
>comparing a MIPS M/120 (16.7 MHz) versus a 20 Mhz M/2000, I've found:

>Program #1 ("obsgen")	MIPS M/120 (-O2)	 821 sec
>			MIPS M/2000 (-O2;3.10)	 795

>Program #2 ("scrgen")	MIPS M/120 (-O2)	 808 sec
>			MIPS M/2000 (-O2;3.10)	 826

>Obviously, these two programs may not be representative of "typical"
>programs (whatever those are).  However, I would not be surprised if
>many "data-management" type programs (with large hash tables, binary
>trees, etc.) have similar behavior - namely better performance with
>single word data cache refill.

As usual, it depends on the kind of programs you run.  If you keep the rest
of the system design constant, and just vary the refill-size, each benchmark
will have an optimal refill-size or pair of refill-sizes, and of course,
they can vary quite radically.  Across the mix of benchmarks we use,
either 8 or 16 words was the right number for the M/2000, and we picked 16
because it helped some of the linear algorithms while generally not
bothering the others particularly.  There is no doubrt that you run into
cache-buster programs, naturally [that's why include compress in our
internal benchmark suite, as it helps drag the average down.]
I suspect that at least a 2-word refill would be appropriate for longer-
latency memor ysystems: most floating-point programs at least access doubles,
and many list-processing programs have at least a 2 links or link+data.
The fundamantal problem of course, is that machines that run fast with
large expandable memories are inherently further away from their DRAM
in cycle counts than machines that are either slower or don't need
supermini+ memory systems.  As I noted, you can always write cache-busters
that will drag a machine down to DRAM speed.  The only way to win
is statistically, i.e., bigger caches to get reasonable cache miss ratios
for bigger and bigger problems, and still.....
Fortunately, caches do keep getting bigger and bigger.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086