[comp.arch] CPU chip cache sizes, was Re: Harvard Architecure

grenley@nsc.nsc.com (George Grenley) (03/12/88)

In article <1071@PT.CS.CMU.EDU> lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) writes:
>In article <3460011@hpsrla.HP.COM> brucek@hpsrla.HP.COM (Bruce Kleinman) 
> writes about the 68030:
>>Ahh, those massive 256-Byte caches are really going to speed this puppy up :-)

I'm sure they will.  Heaven knows it needs it...

>Plus, the two caches access in parallel (versus the one cache of the 68020).

As do the two caches on the 32532

>Plus, the caches now take one clock (versus 2 on the 68020).

Likewise on the 532

>Plus, the caches now have burst refill (if the board designer supports it,
>of course.)

So does the 532.  We also have larger caches (512 instruction, 1K 2 way
set associative data).  I have seen the studies on hit rate vs size for
the 532; since the 030 is roughly similar architecture I expect they have
the same tradeoffs.  256 bytes is better than no bytes, but it is still
pretty small.

George Grenley
NSC

mash@mips.COM (John Mashey) (03/12/88)

In article <5009@nsc.nsc.com> grenley@nsc.UUCP (George Grenley) writes:
>So does the 532.  We also have larger caches (512 instruction, 1K 2 way
>set associative data).  I have seen the studies on hit rate vs size for
>the 532; since the 030 is roughly similar architecture I expect they have
>the same tradeoffs.  256 bytes is better than no bytes, but it is still
>pretty small.

I recall there was speculation when the 68030 was announced that the
D-cache might actually cost you performance in general applications,
and that people would end up turning it off [unlike the I-cache,
where even a small cache is almost always useful].  
However, I've seen no data published one way or another on this yet,
and I don't have any.
Do you (or anybody else) have any good data on a 256-byte cache with
16 16-byte lines? (i.e., the 68030 D-cache)
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

north@Apple.COM (Donald N. North) (03/15/88)

In article <1853@winchester.mips.COM> mash@winchester.UUCP (John Mashey) writes:
>I recall there was speculation when the 68030 was announced that the
>D-cache might actually cost you performance in general applications,
>Do you (or anybody else) have any good data on a 256-byte cache with
>16 16-byte lines? (i.e., the 68030 D-cache)

Having had some '030 experience as of late, I have found that (in real working
hardware) the D-cache is always a 'win', even though it is small by most
standards.  I have yet to find a benchmark (Dhrystone, any others of the small
integer class) or some real application code in which the performance is less
when the cache is enabled than disabled.  The typical performance improvement
ranges from a low of 5% to a high of 25%, 'average' for larger applications
appears to be about 20%.

In looking at the cache organization (direct mapped, 16-16 byte lines) one could
construct a sequence of references (accessing data locations exactly 256 bytes
apart, for example) that causes thrashing in particular cache lines.  This is a
problem in all direct mapped caches, and one would think it to be especially
severe with such a few number of entries (16 in the '030s case).  I suspect 
this is one reason for the relatively low performance improvement figures; the
other is that the cache is just too small to be 'really' useful except in limited
situations.  Two that come immediately to mind are pushing stack arguments that
are then accessed relatively soon, and accesses to local stack storage.

Don North   -----   Apple Computer, Inc.   -----   Advanced Technology Group
UUCP: {voder,nsc,dual,sun}!apple!north                CSNET: north@apple.com
{{ Facts are facts,  but any opinions expressed are my own,  and *do not* }}
{{ represent any viewpoint, official or otherwise, of Apple Computer, Inc.}}

lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) (03/17/88)

In article <7672@apple.Apple.Com> north@apple.UUCP (Donald N. North) writes:
>Having had some '030 experience as of late, I have found that (in real working
>hardware) the D-cache is always a 'win', even though it is small by most
>standards.  I have yet to find a benchmark (Dhrystone, any others of the small
>integer class) or some real application code in which the performance is less
>when the cache is enabled than disabled.  The typical performance improvement
>ranges from a low of 5% to a high of 25%, 'average' for larger applications
>appears to be about 20%.

You didn't mention how many wait states on a memory access. Also,
you didn't mention if the board supports burst-fill.

I assume that the 20% is for a hot board. I would expect that the cache gets
more useful as boards get slower. In particular, the 68030 should be much
more useful than a 68020 when given a slow 8-bit-wide memory - i.e. a
minimum configuration.

I was recently surprised to learn that 68020 minimum configurations weren't
just showing up in minimum-cost systems. Apparently, some are embedded in
other systems, doing the sort of thing that an 8-bitter could hack (like,
hardware diagnostics). I assume that the 68030 will show up eventually in
this role.

Of course, 68020's have also been used as IO controllers and the like.  Does
anyone have insight into the minimum/maximum aspects of these uses, or the
likelihood of SPARC/MIPS/etc pushing into these roles ?
-- 
	Don		lindsay@k.gp.cs.cmu.edu    CMU Computer Science