[comp.arch] SRAMS vs. cache-cihps

johnw@astroatc.UUCP (John F. Wardale) (12/08/87)

John Mashey, (who is alway interesting)  writes:

>  Of delivered RISC machines, the ones that use standard
>SRAMs {MIPS, SPARC} outperform those that have special-purpose
>cache-mmu chips {Clipper}.  The AMD29000 (no special RAM chip designs)
>and Motorola 78000 {cache/mmu chips} will add a few more data points.

I always thought that the "problem" with the clipper was that its
path to memory was too long:  (long latency Virtual-addr to
data-available).

Is this an affect of the cach/mmu chips, or a more seperable
issue.   In otherwords, can one make a good, fast RISC box with
mmu/chace chips, and still maintain good memory-responce (ala MIPS
and maybe AMD29000)?

Any opinions, net-speculation, etc.??  (John Mashey--comment?)

-- 
					John Wardale
... {seismo | harvard | ihnp4} ! {uwvax | cs.wisc.edu} ! astroatc!johnw

To err is human, to really foul up world news requires the net!

mash@mips.UUCP (John Mashey) (12/13/87)

In article <636@astroatc.UUCP> johnw@astroatc.UUCP (John F. Wardale) writes:
...kind words....
>>  Of delivered RISC machines, the ones that use standard
>>SRAMs {MIPS, SPARC} outperform those that have special-purpose
>>cache-mmu chips {Clipper}.  The AMD29000 (no special RAM chip designs)
>>and Motorola 78000 {cache/mmu chips} will add a few more data points.

Note: Colin Plumb pointed out to me that there was supposed to be a
cache chip [AM29062] that was being worked on.  I should have been
more precise in what I said in the first place. The 29K literature
says that all sorts of memory configurations are possible, unlike:
	Clipper, where the design clearly needs the CAMMU chips.
	78K, where 78200 CMMU chips are also clearly expected (I don't
		have the info to know if you can use 78K alone easily).

Thus, there are perhaps 3 combinations dealing with external caches:
1) CPU and cache-mmu chips are clearly a set, and CPU is difficult or
impossible to use alone, at least in virtual-memory systems that are fast.
	Clipper; 78000+78200(?)
2) CPU might have special cache chips, or else builds caches out of
standard SRAMs plus glue.
	29K (?); SPARC (?)
3) CPU just uses standard SRAMs for caches, ie.., cache control is on-chip.
	R2000.

>I always thought that the "problem" with the clipper was that its
>path to memory was too long:  (long latency Virtual-addr to
>data-available).

>Is this an affect of the cach/mmu chips, or a more seperable
>issue.   In otherwords, can one make a good, fast RISC box with
>mmu/chace chips, and still maintain good memory-responce (ala MIPS
>and maybe AMD29000)?

This still seems unresolved: one must always be careful to consider
the implementation, i.e., there are plenty of reasonable architectural
ideas that whose implementations causethem to look worse than they are.
That's one of the reasons why one wants more data points.
For example, there are some gross structural similarities between the
Clipper and 78K [which Motorola finally admitted in public last week that
it exists, although not in a named fashion.]  The gross similarities:
	a) CPU = CPU + on-chip floating-point
	b) Cache-mmu chips [separate I&D], with physical caches.
From the skimpy simulated performance data available [simulated 1.0
Dhrystones, @ 20MHz, under varying optimizations], plus implication of
2-cycle loads from some of the pipeline charts, we'd expect that
a 20MHz 78K would probably be what we'd call 10-(12-minus) mips,
i.e., a little faster than the 15MHz R2000 in our M/1000 boxes that
have long memory latencies, but slower than a 16.7MHz one with shorter
latencies.  If the compilers don't get where they think they will,
then it might be less.  **WARNING*** all of this is extrapolating from
really skimpy data.  Presumably we'll know more when it really gets announced.

With regard to the general issue [partitioning, and cache-response],
one can observe that there is always a tug-of-war between two things:
a) The wish to keep minimal-cycle latencies between CPU and caches,
for loads, stores, and I-fetches.
b) The wish to minimize the performance degradation caused by
cache misses, sometimes by 
	b1) Lowering the refill penalty.
OR
	b2) Lowering the cache-miss rate.

Items in b1) include widening the memory bus, using longer cache lines
(up to some point), trickery with the order of fetching, lowering the
latency to memory (hard: DRAMS get bigger, but not much faster), etc.
Most of these are somewhat or mostly independent of a).

Items in b2) include making the caches bigger, or making them more
set-associative.  Bigger caches (up to a point) are easy, but then
get to cost board space, and then there can be cache-bus electrical
problems, although these can be pushed out with design sneakiness.
Making them more associative definitely lowers the cache-miss rate, but
most of the associative-cache designs cost you time on loads/stores,
or require other complexities [i.e., a lot of hardware].  If you have
special-purpose cache-mmu chips [possible BIAS on] you may well NOT be
able to track the speed and cost decreases of the SRAM vendors,
because what you're essentially building is lots of SRAM with some
other special circuitry, while the SRAM vendors are hammering the costs
down and lowering the cycle times by making large volumes of them.
(MIPS bias is to build CPUs with cache control on them and rely on the
SRAM vendors for the caches; it's no accident that our silicon partners are
very, very good SRAM folks!) The other issue with special cache/mmu
chips is whether or not you can gang them together.  If you can't,
you can be horribly shocked when you throw large, demanding applications,
or UNIX kernel code at small caches.  The earliest Clipper numbers
(from InterGraph, a year ago) made it look like what we'd call 3.5-4-mips
user-level, and maybe 1.5-mips for UNIX kernel-heavy mixes (skimpy data,
may have changed since then.)
[possible BIAS off].

Anyway, the design tradeoff often ends up being a tug of war between
wanting to reduce the miss rate (via associativity, for example)
while wanting NOT to impact the cpu-cache interface, while trying to
deal with cost, board-space issues, and trying to outguess semiconductor
trends.....a whole bunch of fun, but not for anyone who wants to take
many years to design systems!
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086