[comp.arch] Cache On Chip

keith@mips.COM (Keith Garrett) (09/26/90)

In article <10550@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes:
>
>It occurs to me that as chips get bigger, we have to cross an
>interesting discontinuity.
>
>Today, a million transistor chip (like the 68040 or i860) will have 8
>KB of onchip cache.  Of course, that's not enough: a high end system
>needs 128 KB or even 1 MB per processor.  When we get to 50 or 100
>MT/chip, then we can do that.
>
>The discontinuity is that for intermediate chip sizes, we _don't_
>want to just have an intermediate sized cache. The reason is that
>high speed is best served by having a fast primary cache, and a
>slightly slower secondary cache.  But notice the following simulation
>results (High Performance Systems, Sept89 P. 76):
>
>	Sustained Native MIPS for an idealized 100 MHz RISC CPU:
>
>Secondary Cache in KB =	64	128	256	512
>
>Primary =  8 KB		49	62	75	80
>Primary = 16 KB		51	65	79	85
>Primary = 32 KB		52	67	83	89
>
>...that is, the cache faults slow the CPU from 100 MIPS to 89 MIPS
>(best case) or 49 MIPS (worst case).  The important thing here is to
>read these numbers across, and then read them down.  Notice that
>throughput just isn't that sensitive to the size of the primary
>cache.
these numbers are interesting, but it would be of interest to see the
affect of the primary cache size a) without secondary cache, and b)
with primary miss times 2-3 times longer than primary hit times, that is,
2-3 cycle miss penalties.
>
>I conclude that between 1 MT and 100 MT, there is a region where we
>can't get the secondary cache on-chip, and a primary cache big enough
>to fill up the chip would have nil or negative benefit.
this seems rather pessimistic. current ram technology is 256Kb for srams
and 4Mb for drams. 1Mb and 16Mb are just around the corner. assuming 30%
overhead for checkbits and tags (thats 3 bits/byte or 11 bits/byte total),
128KB ~ 1.4Mb and 1MB ~ 11Mb. i don't know the relative sizes, but the
technology should be simular for the '040, 256Kb sram, and 1Mb dram.
assuming they are the same size, we can calculate relative sizes for
new chips providing sram secondary cache, or dram secondary cache.

for 256kb srams:
   128kB cache uses 5.6 rams + processor = 6.6 ram equivalents
   1MB cache uses 44 rams + processor = 45 ram equivalents
for 1Mb srams:
   128kB cache uses 1.4 rams + processor = 2.4 ram equivalents
   1MB cache uses 11 rams + processor = 12 ram equivalents
for 4Mb drams:
   128kB cache uses .4 rams + processor = 1.4 ram equivalents
   1MB cache uses 3 rams + processor = 4 ram equivalents
for 16Mb drams:
   128kB cache uses .1 rams + processor = 1.1 ram equivalents
   1MB cache uses .7 rams + processor = 1.7 ram equivalents

these numbers are very crude. perhaps someone can share info about
the relative die sizes and process CD's for the various processors and rams.
the worst number here suggests that you are only off by a factor of 2, but
i suspect that 2-5x the '040 is a more reasonable estimate of equivalent die
area to include a reasonable secondary cache.
BTW, dram isn't bad for secondary cache if you can avoid address muxing.
there was a startup that flare up a couple of years ago, offering sram
equivalents using dram technology. perhaps someone else remembers their name.
their worst case times were ~30ns. refresh got in the way, but i think that
you can schedule that reasonably well for an integrated cache.
-- 
Keith Garrett        "This is *MY* opinion, OBVIOUSLY"
      Mips Computer Systems, 930 Arques Ave, Sunnyvale, Ca. 94086
      (408) 524-8110     keith@mips.com  or  {ames,decwrl,prls}!mips!keith