[comp.arch] Discontintuiy

lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (09/24/90)

It occurs to me that as chips get bigger, we have to cross an
interesting discontinuity.

Today, a million transistor chip (like the 68040 or i860) will have 8
KB of onchip cache.  Of course, that's not enough: a high end system
needs 128 KB or even 1 MB per processor.  When we get to 50 or 100
MT/chip, then we can do that.

The discontinuity is that for intermediate chip sizes, we _don't_
want to just have an intermediate sized cache. The reason is that
high speed is best served by having a fast primary cache, and a
slightly slower secondary cache.  But notice the following simulation
results (High Performance Systems, Sept89 P. 76):

	Sustained Native MIPS for an idealized 100 MHz RISC CPU:

Secondary Cache in KB =	64	128	256	512

Primary =  8 K		49	62	75	80
Primary = 16 KB		51	65	79	85
Primary = 32 KB		52	67	83	89

...that is, the cache faults slow the CPU from 100 MIPS to 89 MIPS
(best case) or 49 MIPS (worst case).  The important thing here is to
read these numbers across, and then read them down.  Notice that
throughput just isn't that sensitive to the size of the primary
cache.

I conclude that between 1 MT and 100 MT, there is a region where we
can't get the secondary cache on-chip, and a primary cache big enough
to fill up the chip would have nil or negative benefit.

-- 
Don		D.C.Lindsay

usenet@nlm.nih.gov (usenet news poster) (09/24/90)

In article <10550@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes:
>[...]
>high speed is best served by having a fast primary cache, and a
>slightly slower secondary cache.  But notice the following simulation
 ^^^^^^^^^^^^^^^
>results (High Performance Systems, Sept89 P. 76):
>
>	Sustained Native MIPS for an idealized 100 MHz RISC CPU:
>
>Secondary Cache in KB =	64	128	256	512
>
>Primary =  8 K			49	62	75	80
>Primary = 16 KB		51	65	79	85
>Primary = 32 KB		52	67	83	89
>
>...that is, the cache faults slow the CPU from 100 MIPS to 89 MIPS
>(best case) or 49 MIPS (worst case).  The important thing here is to
>read these numbers across, and then read them down.  Notice that
>throughput just isn't that sensitive to the size of the primary
>cache.

Isn't the key here the use of a secondary cache which is only slightly
slower?  

>I conclude that between 1 MT and 100 MT, there is a region where we
>can't get the secondary cache on-chip, and a primary cache big enough
>to fill up the chip would have nil or negative benefit.

What would the results be in a system configured without a secondary cache?
Maybe the high end system will require secondary cache. Might not a big chip 
which didn't require secondary cache be an overall win on lower end systems?

>-- 
>Don		D.C.Lindsay

David States

my@dtg.nsc.com (Michael Yip) (09/24/90)

Instead of just expanding the on-chip cache size there are many things
that we can do to use up that 49 million transisters (;-).

We can probably integrate some specialized functional units on-chip.
For example, units for vector processing and units for off loading
the routing of mesages (for Transputers).  Or even put many processors
on the same Si and have parallel processing on one chip.

I am not an expert to computer architecture but it is interesting to
see what we can do in a few years (hopefully).  

-- Mike
   my@dtg.nsc.com

chrism@mips.COM (Christopher Marino) (09/27/90)

In article <10550@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes:
>
>It occurs to me that as chips get bigger, we have to cross an
>interesting discontinuity.

I don't dispute your observation that there is somewhat of a
discontinuity here. However, there are implementation issues that complicate
the problem a little.

The MIPS 6280 is a machine that confirms the data that you present.  Here is 
a machine that has a relatively small primary cache and a large secondary
cache. (16 KB primary data, 64 KB primary instruction, 512 KB shared secondary)
The penalty of a primary cache miss depends a lot on how you build it (speed
of the parts, datapaths etc.). Given the constraints of this design, the
benefits of larger primary caches could not be justified.

My guess is that the design of the secondary cache would be optimized to take
full advantage of the primary cache that does get integrated on the chip.
What comes to mind is that faster parts and wider datapaths would be used for
the secondary cache.  This would alter the performance characteristics of the
machine as the secondary cache gets smaller (the penalty of a miss would be
smaller).

Getting the primary cache on chip will be change many of the working
assumption that were used to generate the data that you present. 
>

>Secondary Cache in KB =	64	128	256	512
>
>Primary =  8 K		49	62	75	80
>Primary = 16 KB		51	65	79	85
>Primary = 32 KB		52	67	83	89
>
>...that is, the cache faults slow the CPU from 100 MIPS to 89 MIPS
>(best case) or 49 MIPS (worst case).  The important thing here is to
>read these numbers across, and then read them down.  Notice that
>throughput just isn't that sensitive to the size of the primary
>cache.
>
>I conclude that between 1 MT and 100 MT, there is a region where we
>can't get the secondary cache on-chip, and a primary cache big enough
>to fill up the chip would have nil or negative benefit.
>
>-- 
>Don		D.C.Lindsay