jesup@cbmvax.commodore.com (Randell Jesup) (02/27/91)
Random question concerning the 68040: what do people think about the utility/cost effectiveness/need for external caches (given that it has ?4-way? associative 4K I and D caches internally and a single external bus. What sort of speedups/cache size do you think you'd be likely to get? I would suspect you don't need as large caches as with most "risc" chips, because of the more complex, higher-density instructions, but how much affect does this really have (are there any recent figures out there)? What about external caches on other CISC's, such as 68030's, x86's (yech), etc? Certainly at some point you get insufficient gain for the expense of adding more cache (I know, "insufficient" is a subjective term). I'm interested in where people think the crossovers are (and I suppose for RISC's too while we're at it). -- Randell Jesup, Keeper of AmigaDos, Commodore Engineering. {uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com BIX: rjesup The compiler runs Like a swift-flowing river I wait in silence. (From "The Zen of Programming") ;-)
torrie@cs.stanford.edu (Evan Torrie) (02/27/91)
jesup@cbmvax.commodore.com (Randell Jesup) writes: > Random question concerning the 68040: what do people think about >the utility/cost effectiveness/need for external caches (given that it >has ?4-way? associative 4K I and D caches internally and a single >external bus. I don't have any figures for the 68040, but for a very good explanation of the details and tradeoffs in cache design, take a look at Steven Przybylski's "Cache and Memory Hierarchy Design: A Performance-Directed Approach", Morgan Kaufman, 1990. There are, of course, many issues which would dictate whether adding a second-level cache is "worth it". Workload is a big factor - Unix type development environments are very different from personal computers. Cost also plays an important part. If you're striving for the last percentage point of performance, you can afford to spend a lot on the cache. If you're prepared to sacrifice peak speed in order to get a low cost machine (as it seems NeXT's designers have chosen), the 4K I/D caches are probably sufficient. Written from a MIPS Risc perspective, Pryzbylski suggests that 4K caches are far from optimal. He suggests 64K - 256K for an external cache, and argues a case for direct-mapped over set-associative caches. You mention the issue of code density on the 040 vs RISC type machines. I wonder if this will actually be less of a factor it is with the 68030. I believe it's true that the 040, taking it's RISC-like approach, is actually optimised for the very simple addressing modes, and will actually have an overall lower CPI if the code contains more of these simple instructions in place of a complicated 680x0 addressing mode instruction. Perhaps someone else can confirm this, along with whether this is being implemented in any 68040 specific compilers. > What about external caches on other CISC's, such as 68030's, x86's >(yech), etc? Certainly at some point you get insufficient gain for the >expense of adding more cache (I know, "insufficient" is a subjective term). >I'm interested in where people think the crossovers are (and I suppose for >RISC's too while we're at it). My opinion... For a Unix type workload, 64K-256K of cache is about where you should be now. For a PC type workload, 32K-64K. Apple seems to have done studies on its IIfx and IIci cache designs which indicate that anything more than 32K of cache for the Mac design is overkill. Other crossover points... set associativity of more than 2-4 is wasted. A block size of 4 words - 8 words is usually optimal. My $0.02 worth... -- ------------------------------------------------------------------------------ Evan Torrie. Stanford University, Class of 199? torrie@cs.stanford.edu "And in the death, as the last few corpses lay rotting in the slimy thoroughfare, the shutters lifted in inches, high on Poacher's Hill, and
lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (02/28/91)
In article <19330@cbmvax.commodore.com> jesup@cbmvax.commodore.com (Randell Jesup) writes: > Random question concerning the 68040: what do people think about >the utility/cost effectiveness/need for external caches (given that it >has ?4-way? associative 4K I and D caches internally and a single >external bus. Claimed results from HP (for the HP 425t and HP 425s, both 25 MHz) are: KB of external cache: 0 128 overall SPECmark 11 11.8 Integer SPECs 12.3 12.9 Float SPECs 10.2 11 Note that this is at 25 MHz. What this data is saying, is that the disparity between onchip cache, and main memory, is not extreme enough. [For this benchmark suite] few systems can justify adding an intermediate level to the memory heirarchy. A higher clock rate, or a slower main memory, would cause a bigger disparity. Eventually, the external cache would be reasonable, to reduce the penalty of the onchip cache misses. Note, by the way, that second-level caches have to be much larger than first-level caches. This is because the first-level cache skims the cream, and any second-level cache sees an address stream with most of its locality removed. With bad locality, a small cache isn't going to help. Luckily, the second-level cache only has to be prompt, not screamingly fast, so it isn't *that* expensive to build one. -- Don D.C.Lindsay .. temporarily at Carnegie Mellon Robotics
edwardm@hpcuhd.HP.COM (Edward McClanahan) (03/02/91)
> > Random question concerning the 68040: what do people think about > >the utility/cost effectiveness/need for external caches (given that it > >has ?4-way? associative 4K I and D caches internally and a single > >external bus. > My opinion... For a Unix type workload, 64K-256K of cache is about > where you should be now. For a PC type workload, 32K-64K. Apple seems > to have done studies on its IIfx and IIci cache designs which indicate > that anything more than 32K of cache for the Mac design is overkill. Another issue to consider is cache coherency in systems with multiple CPUs or DMA (e.g. most of the machines mentioned). I do not know what the '040 internal caches do to solve this problem, but external caches which don't address it (cache coherency) can actually be quite a hassle to program around. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Edward McClanahan Hewlett Packard Company -or- edwardm@cup.hp.com Mail Stop 42UN 11000 Wolfe Road Phone: (480)447-5651 Cupertino, CA 95014 Fax: (408)447-5039
mash@mips.com (John Mashey) (03/05/91)
In article <12143@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes: > >In article <19330@cbmvax.commodore.com> > jesup@cbmvax.commodore.com (Randell Jesup) writes: >> Random question concerning the 68040: what do people think about >>the utility/cost effectiveness/need for external caches (given that it >>has ?4-way? associative 4K I and D caches internally and a single >>external bus. > >Claimed results from HP (for the HP 425t and HP 425s, both 25 MHz) are: > >KB of external cache: 0 128 > >overall SPECmark 11 11.8 >Integer SPECs 12.3 12.9 >Float SPECs 10.2 11 > >Note that this is at 25 MHz. What this data is saying, is that the >disparity between onchip cache, and main memory, is not extreme >enough. [For this benchmark suite] few systems can justify adding an >intermediate level to the memory heirarchy. > >A higher clock rate, or a slower main memory, would cause a bigger >disparity. Eventually, the external cache would be reasonable, to >reduce the penalty of the onchip cache misses. >Note, by the way, that second-level caches have to be much larger >than first-level caches. This is because the first-level cache skims >the cream, and any second-level cache sees an address stream with >most of its locality removed. With bad locality, a small cache isn't >going to help. Luckily, the second-level cache only has to be prompt, >not screamingly fast, so it isn't *that* expensive to build one. Yes, all of this is an interesting effect; maybe somebody who knows the details can help sort out some of the following conjectures. First, a few observations: 1) Using MHz/SPECint as an approximation for CPI, we get about 2.0, +/- a little, depending on external cache, or not. The lower the CPI, the worse cache-missing hurts you; the higher the CPI, less hurt. For comparison, does anyone have any SPEC numbers for 486s that do NOT have secondary caches? (The CPI is similar, but the interface is probably different). 2) The miss rate for the on-chip caches is, of course identical between the two configurations. 3) Thus, this just really says: The average miss penalty is only slightly better with and without the external cache, for these programs, on a 68040. 4) Let's consider some reasons why that might be: a) Main memory is fast, using page-mode DRAM or whatever to get fairly fast refills, but also (if that's what they're doing), it costs you some to switch between pages. One would assume that the secondary cache buys MORE in a larger machine with a larger (and probably, longer latency) memory system, and LESS in a tight-coupled workstation / embedded design. b) Perhaps there is something in the 68040-interface+external secondary cache control that has a higher penalty than one would expect. I assume the secondary cache is writeback (?). Maybe the design requires flushing dirty data back to DRAM before initiating the refill? Maybe there are extra cycles for synchronizing everything? Anyway, maybe somebody who actually knows can post a few details, since the rest of us are just guessing. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086
mash@mips.com (John Mashey) (03/05/91)
Oops, I forgot. A really good analysis of memory hierachy design is: Steven A. Przybylski, CACHE AND MEMORY HIERARCHY DESIGN: A Performance- Directed Approach, Morgan Kaufmann, San Mateo, CA, 1990. There's a lot of analysis on multi-level caches. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086