johnw@astroatc.UUCP (John F. Wardale) (12/08/87)
John Mashey, (who is alway interesting) writes: > Of delivered RISC machines, the ones that use standard >SRAMs {MIPS, SPARC} outperform those that have special-purpose >cache-mmu chips {Clipper}. The AMD29000 (no special RAM chip designs) >and Motorola 78000 {cache/mmu chips} will add a few more data points. I always thought that the "problem" with the clipper was that its path to memory was too long: (long latency Virtual-addr to data-available). Is this an affect of the cach/mmu chips, or a more seperable issue. In otherwords, can one make a good, fast RISC box with mmu/chace chips, and still maintain good memory-responce (ala MIPS and maybe AMD29000)? Any opinions, net-speculation, etc.?? (John Mashey--comment?) -- John Wardale ... {seismo | harvard | ihnp4} ! {uwvax | cs.wisc.edu} ! astroatc!johnw To err is human, to really foul up world news requires the net!
mash@mips.UUCP (John Mashey) (12/13/87)
In article <636@astroatc.UUCP> johnw@astroatc.UUCP (John F. Wardale) writes: ...kind words.... >> Of delivered RISC machines, the ones that use standard >>SRAMs {MIPS, SPARC} outperform those that have special-purpose >>cache-mmu chips {Clipper}. The AMD29000 (no special RAM chip designs) >>and Motorola 78000 {cache/mmu chips} will add a few more data points. Note: Colin Plumb pointed out to me that there was supposed to be a cache chip [AM29062] that was being worked on. I should have been more precise in what I said in the first place. The 29K literature says that all sorts of memory configurations are possible, unlike: Clipper, where the design clearly needs the CAMMU chips. 78K, where 78200 CMMU chips are also clearly expected (I don't have the info to know if you can use 78K alone easily). Thus, there are perhaps 3 combinations dealing with external caches: 1) CPU and cache-mmu chips are clearly a set, and CPU is difficult or impossible to use alone, at least in virtual-memory systems that are fast. Clipper; 78000+78200(?) 2) CPU might have special cache chips, or else builds caches out of standard SRAMs plus glue. 29K (?); SPARC (?) 3) CPU just uses standard SRAMs for caches, ie.., cache control is on-chip. R2000. >I always thought that the "problem" with the clipper was that its >path to memory was too long: (long latency Virtual-addr to >data-available). >Is this an affect of the cach/mmu chips, or a more seperable >issue. In otherwords, can one make a good, fast RISC box with >mmu/chace chips, and still maintain good memory-responce (ala MIPS >and maybe AMD29000)? This still seems unresolved: one must always be careful to consider the implementation, i.e., there are plenty of reasonable architectural ideas that whose implementations causethem to look worse than they are. That's one of the reasons why one wants more data points. For example, there are some gross structural similarities between the Clipper and 78K [which Motorola finally admitted in public last week that it exists, although not in a named fashion.] The gross similarities: a) CPU = CPU + on-chip floating-point b) Cache-mmu chips [separate I&D], with physical caches. From the skimpy simulated performance data available [simulated 1.0 Dhrystones, @ 20MHz, under varying optimizations], plus implication of 2-cycle loads from some of the pipeline charts, we'd expect that a 20MHz 78K would probably be what we'd call 10-(12-minus) mips, i.e., a little faster than the 15MHz R2000 in our M/1000 boxes that have long memory latencies, but slower than a 16.7MHz one with shorter latencies. If the compilers don't get where they think they will, then it might be less. **WARNING*** all of this is extrapolating from really skimpy data. Presumably we'll know more when it really gets announced. With regard to the general issue [partitioning, and cache-response], one can observe that there is always a tug-of-war between two things: a) The wish to keep minimal-cycle latencies between CPU and caches, for loads, stores, and I-fetches. b) The wish to minimize the performance degradation caused by cache misses, sometimes by b1) Lowering the refill penalty. OR b2) Lowering the cache-miss rate. Items in b1) include widening the memory bus, using longer cache lines (up to some point), trickery with the order of fetching, lowering the latency to memory (hard: DRAMS get bigger, but not much faster), etc. Most of these are somewhat or mostly independent of a). Items in b2) include making the caches bigger, or making them more set-associative. Bigger caches (up to a point) are easy, but then get to cost board space, and then there can be cache-bus electrical problems, although these can be pushed out with design sneakiness. Making them more associative definitely lowers the cache-miss rate, but most of the associative-cache designs cost you time on loads/stores, or require other complexities [i.e., a lot of hardware]. If you have special-purpose cache-mmu chips [possible BIAS on] you may well NOT be able to track the speed and cost decreases of the SRAM vendors, because what you're essentially building is lots of SRAM with some other special circuitry, while the SRAM vendors are hammering the costs down and lowering the cycle times by making large volumes of them. (MIPS bias is to build CPUs with cache control on them and rely on the SRAM vendors for the caches; it's no accident that our silicon partners are very, very good SRAM folks!) The other issue with special cache/mmu chips is whether or not you can gang them together. If you can't, you can be horribly shocked when you throw large, demanding applications, or UNIX kernel code at small caches. The earliest Clipper numbers (from InterGraph, a year ago) made it look like what we'd call 3.5-4-mips user-level, and maybe 1.5-mips for UNIX kernel-heavy mixes (skimpy data, may have changed since then.) [possible BIAS off]. Anyway, the design tradeoff often ends up being a tug of war between wanting to reduce the miss rate (via associativity, for example) while wanting NOT to impact the cpu-cache interface, while trying to deal with cost, board-space issues, and trying to outguess semiconductor trends.....a whole bunch of fun, but not for anyone who wants to take many years to design systems! -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086