sritacco@hpdmd48.HP.COM (Steve Ritacco) (02/22/90)
Ok, let's talk some architecture stuff. Why is it that the RIOS has a bigger data cache than instruction cache? This defies conventional wisdom. Data caches are less effective than instruction caches and are usually made small because their hit ratio doesn't increase with size as rapidly as instruction cache. If I had to guess what is going on, I would guess that access to the I-cache is very wide, to suport super-scaler, so they crammed all they could on the CPU chip. This seems to be pretty effective. IBM has shown the super-scaler architecture works, which up to this point I wasn't convinced of. The benifits are tangible. A 20MgHz CPU with 8K I-cache and 32K D-cache SPECmarked at 22.something. That is quite impressive. The R3000 which to date seemed the most efficient CPU/system implementation has been displaced for the moment. I don't doubt that the R4000 will out-perform RIOS, there is one thing that I wonder about though. Will the R4000 out-perform RIOS brute force that is high integration and very high clock speeds, or will it beat it by providing greater architectual efficiency? The only SPARC imple mentations that beat mips have a major clock speed difference (not to mention higher implemenatation cost). One other concern is the hype associated with super-scaler. In the EE-Times article someone from mips stated that the R4000 was going to be super-scaler. That's the first time I had heard that. Made me wonder if it is true, or just an attempt to ride the hype wave. If better performance can be had with a simpler design (non super-scaler) due to less complexity, why not tell the world. That is what the whole RISC-CISC thing was all about in the first place. The America chip set seems quite complex to me! How does the complexity of a design like it compare with CISC complexity? Is it more managable for some reason? ... On a side note, does IBM think workstation boxes need to be ugly to be impressive, or was their industrial design team out to lunch? ------------------------------------ These coments are only my own, and do not reflect te views of my employer ... ____________________________________
aglew@oberon.csg.uiuc.edu (Andy Glew) (02/23/90)
>From: sritacco@hpdmd48.HP.COM (Steve Ritacco) > >Why is it that the RIOS has a bigger data cache than instruction cache? >This defies conventional wisdom. Data caches are less effective than >instruction caches and are usually made small because their hit ratio >doesn't increase with size as rapidly as instruction cache. If I had to >guess what is going on, I would guess that access to the I-cache is >very wide, to suport super-scaler, so they crammed all they could on >the CPU chip. This seems to be pretty effective. IBM has shown the >super-scaler architecture works, which up to this point I wasn't convinced >of. The benifits are tangible. A 20MgHz CPU with 8K I-cache and 32K >D-cache SPECmarked at 22.something. That is quite impressive. The R3000 >which to date seemed the most efficient CPU/system implementation has >been displaced for the moment. Huh?!? I-caches get better hit rates than D-caches, but you quickly reach a point of diminishing returns - your I-cache hit rate is so good than improving it doesn't make too much difference to your performance. Moreover, if your memory system cycles comparable to your processor, but just has a large latency, then you can suck instructions out of memory, except for branches. Just went to a talk by Mudge of Michigan, who are building a GaAs MIPS 6000, where the speaker had to justify I-cache bigger than D-cache. In this case, they wanted a direct mapped virtual primary D-cache, so the D-cache size was limited by the page size of the architecture, to avoid synonym problems. Since you don't worry about synonyms in the I-cache, the I-cache could be made larger. -- Andy Glew, aglew@uiuc.edu
mash@mips.COM (John Mashey) (02/24/90)
In article <14900004@hpdmd48.HP.COM> sritacco@hpdmd48.HP.COM (Steve Ritacco) writes: >Ok, let's talk some architecture stuff. >Why is it that the RIOS has a bigger data cache than instruction cache? >This defies conventional wisdom. Data caches are less effective than >instruction caches and are usually made small because their hit ratio >doesn't increase with size as rapidly as instruction cache. If I had to >guess what is going on, I would guess that access to the I-cache is >very wide, to suport super-scaler, so they crammed all they could on >the CPU chip. This seems to be pretty effective. IBM has shown the >super-scaler architecture works, which up to this point I wasn't convinced >of. The benifits are tangible. A 20MgHz CPU with 8K I-cache and 32K >D-cache SPECmarked at 22.something. That is quite impressive. The R3000 >which to date seemed the most efficient CPU/system implementation has >been displaced for the moment. 1) Super-scalar works, at least for getting at more of the low-level parallism of FP code. This is clearly shown by the IBM systems. 2) It's not clear that it works, or their specific case works. This might be: a) Compilers will get better (likely) for integer. b) Compilers will get better, a lot (unlikely) for integer. It is alwasy possible that there's a whole lot of mileage to be gained, but past experience says to doubt it; it's not like this compiler technology is a raw new technology: IBM has been doing excellent optimization for a long time. Certainly, our experience has been that most of the micro-level scheduling improvements over the last few years ahve been more in the FP area, than in the integer area. Anyway, I'd council keeping an open mind, but I'd also advise not just believing what some IBM marketing guy says "Oh, we haven't really taken advantage of that.", that it's going to get magically better. On the other hand, if one of their good technical folks like Marty Hopkins says there's a big jump coming, then one should pay serious attention. 3) In general, this does raise an interesting issue, which is comparing cache sizes. Some people build smaller, special-purpose, N-way-set-associative caches; some people build various-sized, direct-mapped caches from standard SRAM. Both ways are legitimate, and there are various tradeoffs in terms of power, space, and cost. One thing to be careful off is saying that something did it with a small cache, because one would also want to know how much a special-purpose cache chip costs, also..... I don't claim to be unbiased: I like using standard SRAMs, because they always get cheap, and so far, I think that chip costs argue with me, but there are legitimate reasons for doing it the other way, too. Of course, we're not likely to know for sure the cost of the IBM chips, so it's not so easy to compare. Note, just in case anyone is misled by the following, there is no announced product from MIPS called an R4000.... >I don't doubt that the R4000 will out-perform RIOS, there is one thing >that I wonder about though. Will the R4000 out-perform RIOS brute force >that is high integration and very high clock speeds, or will it beat >it by providing greater architectual efficiency? The only SPARC imple >mentations that beat mips have a major clock speed difference (not to >mention higher implemenatation cost). One other concern is the hype >associated with super-scaler. In the EE-Times article someone from >mips stated that the R4000 was going to be super-scaler. That's the As far as I know, no one from MIPS, who knows, has ever publicly said that it would be super-scalar (or that it wouldn't). What we generally say is: there are various flavors of multiple-issue machines: superscalar, superpipelined, and VLIW (or maybe, short VLIW, which is what I'd call the i860 and DN10000, sort of), and that anybody who wants to be competitive in the current round of chips needs to do one of these, and that of course we've been working on this for several years, and are familiar with the variations, and are doing one, or some combination, but that we explicitly refuse to disclose which flavor we're using in the R?000. At some panel session within the last year or so, I commented that architectural simulation is a necessity, because this area was far beyond human intuition. I said, for instance, that we'd burned huge numbers of cycles simulating the effects of being able to do various pairs of instructions simultaneously, and comparing results, and that we'd been thinking about "supersonic" pipelines for years. Actually, what I think has happened is that super-scalar has become like RISC. I.e., for a while, lots of people got convinced that if something had register windows, that was RISC. Right now, almost anything with an aggressive pipeline gets called super-scalar, because for most people, the implementation nuances are irrelevant. >first time I had heard that. Made me wonder if it is true, or just an >attempt to ride the hype wave. If better performance can be had with >a simpler design (non super-scaler) due to less complexity, why not tell As usual, I recommend Hennessy's article in the September 89 UNIX Review; it also contians some references to other articles with good studies of super-scalar issues. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
mash@mips.COM (John Mashey) (02/24/90)
In article <AGLEW.90Feb22110902@oberon.csg.uiuc.edu> aglew@oberon.csg.uiuc.edu (Andy Glew) writes: >I-caches get better hit rates than D-caches, but you quickly reach a >point of diminishing returns - your I-cache hit rate is so good than >improving it doesn't make too much difference to your performance. >Moreover, if your memory system cycles comparable to your processor, >but just has a large latency, then you can suck instructions out of >memory, except for branches. Note, also, that it also depends on the kind of programs you're emphasizing. Systems that run lots of users, executing the kernel a lot, or running big DBMs, have different characteristics than ones aimed more at technical number crunch, especially of the linear algebra type. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
mitch@oakhill.UUCP (Mitch Alsup) (02/27/90)
# Donning flame retardent suit In article <36426@mips.mips.COM> mash@mips.COM (John Mashey) writes: >Note, just in case anyone is misled by the following, there is no >announced product from MIPS called an R4000.... Aye, Captain ! >As far as I know, no one from MIPS, who knows, has ever publicly said >that it would be super-scalar (or that it wouldn't). ..... >............................... and that anybody who wants to be competitive >in the current round of chips needs to do one of these, ... Does this imply what it appears to imply? >............ I said, for instance, that we'd >burned huge numbers of cycles simulating the effects of being able to ^^^^^^^^^^^^^^^^^^^ Presumably all of these CPU cycles are limited in performance by the capability of the machines to process Double Precision Floating Point Numbers? :-) >do various pairs of instructions simultaneously, and comparing results, ^^^^^ only pairs????? c.f. IBM 6000 >and that we'd been thinking about "supersonic" pipelines for years. ^^^^^^^^^^^^ Do you require afterburners to achieve the "supersonic" transition? ;-) Or, have you kept the Reynolds Number low? # Removing flame retardent suit >-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> Mitch Alsup M88000 Design Group Mike Shebanow M88000 Design Group DISCLAIMER: <We speak for ourselves only>