hansen@mips.UUCP (Craig Hansen) (05/14/87)
In article <16658@amdcad.AMD.COM>, phil@amdcad.AMD.COM (Phil Ngai) writes: > Let's not forget about a branch target cache. With memories capable of > supplying high bandwidth (nibble mode, page mode, video memories, etc) > all you really need is to handle the start up time. By caching only > about four instructions at the branch target, you can run loops at > full speed no matter how long they are. By caching several branch > targets you can handle nested loops, etc. Don't forget that you must interleave load and store references into that memory system too. A "real man's" memory system with ECC, of a serious size, and bandwidth for I/O transactions doesn't respond to a memory request in 80 nanoseconds. (This assumes two cycles in on/off chip delays and branch target selection). This means that you'd better cache more than four instructions in your branch target cache, and the average basic block size is in the range of six to eight instructions (This is for MIPS instructions, AMD instructions would be more because of their weaker load/store/branch instructions). The branches that terminate these basic blocks are taken more than 50% of the time, so I don't see how a branch target cache performs comparably to an instruction cache, in a real system environment. I guess the AMD folks are excited about their new child, but when reality sets in, and you try to build a real UNIX-based computer system out of this fine controller part, you'll be sorely disappointed if you expected 17 MIPS at 25 MHz with a nibble-mode DRAM as your primary memory system. Based on their benchmarks (puzzle, dhrystone, and a tiny piece of linpack; sorry, but that's all they've got to compare with...) with an external cache, external cache control hardware, and a fast main memory system, a 29000 at 25 MHz that you can build next year doesn't perform any faster than a MIPS R2000 system at 16.7 MHz that you can build now. -- Craig Hansen Manager, Architecture Development MIPS Computer Systems, Inc. ...decwrl!mips!hansen
bcase@apple.UUCP (Brian Case) (05/15/87)
In article <388@dumbo.UUCP> hansen@mips.UUCP (Craig Hansen) writes: >instructions). The branches that terminate these basic blocks are taken >more than 50% of the time, so I don't see how a branch target cache performs >comparably to an instruction cache, in a real system environment. it can perform comparably because, logically, part of the instruction cache is in the external rams. the branch target cache is only there to cover the latency of the initial access in that external ram (how many times do we have to say this?). so, in essence, the Am29000 branch target cache can cache 32 loops of *any* size. this is most certainly not true of traditional instruction caches. on the other hand, this is not necessarily a very important attribute; the point is that a branch target cache can perform, and in fact does perform, as well as an instruction cache. i feel that it should be i (or someone defending the am29000) who should be asking *you* to defend a traditional instruction cache relative to the branch target cache (always assuming an external burst mode memory for the branch target cache, of course. the burst mode memory is a critical part of the concept.). >I guess the AMD folks are excited about their new child, but when reality >sets in, and you try to build a real UNIX-based computer system out of this >fine controller part, you'll be sorely disappointed if you expected 17 MIPS >at 25 MHz with a nibble-mode DRAM as your primary memory system. Based on *not* nibble-mode, video dram; there is a big difference. >their benchmarks (puzzle, dhrystone, and a tiny piece of linpack; sorry, but >that's all they've got to compare with...) with an external cache, external >cache control hardware, and a fast main memory system, a 29000 at 25 MHz that >you can build next year doesn't perform any faster than a MIPS R2000 system >at 16.7 MHz that you can build now. i will be one of the first to say that mipsco has a good part. you guys have done a great job in the sense that you have gotten great performance at lower clock rates. but what happens to your bus at 25, 30, 35 mhz? i'm not saying that you are guys are going to fail to fix things, but let's not be casting stones! how do you know that amd will be sorely dissapointed in its expectations of 17 mips at 25 mhz? there are accurate simulations to support just this data! i am not trying to say you are wrong, but don't just make the statement, back it up with solid data or at least some observations you have made about problems with the architecture/implementation! please, follow the lead of your co-worker john mashey. sorry, i know the note i am responding to is old, so if this issue has already been put to bed, forgive me. i have been off the net for a while because of a job change. bcase
howard@cpocd2.UUCP (Howard A. Landman) (05/20/87)
In article <3460001@hpsrla.HP.COM> brucek@hpsrla.HP.COM (Bruce Kleinman) writes: >CISC CPUs tend to be, uh, >rather complex. And, therefore, the CPU usually dominates the chip. > >RISC CPUs tend to be rather simple. And, >therefore, the CPU is usually a small portion of the chip. This is approximately true, depending on what you mean by "CPU". However, I find that it is usually far more enlightening to consider the fraction of the chip dedicated to computation/storage versus the fraction dedicated to control. Last time I looked, every CISC microprocessor ever made spent 50% to 75% of the chip area on control, with an average of about 2/3. Usually a big chunk of this is the ucode ROMs. This leaves only 25% to 50% of the chip to do the useful work. It's easy to see this in most chip photos. Look for the datapath and register file, squeezed into one end of the chip. Everything else is control. I don't have numbers for all the RISC processors, but the RISC I spent 6% of its area on control; roughly one-tenth the CISC percentage. And I'm pretty sure MIPS is well under half. Numbers, Craig? Anyone? -- Howard A. Landman ...!intelca!mipos3!cpocd2!howard howard%cpocd2%sc.intel.com@RELAY.CS.NET "You just ask them?"
brucek@hpsrla.HP.COM (Bruce Kleinman) (05/22/87)
+--------- | Last time I looked, every CISC microprocessor ever made spent | 50% to 75% of the chip area on control, with an average of about 2/3. | [....] | This leaves only 25% to 50% of the chip to do useful work. +--------- My point exactly. Consider the newest Transputer, the T8 I believe. A RISCy chip which happens to be microcoded, curiously enough. Very tight instruction set, and a stack-like programming model. Fixed-width instructions, a scant 8 bits wide. Honest. Sound simple? It is. The instruction decode and microcode occupy about 10% of the die. Hmmm, what to do the rest of that real estate. The Inmos folks decided on 4K bytes of RAM, a floating point unit, a DRAM controller, and four high-speed serial links. They have experimented with a version that replaces the floating point unit with a Winchester controller. My point here is not to glorify the Transputer (although I am exceedingly impressed by the chip). I use it to illustrate the possibilities opened up by "simple instruction set computers" as a whole. MIPS II has a 4K byte cache. The AMD 29000 has a 192 registers, a branch target cache, and an MMU. The Fairchild Clipper has a floating point unit (the 8K bytes of cache and the pair of MMUs are on two separate dies). The possibilities seem nearly endless. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Bruce Kleinman Hewlett Packard -- Network Measurements Division Santa Rosa, California ....hplabs!hpsrla!brucek ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~