mark@mips.UUCP (Mark G. Johnson) (12/16/87)
Quoting from author jesup@pawl22.pawl.rpi.edu (Randell E. Jesup) in article <140@imagine.PAWL.RPI.EDU> of comp.arch on date 13 Dec 87 13:02:20 GMT > Given current technology, r2000 could probably be scaled > to about 20 MHz. However, custom RISC designs in CMOS are > now reaching 40 MHz, which would be impossible with the > double-clocked interface currently on the r2000. Perhaps > the interface could be removed, given enough pins, but > that gets you back into the packaging limits. "Impossible" is quite a strong word. "Difficult", sure. But he's saying that a 2.4X improvement of a first-chip-designed-at-a-startup- company, 2-micron-generic-silicon-foundry device is IMPOSSIBLE. A few things might change :-) :-) between now (16.7 MHz) and 40 MHz. Principal among these is experience; several different systems using this double-clocked approach have now been built (by SGI, MIPS, and others) and their properties have been measured and analyzed. Weaknesses, if any :-) :-), can be improved, and strengths can be exploited. Other factors conspire to make the job of building a 40 MHz double-clocked interface not "impossible": 1. Cache RAM access times will continue to decrease, likely at the same rate as the processor clock, since SRAM vendors now build RISC chips (including SPARC, R2000, Am29K). So RAM access time will probably stay at 40-50% of processor cycle time. {presently 60 ns cycle, 25-30 ns RAM access}. The rest of the cycle is used up by setup & hold times, bus drive (slew) times, timing uncertainties, and "margin". 2. Surface mount packages (having the *same number* of leads, 144) might be used instead of the current 144 pin Pin Grid Array. Their lower inductance and better controlled impedance can decrease dispersion and improve signal quality. Such packages, available today, are more than 2.4X better than the existing PGA package, so the net percentage of the cycle wasted in package-induced "timing slop" would decrease. 3. Output voltage drive levels might shrink from the present 0.0 volts and 5.0 volts, to (an example) 0.4V and 2.7V. This speeds up output transitions (dT = dV * C/I) without increasing switching noise. Less of the cycle (in percentage terms) would be spent slewing the bus around. 4. The clock generation and distribution technology may get a factor of > 2.4X more precise. If this happened, the fraction of the cycle lost to "slop" (timing edge uncertainty) would go down. 5. BiCMOS fab processes might be employed, permitting the use of open emitter, Wired-OR interfacing to ECL-compatible cache RAM chips. Current ECL RAMs are about 15-20 nsec, a smaller fraction of the current processor cycle (60 nsec) than current MOS RAM chips. So the timing margins *improve*. Additionally, multiple-driver "collisions" or "contention" are non-deadly in the wired-OR ECL structure, (unlike CMOS tristate busses), such that the time between disabling X and enabling Y onto the bus, can be reduced dramatically. And there's reason to believe that BiCMOS RAM access times will scale at about the same rate as traditional CMOS RAMs. Taken individually or as a group, these scenarios indicate (at least to me) that 40 MHz "double clocked cache interfaces" are indeed possible, and might in fact be as robust (or moreso) than existing implementations at 16.7 MHz. Regards, -Mark Johnson *** DISCLAIMER: The opinions above are personal. *** UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mark TEL: 408-720-1700 x208 US mail: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
oconnor@sunray.steinmetz (Dennis M. O'Connor) (12/18/87)
An article by hansen@mips.UUCP (Craig Hansen) says -] Quoting jesup@pawl22.pawl.rpi.edu (Randell E. Jesup) : -]] The real problem is the fact that your chip edge is clocked at twice your -]] instruction freqency. Running a higher-speed clock than the instruction -]] rate is fine, and makes internal design much easier. However, packaging -]] technology will be your limiting factor for some time to come, not really -]] ram speed per se. For the large number of pins required, it is hard to -]] find packages certified at that speed. -] -]For speeds well above 40 MHz in CMOS technology, our studies suggest that this -]will not be a limiting factor at all ... You missed the point. The relevent number isn't 40MHz, it's 40MIPS. To achieve that, you'll need 80 MHz. Mr. Jesup ( hi Randell ) has personal knowledge of a 40MHz 40MIPS CMOS microprocessor, as do I, that is real silicon, really working, right now. Those are of course raw machine MIPS, not equivalent anything MIPS. Check out the upcoming ISSCC conference for more details. -] ... But the other companies chip had double-frequency clock inputs, -] too, and when you compared the two chips at their specified clock -]rate, ours runs more than twice as fast (benchmark-wise), and theirs had double -]the input clock rate. Talk about double-speak! The chip Randal and I know of uses a two-phase 40MHz clock two achieve 40 MIPS. The smallest important time interval is 2ns. You'll need, I think, a four-phase 80MHz clock for 40 MIPS. I don't know what the smallest important time interval will be for your clocks. -] [ ... a bunch of non-sequitors deleted ... ] -] Because SRAMs are used as technology drivers for new CMOS and BiCMOS -] technologies, MIPS can be assured of a good supply of highly -] agressive SRAMs that will work with the MIPS part. Actually, GE's 1.25 micron AVLSI CMOS process is one of the best going (ask IBM about it) and never had anything to do with RAMs. Oh since you brought it up, the machine Randal and I know of uses 20ns CMOS RAMs to run at 40MIPS with no wait states. What will you need? -]The real problem with the other RISC designs ... -] [massive generalization deleted] What limited-omniscience you must have, to be able to tell all of us the ONE TRUE PROBLEM with "other" RISC designs. :-) Baloney. What do you know about MY RISC machine? Nothing. So pay attention at ISSCC and learn a few things. ALSO, An article by mark@mips.UUCP (Mark G. Johnson) says: -] Quoting from author jesup@pawl22.pawl.rpi.edu (Randell E. Jesup) -]] Given current technology, r2000 could probably be scaled -]] to about 20 MHz. However, custom RISC designs in CMOS are -]] now reaching 40 MHz, which would be impossible with the -]] double-clocked interface currently on the r2000. Perhaps -]] the interface could be removed, given enough pins, but -]] that gets you back into the packaging limits. -] -]"Impossible" is quite a strong word. "Difficult", sure. But he's -]saying that a 2.4X improvement of a first-chip-designed-at-a-startup- -]company, 2-micron-generic-silicon-foundry device is IMPOSSIBLE. -] -]A few things might change :-) :-) between now (16.7 MHz) and 40 MHz. No No NO, 40 _MIPS_, not MHz. 80MHz for you guys. And since it is already here, your changes better happen yesterday. Randel, you shoulda said 40MIPS, you know how easily confused people get :-) -] [ ... standard "learn from experience" stuff deleted ... ] -] -]Other factors conspire to make the job of building a 40 MHz -]double-clocked interface not "impossible": -] -] 1. Cache RAM access times will continue to decrease, likely -] at the same rate as the processor clock, since SRAM vendors -] now build RISC chips (including SPARC, R2000, Am29K). So -] RAM access time will probably stay at 40-50% of processor -] cycle time. {presently 60 ns cycle, 25-30 ns RAM access}. Anyone who thinks system access time for a given memory system architecture will decrease linearly with RAM access into the tens-of- nanoseconds range is dreaming. Go figure your peak amps per volt of swing when trying to charge 120pf in 3 nanoseconds. Run that through your bonding wires and smoke it :-). Also the idea that there is a linear relationship between how fast a particular fab technology can make RAMs and how fast the technology can make CPUs ignores a lot of important differences between the two. For instance, how much difference do you think having two levels of metal makes to a RAM, and how much to a CPU ? Think about it. -] 2. Surface mount packages (having [...] leads, 144) might be -] used instead of the [... PGA]. ... lower inductance ... -] controlled impedance ... improve signal quality ... -] more than 2.4X better than the existing PGA package ... -] "timing slop" would decrease. Package isn't that important. The dielectric constant of your substrate and the size of your input protection are probably going to limit you first. Besides, surface-mounts beyond 132 or so have a tendency to jump off the board when thermal-cycled. No fun. -] -] 3. Output voltage drive levels might shrink ... to (an -] example) 0.4V and 2.7V. ...speeds up output transitions... -] without increasing switching noise. Less of the cycle (in -] percentage terms) would be spent slewing the bus around. Yes, this is true. But the processor Randal and I know of ( but aren't really allowed to say much about ) uses good old 5V swings, like the fast CMOS RAMs it's hooked to and the Sun that drives talks to it :-) If IT goes to 3V, well, we'll have to see. Don't have crystal ball. Also, what abouth second order effects in the MOS transistors, and what about noise rejection ? Its just not as simple as lowering the supply. -] 4. The clock generation and distribution technology may -] get a factor of -] 2.4X more precise ... -] "slop" (timing edge uncertainty) would go down. "You cannae change the laws of physics, laws of physics, laws of physics, You cannae change the laws of physics, laws of physics, Captain !" from _Star Trekkin'_, by The Firm. " Our new supraluminal wave guide... " "Don't put your finger near the board, the stray capacitance will kill the clock system..." Look, using transmission lines instead of wires is one way to go and yes it would help. But until then your just doing that old RC game. In the imperfect noisy nonlinear real world. Sigh. -] 5. BiCMOS fab processes might be employed, permitting the -] use of open emitter, Wired-OR interfacing to ECL-compatible -] cache RAM chips. Current ECL RAMs are about 15-20 nsec, -] a smaller fraction of the current processor cycle (60 nsec) -] than current MOS RAM chips. So the timing margins -] *improve*. Additionally, multiple-driver "collisions" or -] "contention" are non-deadly in the wired-OR ECL structure, -] (unlike CMOS tristate busses), such that the time between -] disabling X and enabling Y onto the bus, can be reduced -] dramatically. And there's reason to believe that BiCMOS -] RAM access times will scale at about the same rate as -] traditional CMOS RAMs. Sure, but when YOU use 20ns RAM you get what, 20MIPS? And when I use 20ns RAM I get (am getting) 40MIPS. So who is better positioned to take advantage of new RAM technology? -]Taken individually or as a group, these scenarios indicate (at least -]to me) that 40 MHz "double clocked cache interfaces" are indeed -]possible, and might in fact be as robust (or moreso) than existing -]implementations at 16.7 MHz. -] -]Regards, -] -]-Mark Johnson *** DISCLAIMER: The opinions above are personal. *** Sure, and warp drive may be safer than skiing. Cutting edge NOW is 40MIPS at 40MHz-2-phase using 20ns RAMs. The future may belong to GaAs. Are you in the present or the past ? Gee, I wish I could tell you more about our chip. Maybe after ISSCC I'll be able to. Like, knock off your socks, fur sure. -- Dennis O'Connor oconnor@sungoddess.steinmetz.UUCP ?? ARPA: OCONNORDM@ge-crd.arpa "If I have an "s" in my name, am I a PHIL-OSS-IF-FER?"
jesup@pawl22.pawl.rpi.edu (Randell E. Jesup) (12/18/87)
In article <1145@mips.UUCP> mark@mips.UUCP (Mark G. Johnson) writes: >Quoting from author jesup@pawl22.pawl.rpi.edu (Randell E. Jesup) > > Given current technology, r2000 could probably be scaled > > to about 20 MHz. However, custom RISC designs in CMOS are > > now reaching 40 MHz, which would be impossible with the > > double-clocked interface currently on the r2000. Perhaps > > the interface could be removed, given enough pins, but > > that gets you back into the packaging limits. > >"Impossible" is quite a strong word. "Difficult", sure. But he's >saying that a 2.4X improvement of a first-chip-designed-at-a-startup- >company, 2-micron-generic-silicon-foundry device is IMPOSSIBLE. Oops. You're right, I shouldn't have said 'impossible'. Take that as rephrased to 'very tough'. >Other factors conspire to make the job of building a 40 MHz >double-clocked interface not "impossible": > > 1. Cache RAM access times will continue to decrease, likely > at the same rate as the processor clock, since SRAM vendors > now build RISC chips (including SPARC, R2000, Am29K). So > RAM access time will probably stay at 40-50% of processor > cycle time. {presently 60 ns cycle, 25-30 ns RAM access}. > The rest of the cycle is used up by setup & hold times, > bus drive (slew) times, timing uncertainties, and "margin". If you're willing to pay for it, you can get 20ns cycle SRAMs in reasonable sizes now. > 2. Surface mount packages (having the *same number* of > leads, 144) might be used instead of the current 144 pin > Pin Grid Array. Their lower inductance and better > controlled impedance can decrease dispersion and improve > signal quality. Such packages, available today, are > more than 2.4X better than the existing PGA package, so > the net percentage of the cycle wasted in package-induced > "timing slop" would decrease. You can get leadless chip carriers now with 144 'pins' that are certified to run at 40Mhz. However, to 'double-clock' I assume they would have to be certified at 80Mhz, which they are not. PGAs at that speed are right out (though perhaps not impossible :-) > 3. Output voltage drive levels might shrink from the present > 0.0 volts and 5.0 volts, to (an example) 0.4V and 2.7V. > This speeds up output transitions (dT = dV * C/I) without > increasing switching noise. Less of the cycle (in > percentage terms) would be spent slewing the bus around. This may well be they way the world goes. However, at least for the next few years, I think we're stuck with 0-5V. > 4. The clock generation and distribution technology may > get a factor of > 2.4X more precise. If this happened, > the fraction of the cycle lost to "slop" (timing edge > uncertainty) would go down. Definitely helps, but you can get some VERY accurate clocks if you put some real thought into it and limit the loading. Things like 2 40Mhz non-overlapping clocks, for example. > 5. BiCMOS fab processes might be employed, permitting the > use of open emitter, Wired-OR interfacing to ECL-compatible > cache RAM chips. Current ECL RAMs are about 15-20 nsec, > a smaller fraction of the current processor cycle (60 nsec) > than current MOS RAM chips. So the timing margins > *improve*. Additionally, multiple-driver "collisions" or > "contention" are non-deadly in the wired-OR ECL structure, > (unlike CMOS tristate busses), such that the time between > disabling X and enabling Y onto the bus, can be reduced > dramatically. And there's reason to believe that BiCMOS > RAM access times will scale at about the same rate as > traditional CMOS RAMs. I'm not a silicon person, so I'll take your word. However, as I said earlier, 20-25ns cycle time SRAMs exist and can be bought now. In fact, all 40Mhz RISC chip designs being worked on that I know about will need them. Does the BiCMOS require that ALL chips be BiCMOS? (sounds like it would.) >Taken individually or as a group, these scenarios indicate (at least >to me) that 40 MHz "double clocked cache interfaces" are indeed >possible, and might in fact be as robust (or moreso) than existing >implementations at 16.7 MHz. I think your real limiting factor, even with some of those you stated above, will be packaging technology. You can build a 40Mhz RISC chip with single-clocked interfaces with current tech. With better tech, you could build better ones, more or less as fast as the interface lets you go. It seems to be the limiting factor at the bleeding edge right now. The 16Mhz double-clocked RISC chips are now putting about the same load on the technology as a 30-32Mhz single-clocked chip would. For double-clocking to be a win, you have to have packaging tech that is at least twice as fast as you can make your CPU cycle times go. With 16Mhz now that's easy, it is not possible (caveat:-) with 40Mhz now. (Note that my earlier article was about current state-of-the-art, not what can happen in the next n years. There are 40Mhz RISC chips out there now, using 1.25u. My comment on scaling was that the r2000 chip couldn't do 40Mhz given CURRENT vlsi, packaging, and ram technologies (once again, caveat:-)) If you believe that packaging tech will improve sufficiently faster than CPU cycle times, maybe it will continue to be reasonable. However, it seems that packaging is NOT keeping up right now. One last caveat: I know there are new packaging technologies under development that may (more like will) turn the world upside down. However, my earlier comment was about NOW, not 1-3 years from now. >Regards, > >-Mark Johnson *** DISCLAIMER: The opinions above are personal. *** >UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mark TEL: 408-720-1700 x208 >US mail: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086 // Randell Jesup Lunge Software Development // Dedicated Amiga Programmer 13 Frear Ave, Troy, NY 12180 \\// lunge!jesup@beowulf.UUCP (518) 272-2942 \/ (uunet!steinmetz!beowulf!lunge!jesup)
mash@mips.UUCP (John Mashey) (12/20/87)
The following is a general discussion prompted by Dennis O'Conner's posting, followed by a few detailed comments interspersed with extracts from the posting itself. Dennis alluded many times to a GE 40-native-mips CPU about which he can't say much until after ISSCC. [BTW: I've occasionally confused people by using "we", making them think I speak for MIPS Computer Systems, which I do NOT. I sometimes use the editorial "we" out of personal habit, and sometimes I say "we", in that my comments and opinions may arise from discussions with other people. In this case, mark johnson and craig hansen had useful comments that I've incorporated, but I'm responsible for any errors.] GENERAL 1) Fortunately, it's not long until ISSCC, which will help, since it's hard to have any reasonable discussion without data, and we understand the ISSCC rules that prevent prior publication, so I won't push for details that would cause problems. 2) The statements marked 1 & 2 below (on mips) could use some clarification. a) Remember that we consistently use 1 mips = vax11/780-with-decent compilers-running-variety-of-real-programs-not-dhrystones-or-toy- benchmarks type mips, i.e., something people can actually measure and evaluate. b) 40MHZ is not apriori 40mips (of the kind we described above); statement 1 (way below: "raw machine MIPS") makes it clear that Dennis understands this, but some of the other statements seem to MIX apples and oranges. Note that even getting 40 native-mips from 40MHZ implies certain things about external cache/memory latencies, miss penalties, branch-handling, MMU-interference, etc. I do assume 40MIPS is for something other than a tight-loop of adds or nops in an on-chip cache. (a 25MHZ 68020 is a 12.5MIPS thing by that rule). c)This business has more than once seen spectacular claims,based on peak native-mips-ratings, that had little to do with the actual performance on real benchmarks. Until we see the ISSCC paper, it's hard to guess what actual performance might be. I certainly believe that a machine should be able to do it's clock rates in NOPS or ADDs; doing loads, stores, and branches is harder, especially for real-sized programs that actually miss in caches now and then. 3) After ISSCC, if the paper itself doesn't reveal such information, perhaps you can post some real benchmark numbers. We'd be glad to send you a copy of the MIPS benchmark tape for a nominal cost (not me: mail to ....!mips!mannos), and we'll be glad to compare notes. 4) Can you say more on what you're asserting the actual performance is? If not now, at least after ISSCC? For example, do you think that, in a real system, that people can build, is it: a) 40X faster than 11/780 (4X MIPS M/1000) b) 30X "" ...(3X) c) 2X "" ...(2X) Also, you haven't mentioned floating point. Can you at least say if the ISSCC paper will discuss it? 5) There are several ways of convincing somebody that a computer can achieve a given performance: REALITY: Here's the machine. Benchmark it and see. HINT: For a future machine, here are some hints about the ways in which it might be done. DESIGN: Here is what the future design looks like, and here are the innovations and sneaky designs we use to make it work. REALITY is always preferable: existence is a virtue: if I see a system remake the UNIX kernel, boot it, and then compile/run Spice, I believe it might even be a Real Machine, subject to any evidence to the contrary. This is hard to do with future designs, unless someone has really great simulators and can convince me of that. We often tell people under nondisclosure a lot of HINT, and even some DESIGN. Sometimes HINT alone can sound like hand-waving, which we dislike, but DESIGN inevitably discloses details that are highly proprietary, and this is something we can't do in a forum like comp.arch. Anway, we look forward to seeing the GE ISSCC paper, and even more, to some live benchmark numbers to give appropriate perspective to the 40-mips characterization. SPECIFIC NOTES ON POSTING -------------------------- In article <8252@steinmetz.steinmetz.UUCP> sunray!oconnor@steinmetz.UUCP writes: >An article by hansen@mips.UUCP (Craig Hansen) says .... >-]For speeds well above 40 MHz in CMOS technology,our studies suggest that this >-]will not be a limiting factor at all ... >You missed the point. The relevent number isn't 40MHz, it's 40MIPS. To >achieve that, you'll need 80 MHz. Mr. Jesup ( hi Randell ) has >personal knowledge of a 40MHz 40MIPS CMOS microprocessor, as do I, >that is real silicon, really working, right now. Those are of course >raw machine MIPS, not equivalent anything MIPS. Check out the upcoming -------------^^^^ statement 1: >ISSCC conference for more details. .....bunch of discussion on clock-rates.... >Actually, GE's 1.25 micron AVLSI CMOS process is one of the best going >(ask IBM about it) and never had anything to do with RAMs. Oh since >you brought it up, the machine Randal and I know of uses 20ns CMOS RAMs >to run at 40MIPS with no wait states. What will you need? >-]The real problem with the other RISC designs ... >-] [massive generalization deleted] >What limited-omniscience you must have, to be able to tell all of us >the ONE TRUE PROBLEM with "other" RISC designs. :-) Baloney. >What do you know about MY RISC machine? Nothing. So pay >attention at ISSCC and learn a few things. Just before "The real problem..", craig had referred to 2 specific other RISC designs, and was discussing cache issues, by context. He never stated that this was the ONE TRUE PROBLEM of ALL other risc designs. >No No NO, 40 _MIPS_, not MHz. 80MHz for you guys. And since it is >already here, your changes better happen yesterday. Randel, you >shoulda said 40MIPS, you know how easily confused people get :-) ....much EE stuff deleted... >Sure, but when YOU use 20ns RAM you get what, 20MIPS? And when I >use 20ns RAM I get (am getting) 40MIPS. So who is better positioned ---------------------------------^^^^^^ statement 2 >to take advantage of new RAM technology?..... >Sure, and warp drive may be safer than skiing. Cutting edge NOW is >40MIPS at 40MHz-2-phase using 20ns RAMs. The future may belong to >GaAs. Are you in the present or the past ? It's hard to tell when we are. We think we're going along the edge of of what's practical for us to build, sell, and quickly improve for the commercial marketplace, and we think what we do will scale well in the better technologies that we're now getting access to. It's quite possible that the chip you speak of will indeed blow everyone's socks off, and that we are indeed living in the past. On the other hand, if you haven't done exceedingly careful performance modeling of this in realizable system environments, you might get surprised, as have numerous other people. Perhaps you could describe your performance simulation environment, and what kinds of programs are simulated to predict performance? >Gee, I wish I could tell you more about our chip. Maybe after >ISSCC I'll be able to. Like, knock off your socks, fur sure. Many people here will look forward to the ISSCC paper with strong interest, and even more to seeing some good benchmarks. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
oconnor@sunray.steinmetz (Dennis M. O'Connor) (12/22/87)
An (excellent) article by mash@winchester.UUCP (John Mashey) says : -]GENERAL -]1) Fortunately, it's not long until ISSCC ... and we understand -]the ISSCC rules that prevent prior publication ... Thanks. As one of the architects of this design, I'm just itchin' to discuss it, so I'm frustrated by the rules but understand why. I'm glad you understand too. -]2) The statements marked 1 & 2 below (on mips) could use some clarification. -] a) Remember that we consistently use 1 mips = vax11/780-with-decent -] compilers-running-variety-of-real-programs-not-dhrystones-or-toy- -] benchmarks type mips, i.e., something people can actually measure -] and evaluate. We've only got some computed VAX equivalence numbers, so I wouldn't want to tell them. -] b) 40MHZ is not apriori 40mips (of the kind we described above); -] statement 1 (way below: "raw machine MIPS") makes it clear that Dennis -] understands this, but some of the other statements seem to MIX -] apples and oranges. Note that even getting 40 native-mips from -] 40MHZ implies certain things about external cache/memory latencies, -] miss penalties, branch-handling, MMU-interference, etc. I do assume -] 40MIPS is for something other than a tight-loop of adds or nops in -] an on-chip cache. (a 25MHZ 68020 is a 12.5MIPS thing by that rule). The chip always runs 40MIPS all the time. We do have interlocks that can sometiumes require NOPs ( one O or two ?? a new debate :-), and we get cache misses very occasionally as well, so our performance is, well, usually less than 40MIPS certainly. Which is indeed why I've only spoken of raw machine MIPs. "But they're HONEST raw machine MIPs!" :-) -] c)This business has more than once seen spectacular claims,based on peak -] native-mips-ratings ... Until we see the ISSCC paper, it's hard to -] guess what actual performance might be. I certainly believe that -] a machine should be able to do it's clock rates in NOPS or ADDs; -] doing loads, stores, and branches is harder, especially for real-sized -] programs that actually miss in caches now and then. The RPM40 chip does its clock rate on loads and stores. There are no cache misses on loads and stores. Branches can cause cache misses, but barring misses we do full clock on them too. I really wish I coulkd talk about cache design, ooh its so good! But fortuneately ISSCC is not so far away. -] -]3) After ISSCC, if the paper itself doesn't reveal such information, -]perhaps you can post some real benchmark numbers. I'll post everything I can. But as I will explain later, it's probably comparing Red Delicious to Macintosh ( the apples ) to compare RPM40 to DEC, Sun, MIPS or Motorola. Different target environment. -]Also, you haven't mentioned floating point. Can you at least say if the -]ISSCC paper will discuss it? This years ISSCC does not ( I believe ) discuss the FPU. That may have to wait till NEXT years ISSCC, I fear. Even tho we've silicon of it Now. -]5) There are several ways of convincing somebody that a computer can -]achieve a given performance: -] REALITY: Here's the machine. Benchmark it and see. -] HINT: For a future machine, here are some hints about the ways in -] which it might be done. -] DESIGN: Here is what the future design looks like, and here are the -] innovations and sneaky designs we use to make it work. -] -]REALITY is always preferable: existence is a virtue: -]if I see a system remake the UNIX kernel, boot it, and then compile/run -]Spice, I believe it might even be a Real Machine, subject to any evidence -]to the contrary. This is hard to do with future designs ... -] -]... HINT alone can sound like hand-waving ... DESIGN inevitably -] discloses details ... highly proprietary ... can't do in ... comp.arch. After ISSCC I hope I can talk design : this was a non-classified DARPA project, and GE is NOT in the computer business. Maybe I'll be allowed to publish. I hope so : I think we did some great work ! -]... we look forward to ... GE ISSCC paper ... live benchmark numbers Our live benchmarks, to be applicable to our design environment, would be different from yours. Unless you've got an ATF ( Advanced Tactical Fighter) aerodymanic control surfaces controller benchmark :-). Well, now for some notes : First, I hope people caught that my article was intended to be light-hearted. I've got no bones to pick or axes to grind, I'm just EXTREMELY happy to have some of my ideas in working silicon. Feels real good. And yes I'm proud of my work. But I can't really disclose it yet, which is frustrating. The chip has IMHO some nice new things in it, which IMHO will be taken up with enthusiam when they go public ( the ideas, not the chip ). But realize : this is a MILITARY chip. Not commercial. $/MIP is not a real factor in it's design. MIPS/Watt, MIPS/sq-cm, MIPS/package where bigger drivers. Rad-Hard was a factor as well. So it may never make a UNIX kernal, or run Spice. Or be for sale without a satelites :-) So why bring it up ? Beacuse it runs at 40MHz/MIPS. Since we've actually built a 40MHz/MIPS chip, well, as John Mashey says, reality is better than design. We have dealt with some of the nasty problems CMOS encounters driving even 30pf at 40MHz. Or trying to turn a bus around ( from send to receive ) in 3ns. And it is tough. No question. Hi-speed CPUs was the original subject, I think ? Thank you, John Mashey, for an excellent reply that pointed out where my posting was unclear. I hope I've clarifed things some with this. -- Dennis O'Connor oconnor@sungoddess.steinmetz.UUCP ?? ARPA: OCONNORDM@ge-crd.arpa "If I have an "s" in my name, am I a PHIL-OSS-IF-FER?"
ward@cfa.harvard.EDU (Steve Ward) (12/24/87)
> > The RPM40 chip..... > > > After ISSCC I hope I can talk design : this was a non-classified > DARPA project, and GE is NOT in the computer business. Maybe I'll > be allowed to publish. I hope so : I think we did some great work ! > > -- > Dennis O'Connor oconnor@sungoddess.steinmetz.UUCP ?? > ARPA: OCONNORDM@ge-crd.arpa > "If I have an "s" in my name, am I a PHIL-OSS-IF-FER?" I believe that all non-classified federal research must be made available to U.S. Citizens. The Freedom of Information Act can be used to obtain such information if it is not otherwise forthcoming. DON'T JUMP TO CONCLUSIONS! Clearly the researcher(s) have rights to privacy and confidentiality during the research, so nobody can or should be able to force premature disclosures. However, a research contract with DARPA will call for research results in the form of reports, findings, papers, or whatever to be reported to DARPA according to the contract timetable, which might even be only upon completion/termination of the research grant/contract. All such non-classified documents and research materials may be requested by anyone. If the research is classified then only DOD contractors with the need to know and industrial and foreign spies :-) will gain access. Of course, one can always appeal the DOD classification, as well. All of this is stated here to pose a question: Dennis O'Connor seemed to have doubt as to what he could make public in the long term about his work. It seems to me that if his work is sponsored by a non-classified DARPA grant or research contract, that his work MUST be made public as it is reported to DARPA, meaning that at the very least, the documents and other information he gives to DARPA should be publically available in roughly the timeframe in which DARPA receives such information. Again, preliminary information and general communications with DARPA are not what I am talking about, but only final results and findings. Now for my parachute: We do a lot of federally-sponsored research here, but almost always with NASA. Our work is also of a non-classified nature. Our scientists live to publish, so the only secrecy is prior to publishing, then all the beans are publically spilled. There certainly may be more to this situation than meets my eye. I am not an expert on federally-sponsored research legalities. Hopefully when the time is right, all technical information regarding the 40MIPS/MHZ beastie will be revealed.
oconnor@sungoddess.steinmetz (Dennis M. O'Connor) (12/29/87)
I didn't say enough about why I might not be able to publish. But it's NOT because of the Goverment. There are indeed reports you can probably get from the Gov. Printing Office about our work. And the goverment is NOT stopping publication. But I work for GE, GE has the rights to the RPM40 work, and if GE doesn't want it published, that's GE's right. Just like it is/was IBM's right to not publish LOTS of it's research, including stuff IBM was never going to use. I have no problems with this. Sure, I'd like to publish. But if GE says no, I won't. I'll just move on to othr interesting GE work. No problem. So what if GE doesn't let me do some things I'd like : I still think working at GE CR&D is lotsa fun. -- Dennis O'Connor oconnor@sungoddess.steinmetz.UUCP ?? ARPA: OCONNORDM@ge-crd.arpa "If I have an "s" in my name, am I a PHIL-OSS-IF-FER?"
Steve_D_Wilson@cup.portal.com (01/11/88)
Just a quick comment on your statements concerning ECL. There are two points that should be made. 1) Current ECL rams aren't at 15-20 ns as stated, there down at 7 to 8 ns and with the advent of self-timed write curcuitry, will approach 3 to 5 ns cycle times. 2) When you WIRE-OR ECL drivers you pay a price in performance. There is a timing penalty that must be payed when you have more than one driver on the line, and it isn't small. At the last company I worked at the expected hit for using a WIRE-OR was greater than 2 ns WITHOUT adding in the delay due to driver separation reflections and such.
jesup@pawl19.pawl.rpi.edu (Randell E. Jesup) (01/12/88)
In article <2367@cup.portal.com> Steve_D_Wilson@cup.portal.com writes: >1) Current ECL rams aren't at 15-20 ns as stated, there down > at 7 to 8 ns and with the advent of self-timed write > curcuitry, will approach 3 to 5 ns cycle times. Sorry if you misunderstood, 15-20ns was for static CMOS rams, not for ECL. I hope they can get that fast, as we will see GAAS chips of >100 MHz, maybe even 200 MHz, eventually. Of course, they have to get gaas to yield with lots of gates, first. // Randell Jesup Lunge Software Development // Dedicated Amiga Programmer 13 Frear Ave, Troy, NY 12180 \\// beowulf!lunge!jesup@steinmetz.UUCP (518) 272-2942 \/ (uunet!steinmetz!beowulf!lunge!jesup) BIX: rjesup