schow@bcarh61.bnr.ca (Stanley T.H. Chow) (11/13/89)
It seems to me the MIPS 55 MIPS (@ 60 MHz?) ECL system (chip set?) is the "classical" approach for RISC designs to get higher through- put. They do it by upping the clock-rate. Intel has gone the SuperScalar route. Their i960CA is said to be 66 MIPS @ 33 MHz. They have put the cleverness into multiple execution units. Here is the $64,000 question: Which part is easier to integrate into a real system? Please note that we have concrete real examples here. Theoratical discussion is nice, but real data-points are more interesting. Other interesting question: Which system has a larger "domain" over which it actually achives quoted figures? What other systems/chips/... are claiming over 50 MIPS? How do these systems compare in terms of cost (design and per unit)? Stanley Chow BitNet: schow@BNR.CA BNR UUCP: ..!psuvax1!BNR.CA.bitnet!schow (613) 763-2831 ..!utgpu!bnr-vpa!bnr-rsc!schow%bcarh61 Me? Represent other people? Don't make them laugh so hard.
hawkes@mips.COM (John Hawkes) (11/14/89)
In article <1358@bnr-rsc.UUCP> schow%BNR.CA.bitnet@relay.cs.net (Stanley T.H. Chow) writes: > >It seems to me the MIPS 55 MIPS (@ 60 MHz?) ECL system (chip set?) >is the "classical" approach for RISC designs to get higher through- >put. They do it by upping the clock-rate. > >Intel has gone the SuperScalar route. Their i960CA is said to be >66 MIPS @ 33 MHz. They have put the cleverness into multiple >execution units. Once again, let's not confuse apples and oranges. Using the MIPS performance benchmark suite, the MIPS R6000-based *system* achieves 55 Vax-MIPS at 67-MHz. Since it's not a superscalar design, the system executes 67 million *instructions* at 67-MHz. The ECL chipset is not the limiting factor at this clock rate. The i960 *chip* executes a theoretical max of 66 million *instructions* at 33-MHz -- two per cycle. I haven't heard Intel make any claims about how fast a Unix *system* would execute real applications. The Atlantic Research Corporation, an independent group, has done some comparisons between the MIPS R3000 (25-MHz) and a 20-MHz 80960 executing Ada programs (the "Common Avionics Processor Ada Benchmark Suite"), and they discovered that the R3000 was usually more than twice as fast on hand-coded programs, and overall was more than five times faster on compiled programs. >Here is the $64,000 question: > > Which part is easier to integrate into a real system? What kind of "real system"? The R6000 is designed to be the heart of large, general-purpose compute and/or file system server. I don't think the same is true of the i960. -- John Hawkes {ames,decwrl}!mips!hawkes OR hawkes@mips.com
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (11/15/89)
In article <1358@bnr-rsc.UUCP>, schow@bcarh61.bnr.ca (Stanley T.H. Chow) writes: | Other interesting question: | | Which system has a larger "domain" over which it actually | achives quoted figures? | | What other systems/chips/... are claiming over 50 MIPS? | | How do these systems compare in terms of cost (design and per unit)? Question one is the kicker. I don't care (as a user/buyer) how many mips a CPU can perform, just how fast my stuff runs. For some programs which don't overlap f.p. with other CPU, the Intel will not deliver full potential. For other which do, particularly if the non-f.p. ops are the kind which seem to require more than one RISC op to perform but might be a single op in CISC, I would expect the Intel to look very good. Actually this gets beyond RISC/CISC discussion back to the "fast serial vs. parallel" track, since the Intel gets the rating by putting execution units in parallel. This implies that there are big losses of performance if the compiler doesn't keep the mix right, etc. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
scarter@gryphon.COM (Scott Carter) (11/16/89)
In article <31329@winchester.mips.COM> hawkes@mips.COM (John Hawkes) writes: >In article <1358@bnr-rsc.UUCP> schow%BNR.CA.bitnet@relay.cs.net (Stanley T.H. Chow) writes: >> >>It seems to me the MIPS 55 MIPS (@ 60 MHz?) ECL system (chip set?) >>is the "classical" approach for RISC designs to get higher through- >>put. They do it by upping the clock-rate. >> >>Intel has gone the SuperScalar route. Their i960CA is said to be >>66 MIPS @ 33 MHz. They have put the cleverness into multiple >>execution units. > >Once again, let's not confuse apples and oranges. Using the MIPS performance >benchmark suite, the MIPS R6000-based *system* achieves 55 Vax-MIPS at 67-MHz. >Since it's not a superscalar design, the system executes 67 million >*instructions* at 67-MHz. The ECL chipset is not the limiting factor at this >clock rate. > >The i960 *chip* executes a theoretical max of 66 million *instructions* at >33-MHz -- two per cycle. I haven't heard Intel make any claims about how fast >a Unix *system* would execute real applications. Note that the above statement applies to the i960_CA_, whereas the quote below applies to the i960[KA,KB,MC,XA]. Also, note that at 67 MHz the R6000 can in theory be executing two integer instructions (it still has the asynch mult/div unit, no?) as well as I would guess two FP instructions. However, it can only ISSUE one instruction per cycle. The 960 CA can issue three instructions per cycle to the chosen three of four execute units. I believe Intel has figures showing that on the average they could infact issue two instructions per clock _average_ [over what program set?], hence the 960CA can legitimately be called 66 Native MIPS average with 99 Native MIPS peak. How this will work out in "reality" who knows? I'm looking forward to Specmarks for a 960CA Real System! >The Atlantic Research Corporation, an independent group, has done some >comparisons between the MIPS R3000 (25-MHz) and a 20-MHz 80960 executing Ada >programs (the "Common Avionics Processor Ada Benchmark Suite"), and they >discovered that the R3000 was usually more than twice as fast on hand-coded >programs, and overall was more than five times faster on compiled programs. > This comparison was to the 960_XA_, which was crippled by the register port design needed to get the windows on the chip. Steve McGeady posted here a while ago on why Intel made the choices they did - the above comparison says essentially nothing about how a 960CA, with relatively few register file / bypass conflicts, would fare. The JIAWG benchmarks are pretty silly anyway. >John Hawkes >{ames,decwrl}!mips!hawkes OR hawkes@mips.com
preston@titan.rice.edu (Preston Briggs) (11/17/89)
In article <22303@gryphon.COM> scarter@gryphon.COM (Scott Carter) writes: >ISSUE one instruction per cycle. The 960 CA can issue three instructions per >cycle to the chosen three of four execute units. I believe Intel has figures >showing that on the average they could infact issue two instructions per clock >_average_ [over what program set?], hence the 960CA can legitimately be called >66 Native MIPS average with 99 Native MIPS peak. I think that's too optimistic. We've played some with an i860 on an evaluation board. The supplied compilers didn't attempt to issue more than 1 instruction/cycle (out of a max of three). On a simple matrix multiply (single precision fp), multiplying 2 100x100 matrices took .52 seconds (3.8 MFlops) multiplying 2 400x400 matrices took 86 seconds (1.5 MFlops) versus a peak of 66 MFlops. The poor performance on the larger size shows the effect of the small on-chip data cache. Using the VAST front-end, with hand coded vector primitives gives about 8.5 MFlops. Reworking by hand, being especially careful of the cache, gives about 26.5 MFlops, for either size. (This can be improved, but I think only slightly). This is fairly hot, though still not 66 MFlops. The challenge is getting compilers to take advantage of tiny caches and long pipelines and multi-instruction issue, as discovered below >>discovered that the R3000 was usually more than twice as fast on hand-coded >>programs, and overall was more than five times faster on compiled programs. Sounds like the MIPS compilers are more mature. Certainly it's an easier target. Preston Briggs
tim@electron.amd.com (Tim Olson) (11/20/89)
In article <22303@gryphon.COM> scarter@gryphon.COM (Scott Carter) writes: | The 960 CA can issue three instructions per | cycle to the chosen three of four execute units. I believe Intel has figures | showing that on the average they could infact issue two instructions per clock | _average_ [over what program set?], hence the 960CA can legitimately be called | 66 Native MIPS average with 99 Native MIPS peak. The i960CA decoder can dispatch up to 3 instructions per cycle. However, the decoder looks at 4 instructions at a time, and it appears that the decoder cannot be loaded with the next set of 4 instructions until the current set of instructions have all been dispatched. Therefore, the "99 Native MIPS peak" can only be attained for one clock cycle; the left-over instruction in the decoder would be dispatched by itself in the next clock cycle. In reality, it is "66 Native MIPS peak". | How this will work out in | "reality" who knows? I'm looking forward to Specmarks for a 960CA Real | System! Since the 960CA is targeted for embedded control applications, and has no MMU nor floating-point, I don't think you will ever see Specmarks for it. However, Intel released performance numbers for it at the i960CA announcement. The numbers were for a 33 MHz i960CA running with 64KB of 15ns SRAM and 1MB of 4-cycle inital access, 3-cycle subsequent access DRAM. The SRAM was used for instruction memory and the DRAM was used for data memory. The benchmarks run were Dhrystone 1.1, Buffer Copy, "Travelling Salesman" solution by simulated annealing, Pi (compute pi to 500 places), quicksort, bubblesort, integer matrix multiply, CCITT image compression, and Bezier curve calculation. Intel compared its i960CA board running this benchmark suite with a 68030 (20MHz), an i960KA(20MHz), and an Am29000(16MHz) board. However, the board they used to benchmark the Am29000 was not designed for performance; rather, it was designed to test the functionality of ADAPT (Advanced Development and Prototyping Tool) hardware debuggers. To provide a more fair comparison, I requested the benchmark sources from Intel, to run on a 30MHz Am29000 board (manufactured by YARC Systems). This board uses 2-way interleaved, 100ns DRAM memory for instructions and 35ns SRAM for data. I received sources for the non-proprietary benchmarks, compiled them with the current version of the MetaWare HighC29k compiler, and ran them on the YARC card. Here are the final results: Absolute Performance benchmark 68030 960KA Am29000 960CA 20MHz 20MHz 30MHz 33MHz quicksort (ms) 286 135 51 50 bubblesort (ms) 291 180 65 85 pi-500 (ms) 6999 3510 1398 1624 anneal (ms) 37210 20910 8119 8388 matmult (us) 186552 74873 49135 26898 dhrystone 1.1 5484 14196 44876 41600 Performance Relative to 68030 Board benchmark 68030 960KA Am29000 960CA 20MHz 20MHz 30MHz 33MHz quicksort 1.00 2.12 5.61 5.72 bubblesort 1.00 1.62 4.48 3.42 pi-500 1.00 1.99 5.01 4.31 anneal 1.00 1.78 4.58 4.44 matmult 1.00 2.49 3.80 6.94 dhrystone 1.1 1.00 2.59 8.18 7.59 ------------------------------------------------------- geom mean 1.00 2.07 5.11 5.20 Performance Normalized to 20MHz, Relative to 68030 Board benchmark 68030 960KA Am29000 960CA quicksort 1.00 2.12 3.74 3.47 bubblesort 1.00 1.62 2.98 2.07 pi-500 1.00 1.99 3.34 2.61 anneal 1.00 1.78 3.06 2.69 matmult 1.00 2.49 2.53 4.20 dhrystone 1.1 1.00 2.59 5.46 4.60 ------------------------------------------------------- geom mean 1.00 2.07 3.41 3.15 Thus, it would appear that the "66 Native MIPS", 33MHz i960CA is about the same performance as the 20 Native MIPS (18 VAX-equivalent MIPS), 30MHz Am29000. -- Tim Olson Advanced Micro Devices (tim@amd.com)
scarter@gryphon.COM (Scott Carter) (11/21/89)
In article <3024@brazos.Rice.edu> preston@titan.rice.edu (Preston Briggs) writes: >In article <22303@gryphon.COM> scarter@gryphon.COM (Scott Carter) writes: > >>ISSUE one instruction per cycle. The 960 CA can issue three instructions per >>cycle to the chosen three of four execute units. I believe Intel has figures >>showing that on the average they could infact issue two instructions per clock >>_average_ [over what program set?], hence the 960CA can legitimately be called >>66 Native MIPS average with 99 Native MIPS peak. > >I think that's too optimistic. >We've played some with an i860 on an evaluation board. >The supplied compilers didn't attempt to issue more than >1 instruction/cycle (out of a max of three). > >On a simple matrix multiply (single precision fp), > > multiplying 2 100x100 matrices took .52 seconds (3.8 MFlops) > multiplying 2 400x400 matrices took 86 seconds (1.5 MFlops) > >versus a peak of 66 MFlops. The poor performance on the larger >size shows the effect of the small on-chip data cache. > >Using the VAST front-end, with hand coded vector primitives >gives about 8.5 MFlops. > >Reworking by hand, being especially careful of the cache, >gives about 26.5 MFlops, for either size. >(This can be improved, but I think only slightly). >This is fairly hot, though still not 66 MFlops. > >The challenge is getting compilers to take advantage of >tiny caches and long pipelines and multi-instruction issue, >as discovered below > >>>discovered that the R3000 was usually more than twice as fast on hand-coded >>>programs, and overall was more than five times faster on compiled programs. > >Sounds like the MIPS compilers are more mature. Certainly it's an >easier target. > >Preston Briggs [How did I wind up doing writing something which could be interpreted as defending the 960?] 1) Thanks for the _Data_ on the 860. It's on the order of what I would have guessed - nice to have it confirmed by someone with actual knowledge :). 2) I'm not sure that any meaningful extrapolation can be made from the 860 to the 960CA, given that their instruction parallelism mechanisms are utterly different. Comparison to something like the Super Titan (on integer codes) would be rather more appropriate. 3) Agreed that comparisons on Real Programs (tm) [or at least Real Becnhmarks (tm?)] is the only thing to go from. I merely pointed out that for Intel to claim 66 Native Mips is not a priori any more illegitimate than most other vendors native MIPS claims. Kudos to Mips for trying to not mention anything other than Real Program numbers. 4) I would disagree about the Mips _Ada_ compiler being better than the Intel/Biin 960 Ada compiler (agree wholeheartedly on C/Pascal/FORTRAN). We found that the performance ratio between the R3000 and the 960XA was much wider on [somewhat larger than JIAWG] our own benchmarks in C, Pascal, and FORTRAN than in Ada, either JIAWG or some other internal benchmarks. 5) Based on the code generated for the 960XA for the JIAWG benchmarks, I have to say I can't believe in two instructions per clock for the 960CA on this set (this is a GUESS only - any data I might have cannot be posted), but I do think the 960 CA might well do twice as many useful instructions per clock ON THIS BENCHMARK SET as an R3000, given what their Ada compilers generated. Your mileage will undoubtedly vary. 6) If we need to express our religious loyalty, mine is with the R3000. Scott Carter
mcg@mipon2.intel.com (Steven McGeady) (11/28/89)
In article <1358@bnr-rsc.UUCP>, schow@bcarh61.bnr.ca (Stanley T.H. Chow) writes: > > Intel has gone the SuperScalar route. Their i960CA is said to be > 66 MIPS @ 33 MHz. They have put the cleverness into multiple > execution units. > > Here is the $64,000 question: > > Which part is easier to integrate into a real system? > > Please note that we have concrete real examples here. Theoratical > discussion is nice, but real data-points are more interesting. Here is a "real data-point". Heurikon Corp. (Madison,WI) is now selling 960CA boards with on-board SCSI, Ethernet (82596), 4Mb DRAM (near-zero wait-state, i.e. 1-0-0-0 read, 0 ws write), multiple serial lines, VME bus interface, VSB bus, and more for $2995 in quantity 100. All this fits on a standard (small, not Sun-sized) VME board. > Other interesting question: > > Which system has a larger "domain" over which it actually > achives quoted figures? The 960CA is an embedded controller. It contains 4-channel DMA, dynamic, per-region bus sizing, sophisticated interrupt control, etc. I would suspect that it would perform admirably in most embedded applications. The MIPS R6000 is a *system*. It runs UNIX very well, apparently. The 960CA does not now and will never run UNIX, as it lacks a memory management unit. S. McGeady Intel Corp.
mcg@ishark.Berkeley.EDU (Steven McGeady) (11/28/89)
In article <31329@winchester.mips.COM>, hawkes@mips.COM (John Hawkes) writes: > The Atlantic Research Corporation, an independent group, has done some > comparisons between the MIPS R3000 (25-MHz) and a 20-MHz 80960 executing >Ada > programs (the "Common Avionics Processor Ada Benchmark Suite"), and they > discovered that the R3000 was usually more than twice as fast on hand- > coded programs, and overall was more than five times faster on compiled > > programs. The 20MHz 960 referred to here is the Military 80960MC part, *not* the 960CA. The 960MC hit silicon in 1985 and has not been upgraded since then. ARC did not measure the 960CA, even though that would have been a more representative measurement. The part measured was running in a PC/AT plug-in board. The MIPS system it is being compared to is a full system with a significantly-sized off-chip cache. The 960CA would perform approximately 2x *faster* than the MIPS R3000 on the handcoded versions of the benchmarks. For compiled code, if the code were written in C, we would also perform approximately 2x faster. The code in question was compiled with a beta-release Ada compiler available last spring. Mr. Hawkes is doing the expected in attempting to show MIPS' processor in the best light, but not in Mr. Mashey's spirit of "full disclosure". If people are more interested in these tests, I will see how much information JIAWG will allow to be released, and release it here. S. McGeady Intel Corp.
mcg@mipon2.intel.com (Steven McGeady) (11/28/89)
In article <28107@amdcad.AMD.COM>, tim@electron.amd.com (Tim Olson) writes: > > In article <22303@gryphon.COM> scarter@gryphon.COM (Scott Carter) writes: > | The 960 CA can issue three instructions per > | cycle to the chosen three of four execute units. I believe Intel has figures > | showing that on the average they could infact issue two instructions per clock > | _average_ [over what program set?], hence the 960CA can legitimately be called > | 66 Native MIPS average with 99 Native MIPS peak. > > The i960CA decoder can dispatch up to 3 instructions per cycle. > However, the decoder looks at 4 instructions at a time, and it appears > that the decoder cannot be loaded with the next set of 4 instructions > until the current set of instructions have all been dispatched. This is not correct. The instruction decoder contains a rolling quad-word window into which instructions are loaded (potentially) every cycle. The reason that we do not claim 99 MIPS (none of our advertising claims this number, to the best of my knowledge - those who have heard me speak hear me say jokingly that we run at 99 MIPS for "one whole cycle") - is that for three instructions to be dispatched, one must be a branch. A branch requires that a non-next line of instructions from the i-cache be loaded, and this is not accomplished at the full rate. > Intel compared its i960CA board running this benchmark suite with a > 68030 (20MHz), an i960KA(20MHz), and an Am29000(16MHz) board. > However, the board they used to benchmark the Am29000 was not designed > for performance; rather, it was designed to test the functionality of > ADAPT (Advanced Development and Prototyping Tool) hardware debuggers. This is an interesting piece of history re-invention. Step Engineering, the current manufacturer of the STEB board, received the design of the board from AMD (the board has an AMD copyright on it). Apparently, the board was designed this way because it is impossible to build a 29K system using normal DRAMs and achieve better performance. We attempted to put faster RAMs inthe STEB board, and to increase the clock speed to 20MHz, and neither worked. We chose the STEB board not because it was slow (even we didn't expect it to be so slow) but because it is the only available board with a prototyping area on which we could add an SBX connector to interface the graphics cards on which we displayed the benchmark results. > To provide a more fair comparison, I requested the benchmark sources > from Intel, to run on a 30MHz Am29000 board (manufactured by YARC > Systems). This board uses 2-way interleaved, 100ns DRAM memory for > instructions and 35ns SRAM for data. This board contains separate Instruction and Data memory (using the 29k's Hardvard bus), each of which is interleaved (according to published data I've been able to find on the board). The 30MHz 29k's are apparently hand-sorted - we know of no volume shipments of these parts. This board is in no way comparable in cost, parts-count, interface complexity, or usability to the 960CA board that was used. > I received sources for the non-proprietary benchmarks, compiled them > with the current version of the MetaWare HighC29k compiler, and ran > them on the YARC card. Here are the final results: > > [tables showing the 29k approximately at par with 960CA] We supplied Mr. Olson with the sources to these benchmarks, as an effort to bring an end to the warring that has been going on over benchmarking. In exchange for freely supplying these, Mr. Olson agreed that we would be given the resulting source code back, along with a copy of the compiler that produced it, prior to publication of the results. Mr. Olson has chosen to ignore those commitments and publish numbers without noting what compiler was used, and without providing us (or anyone else - we also supplied the benchmarks to Michael Sleator of Microprocessor Report) with the ability to check their validity. It should be noted that the 960CA benchmarks were compiled with the current GNU GCC compiler, which does *no* instruction scheduling, and thus fails to take advantage of the multiple-instruction issue capability of the 960CA. We have been working on an instruction-scheduling compiler, but it is not available for release at this time. The lesson that this has served to teach me, who argued with our marketing department that we should release these benchmarks to AMD under the noted restrictions, is that we were foolish to trust AMD's word regarding feedback of the results from the benchmarks. Thus, I place no trust in these numbers presented as representing any kind of objective reality. Furthermore, I have learned my lesson with regard to cooperating. The benchmark wars will now most certiainly be taken out of the hand of technologists and be placed back in the hands of marketing departments. I will reiterate here my advice to customers attempting to determine the relative speed of the two processors: run your own benchmarks on a board with a memory system relevant to the design you plan to build. The Yarc board's memory design is an example of the most-expensive memory system design that one can attach to the 29k - it bears no resemblance to what can be expected with a combined I&D DRAM memory system, which is where the only true comparison lies. In short, don't believe AMD's benchmark numbers, and don't believe ours. Don't believe simulators, because AMD's is well known at overstating performance. Believe your own benchmarks. And note that the STEB board is much closer to most embedded designs that the Yarc board, and that the 960 is much more usable in the average design that the 29k. S. McGeady Intel Corp.
mcg@mipon2.intel.com (Steven McGeady) (11/28/89)
In article <22514@gryphon.COM>, scarter@gryphon.COM (Scott Carter) writes: > > 2) I'm not sure that any meaningful extrapolation can be made from the 860 to > the 960CA, given that their instruction parallelism mechanisms are utterly > different. Comparison to something like the Super Titan (on integer codes) > would be rather more appropriate. No meaningful comparison is useful here. The 860 is a floating-point near-VLIW processor, the 960 is an integer superscalar embedded processor. The 860 achieves parallelism between floating-point and integer operations using parallel pipelines, the 960 achieves parallelism between integer and memory operations by using parallel instruction dispatch. > claim 66 Native Mips is not a priori any more illegitimate than most other > vendors native MIPS claims. In technical forums, I have always been careful to distinguish the cases where the 960CA could be expected to run at this rate. > 4) I would disagree about the Mips _Ada_ compiler being better than the > Intel/Biin 960 Ada compiler (agree wholeheartedly on C/Pascal/FORTRAN). While the original MIPS/Verdix Ada compiler was not up to snuff with their C technology, it was still reasonably good. MIPS has released new numbers (the ones that Mr. Hawkes referred to) based on a new release of their compiler. > We found that the performance ratio between the R3000 and the 960XA was much > wider on [somewhat larger than JIAWG] our own benchmarks in C, Pascal, and > FORTRAN than in Ada, either JIAWG or some other internal benchmarks. As I mentioned in a previous article, this ignores the following facts: 1) the 960MC/960XA is the original silicon generation of the 960 architecture, and is wholly unrelated to the 960CA -- you can expect us to apply the CA's superscalar techniques to other levels of the architecture, but we're not yet saying when; 2) the benchmarks were run on systems that are in no way comparable: a PC plug-in board (or possibly the execrable Multibus-I EXV board, or the 16MHz BiiN systems), versus the MIPS systems with large caches. 3) The current compiler does not attempt any CA parallel-dispatch optimizations. The 960CA was released with working silicon, but unfortunately, the compilers are a little behind. > 5) Based on the code generated for the 960XA for the JIAWG benchmarks, I have > to say I can't believe in two instructions per clock for the 960CA on this > set (this is a GUESS only - any data I might have cannot be posted), As stated in other articles, I would be astonished if you got a sustained rate of two instructions per clock over the balance of a large benchmark. Parallel instruction dispatch is much more complicated than this - the idea is to reduce the overall latency of instructions. I have noted several times that we expect that parallel instruction dispatch will allow us to bring our cycles per instruction down to very close to 1 instruction per clock in this generation of chips, which is substantially better than most other archictures when you consider that 960 code is 20-30% denser than comparable RISCs. > 6) If we need to express our religious loyalty, mine is with the R3000. No suprise here - I'll leave my loyalty as an exercise to the reader. S. McGeady Intel Corp.
rogerk@mips.COM (Roger B.A. Klorese) (11/28/89)
In article <5277@omepd.UUCP> mcg@mipon2.intel.com (Steven McGeady) writes: >we also >supplied the benchmarks to Michael Sleator of Microprocessor Report Michael Slater of Microprocessor Report is not Michael Sleator of Stardent. -- ROGER B.A. KLORESE MIPS Computer Systems, Inc. phone: +1 408 720-2939 928 E. Arques Ave. Sunnyvale, CA 94086 rogerk@mips.COM {ames,decwrl,pyramid}!mips!rogerk "I want to live where it's always Saturday." -- Guadalcanal Diary
aburto@marlin.NOSC.MIL (Alfred A. Aburto) (11/29/89)
In article <28107@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes: > >To provide a more fair comparison, I requested the benchmark sources >from Intel, to run on a 30MHz Am29000 board (manufactured by YARC >Systems). This board uses 2-way interleaved, 100ns DRAM memory for >instructions and 35ns SRAM for data. > > -- Tim Olson > Advanced Micro Devices > (tim@amd.com) While your comparative results are very interesting, and useful as a point of reference, I must throw in some words of caution. That is, the individual results are highly dependent upon the machine code generating efficiency of the various compilers used. In order to achieve real useful relative comparisons of performance we must somehow demonstrate that the compilers generate reasonably optimal (or 'typical', or equally degraded :-)) machine code for each benchmark. I know for example the 68030 @ 25 MHz coupled with 'the right stuff' (hardware and compiler) can achieve roughly 10K (V1.1) Dhrystones/sec (as compared to the 5.5K result posted). Al Aburto aburto@marlin.nosc.mil
tim@nucleus.amd.com (Tim Olson) (11/30/89)
In article <1256@marlin.NOSC.MIL> aburto@marlin.nosc.mil.UUCP (Alfred A. Aburto) writes: | While your comparative results are very interesting, and useful as a point | of reference, I must throw in some words of caution. That is, the individual | results are highly dependent upon the machine code generating efficiency of | the various compilers used. Yes, your point is well taken. However, it is usually quite hard to isolate these kind of things when benchmarking real systems; about all you can say is System X, using compiler Y achieved these results on benchmark Z. | I know for example the 68030 @ 25 MHz | coupled with 'the right stuff' (hardware and compiler) can achieve roughly | 10K (V1.1) Dhrystones/sec (as compared to the 5.5K result posted). Right. This is why I requested the benchmarks from Intel in the first place. They also showed the Am29000 running much slower than it could, so I re-ran the benchmarks on a different board to present corrected results. I reported these, along with Intel's original i960KA, i960CA, and 68030 results, but I didn't feel that it was my place to also present new numbers for the 68030 -- I invite Motorola to do so. Our benchmark philosophy has been to include results for common systems (i.e. VAX 11/780 running 4.3bsd, and Sun 3/60), for two reasons: 1) The results can be easily verified by 3rd parties 2) More people have direct experience with these systems and have a good feel for their performance levels. -- Tim Olson Advanced Micro Devices (tim@amd.com)
mash@mips.COM (John Mashey) (12/01/89)
In article <5275@omepd.UUCP> mcg@ishark.Berkeley.EDU (Steven McGeady) writes: > >In article <31329@winchester.mips.COM>, hawkes@mips.COM (John Hawkes) writes: > >> The Atlantic Research Corporation, an independent group, has done some >> comparisons between the MIPS R3000 (25-MHz) and a 20-MHz 80960 executing >Ada >> programs (the "Common Avionics Processor Ada Benchmark Suite"), and they >> discovered that the R3000 was usually more than twice as fast on hand- >> coded programs, and overall was more than five times faster on >compiled > > programs. >The 20MHz 960 referred to here is the Military 80960MC part, *not* the >960CA. The 960MC hit silicon in 1985 and has not been upgraded since >then. ARC did not measure the 960CA, even though that would have been >a more representative measurement. The part measured was running in a >PC/AT plug-in board. The MIPS system it is being compared to is a full >system with a significantly-sized off-chip cache. > >The 960CA would perform approximately 2x *faster* than the MIPS R3000 >on the handcoded versions of the benchmarks. For compiled code, if >the code were written in C, we would also perform approximately 2x >faster. The code in question was compiled with a beta-release Ada >compiler available last spring. Mr. Hawkes is doing the expected >in attempting to show MIPS' processor in the best light, but not in >Mr. Mashey's spirit of "full disclosure". If people are more >interested in these tests, I will see how much information JIAWG will >allow to be released, and release it here. Well, I'm not sure any of these means a whole lot. The tests were done in April, so 960CA's weren't available, and of course MCs and CAs are extremely different (people sometimes get confused by the variations in Intel nomenclature of versions :-). John cited one of the few results known to be available, as such results are not the easiest things to come by. Also, in the spirit of "full disclosure", note that they may have later ran it on a 25MHz R3000, but the report I saw used a 16.7MHz R2000 in an M/120. The 960 was an EVA-960KB board, including 1MB of DRAM, and 64KB of 35ns SRAM. "All benchmarks were run out of SRAM". Note, of course, that for almost any chip, a single-board-computer design is FASTER than a larger/expandable design with multiple boards, because you usually can build a tighter memory system. Thus, the fact that something is a plugin board to a PC is irrelevant: the bigger system has to bear performance burdens that a plugin board does not, and having everything in SRAM is likely to be faster than having SRAM cache in front of DRAM. The benchmarks I saw included some floating-point, which I suspect would not have pleased the 960CA..... Of course, it is hard to say much about any of this, as the ones I saw were all small benchmarks anyway, hence taken with a grain of salt. This certainly relates to the discussion at Microprocessor Forum regarding the difficulties of benchmarking embedded systems, i.e., doing UNIX ones is bad enough, but the embedded world is really a zoo by comparison! I certainly expect the 960CA to be noticably faster. I also note that even people are trying hard to do sensible and fair benchmarking, it's easy to say things that are subject to argument. (At the M-F, there was an interesting sequence between the i960 & AMD 29K crews that illustrated this, especially with rgard to evaluation boards.) anyway, peace. Finally, it probably doesn't matter a whole bunch, at least in terms of what fired all of this 960 vs R3000 stuff up in the first place. As **most** people interested in this area know, the i960 and MIPS were chosen last summer as the 2 architectures of choice by the JIAWG committee, (after a "spirited" competition I believe). For various reasons having nothing to do with computer architecture, these two choices have tended to to ripple through the defense community as 32-bit military RISC standards, which is why, of course, the JIAWG battle was pretty hard-fought in the first place. I'll soon post an analysis of a related effort [the SAE committee's recommendations], as an example relevant to the difficulty of embedded evaluations, and also relevant to one of my favored topics: interpretation and (mis)interpretation of data. In preparation for that you might want to read the Suntech Journal, Autumn isse, page ST8, "SPARC Scores in DARPA/SAE Architecture Test", which reminded me of the SAE. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
mash@mips.COM (John Mashey) (12/01/89)
This note: 1) Analyzes the Society of Automotive Engineers (SAE)'s final report "FINAL REPORT, 32 BIT COMMERCIAL ISA TASK GROUP, AS-5, SAE" ..... which came out in September or October, I think.(?) 2) Discusses an article representing the results of that report. The objective was: "the 32 Bit Commercial ISA Task Group was established to evaluate suitability of existing commercial architectures for use as general purpose processors in avionic and other embedded applications" The approach was to request applications from any vendor who wanted to propose things, and they got AMD29K, Intergraph Clipper, MIPS R3000, NS32000, Sun SPARC, and Zilog ZS80000. "A set of criteria were established and relative weights set." This was split into: 60%: functionality of the instruction sets (general) 20%: capabilities of the current implementation 20%: performance What this means is that there were a bunch of criteria, with points assigned by discussion of the committee, i.e., there could be 10 points for some section, and chips might be given anywhere from 2 to 8 points, then normalized to the maximum found, that is, the one with 8 would get 1 point, and the one with 2 would get 2/8 = .25. Totals were: "Results: 29000 R3000 32532 SPARC General 42.88 40.12 42.56 43.40 Current 10.89 13.52 13.65 13.86 Performance 4.90 14.50 10.92 16.00 Total: 58.67 68.14 67.14 73.26 Observations: The most significant point of the results is the very small spread of the point values." They go on to note that AMD didn't have an Ada compiler available at the time, and so got zapped on performance. They also note that they scaled up the scores for MIPS and SPARC because faster chips became available than what had been benchmarked. They noted the difficulty of establishing objective criteria, saying: "To this end, four meetings and the intervening months were devoted to establishing the criteria against which the ISAs would be evaluated. As in any other venture, if we were to start over, we would probably produce a somewhat different set of criteria, with results that might be more valuable in their ability to differentiate between the ISAs....It was also noted that when actual evaluation was started, the meaning of several of the criteria were obscure and had to be clarified. Conclusions: Since these ISAs, and their implementations, are competing in the market place, it is not surprising that none of the ISAs were exceptionally better or worse than any of the others...Due to there not being a typical application, it is not possible to make a definitive general recommendation. In general, any of the ISAs will serve well. Given a specific application, with its own priorities and constraints, one of the implementations will probably serve that purpose better than another." ************************* Thus, the outcome of the study, clearly stated, was: a) It's hard to create objective criteria. b) They cannot make any definitive recommendations of one over another. ************************* The next section gives the various details of rating points, for the first two categories. These were done by consensus scoring of features. For example, "Support for cache coherency AMD 29000 2 MIPS R3000 5 National 32000 2 Sun SPARC 8" (There are pages of such things; some of the numbers make sense, some are inexplicable to me, but that's OK. This particular one is somewhat inexplicable... Some of the ratings directly contradict the findings of people like JMI, whose C Executive runs on many micros, and which MEASURED things like interrupt-handling and context switching, rather than consensus-estimating them. Under "Current implementations", there were good things like: "How many compatible performance variations are available? AMD29000 1 MIPS R3000 3 National 32000 5 Sun SPARC 5" (Interesting: it doesn't matter whether an implementation covers a wide range of performance, what counts is the number of different ones. Note that the .4 difference (5/5 - 3/5 accounts for more than the full difference in the final ratings for this section.....) Finally, we come to the benchmark section, which contains additional ratings of the type above, plus one section for actual benchmarks. Sun SPARC is given 50 points (24.5 mips), and the R3000 39.1 (19.15 mips). I deleted the NS32532 column for space reasons, and added the data column at the right (which was the Ada compiler, -O, and whose results were available May 1989 and posted shortly thereafter (I think) on the JIAWG bboard by the TI folks. The benchmarks total 2200 Lines Of Code Ada), and are mixture of integer and floating point, as follows: bin_clst binning & clustering: 135 LOC, integer boomult multiplies boolean matrices together, 102 LOC des1 encryption, 346 LOC dig_fil 64-bit FFT, 647 LOC eightqueens integer, 98 LOC finite2 char->float conversions, 165 LOC flmult float matrix multiplication, 106 LOC inmult integer matrix mult, 81 LOC kalman flt/integer, matrices, 324 LOC shell shell sort, 52 LOC, integer substrsrch substring text search, 103 LOC Now, here is the data presented in the report, plus my addition of the last column: VAX 11/780 VAX 11/785 R3000 SPARC R3000 -O DEC DEC MIPS Inc. SUN MIPS 25 MHZ 25MHz 25MHz Times in millisec, followed by results in MIPS, Normalized to VAX 11/780 =1 (Note 3) bin_clst 0.51 0.48 0.05 0.08 0.04 boomult 981 658 246 49.99 29 des1 160 111 13.33 dig_fil 111000 2830 70 106.66 55 eightqueens 30 21 1.58 1.65 1.29 finite2 12 9 0.70 0.71 .60 flmult 765 429 81 65 24 inmult 789 495 104 53 kalman 480 330 57 51.66 27 shell 5 3.1 0.48 0.47 .31 substrsrch 12 9 0.65 0.55 .35 bin_clst 1.00 1.06 10.20 6.38 20.00 boomult 1.00 1.49 3.99 19.62 33.80 des1 1.00 1.44 12.00 dig_fil (note 3) 0.03 1.00 40.43 26.53 51.5 eightqueens 1.00 1.43 18.99 18.18 23.25 finite2 1.00 1.33 17.14 16.90 20.00 flmult 1.00 1.78 9.44 11.77 31.87 inmult 1.00 1.59 7.59 14.89 kalman 1.00 1.45 8.42 9.29 17.78 shell 1.00 1.61 10.42 10.64 16.13 substrsrch 1.00 1.33 18.46 21.82 34.28 Average 0.91 1.41 14.51 15.31 26.35 Average for 33MHz R-3000 and 40MHz SPARC 19.15 25.16 Note 3) dig-fil results are normalize (sic) to VAX 11/785 results. Data sources: VAX results provided by JIAWG/WPAFB R3000 results provided by TI SPARC results provided by Sun ------------------------------------------------------- ------------------------------------------------------- Now, here's a good exercise for the reader: what do you believe from the data above? What conclusions can you draw, and why? What problems might there be? 1. The benchmarks are very short: remember the times are in milliseconds, that is, numbers as low as 40 microseconds are listed. => benchmarks should be longer 2. There are holes in the data. The des1 entry for MIPS is missing (there was an obscure bug in the Ada front-end at that point). The inmult benchmark for Sun was missing, for reasons I don't know. It is very difficult to compute averages of data where it's missing, because some benchmarks are tougher than others, and if your best or worst benchmark gets left out, it can affect the results. (This is why it's so nice to have the SPEC benchmarks: it was always a pain getting a complete set of numbers for the MIPS Performance Brief)> => delete the rows that have missing data. 3. The average is an arithmetic average, NOT a geometric mean. (Geometric mean is a better measure for analyzing ratios.) => use geometric mean for averaging ratios. Also, one of the datapoints is normalized differently (to a 785). 4.If you compute the Geometric means, having deleted the two rows that are missing data, you get: MIPS: 12.63, SPARC: 14.36, MIPS (opt): 25.8. 5. Just scaling up clock rates is meaningless, computers don't work that way, because the memory systems are relevant. Suppose you give SPARC a 40MHz clock rate: that get's its geometric mean = 14.36x40/25 = 22.98, i.e., not as fast as the MIPS at 25MHz.... 6. Of course, the variance of all this data is pretty high: with 9 data points used, the 95% confidence levels for the 3 are: MIPS: [7, 23.5] SPARC: [10.6, 20.7] MIPS -O: [18.9, 36.4] Anyway, this is why the committee carefully said that the overall data didn't mean very much. Of course, the committee report came out AFTER the JIAWG decision was made [i.e., it was irrelevant to that], and this report explicitly did NOT recommend anything as the architecture for military projects. Lessons: 1) It's hard to evaluate things on paper. I think the committee tried hard, in a really difficult job, but it's real hard... 2) It's always a good idea to look behind the summaries a bit. 3) It's important to understand the difference between numbers than mean something, and numbers that don't. The committee did understand that there was insufficient difference to prove anything. Now, everyone interprets data a bit differently, Just for fun, let's look at how Frank Yien and Scott Thorpe of Sun interpreted this, in SunTech Journal, Autumn 1989, page ST8, in the article called: "SPARC Scores In DARPA/SAE Architecture Test" (THERE'S BEEN PLENTY OF DATA; NOW WE GET SOME "MARKETING" ANALYSIS; QUIT NOW IF YOU DON'T LIKE THAT STUFF. I INCLUDE THIS BECAUSE I'VE ALREADY GOTTEN QUESTIONS FROM PEOPLE ABOUT IT, AND THE ARTICLE HAS APPARENTLY GIVEN TO PEOPLE ABROAD TO PROVE THAT SPARC WAS SOMEHOW A U.S. RECOMMENDED STANDARD....) The article leads off with: "In a recent comparison of leading 32-bit architectures by DARPA (the Defense Advanced Research Projects Agency), the SPARC architecture was ranked as the top processor architecture for use in military projects." Well, it had the highest numbers, but they weren't significant, and the committee said so. Of course, it didn't matter much anyway, because the key decisions were being made somewhere else, and the choices elsewhere [MIPS & Intel] reflected what the large contractors decided in doing serious evaluations. "Finally, SPARC won the benchmark category, without using the most powerful SPARC implementations available from SPARC manufacturers today. The 80-MHz ECL SPARC implementation was not used in these comparisons;" Of course it wasn't; the embedded avionics market is not excited by ECL, and Sun didn't have an ECL system for them to benchmark anyway. So what does ECL SPARC have to do with it? "instead, the 40-MHz CMOS SPARC implementation was benchmarked and still won easily, since the others have only 33-Mhz chips." They didn't benchmark a 40-MHz implementation, they benchmarked a 25Mhz one and then multiplied by 40/25. Note that no 40Mhz SPARC SYSTEM has yet been announced, much less delivered. It didn't win easily, it won barely @ 25Mhz, and if they had reported the correspondingly-optimized MIPS numbers, a 40MHz SPARC (not yet delivered in system) is seen from the chart above to be SLOWER than a 25MHz R3000 [slower on the average, and slower on 8 out of the 11 benchmarks, the only exceptions being eightqueens, finite2, and shell, hardly the larger/more realistic tests]. "note that military benchmarks are very demanding and closely resemble compute-intensive engineering/simulation environments." Military benchmarks can be demanding all right, but: some of these are very small benchmarks. Some of these benchmarks are realistic, and some are pretty small; none have any real-time component that I could see. If you believe there's a correlation between these benchmarks and engineering ones, that's good, because MIPS is faster. If you don't believe there's much correlation, that's fine too.... "SPARC is winning the technology battle: It is the frequency leader in both CMOS and ECL technologies and ranks first in independent tests. SPARC hardware and software vendors are well positioned for the future." Well, each to their own.... Note that the real war for the 32-bit RISC embedded defense standard seems to have 2 winners, and SPARC wasn't one of them.... It's possible that some people missed this, although it sure made the defense magazines... -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
khb@chiba.Sun.COM (Keith Bierman - SPD Advanced Languages) (12/01/89)
I must confess to not having followed the SAE stuff closely (I must have spent a good minute or so glancing at the suntech blurb until now :>). Thanks for bringing this article to my attention. When I get back to work (taking next week off :> :>) perhaps I can coax someone to loan me their copy of the report itself. The Right Honorable J.M. sez: >This note: >1) Analyzes the Society of Automotive Engineers (SAE)'s final report >"FINAL REPORT, 32 BIT COMMERCIAL ISA TASK GROUP, AS-5, SAE" ..... >The objective was: "the 32 Bit Commercial ISA Task Group was established >to evaluate suitability of existing commercial architectures for use as >general purpose processors in avionic and other embedded >applications" Does anyone happen to know how/when/why SAE become the arbiter of avionic applications ? >The approach was to request applications from any vendor who wanted to >propose things, and they got AMD29K, Intergraph Clipper, MIPS R3000, >NS32000, Sun SPARC, and Zilog ZS80000. "A set of criteria were > established and relative weights set." This was split into: > 60%: functionality of the instruction sets (general) > 20%: capabilities of the current implementation > 20%: performance ^^^ So while we performance weenies can carp about their benchmarking techniques and/or data reduction techniques (What no SVD, covariance and sensitivity analyses ? :>) the results of the benchmark section would appear to be of much less entertainment value than their evaluation of the functionality of the instruction sets (I thought all the machines were turning complete .... so this must have been the area the group spent the most time locked in discussion). >"Results: > 29000 R3000 32532 SPARC >General 42.88 40.12 42.56 43.40 ---------------------------------------------- >The most significant point of the results is the very small spread of the >point values." Agreed, the difference between the high (SPARC) and the low (MIPS) is only 7.6%. But as this section is pure gedanken study, the rationale employed is probably of great interest to this group. If someone has the report handy and is a good typist, posting it would be a service (more entertaining than yet another posting of xstones, or 8queens :>). Since this was a GS there is NO measurement error, or other source of measurement error, there is little justification for using the same statistical tools we use for measuring benchmarking activities. It is not surprising that this section was counted more heavily than the other two, as the folks who build missiles, planes, and spacecraft are more concerned about long term issues than the hot chip of the week (galileo, for instance, is 1802 based ... and it is possibly the most complex deep space probe yet flown). >(There are pages of such things; some of the numbers make sense, some >are inexplicable to me, but that's OK. This particular one is somewhat >inexplicable... Some of the ratings directly contradict the findings >of people like JMI, whose C Executive runs on many micros, and which >MEASURED things like interrupt-handling and context switching, ^^^^^^^^ Ah, "data". If you can measure it with a stop watch, it is part of "performance" or "current implementation". Without pondering their report long and deep I can't begin to second guess them; but mixing the gedanken study whose intent it is to crystal ball gaze (it takes a lot of years to develop an embedded system, so one tries very, very hard to pick the technology that will be ripe when you are ... typically 5 to 10 years down the road (at least where I came from). >Under "Current implementations", there were good things like: >"How many compatible performance variations are available? AMD29000 1 MIPS R3000 3 National 32000 5 Sun SPARC 5" >(Interesting: it doesn't matter whether an implementation covers >a wide range of performance, what counts is the number of different >ones. yep. If I want to guess what will really be on the shelf in 10 years, I want the one with the most suppilers ... one is likely to have stuck around. 1 is a really bad number (if for no other reason, it usually addes several inches of paper to the documents which must be approved for your project to fly).... there are of course other reasons. >I deleted the NS32532 column for space reasons, and added the data column >at the right (which was the Ada compiler, -O, and whose results were >available May 1989 and posted shortly thereafter (I think) on the JIAWG >bboard by the TI folks. One presumes they left out optimized results for some bizzare reason of their own. I have played with both the Verdix and Telesoft SPARC Ada compilers and both come with optimizers. ....perf figures and analysis .... > Of course, the committee report came out AFTER >the JIAWG decision was made [i.e., it was irrelevant to that], >and this report explicitly did NOT recommend anything as the architecture >for military projects. There are non-military government embeded projects. There are non-JIAWG (at least there were when I lived in that universe). As I recall it is rare for such committees to ever come out and say BUY IBM or anything like that :> They come up with fancy numeric ranking schemes to shield themselves from anything that tacky. Also a lot of such projects end up with close figures .... most readers (and writers in that world) rely on the ranking (at least they used to). Lessons: >1) It's hard to evaluate things on paper. I think the committee tried >hard, in a really difficult job, but it's real hard... But real necessary. Long term projects require long term thinking. >2) It's always a good idea to look behind the summaries a bit. And at the background of the organization(s) involved, past recommendations, projects which relied on or ignored the recommendations (implicit as well as explicit) and how they turned out (including funding battles, etc.) >3) It's important to understand the difference between numbers than >mean something, and numbers that don't. The committee did understand >that there was insufficient difference to prove anything. Perhaps. I don't know the SAE (other than as the folks who along with God and Honda tell me what oil to put into my motorcyles). One must know how to read behind the words. >Now, everyone interprets data a bit differently, Just for fun, let's look >at how Frank Yien and Scott Thorpe of Sun interpreted this, in >SunTech Journal, Autumn 1989, page ST8, in the article called: > "SPARC Scores In DARPA/SAE Architecture Test" >(THERE'S BEEN PLENTY OF DATA; NOW WE GET SOME "MARKETING" ANALYSIS; Data ? The SAE chose a 60-40 split of "that which cannot be measured with a stopwatch or ruler" vs. "lets take our places and do the 100-nanosec dash". Taking the resulting numbers and renormalizing, computing means and etc. isn't data. It's analysis. Since it's not being done to engineer a product, or to elucidate the logic of the report, its all been "marketing" analysis. Very interesting and entertaining mind you, but this warning is a bit late. It is a very nice rhetorical move though. >The article leads off with: >"In a recent comparison of leading 32-bit architectures by DARPA (the Defense >Advanced Research Projects Agency), the SPARC architecture was ranked >as the top processor architecture for use in military projects." > Well, it had the highest numbers, but they weren't significant, > and the committee said so. How many reports of this nature say otherwise ? As far as I know, its a standard disclaimer like "your milage may vary". > Of course, it didn't matter much anyway, because the key > decisions were being made somewhere else, and the choices elsewhere > [MIPS & Intel] reflected what the large contractors decided in > doing serious evaluations. Perhaps, perhaps not.... Excerpted from a press release dated Nov 27 SPEC To Develop Chip Set SPEC has been contracted by NASA to develop a high-performance GaAs RISC processor to demonstrate the inherent speed and radiation-hardness advantages of GaAs. Multiple GaAs SPARC processors will be included in a demonstration board that SPEC is building to look at Gas capabilities. The board will include four to eight GaAs PARC processors, GaAs array communications coprocessors and GaAs floating point coprocessors. Under the agreement with SPEC, Sun can license SPEC's GaAs-based SPARC design with the option to have it manufactured by one of the six semiconductor vendors now manufacturing SPARC microprocessors. Initial samples of the GaAs SPARC processor will be available late in 1990. Note that: this isn't the benchmarking group that ee-times, sun, mips, hp, et al. set up. Instead it is Systems & Processes Engineering Corporation (SPEC) provides systems engineering services and manufactured products to the aerospace industry, international and U.S. commercial business, and to government agencies. Located in Austin, Tex., SPEC is a privately owned company. This brings up a question ... did the SPEC benchmarking group do a name search ? The array communications coprocessor is a GaAs implementation of a proprietary SPEC inter-processor communications architecture. The coprocessor provides a tightly coupled message/data passing interface between processors in a multi-processor computer system. The floating point coprocessor supports 32-bit and 64-bit operations in a highly pipelined mode with a peak throughput of one floating point operation per cycle. These three components make a complete chip set that SPEC will incorporate into board-level products for the commercial marketplace. "These are the building blocks necessary to build the high-performance systems of the future. Single- and multiple-processor GaAs workstations will form the high end of performance in the 1990s. Other technologies will not be able to approach the performance of GaAs," said SPEC President Randolph E. Noster. According to SPEC's chief scientist, Dr. Gary B. McMillian, the GaAs SPARC processor and coprocessors are being designed to operate at 200 MHz, with performance at 800 to 1600 MIPS in a four- to eight-processor implementation. SPEC plans to use a VME/FutureBus implementation, which will provide enough bus bandwidth to support the multiple, high-speed processors. >"Finally, SPARC won the benchmark category, without using the most >powerful >SPARC implementations available from SPARC manufacturers >today. The 80-MHz >ECL SPARC implementation was not used in these >comparisons;" > Of course it wasn't; the embedded avionics market is not > excited by ECL, and Sun didn't have an ECL system for them to > benchmark anyway. Not necessarily true. Gould/Encore, Elxsi and others have particpated in the big iron/high powered embeded system marketplace. And, as the clipping I included above notes, GaS is of a more than passing interest (rad hard, widely used in some battlefield stuff) in some circles. Late breaking events in what used to be the USSR may make some of the high performance/rad hard research less important ... or perhaps it will free up resources to do more serious space research.... but as late as Monday some folks still thought they had funding for such projects. While I am certainly NOT prepared to say sun has an ECL machine, I find it interesting that John is so positive that we don't. Some folks use the old algorithm "announce, take orders, design, ship, test" but it is increasingly dangerous to rely on it. I'd be willing to bet that, for instance, IBM will be able to ship its RIOS box close to whatever date IBM sez it can after announcement. ..... misc hype from the suntech article <omitted> Well, each to their own.... Note that the real war for the 32-bit RISC embedded defense standard seems to have 2 winners... Sometimes battles last longer than one round. Keith H. Bierman |*My thoughts are my own. !! kbierman@sun.com It's Not My Fault | MTS --Only my work belongs to Sun* I Voted for Bill & | Advanced Languages/Floating Point Group Opus | "When the going gets Weird .. the Weird turn PRO" And in this case, my boss probably thinks I'm home asleep, not wasting valuable computer cycles on the net. I may not even be speaking for me.... I meant to go home hours ago. "There is NO defense against the attack of the KILLER MICROS!" Eugene Brooks Nor should there be. --khb Keith H. Bierman |*My thoughts are my own. !! kbierman@sun.com It's Not My Fault | MTS --Only my work belongs to Sun* I Voted for Bill & | Advanced Languages/Floating Point Group Opus | "When the going gets Weird .. the Weird turn PRO"
henry@utzoo.uucp (Henry Spencer) (12/05/89)
In article <128680@sun.Eng.Sun.COM> khb@chiba.Sun.COM (Keith Bierman - SPD Advanced Languages) writes: >(galileo, for instance, is 1802 based ... and it is possibly the most >complex deep space probe yet flown). It's also a twenty-year-old design built ten years ago. Galileo has waited a *long* time to fly, due to an excruciating series of problems with launch vehicles and upper stages. (In some ways this is a good thing, because a major design defect in Galileo's thrusters was discovered less than a year ago...!) It is definitely the most complex deep-space mission yet flown, but is not representative of technology that would be used today. -- Mars can wait: we've barely | Henry Spencer at U of Toronto Zoology started exploring the Moon. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
brooks@maddog.llnl.gov (Eugene Brooks) (12/05/89)
In article <1989Dec4.171505.22203@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >It's also a twenty-year-old design built ten years ago. Galileo has waited >a *long* time to fly, due to an excruciating series of problems with launch >vehicles and upper stages. (In some ways this is a good thing, because a >major design defect in Galileo's thrusters was discovered less than a year >ago...!) It is definitely the most complex deep-space mission yet flown, >but is not representative of technology that would be used today. You mean not representative of technology that would be used in a mission designed today, built 10 years from now, and flown 20 years from now. Technology changes, but the way such missions are arranged and delayed does not... Please note the lack of a smilie... brooks@maddog.llnl.gov, brooks@maddog.uucp
cooper@hpsrad.enet.dec.com (g.d.cooper in the shadowlands) (12/06/89)
In article <40547@lll-winken.LLNL.GOV>, brooks@maddog.llnl.gov (Eugene Brooks) writes... >In article <1989Dec4.171505.22203@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >>It's also a twenty-year-old design built ten years ago. Galileo has waited >>a *long* time to fly, due to an excruciating series of problems with launch >>vehicles and upper stages. (In some ways this is a good thing, because a >>major design defect in Galileo's thrusters was discovered less than a year >>ago...!) It is definitely the most complex deep-space mission yet flown, >>but is not representative of technology that would be used today. This reminds me of a similar anecdote about the Apollo program, if I remember correctly. All of the logic was RTL and they couldn't fit sufficient gates into the design which is why Armstrong had to manually pilot the lunar lander. The original idea was for a computer controlled landing. >You mean not representative of technology that would be used in a >mission designed today, built 10 years from now, and flown 20 years from now. >Technology changes, but the way such missions are arranged and delayed does >not... Please note the lack of a smilie... And they could have used TTL by the time Apollo was ready to go but it would have required a total redesign of all of the electronics and n billion $s. As a side note, I believe that NASA was the last large scale user of RTL components. Can you say archaic, shades ============================================================================ | He paid too high a price for living | Geoffrey D. Cooper | | too long with a single dream..... | cooper@hpsrad.enet.dec.com | |-------------------------------------| business (508) 467-3678 | | decwrl!hpsrad.enet.dec.com!cooper | home (617) 925-1099 | ============================================================================ Note: I'm a consultant. My opinions are *MY* opinions.
khb@chiba.Sun.COM (chiba) (12/12/89)
In article <603@ryn.esg.dec.com> cooper@hpsrad.enet.dec.com (g.d.cooper in the shadowlands) writes: >>> misc comments on the state of NASA tech, from misc. folks >This reminds me of a similar anecdote about the Apollo program, if I >remember correctly. All of the logic was RTL and they couldn't fit... > .... > >And they could have used TTL by the time Apollo was ready to go but it >would have required a total redesign of all of the electronics and n >billion $s. > >As a side note, I believe that NASA was the last large scale user of >RTL components. > Can you say archaic, > Also quite reliable. In the lab we can use all sorts of new toys. By First Customer Shipment one expects the worst defects to be known and fixed. A couple of years later the vendor makes a new widget and the old one goes to the new guy/gal on the block. If there is an odd failure mode that takes 5 years to show up, its not a problem. It takes a considerable amount of calendar time to make it out to Deep Space (say Jupiter ... which is closer than lots of other nice places to visit). VGR has done real well; a few hw failures, but nothing which caused the mission to fail. It can be argued that it would be better to build cheaper spacecraft quicker, and launch lots. But unless and until we ensure waves of spacecraft each one has to be near perfect, or the whole project is a total loss. This implies a much more conservative set of design/management rules. The robotic arm of NASA (JPL, et al) has done a really fine job. My remark about 1802's was not a joke, it was certainly NOT intended as criticism. I want the latest technology (bugs and all) on my desk. I want something reliable for my file server. I want something safe in my motorcycle (enough risks as it is) and I most certainly want something really safe (and therefore probably old) in anything flying in deep space. cheers Keith H. Bierman |*My thoughts are my own. !! kbierman@sun.com It's Not My Fault | MTS --Only my work belongs to Sun* I Voted for Bill & | Advanced Languages/Floating Point Group Opus | "When the going gets Weird .. the Weird turn PRO"