irf@kuling.UUCP (Bo Thide') (03/30/91)
On popular demand here is a follow-up on my eralier posting with individual SPECmark ratings for the HP Apollo 9000/700 Snake family. ----------------------------------------------------------------------------- gcc espr. li eqntott spice doduc nasa7 matrix fpppp tomcatv ----------------------------------------------------------------------------- HP9000/730 46.5 55.2 50.3 52.6 60.9 64.0 73.7 273.3 107.0 67.4 HP9000/720 35.2 42.5 36.1 40.6 46.9 48.6 58.0 210.0 81.4 52.9 ----------------------------------------------------------------------------- In John D. McCalpin's (mccalpin@perelandra.cms.udel.edu) terms the 264 MB/sec bus speed gives the 720 a "Streaming MFLOPS" rating of 11. Here's John's table from his comp.benchmarks posting (I let John put in the correct Snake figures; an update would be welcome, John!) Performance Summary Table - LINPACK ----------------------------------- MFLOPS MFLOPS MFLOPS MFLOPS Price MFLOPS/Million$ System Peak Max Lnpk Stream $10**6 Max Stream ----------------------------------------------------------------------- IBM 550 82 62 27 12 0.13 477 92 MIPS RC6280 24 16 10 8 0.20 80 40 IBM 320 40 29 9 6 <0.02 1450 300 ----------------------------------------------------------------------- Convex C-210 50 44 17 9 ~0.5 88 18 Convex C-240 200 166 26 36 ~1.6 104 23 ----------------------------------------------------------------------- Cray Y/MP-1 333 324 25 150 ~3.0 108 50 Cray Y/MP-8 2664 2144 275 1200 ~16.0 134 75 ----------------------------------------------------------------------- 1xIBM 3090E VF 116 71 13 11 ~3.0 24 4 2xIBM 3090E VF 232 141 26* 22 ~5.0 28 4 3xIBM 3090E VF 348 210 39* 33 ~7.0 30 5 ----------------------------------------------------------------------- SGI 4D/310 10 8 6 3 SGI 4D/380 80 52 48* 3 ~0.20 260 15 ----------------------------------------------------------------------- Stardent 3010 32 25 10 6 Stardent 3040 128 77 12 11 ~0.25 308 44 ----------------------------------------------------------------------- IBM 320 40 29 9 6 <0.02 1450 300 8x IBM 320 320 232* 72* 48 0.11 1450* 300 16x IBM 320 640 464* 144* 96 0.21 1450* 300 ----------------------------------------------------------------------- (*) indicates extrapolated figures. Here's another comparison table: ==================================================================== The HP 720: How It Stacks Up COMPANY/PRODUCT PRICE MIPS SPEC Price Per Price Per marks MIPS SPECmark Hewlett-Packard/ $12,000 57 55.5 $211 $216 HP 9000 Model 720 IBM/ $9,725 29.5 24.6 $330 $395 RISC System/6000 Model 320 Digital Equipment/ $12,500 27.3 19.9 $458 $628 DECstation 5000 Model 200 MX Sun Microsystems/ $15,000 28.5 21 $526 $714 SPARCstation 2 Units include monochrome monitors and no disks, except for IBM 320, which has a 120-Mbyte disk Sources: "HP Apollo 9000 Series 700 Performance Brief", Companies and UNIX Today! ===================================================================== Many have questioned the statement in my earlier posting that Motif 1.2 will be available for the Snakes. Well, even though I found this stated explicitly in three different official HP publications (HP numbers 5091-0977E, 5091-0979E, and 5091-0980E) I am now inclined to believe that this is a typo. Any official comments from HP on this? Bo --- ^ Bo Thide'-------------------------------------------------------------- |I| Swedish Institute of Space Physics, S-755 91 Uppsala, Sweden |R| Phone: (+46) 18-303671. Telex: 76036 (IRFUPP S). Fax: (+46) 18-403100 /|F|\ INTERNET: bt@irfu.se UUCP: ...!mcvax!sunic!irfu!bt ~~U~~ -----------------------------------------------------------------sm5dfw
mash@mips.com (John Mashey) (03/31/91)
In article <2004@kuling.UUCP> bt@irfu.se (Bo Thide') writes: >----------------------------------------------------------------------------- > gcc espr. li eqntott spice doduc nasa7 matrix fpppp tomcatv >----------------------------------------------------------------------------- >HP9000/730 46.5 55.2 50.3 52.6 60.9 64.0 73.7 273.3 107.0 67.4 >HP9000/720 35.2 42.5 36.1 40.6 46.9 48.6 58.0 210.0 81.4 52.9 >----------------------------------------------------------------------------- Note that the above is highly important information, if you compare mips-ratings with the SPECint subset (the first 4), and the overall behavior patterns, including the effect on matrix300. (This is quite legal, by the way.) However, SPEC has ALWAYS insisted that you see all 10 numbers, and this is one more reminder of the reason, because you can be completely misled about the performance pattern of the machine if all you see is a SPECmark. [Somebody from HP earlier claimed that the SPECmark understated the performance on real programs, not toys .... B.S. As a generic statement, that was simply nonsense, and a misstatement of what a SPECmark (alone) means, and doesn't mean.... >Here's another comparison table: PLEASE: do we have to keep seeing cost/MIPS, where everybody computes mips differently. I'll express a little irritation at this: people MIGHT have computed Price/SPECmark and Price/SPECint, where the latter is an intelligent approximation to Price/VAX-mips-integer. Of course, if I had a machine whose SPECfp is substantially higher than its SPECint, and I were a marketeer: a) I'd quote SPECmarks to get the effect from the FP b) I'd quote mips-ratings (based on dhrystone) to get a good-looking price/mips. c) I'd avoid quoting a price/SPEcint, although that is a hugely more predictive number, and whose data HAD to be available to compute the overall SPECmark.... even if the value is quite good, (which it is) because b) is likely to be better.... d) To see the kinds of distortions introduced, let's observe that the DS5000 is 20-25% FASTER on SPECint than the IBM 320, but you'd never guess that from this table. (The DS5000 and IBM 320 should have roughly equal $/SPECint, not DS5000 costing 1.3X+ more). e) In general, it is much easier to compare performacne than price, because different vendors use different pricings for the the various pieces you may end up needing. AS a buyer, you should always compare prices of configurations you're likely to buy and use. Some vendors have very low entry prices, but other things cost more. (I don't know whether this is true or not with the HPs; I haven't seen a price list yet with enough numbers to know. Maybe someone else has.) >==================================================================== > The HP 720: How It Stacks Up > >COMPANY/PRODUCT PRICE MIPS SPEC Price Per Price Per > marks MIPS SPECmark > >Hewlett-Packard/ $12,000 57 55.5 $211 $216 >HP 9000 Model 720 > >IBM/ $9,725 29.5 24.6 $330 $395 >RISC System/6000 >Model 320 > >Digital Equipment/ $12,500 27.3 19.9 $458 $628 >DECstation 5000 >Model 200 MX > >Sun Microsystems/ $15,000 28.5 21 $526 $714 >SPARCstation 2 Summary: Snakes look like a good implementation of a good architecture; the FP got a good boost, mostly from the new compilers avail in June; the integer performance remains closely on the line of well-implemented single-issue, 1-level cache RISCs, i.e., SPECint = .75-.80X MHz. But please, let's terminate this $/dhrystone-mips trash.... -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086
mbk@jacobi.ucsd.edu (Matt Kennel) (03/31/91)
In article <2004@kuling.UUCP> bt@irfu.se (Bo Thide') writes: >----------------------------------------------------------------------------- > gcc espr. li eqntott spice doduc nasa7 matrix fpppp tomcatv >----------------------------------------------------------------------------- >HP9000/730 46.5 55.2 50.3 52.6 60.9 64.0 73.7 273.3 107.0 67.4 >HP9000/720 35.2 42.5 36.1 40.6 46.9 48.6 58.0 210.0 81.4 52.9 >----------------------------------------------------------------------------- Methinks the Attack of the Killer Micros is shaping up as the Mother Of All Battles. Matt K mbk@inls1.ucsd.edu
irf@kuling.UUCP (Bo Thide') (04/01/91)
OOPS! I made a mistake when reading off and entering the individual SPEC ratings for the Cobra (HPA9000/720). I hope the following table is correct. The full "HP Apollo 9000 Series 700 Performace Brief" report is very comprehensive (41 pages, with Reference list) and contains many more benchmark ratings and comparisons (with, hopefully, fewer errors) than my excerpts posted to the net. You can get copies of this report from your nearest HP office. ----------------------------------------------------------------------------- gcc espr. li eqntott spice doduc nasa7 matrix fpppp tomcatv ----------------------------------------------------------------------------- HP9000/730 46.5 55.2 50.3 52.6 60.9 64.0 73.7 273.3 107.0 67.4 HP9000/720 35.2 42.5 38.1 40.6 46.9 48.6 58.0 210.0 81.4 52.9 ^^^^ ----------------------------------------------------------------------------- So the Cobra I-SPECmark becomes (35.2*42.5*38.1*40.6)^(1/4) = 39.0. Exactly as claimed by HP. Bo --- ^ Bo Thide'-------------------------------------------------------------- |I| Swedish Institute of Space Physics, S-755 91 Uppsala, Sweden |R| Phone: (+46) 18-303671. Telex: 76036 (IRFUPP S). Fax: (+46) 18-403100 /|F|\ INTERNET: bt@irfu.se UUCP: ...!mcvax!sunic!irfu!bt ~~U~~ -----------------------------------------------------------------sm5dfw
maf@hpfcso.FC.HP.COM (Mark Forsyth) (04/02/91)
>From: mash@mips.com (John Mashey) > >PLEASE: do we have to keep seeing cost/MIPS, where everybody >computes mips differently. I'll express a little irritation at this: >people MIGHT have computed Price/SPECmark and Price/SPECint, >where the latter is an intelligent approximation to Price/VAX-mips-integer. And why not Price/SPECfp also ? Don't workstation customers still care highly about performance on FP applications ? Why did SPEC choose six FP intensive benchmarks if these are not important ? Is the importance of a benchmark suite proportional to how competitive MIPs is on it ? Why not also include I/O, graphics, X11, etc. ? >Of course, if I had a machine whose SPECfp is substantially higher than >its SPECint, and I were a marketeer: > a) I'd quote SPECmarks to get the effect from the FP > b) I'd quote mips-ratings (based on dhrystone) to get a good-looking > price/mips. > c) I'd avoid quoting a price/SPEcint, although that is a hugely > more predictive number, and whose data HAD to be available to > compute the overall SPECmark.... even if the value is quite good, > (which it is) because b) is likely to be better.... And if I were HP I'd quote (which they have) performance on many other types of workloads in a 42 page performance brief. It includes all of the SPEC components as well as Workstation Labs' Khornerstones, X windows (x11perf 1.2), networking, disk I/O, Ansys (scientific applications), graphics, as well as the "toy" benchmarks (which are included because customers and press ask for them, and think you are hiding something if you try to tell them it's not important). The $/anything comparisons you are so miffed about come from the press (UNIX Today, I believe), not HP documentation. > >Summary: Snakes look like a good implementation of a good architecture; >the FP got a good boost, mostly from the new compilers avail in June; Even without the new compilers the FPspec/MHz (0.94) is about 30% higher than the DECstation 5000/200 (0.74) and absolute FP spec is about 2.6 times. The new compilers raise these to 43% and 4.9 times, respectively. >the integer performance remains closely on ^^^^^^^^^^^ >the line of well-implemented single-issue, 1-level cache RISCs, >i.e., SPECint = .75-.80X MHz. The integer "performance" is 2.7 times the 25 MHz R3000 (DEC 5000/200). Speed is an extremely important high performance design technique. If you normalize it out you end up with a truely meaningless indicator of performance and one that users shouldn't and don't care about. > >-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> - Mark Forsyth
rcd@ico.isc.com (Dick Dunn) (04/03/91)
maf@hpfcso.FC.HP.COM (Mark Forsyth) writes: > >From: mash@mips.com (John Mashey) > >...people MIGHT have computed Price/SPECmark and Price/SPECint,... > And why not Price/SPECfp also ? Don't workstation customers still care > highly about performance on FP applications ? Why did SPEC choose six > FP intensive benchmarks if these are not important ?... You're begging your own question. If 6 out of 10 of the SPEC benchmarks are FP-intensive, SPECmark itself is a significant indicator of FP performance. In fact, the SPEC suite has been criticized for being too heavily weighted toward FP as it is. While that's arguable at best, it does seem that a hot FP processor is going to have a hot SPECmark figure unless the integer performance is terminally lame. I suppose that "SPECfp" is interesting if you really want to focus tightly on FP performance, but the situation is not symmetric between carving int performance out of SPECmark and carving FP performance out: - SPECmark is already weighted toward FP - after you set aside "balanced" (in the SPECmark sense) int-vs-FP usage of workstations, mostly-int use is more common than mostly- FP use. ALL of the $/CPUbenchmark figures are bogus, but that's another flame. >...Is the importance > of a benchmark suite proportional to how competitive MIPs is on it ? Did you mean MIPS? (I.e., was that just a gratuitous slam at Mash?) > Why not also include I/O, graphics, X11, etc. ? Well, if you're interested in X, you're not much interested in performance. Oops! (ducks quickly; large, heavy, dull object [box of X manuals?] slams into the wall behind him...:-) Seriously, what are we talking about--CPU performance or system performance? For CPU performance, there's too much else in the way for I/O performance to be a useful indicator. Graphics and X (interesting distinction!) may or may not be relevant to CPU performance, depending on how much of the work is left to the CPU vs how much other hardware helps or hinders. Of course, it would be *more* useful to look at overall system performance, in which case I/O and graphics performance measures are welcome (in fact, necessary) additions. Just be sure that everybody changes the channel at the same time if we're going to start talking overall performance. > The $/anything comparisons you are so miffed about come from the press > (UNIX Today, I believe), not HP documentation. The ones he was complaining about seemed to have been posted to this newsfroup. Let's separate complaints about HP (which I don't think I've seen much--I think comments have generally been complimentary) from complaints about bogus numbers (which are deserved regardless of source). -- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870 The Official Colorado State Vegetable is now the "state legislator".
mash@mips.com (John Mashey) (04/03/91)
In article <8840021@hpfcso.FC.HP.COM> maf@hpfcso.FC.HP.COM (Mark Forsyth) writes: >>From: mash@mips.com (John Mashey) >> >>PLEASE: do we have to keep seeing cost/MIPS, where everybody >>computes mips differently. I'll express a little irritation at this: >>people MIGHT have computed Price/SPECmark and Price/SPECint, >>where the latter is an intelligent approximation to Price/VAX-mips-integer. > >And why not Price/SPECfp also ? Don't workstation customers still care >highly about performance on FP applications ? Why did SPEC choose six >FP intensive benchmarks if these are not important ? Is the importance >of a benchmark suite proportional to how competitive MIPs is on it ? >Why not also include I/O, graphics, X11, etc. ? To short-circuit this before it gets out of hand (maf and I have had side-conversations already, calming things done, I think), where all of this came from, I think is: a) It was the understanding of various members of SPEC that if you published SPECmarks, that the full-disclosure form be available at the same time, in particular all 10 numbers. Most people thought that the license says exactly that, but it says something slightly vaguer. b) In general, SPEC members, if they provided a SPECmark, also provided the 10 numbers. In particular, various members of SPEC, from day 1, have been adamant in NOT signing up to anything that didn't have full disclosures of whatever sort the full disclosure consensus agreed on. (me, among others :-) For instance, some companies regularly send out the SPEC form along with the initial press release on a product. c) Unfortunately, not everybody at HP was involved in all of the discussions to this effect, and the license wording is not as explicit as people thought it had been. In addition, in large companies, the SPEC members do not have as much direct influence on marketing as they might in smaller ones. As a result, although CLEARLY not intended by the HP SPEC folks, and DEFINITELY not signed up to by many of the people who've helped SPEC exist, was an important period of time during which: a) The analysts and press had the SPECmark number and a MIPS-rating, because that is what they'd been given. b) They beat up unmercifully on various HP competitors. Since all they had was MIPS and SPECmark, that's what they used, and whether intended or not, those 2 numbers alone are misleading. c) When a press person or analyst called you up, there was no rational reply of any sort possible, because you couldn't get the 10 individual numbers. I.e., you were handed a situation in which what you'd thought you'd agreed to was a certain kind of disclosure (to avoid the problems of single-number things), and you'd supported that, but you were now getting hammered with exactly the thing you thought you hadn't agreed to.... (It was bad enough that the numbers are quite good; it was worse not knowingthe shape of the curves.) d) Now, the numbers are now available, and one can do whatever analysis makes any sense on them, and the 10 numbers are MUCH more meaningful than 1 number, especially when there is a high variance. Both integer and FP are important; it is important to distinguish, not mix them together; it is important to see all 10 numbers to see the variation pattern. $/SPECmark is fine, so is $/SPECfp. However, using either of those two, and then using $/dhry-mips instead of $/SPECint is uncool.... (And I know it doesn't originate with the SPEC folks, HP or otherwise. Almost EVERYBODY involved in SPEC has "interesting" times "educating" their marketing groups....) e) Anyway, SPEC is working to make sure the rules are clarified in everybody's mind, and it's no big deal in the long run. HP usually does an excellent, credible job documenting its performance - the current performance document is fine. The main issue here was the gap in time between when SPECmark numbers were given out and when the full 10-number disclosure was available, and we just have to be clearer about the rules of the game.... This is especially important as we move from the (relatively) easy CPU benchmarks towards more complex systems benchmarks. ...well, back to work... -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086
pf@diab.se (Per Fogelstr|m) (04/03/91)
In article <8840021@hpfcso.FC.HP.COM> maf@hpfcso.FC.HP.COM (Mark Forsyth) writes: >>From: mash@mips.com (John Mashey) > >>the line of well-implemented single-issue, 1-level cache RISCs, >>i.e., SPECint = .75-.80X MHz. > >The integer "performance" is 2.7 times the 25 MHz R3000 (DEC 5000/200). >Speed is an extremely important high performance design technique. If >you normalize it out you end up with a truely meaningless indicator of >performance and one that users shouldn't and don't care about. > >> >>-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> > >- Mark Forsyth I think Mashey's statement is correct. It's not meaningless to compare normalized SPECint because it gives You a good indicator on how well the architecure is implemented. The proof is the SPECint figure for Sparc which is very low compared to others. You might guess why. My conclusion from this is that even if the clock frequency is increased for the sparc chips, they will not achive the same performance as others built with the same chip technology, eg. the headroom is lower. If we could push the clock frequency for the R3000 up to 66Mhz it would, if we scale the results, perform equally well with the HP9000/730. Of course this only shows that the architectures are performing about the same, though there are no 66Mhz R3000, only 40Mhz. But then there is the R4000...... Well, this was the technical point of view, and it would not help the customers that want the boxes today, but I'm an design engineer. So if i wanted the best price/performance solution for my fp intensive application today, i would probably chose an HP9000/730. Ok, You are welcome to flame, but send marketing trash to /dev/null. -- Per Fogelstrom, Diab Data AB SNAIL: Box 2029, S-183 02 Taby, Sweden ANALOG: +46 8-7680660 EMAIL: mcsun!sunic!diab!pf or pf@diab.se
mash@mips.com (John Mashey) (04/04/91)
In article <569@diab.se> pf@diab.UUCP (Per Fogelstr|m) writes: >In article <8840021@hpfcso.FC.HP.COM> maf@hpfcso.FC.HP.COM (Mark Forsyth) writes: >>>From: mash@mips.com (John Mashey) >> >>>the line of well-implemented single-issue, 1-level cache RISCs, >>>i.e., SPECint = .75-.80X MHz. >>The integer "performance" is 2.7 times the 25 MHz R3000 (DEC 5000/200). >>Speed is an extremely important high performance design technique. If >>you normalize it out you end up with a truely meaningless indicator of >>performance and one that users shouldn't and don't care about. >I think Mashey's statement is correct. It's not meaningless to compare >normalized SPECint because it gives You a good indicator on how well >the architecure is implemented. The proof is the SPECint figure for >Sparc which is very low compared to others. You might guess why. As maf points out, the (anythings)/Mhz is fairly irrelevant to end users. Of course, this is a newsgroup on computer architecture, and my note was a successor to some earlier postings (I don't recall from whom) about SPECint/MHz. Such numbers (actual benchmark)/Mhz are of interest to architects, of course, since: a) SPECint/Mhz is probably as close as you can get to a measurable integer CPI, where you can get real numbers for lots of machines. [Because it is truly difficult to get real Cycle-Per-Instruction numbers, unless you are the computer architect with all of the real, likely-to-be-proprietary tools.] Many arguments, real or bogus, revolve around CPIs, hence it is useful to actually have some realistic metrics to compare on. Of course, it is also userful to look at SPECmarks/Mhz, SPECfp/Mhz, LINPACK MFLOPS/Mhz, etc, i.e., as long as the benchmark is something you think is meaningful. b) An open issue (based on machines on the market) is whether or not you can get the SPECint/MHz much better than .75 to .85. It is CLEAR that you can get SPECmark and SPECfp better. Somewhere in Hennessy and Patterson it talks about available low-level parallelism, and differences thereof regarding types of code. But again, end users should usually care less ... even I, in "end user mode", have been known to purchase computers with processors in them whose SPECint/Mhz would probably be .1 or so :-) -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086
mlord@bwdls58.bnr.ca (Mark Lord) (04/04/91)
In article <....HP.COM> maf@hpfcso.FC.HP.COM (Mark Forsyth) writes:
<>From: mash@mips.com (John Mashey)
<>
<>the integer performance remains closely on
<>the line of well-implemented single-issue, 1-level cache RISCs,
<>i.e., SPECint = .75-.80X MHz.
<
<The integer "performance" is 2.7 times the 25 MHz R3000 (DEC 5000/200).
<Speed is an extremely important high performance design technique. If
<you normalize it out you end up with a truely meaningless indicator of
<performance and one that users shouldn't and don't care about.
From a comp.arch point of view, I think the .75-.80X MHz number is of
more value than archecturally irrelevant "my clock is double what yours is"
flame-fests.
So, can we now get back to discussing how best to advance the state-of-the-art
in computer archictectures... ?
--
MLORD@BNR.CA Ottawa, Ontario *** Personal views only ***
begin 644 NOTSHARE.COM ; Free MS-DOS utility - use instead of SHARE.EXE
MZQ.0@/P/=`J`_!9T!2[_+H``L/_/+HX&+`"T2<TAO@,!OX0`N1(`C,B.P/.DS
<^K@A-<TAB1Z``(P&@@"ZA`"X(27-(?NZE@#-)P#-5
``
end
clc5q@madras.cs.Virginia.EDU (Clark L. Coleman) (04/06/91)
In article <569@diab.se> pf@diab.UUCP (Per Fogelstr|m) writes: > >I think Mashey's statement is correct. It's not meaningless to compare >normalized SPECint because it gives You a good indicator on how well >the architecure is implemented. I believe that the main point is not being responded to here. If I complicate the architecture with all kinds of hardware that makes the critical path longer, and add instructions to the ISA that slow the cycle down, then it is an architectural issue, not just a sign of a poor implementation. I think the real question for comp.arch is: Given the same semiconductor process (e.g. any particular current CMOS 1.0 micron process), implement various architectures in that process as best as you can --- then what is the resulting performance? There are system level issues that I am leaving out of the equation, I realize, but at least the question is infinitely more relevant than "normalized Specint" comparisons. Let's try a hypothetical. The JCN computer company is designing a new workstation that is specifically geared towards performing well on the Specint benchmarks. Two competing design teams develop prototypes. (JCN has too much cash to burn, apparently.) One team comes up with a prototype that is implemented in the company's own 1.0 micron CMOS process, and it runs at 50MHz and achieves a Specint of 40. The other team, comprised of blithering idiots, comes up with a chip that interprets high-level code in a terribly complex circuit that has such long critical paths that it can only run at 1MHz in 1.0 micron CMOS. It achieves a Specint of 0.9, however, giving it a better Specint/MHz ratio than the other processor. Naturally, the company chooses to market the slower processor --- it has a provably "superior" architecture, based on the all-important Specint/MHz ratio, and that ratio will be great advertising fodder. The more than 40-fold performance ratio disadvantage must just be "implementation", not bad architecture, according to the marketing MBA genius who chooses the slower chip, "because if the first chip had only a 1MHz clock, it would have poorer benchmarks than the second, and clock rate is just an implementation matter." Unfortunately, the team that designed the first chip leaves and starts their own company, kicking the heck out of JCN in the marketplace. The MBA then lays off a few technical staff and decides they need a bigger advertising budget. THE END. Seriously, I cannot believe I am reading so many people claim that MHz is 100% implementation, 0% architecture. >If we could push the clock frequency for the R3000 up to 66Mhz it would, >if we scale the results, perform equally well with the HP9000/730. And will the HP9000/730 sit still while you do that? Can the R3000 be implemented TODAY in HP's technology at 66MHz ? Does anybody really believe that? >Well, this was the technical point of view, and it would not help the >customers that want the boxes today, but I'm an design engineer. "Technical point of view" ?? It is just a total misunderstanding of computer architecture issues that constrain implementation and affect the clock speed. > >So if i wanted the best price/performance solution for my fp intensive >application today, i would probably chose an HP9000/730. Yes, and you would say, "I sure wish JCN could improve their implementation so I could get their superior architecture instead. Darn! Why don't they do a better job of implementing over there?" And as time went by, their implementation would get better and faster, but so would the first machine. And the ratio would remain approximately 40 to 1 in favor of the first machine until it started to run into physical limitations that have to do with clock speed (bus noise, etc.) But it seems highly improbable that the JCN turkey will ever catch up. I have been watching HP's products for a decade. It always seemed to me that they were lagging the market in semiconductor implementation. Only their less efficient HP 9000/500 architecture got the benefit of their best NMOS process when it became available --- other machines started getting the same process several YEARS later (there must be some horrible stories of corporate inertia lurking around there.) I said to myself years ago that if HP were to implement the HP-PA stuff in state of the art semiconductors, they would blow away the competition. Prophecy fulfilled in 1991; the competitors ARE using a process that is just as good as HP's, other postings notwithstanding; and the detractions I see on this thread bespeak a lack of architecture understanding, or commercial envy and axe-grinding in a few cases. Put up or shut up, workstation vendors: Tell us what design rules would be required to achieve HP's Specint numbers. Most of you have the numbers on your "future evolution" sheets already. I suppose it is proprietary info, of course; just don't keep posting bull about how HP's success is just semiconductor process. When will we see a 66MHz SPARC ? When we have 0.6 micron processes for it? Let's be honest and cut the *****. As for Per, my comments are directed not at you, but at the corporate axe-grinders and assorted sideline sour grapes throwers. ----------------------------------------------------------------------------- "The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence." E.W.Dijkstra, 18th June 1975. ||| clc5q@virginia.edu (Clark L. Coleman)
sgreene@leland.Stanford.EDU (Spencer Greene) (04/06/91)
In article <12914@goofy.Apple.COM> russell@apple.com (Russell Williams) writes: >To summarize, John Mashey has spoken of SPECint / MHz as a measure of >architecture independent of absolute MHz, while Mark Forsyth has indicated >that ability to build a higher MHz implementation given the same process >technology is also a relevant measure of an architecture, stating that >HP's process technology is typical. . . > >The unanswered question seems to be: can the HP architecture be designed >with fewer gate delays than others? The issue, as has been pointed out, is one of headroom. Even if you agree that HP's cpu clock is *entirely* due to architecture, you cannot conclude that HP has the same ability to linearly scale clock rate as other vendors. For example, it is more likely that MIPS can go from 33 Mhz to 66 than that HP can go from 66 to 132. CPU issues aside, everyone is using essentially the same SRAMs, board traces, etc., and Amdahl's Law says that after cranking the CPU clock rate to a certain point these other issues will begin to dominate. Not that they *can't* be made to keep up in a 132-Mhz system, just that the expense is prohibitive compared to other solutions. Of course, this does not detract from the fact that HP has developed today a system which tests some of the purported limits of 1-processor RISC, while other vendors talk of unexploited headroom. However, it does suggest that to evaluate potential in a product line beyond the short term, we should look at the sophistication of the vendor's multiprocessor hardware, and (perhaps more importantly) software. Small wonder that SMP was all the rage at recent trade shows. ----------- Spencer Greene sgreene@leland.stanford.edu
preston@ariel.rice.edu (Preston Briggs) (04/06/91)
ram@shukra.Eng.Sun.COM (Renu Raman) writes: >Does anybody know (exactly how) HP was able to improve matrix-300's >spec-ratio from 30-odd to 200+ - Just curious.... Well, matrix-300 is normally a cache buster. With their rather large D-cache, it shouldn't be nearly as severe, though the problem is still larger than the cache used for the tests (more than 2Meg vs 128 or 256K). The multiply-accumulate instruction surely helps too. Also, it's possible to get a factor of 2 to 3 (or more?) by inlining and reworking the loops extensively. If their compilers can do this, there will be many happy customers. Preston Briggs
ram@shukra.Eng.Sun.COM (Renu Raman) (04/06/91)
Does anybody know (exactly how) HP was able to improve matrix-300's spec-ratio from 30-odd to 200+ - Just curious.... renu raman -- -------------------------------- Renukanthan Raman ARPA:ram@sun.com M/S 16-11, 2500 Garcia Avenue, TEL :415-336-1813 Sun Microsystems, Mt. View, CA 94043
maf@hpfcso.FC.HP.COM (Mark Forsyth) (04/08/91)
<To summarize, John Mashey has spoken of SPECint / MHz as a measure of <architecture independent of absolute MHz, while Mark Forsyth has indicated This is a reasonble measure of architectures if the speed is similar. It is more difficult to achieve low CPI at higher speeds since memory speed doesn't always scale with processor speed, producing higher cache and TLB miss penalties. For comparing 33 to 66MHz designs it probably doesn't make a big difference, except in some of the more cache intensive benchmarks (like some SPEC FP benchmarks). <that ability to build a higher MHz implementation given the same process <technology is also a relevant measure of an architecture, stating that <HP's process technology is typical. Mashey countered that HP's technology ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ HP's CMOS technology is excellent and very well suited to high-speed designs. The other poster who made that statement has been corrected. The only "typical" thing about it is the 1.0 micron channel lengths. <is faster than average. < <The unanswered question seems to be: can the HP architecture be designed <with fewer gate delays than others? I think there are 33MHz R3000s and 66 <MHz Snakes. Is this entire difference attributable to process technology? <When superscalar implementations become common, a similar question will <arise for them, and it's not obvious to me that the architectures which <allow the fewest gate dealys and best CPI numbers in a single-issue design <will still have the advantage in a multiple-issue design. Speed is a function of circuit design methodology, processor partitioning, chip floorplanning, feature set, etc. etc. as well as technology and architecture. All of these are interrelated and you have to review all of the choices with every generation to determine the best way to optimize cost, performance, and schedule for each specific product. Every RISC architecture is capable of higher speeds or superscalar implementations. It will be interesting to compare all of the choices for the next generation. - Mark Forsyth (my opinions, not HP's) maf@hpesmaf
mjs@hpfcso.FC.HP.COM (Marc Sabatella) (04/09/91)
> [re: matrix300 performance on Snakes ] >Also, it's possible to get a factor of 2 to 3 (or more?) >by inlining and reworking the loops extensively. >If their compilers can do this, there will be many happy customers. matrix300 has a big array deliberately traversed in the "wrong" order. By simply having the compiler transforming the loops from row-major to column major traversal, you can get huge wins. I believe this is what happened. This is not my favorite benchmark. However, even if you throw out the spiffy matrix300 number, Snakes SPECmarks look mighty good.
murf@cypress.UUCP (Colin Murphy) (04/10/91)
The tradeoffs made in processor micro-architecture are influenced by process capability and package availibility. The process capability consists not only of a delay per gate number, but also a number for the total count of transistors and wires possible per die. I am very familiar with the ROSS Technology SPARC 40Mhz 7C601 CPU, and barely familiar with the MIPs architecture and chips. First some data on process: ( Cypress Semiconductor is our foundry for this ) ROSS Tech. H-P MIPs Gate oxide 195 Ang 200 Ang exact unknown, IDT and Performance do have comparable processes available. In terms of transistor physics the first two processes are comparable. (I no longer even talk about Leff because with the advent of LDD and dished punch through control implants Leff is a function more of the particular measurement technique used than of reality.) So, at first glance H-P has just out engineered the rest of us. Now lets check out the "secondary" process characteristics. ROSS Tech. H-P MIPs contacted metal 1 pitch 4.0u 2.6u ?about the same as cypress? contacted metal 2 pitch 4.6u 2.6u ?a little larger than cypress? contacted metal 3 pitch none 6.0u none die size (per side) 310 mils 550 mils ?~330 mils, differs by vendor 7.9mm 14.0mm ?~8.4mm transistors 104K 479K ?<<200K, if memory serves I do not consider the H-P process to use the same generation of interconnect, it is at least one generation more advanced for pitch, yield, and the use of three levels of metal interconnect. That is, H-P puts a lot more wires and transistors, closer together, on the snake, than are on either the 7C601 or the R3000A. clc5q@virginia.edu (Clark L. Coleman) writes: >In article <569@diab.se> pf@diab.UUCP (Per Fogelstr|m) writes: >>If we could push the clock frequency for the R3000 up to 66Mhz it would, >>if we scale the results, perform equally well with the HP9000/730. > >And will the HP9000/730 sit still while you do that? Can the R3000 be >implemented TODAY in HP's technology at 66MHz ? Does anybody really >believe that? Why not? I do, with some corrections to long paths, and while I am another axe grinder, I am grinding mine to use on MIPSco, and H-P and IBM. The R3000A is limited by using an obsolete package so that it will be pin compatible with the previous designs, as would any cpu that used the same pin out for more than three years. Let's look at package technology, ROSS Tech. H-P MIPs 7C601 snake R3000 number of pins 207 408 176 The number of pins used on the 7C601 was limited by both the package technology available and the size of the die coupled with the minimum pad pitch on the die. What was the effect of this? H-P has a full harvard chip, with a 64 bit wide data bus, the 7C601 uses a 32 bit wide combined instruction and data bus with one address bus, the R3000 uses separate 32 bit instruction and data buses with a multiplexed address bus. The H-P chip has a built in advantage, one that is especially important for double precision floating point. BTW, the multiplexed address bus is probably what is limiting the R3000A to 33 MHz systems, I would guess that the chip itself is more capable, not that the end user cares. Historical note: The intel ?3001? 2 bit bit-slice and the 8008 were made obsolete by the AMD 2901 4 bit bit-slice and the intel 8080. Why? Because the first two were in 18 pin packages, and the second two were in the brand new 40 pin package, circa 1974-76. -- quote out of order -- >Seriously, I cannot believe I am reading so many people claim that MHz >is 100% implementation, 0% architecture. >I said to myself years ago that if HP were to implement the HP-PA stuff in >state of the art semiconductors, they would blow away the competition. >Prophecy fulfilled in 1991; the competitors ARE using a process that is just >as good as HP's, other postings notwithstanding; and the detractions I see on >this thread bespeak a lack of architecture understanding, or commercial envy >and axe-grinding in a few cases. What is there about the HP-PA architecure that allows for a faster implementation given equal levels of technology? I would like to know so I can go beat up on some architects. 8^) Seriously, the next SPARC chips will have more transistors and wires per die and use more pins. This is the result of designing in 1989-91 as opposed to 1986-88, and has nothing to do with ISA, but everything to do with micro architecture and economics. -- Colin Murphy - ROSS Technology, Inc, daver!cypress!murf - (408) 943-2887" "The many, the humble, the implementors of the SPARC custom CMOS IU"