[comp.arch] More Snake bytes.

irf@kuling.UUCP (Bo Thide') (03/30/91)

On popular demand here is a follow-up on my eralier posting with
individual SPECmark ratings for the HP Apollo 9000/700 Snake family. 


-----------------------------------------------------------------------------
            gcc  espr. li   eqntott spice doduc nasa7 matrix fpppp tomcatv
-----------------------------------------------------------------------------
HP9000/730  46.5 55.2  50.3 52.6    60.9  64.0  73.7  273.3  107.0 67.4
HP9000/720  35.2 42.5  36.1 40.6    46.9  48.6  58.0  210.0   81.4 52.9
-----------------------------------------------------------------------------


In John D. McCalpin's (mccalpin@perelandra.cms.udel.edu) terms the 264
MB/sec bus speed gives the 720 a "Streaming MFLOPS" rating of 11.
Here's John's table from his comp.benchmarks posting (I let John put in
the correct Snake figures; an update would be welcome, John!)

               Performance Summary Table - LINPACK
                -----------------------------------

                MFLOPS  MFLOPS  MFLOPS  MFLOPS  Price   MFLOPS/Million$
System          Peak    Max     Lnpk    Stream  $10**6   Max    Stream
-----------------------------------------------------------------------
IBM 550          82      62       27     12      0.13    477      92
MIPS RC6280      24      16       10      8      0.20     80      40
IBM 320          40      29        9      6     <0.02   1450     300
-----------------------------------------------------------------------
Convex C-210     50      44       17      9     ~0.5      88      18
Convex C-240    200     166       26     36     ~1.6     104      23
-----------------------------------------------------------------------
Cray Y/MP-1     333     324       25    150     ~3.0     108      50
Cray Y/MP-8    2664    2144      275   1200    ~16.0     134      75
-----------------------------------------------------------------------
1xIBM 3090E VF  116      71       13     11     ~3.0      24       4
2xIBM 3090E VF  232     141       26*    22     ~5.0      28       4
3xIBM 3090E VF  348     210       39*    33     ~7.0      30       5
-----------------------------------------------------------------------
SGI 4D/310       10       8        6      3
SGI 4D/380       80      52       48*     3     ~0.20    260      15
-----------------------------------------------------------------------
Stardent 3010    32      25       10      6
Stardent 3040   128      77       12     11     ~0.25    308      44
-----------------------------------------------------------------------
IBM 320          40      29        9      6     <0.02   1450     300
8x  IBM 320     320     232*      72*    48      0.11   1450*    300
16x IBM 320     640     464*     144*    96      0.21   1450*    300
-----------------------------------------------------------------------
(*) indicates extrapolated figures.



Here's another comparison table:
====================================================================
           The HP 720:  How It Stacks Up

COMPANY/PRODUCT     PRICE       MIPS    SPEC    Price Per   Price Per
                                        marks   MIPS        SPECmark

Hewlett-Packard/    $12,000     57      55.5    $211        $216
HP 9000 Model 720

IBM/                $9,725      29.5    24.6    $330        $395
RISC System/6000
Model 320

Digital Equipment/  $12,500     27.3    19.9    $458        $628
DECstation 5000
Model 200 MX

Sun Microsystems/   $15,000     28.5    21      $526        $714
SPARCstation 2

Units include monochrome monitors and no disks,
except for IBM 320, which has a 120-Mbyte disk     

Sources: "HP Apollo 9000 Series 700 Performance Brief",
	 Companies and UNIX Today!

=====================================================================



Many have questioned the statement in my earlier posting that Motif 1.2
will be available for the Snakes.  Well, even though I found this stated
explicitly in three different official HP publications (HP numbers
5091-0977E, 5091-0979E, and 5091-0980E) I am now inclined to believe
that this is a typo.  Any official comments from HP on this?

Bo

---
   ^   Bo Thide'--------------------------------------------------------------
  |I|        Swedish Institute of Space Physics, S-755 91 Uppsala, Sweden
  |R|  Phone: (+46) 18-303671.  Telex: 76036 (IRFUPP S).  Fax: (+46) 18-403100 
 /|F|\          INTERNET: bt@irfu.se      UUCP: ...!mcvax!sunic!irfu!bt  
 ~~U~~ -----------------------------------------------------------------sm5dfw

mash@mips.com (John Mashey) (03/31/91)

In article <2004@kuling.UUCP> bt@irfu.se (Bo Thide') writes:
>-----------------------------------------------------------------------------
>            gcc  espr. li   eqntott spice doduc nasa7 matrix fpppp tomcatv
>-----------------------------------------------------------------------------
>HP9000/730  46.5 55.2  50.3 52.6    60.9  64.0  73.7  273.3  107.0 67.4
>HP9000/720  35.2 42.5  36.1 40.6    46.9  48.6  58.0  210.0   81.4 52.9
>-----------------------------------------------------------------------------
Note that the above is highly important information, if you compare
mips-ratings with the SPECint subset (the first 4), and the overall behavior
patterns, including the effect on matrix300.  (This is quite legal, by the
way.)  However, SPEC has ALWAYS insisted that you see all 10 numbers,
and this is one more reminder of the reason, because you can be completely
misled about the performance pattern of the machine if all you see is
a SPECmark.  [Somebody from HP earlier claimed that the SPECmark understated
the performance on real programs, not toys .... B.S.  As a generic statement,
that was simply nonsense, and a misstatement of what a SPECmark (alone)
means, and doesn't mean....

>Here's another comparison table:
PLEASE: do we have to keep seeing cost/MIPS, where everybody
computes mips differently.  I'll express a little irritation at this:
people MIGHT have computed Price/SPECmark and Price/SPECint,
where the latter is an intelligent approximation to Price/VAX-mips-integer.
Of course, if I had a machine whose SPECfp is substantially higher than
its SPECint, and I were a marketeer:
	a) I'd quote SPECmarks to get the effect from the FP
	b) I'd quote mips-ratings (based on dhrystone) to get a good-looking
	price/mips.
	c) I'd avoid quoting a price/SPEcint, although that is a hugely
	more predictive number, and whose data HAD to be available to
	compute the overall SPECmark....  even if the value is quite good,
	(which it is) because b) is likely to be better....
	d) To see the kinds of distortions introduced, let's observe
	that the DS5000 is 20-25% FASTER on SPECint than the IBM 320, 
	but you'd never guess that from this table.  (The DS5000 and
	IBM 320 should have roughly equal $/SPECint, not DS5000 costing
	1.3X+ more).
	e) In general, it is much easier to compare performacne than
	price, because different vendors use different pricings
	for the the various pieces you may end up needing.  AS a buyer,
	you should always compare prices of configurations you're
	likely to buy and use.  Some vendors have very low entry prices,
	but other things cost more.  (I don't know whether this is true or
	not with the HPs; I haven't seen a price list yet with enough
	numbers to know.  Maybe someone else has.)
	
>====================================================================
>           The HP 720:  How It Stacks Up
>
>COMPANY/PRODUCT     PRICE       MIPS    SPEC    Price Per   Price Per
>                                        marks   MIPS        SPECmark
>
>Hewlett-Packard/    $12,000     57      55.5    $211        $216
>HP 9000 Model 720
>
>IBM/                $9,725      29.5    24.6    $330        $395
>RISC System/6000
>Model 320
>
>Digital Equipment/  $12,500     27.3    19.9    $458        $628
>DECstation 5000
>Model 200 MX
>
>Sun Microsystems/   $15,000     28.5    21      $526        $714
>SPARCstation 2

Summary: Snakes look like a good implementation of a good architecture;
the FP got a good boost, mostly from the new compilers avail in June;
the integer performance remains closely on
the line of well-implemented single-issue, 1-level cache RISCs,
i.e., SPECint = .75-.80X MHz.

But please, let's terminate this $/dhrystone-mips trash....
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086

mbk@jacobi.ucsd.edu (Matt Kennel) (03/31/91)

In article <2004@kuling.UUCP> bt@irfu.se (Bo Thide') writes:
>-----------------------------------------------------------------------------
>            gcc  espr. li   eqntott spice doduc nasa7 matrix fpppp tomcatv
>-----------------------------------------------------------------------------
>HP9000/730  46.5 55.2  50.3 52.6    60.9  64.0  73.7  273.3  107.0 67.4
>HP9000/720  35.2 42.5  36.1 40.6    46.9  48.6  58.0  210.0   81.4 52.9
>-----------------------------------------------------------------------------


Methinks the Attack of the Killer Micros is shaping up as
the Mother Of All Battles.




Matt K
mbk@inls1.ucsd.edu

irf@kuling.UUCP (Bo Thide') (04/01/91)

OOPS!  I made a mistake when reading off and entering the individual
SPEC ratings for the Cobra (HPA9000/720).  I hope the following table is
correct.  The full "HP Apollo 9000 Series 700 Performace Brief" report
is very comprehensive (41 pages, with Reference list) and contains many
more benchmark ratings and comparisons (with, hopefully, fewer errors)
than my excerpts posted to the net.  You can get copies of this report
from your nearest HP office.


-----------------------------------------------------------------------------
            gcc  espr. li   eqntott spice doduc nasa7 matrix fpppp tomcatv
-----------------------------------------------------------------------------
HP9000/730  46.5 55.2  50.3 52.6    60.9  64.0  73.7  273.3  107.0 67.4
HP9000/720  35.2 42.5  38.1 40.6    46.9  48.6  58.0  210.0   81.4 52.9
                       ^^^^
-----------------------------------------------------------------------------

So the Cobra I-SPECmark becomes (35.2*42.5*38.1*40.6)^(1/4) = 39.0. Exactly
as claimed by HP.


Bo

---
   ^   Bo Thide'--------------------------------------------------------------
  |I|        Swedish Institute of Space Physics, S-755 91 Uppsala, Sweden
  |R|  Phone: (+46) 18-303671.  Telex: 76036 (IRFUPP S).  Fax: (+46) 18-403100 
 /|F|\          INTERNET: bt@irfu.se      UUCP: ...!mcvax!sunic!irfu!bt  
 ~~U~~ -----------------------------------------------------------------sm5dfw

maf@hpfcso.FC.HP.COM (Mark Forsyth) (04/02/91)

>From: mash@mips.com (John Mashey)
>
>PLEASE: do we have to keep seeing cost/MIPS, where everybody
>computes mips differently.  I'll express a little irritation at this:
>people MIGHT have computed Price/SPECmark and Price/SPECint,
>where the latter is an intelligent approximation to Price/VAX-mips-integer.

And why not Price/SPECfp also ?  Don't workstation customers still care
highly about performance on FP applications ?  Why did SPEC choose six
FP intensive benchmarks if these are not important ?  Is the importance
of a benchmark suite proportional to how competitive MIPs is on it ?
Why not also include I/O, graphics, X11, etc. ?

>Of course, if I had a machine whose SPECfp is substantially higher than
>its SPECint, and I were a marketeer:
>	a) I'd quote SPECmarks to get the effect from the FP
>	b) I'd quote mips-ratings (based on dhrystone) to get a good-looking
>	price/mips.
>	c) I'd avoid quoting a price/SPEcint, although that is a hugely
>	more predictive number, and whose data HAD to be available to
>	compute the overall SPECmark....  even if the value is quite good,
>	(which it is) because b) is likely to be better....

And if I were HP I'd quote (which they have) performance on many other
types of workloads in a 42 page performance brief. It includes all of
the SPEC components as well as Workstation Labs' Khornerstones, X windows
(x11perf 1.2), networking, disk I/O, Ansys (scientific applications), 
graphics, as well as the "toy" benchmarks (which are included because 
customers and press ask for them, and think you are hiding something if
you try to tell them it's not important).

The $/anything comparisons you are so miffed about come from the press
(UNIX Today, I believe), not HP documentation. 

>
>Summary: Snakes look like a good implementation of a good architecture;
>the FP got a good boost, mostly from the new compilers avail in June;

Even without the new compilers the FPspec/MHz (0.94) is about 30% higher
than the DECstation 5000/200 (0.74) and absolute FP spec is about 2.6 
times.  The new compilers raise these to 43% and 4.9 times, respectively. 

>the integer performance remains closely on
             ^^^^^^^^^^^ 
>the line of well-implemented single-issue, 1-level cache RISCs,
>i.e., SPECint = .75-.80X MHz.

The integer "performance" is 2.7 times the 25 MHz R3000 (DEC 5000/200). 
Speed is an extremely important high performance design technique. If 
you normalize it out you end up with a truely meaningless indicator of
performance and one that users shouldn't and don't care about.

>
>-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>

- Mark Forsyth

rcd@ico.isc.com (Dick Dunn) (04/03/91)

maf@hpfcso.FC.HP.COM (Mark Forsyth) writes:
> >From: mash@mips.com (John Mashey)
> >...people MIGHT have computed Price/SPECmark and Price/SPECint,...

> And why not Price/SPECfp also ?  Don't workstation customers still care
> highly about performance on FP applications ?  Why did SPEC choose six
> FP intensive benchmarks if these are not important ?...

You're begging your own question.  If 6 out of 10 of the SPEC benchmarks
are FP-intensive, SPECmark itself is a significant indicator of FP
performance.  In fact, the SPEC suite has been criticized for being too
heavily weighted toward FP as it is.  While that's arguable at best, it
does seem that a hot FP processor is going to have a hot SPECmark figure
unless the integer performance is terminally lame.

I suppose that "SPECfp" is interesting if you really want to focus tightly
on FP performance, but the situation is not symmetric between carving int
performance out of SPECmark and carving FP performance out:
	- SPECmark is already weighted toward FP
	- after you set aside "balanced" (in the SPECmark sense) int-vs-FP
	  usage of workstations, mostly-int use is more common than mostly-
	  FP use.

ALL of the $/CPUbenchmark figures are bogus, but that's another flame.

>...Is the importance
> of a benchmark suite proportional to how competitive MIPs is on it ?

Did you mean MIPS?  (I.e., was that just a gratuitous slam at Mash?)

> Why not also include I/O, graphics, X11, etc. ?

Well, if you're interested in X, you're not much interested in performance.
Oops! (ducks quickly; large, heavy, dull object [box of X manuals?] slams
into the wall behind him...:-)

Seriously, what are we talking about--CPU performance or system
performance?  For CPU performance, there's too much else in the way for
I/O performance to be a useful indicator.  Graphics and X (interesting
distinction!) may or may not be relevant to CPU performance, depending on
how much of the work is left to the CPU vs how much other hardware helps or
hinders.

Of course, it would be *more* useful to look at overall system performance,
in which case I/O and graphics performance measures are welcome (in fact,
necessary) additions.  Just be sure that everybody changes the channel at
the same time if we're going to start talking overall performance.

> The $/anything comparisons you are so miffed about come from the press
> (UNIX Today, I believe), not HP documentation. 

The ones he was complaining about seemed to have been posted to this
newsfroup.  Let's separate complaints about HP (which I don't think I've
seen much--I think comments have generally been complimentary) from
complaints about bogus numbers (which are deserved regardless of source).
-- 
Dick Dunn     rcd@ico.isc.com -or- ico!rcd       Boulder, CO   (303)449-2870
   The Official Colorado State Vegetable is now the "state legislator".

mash@mips.com (John Mashey) (04/03/91)

In article <8840021@hpfcso.FC.HP.COM> maf@hpfcso.FC.HP.COM (Mark Forsyth) writes:
>>From: mash@mips.com (John Mashey)
>>
>>PLEASE: do we have to keep seeing cost/MIPS, where everybody
>>computes mips differently.  I'll express a little irritation at this:
>>people MIGHT have computed Price/SPECmark and Price/SPECint,
>>where the latter is an intelligent approximation to Price/VAX-mips-integer.
>
>And why not Price/SPECfp also ?  Don't workstation customers still care
>highly about performance on FP applications ?  Why did SPEC choose six
>FP intensive benchmarks if these are not important ?  Is the importance
>of a benchmark suite proportional to how competitive MIPs is on it ?
>Why not also include I/O, graphics, X11, etc. ?
To short-circuit this before it gets out of hand (maf and I have had
side-conversations already, calming things done, I think),
where all of this came from, I think is:
a) It was the understanding of various members of SPEC that if you
published SPECmarks, that the full-disclosure form be available at the
same time, in particular all 10 numbers.  Most people thought that the
license says exactly that, but it says something slightly vaguer.

b) In general, SPEC members, if they provided a SPECmark,
also provided the 10 numbers.  In particular, various members of SPEC,
from day 1, have been adamant in NOT signing up to anything that
didn't have full disclosures of whatever sort the full disclosure
consensus agreed on. (me, among others :-)  For instance, some companies
regularly send out the SPEC form along with the initial press release
on a product.

c) Unfortunately, not everybody at HP was involved in all of the
discussions to this effect, and the license wording is not as explicit
as people thought it had been. In addition, in large companies,
the SPEC members do not have as much direct influence on marketing as
they might in smaller ones. As a result, although CLEARLY not intended
by the HP SPEC folks, and DEFINITELY not signed up to by many of the
people who've helped SPEC exist, was an important period of time
during which:
	a) The analysts and press had the SPECmark number and a MIPS-rating,
	because that is what they'd been given.
	b) They beat up unmercifully on various HP competitors.
	Since all they had was MIPS and SPECmark, that's what they used,
	and whether intended or not, those 2 numbers alone are
	misleading. 
	c) When a press person or analyst called you up, there was no
	rational reply of any sort possible, because you couldn't
	get the 10 individual numbers.  I.e., you were handed a situation
	in which what you'd thought you'd agreed to was a certain kind
	of disclosure (to avoid the problems of single-number things),
	and you'd supported that, but you were now getting hammered
	with exactly the thing you thought you hadn't agreed to....
	(It was bad enough that the numbers are quite good;
	it was worse not knowingthe shape of the curves.)

d) Now, the numbers are now available, and one can do whatever analysis
makes any sense on them, and the 10 numbers are MUCH more meaningful
than 1 number, especially when there is a high variance.
Both integer and FP are important; it is important to distinguish,
not mix them together; it is important to see all 10 numbers to see
the variation pattern.  $/SPECmark is fine, so is $/SPECfp.  However,
using either of those two, and then using $/dhry-mips instead of
$/SPECint is uncool.... (And I know it doesn't originate with the SPEC
folks, HP or otherwise.  Almost EVERYBODY involved in SPEC has
"interesting" times "educating" their marketing groups....)

e) Anyway, SPEC is working to make sure the rules are clarified in
everybody's mind, and it's no big deal in the long run.  HP usually
does an excellent, credible job documenting its performance - the current
performance document is fine.  The main issue here was the gap in time
between when SPECmark numbers were given out and when the full
10-number disclosure was available, and we just have to be clearer about the
rules of the game....  This is especially important as we move from
the (relatively) easy CPU benchmarks towards more complex systems benchmarks.

...well, back to work...
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086

pf@diab.se (Per Fogelstr|m) (04/03/91)

In article <8840021@hpfcso.FC.HP.COM> maf@hpfcso.FC.HP.COM (Mark Forsyth) writes:
>>From: mash@mips.com (John Mashey)
>
>>the line of well-implemented single-issue, 1-level cache RISCs,
>>i.e., SPECint = .75-.80X MHz.
>
>The integer "performance" is 2.7 times the 25 MHz R3000 (DEC 5000/200). 
>Speed is an extremely important high performance design technique. If 
>you normalize it out you end up with a truely meaningless indicator of
>performance and one that users shouldn't and don't care about.
>
>>
>>-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
>
>- Mark Forsyth

I think Mashey's statement is correct. It's not meaningless to compare
normalized SPECint because it gives You a good indicator on how well
the architecure is implemented. The proof is the SPECint figure for
Sparc which is very low compared to others. You might guess why.
My conclusion from this is that even if the clock frequency is increased
for the sparc chips, they will not achive the same performance as others
built with the same chip technology, eg. the headroom is lower.
If we could push the clock frequency for the R3000 up to 66Mhz it would,
if we scale the results, perform equally well with the HP9000/730.
Of course this only shows that the architectures are performing about the
same, though there are no 66Mhz R3000, only 40Mhz. But then there is
the R4000......

Well, this was the technical point of view, and it would not help the
customers that want the boxes today, but I'm an design engineer.

So if i wanted the best price/performance solution for my fp intensive
application today, i would probably chose an HP9000/730.

Ok, You are welcome to flame, but send marketing trash to /dev/null.

-- 
Per Fogelstrom,  Diab Data AB
SNAIL: Box 2029, S-183 02 Taby, Sweden
ANALOG: +46 8-7680660
EMAIL: mcsun!sunic!diab!pf  or  pf@diab.se

mash@mips.com (John Mashey) (04/04/91)

In article <569@diab.se> pf@diab.UUCP (Per Fogelstr|m) writes:
>In article <8840021@hpfcso.FC.HP.COM> maf@hpfcso.FC.HP.COM (Mark Forsyth) writes:
>>>From: mash@mips.com (John Mashey)
>>
>>>the line of well-implemented single-issue, 1-level cache RISCs,
>>>i.e., SPECint = .75-.80X MHz.

>>The integer "performance" is 2.7 times the 25 MHz R3000 (DEC 5000/200). 
>>Speed is an extremely important high performance design technique. If 
>>you normalize it out you end up with a truely meaningless indicator of
>>performance and one that users shouldn't and don't care about.

>I think Mashey's statement is correct. It's not meaningless to compare
>normalized SPECint because it gives You a good indicator on how well
>the architecure is implemented. The proof is the SPECint figure for
>Sparc which is very low compared to others. You might guess why.

As maf points out, the (anythings)/Mhz is fairly irrelevant to
end users.  Of course, this is a newsgroup on computer architecture,
and my note was a successor to some earlier postings (I don't recall
from whom) about SPECint/MHz.
Such numbers (actual benchmark)/Mhz are of interest to architects,
of course, since:
	a) SPECint/Mhz is probably as close as you can get to a measurable
	integer CPI, where you can get real numbers for lots of machines.
	[Because it is truly difficult to get real
	Cycle-Per-Instruction numbers, unless you are the computer
	architect with all of the real, likely-to-be-proprietary tools.]
	Many arguments, real or bogus, revolve around CPIs,
	hence it is useful to actually have some realistic metrics
	to compare on.  Of course, it is also userful to look at
	SPECmarks/Mhz, SPECfp/Mhz,  LINPACK MFLOPS/Mhz, etc, i.e., as
	long as the benchmark is something you think is meaningful.
	b) An open issue (based on machines on the market)
	is whether or not you can get the
	SPECint/MHz much better than .75 to .85.  It is CLEAR that
	you can get SPECmark and SPECfp better.  Somewhere in Hennessy
	and Patterson it talks about available low-level parallelism,
	and differences thereof regarding types of code.
But again, end users should usually care less ... even I, in "end user
mode", have been known to purchase computers with processors in them
whose SPECint/Mhz would probably be .1 or so :-)
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086

mlord@bwdls58.bnr.ca (Mark Lord) (04/04/91)

In article <....HP.COM> maf@hpfcso.FC.HP.COM (Mark Forsyth) writes:
<>From: mash@mips.com (John Mashey)
<>
<>the integer performance remains closely on
<>the line of well-implemented single-issue, 1-level cache RISCs,
<>i.e., SPECint = .75-.80X MHz.
<
<The integer "performance" is 2.7 times the 25 MHz R3000 (DEC 5000/200). 
<Speed is an extremely important high performance design technique. If 
<you normalize it out you end up with a truely meaningless indicator of
<performance and one that users shouldn't and don't care about.

From a  comp.arch  point of view, I think the .75-.80X MHz number is of
more value than archecturally irrelevant "my clock is double what yours is"
flame-fests.

So, can we now get back to discussing how best to advance the state-of-the-art
in computer archictectures... ?
-- 
MLORD@BNR.CA  Ottawa, Ontario *** Personal views only ***
begin 644 NOTSHARE.COM ; Free MS-DOS utility - use instead of SHARE.EXE
MZQ.0@/P/=`J`_!9T!2[_+H``L/_/+HX&+`"T2<TAO@,!OX0`N1(`C,B.P/.DS
<^K@A-<TAB1Z``(P&@@"ZA`"X(27-(?NZE@#-)P#-5
``
end

clc5q@madras.cs.Virginia.EDU (Clark L. Coleman) (04/06/91)

In article <569@diab.se> pf@diab.UUCP (Per Fogelstr|m) writes:
>
>I think Mashey's statement is correct. It's not meaningless to compare
>normalized SPECint because it gives You a good indicator on how well
>the architecure is implemented.

I believe that the main point is not being responded to here. If I complicate
the architecture with all kinds of hardware that makes the critical path
longer, and add instructions to the ISA that slow the cycle down, then
it is an architectural issue, not just a sign of a poor implementation.

I think the real question for comp.arch is: Given the same semiconductor
process (e.g. any particular current CMOS 1.0 micron process), implement
various architectures in that process as best as you can --- then what is
the resulting performance? There are system level issues that I am leaving
out of the equation, I realize, but at least the question is infinitely
more relevant than "normalized Specint" comparisons.

Let's try a hypothetical. The JCN computer company is designing a new
workstation that is specifically geared towards performing well on the
Specint benchmarks. Two competing design teams develop prototypes.
(JCN has too much cash to burn, apparently.) One team comes up with a
prototype that is implemented in the company's own 1.0 micron CMOS process,
and it runs at 50MHz and achieves a Specint of 40. The other team,
comprised of blithering idiots, comes up with a chip that interprets
high-level code in a terribly complex circuit that has such long
critical paths that it can only run at 1MHz in 1.0 micron CMOS.
It achieves a Specint of 0.9, however, giving it a better Specint/MHz
ratio than the other processor. Naturally, the company chooses to market
the slower processor  --- it has a provably "superior" architecture,
based on the all-important Specint/MHz ratio, and that ratio will be
great advertising fodder. The more than 40-fold performance ratio
disadvantage must just be "implementation", not bad architecture,
according to the marketing MBA genius who chooses the slower chip,
"because if the first chip had only a 1MHz clock, it would have poorer
benchmarks than the second, and clock rate is just an implementation
matter."

Unfortunately, the team that designed the first chip leaves and starts
their own company, kicking the heck out of JCN in the marketplace.
The MBA then lays off a few technical staff and decides they need a
bigger advertising budget. THE END.

Seriously, I cannot believe I am reading so many people claim that MHz
is 100% implementation, 0% architecture.

>If we could push the clock frequency for the R3000 up to 66Mhz it would,
>if we scale the results, perform equally well with the HP9000/730.

And will the HP9000/730 sit still while you do that? Can the R3000 be
implemented TODAY in HP's technology at 66MHz ? Does anybody really
believe that?

>Well, this was the technical point of view, and it would not help the
>customers that want the boxes today, but I'm an design engineer.

"Technical point of view" ?? It is just a total misunderstanding of
computer architecture issues that constrain implementation and affect
the clock speed.

>
>So if i wanted the best price/performance solution for my fp intensive
>application today, i would probably chose an HP9000/730.

Yes, and you would say, "I sure wish JCN could improve their implementation
so I could get their superior architecture instead. Darn! Why don't they
do a better job of implementing over there?" And as time went by, their
implementation would get better and faster, but so would the first machine.
And the ratio would remain approximately 40 to 1 in favor of the first
machine until it started to run into physical limitations that have to do
with clock speed (bus noise, etc.) But it seems highly improbable that the
JCN turkey will ever catch up.

I have been watching HP's products for a decade. It always seemed to me
that they were lagging the market in semiconductor implementation. Only
their less efficient HP 9000/500 architecture got the benefit of their
best NMOS process when it became available --- other machines started
getting the same process several YEARS later (there must be some horrible
stories of corporate inertia lurking around there.) I said to myself years
ago that if HP were to implement the HP-PA stuff in state of the art
semiconductors, they would blow away the competition. Prophecy fulfilled
in 1991; the competitors ARE using a process that is just as good as HP's,
other postings notwithstanding; and the detractions I see on this thread
bespeak a lack of architecture understanding, or commercial envy and
axe-grinding in a few cases.

Put up or shut up, workstation vendors: Tell us what design rules would
be required to achieve HP's Specint numbers. Most of you have the numbers
on your "future evolution" sheets already. I suppose it is proprietary
info, of course; just don't keep posting bull about how HP's success is
just semiconductor process. When will we see a 66MHz SPARC ? When we
have 0.6 micron processes for it? Let's be honest and cut the  *****.

As for Per, my comments are directed not at you, but at the corporate
axe-grinders and assorted sideline sour grapes throwers.



-----------------------------------------------------------------------------
"The use of COBOL cripples the mind; its teaching should, therefore, be 
regarded as a criminal offence." E.W.Dijkstra, 18th June 1975.
|||  clc5q@virginia.edu (Clark L. Coleman)

sgreene@leland.Stanford.EDU (Spencer Greene) (04/06/91)

In article <12914@goofy.Apple.COM> russell@apple.com (Russell Williams) writes:
>To summarize, John Mashey has spoken of SPECint / MHz as a measure of 
>architecture independent of absolute MHz, while Mark Forsyth has indicated 
>that ability to build a higher MHz implementation given the same process 
>technology is also a relevant measure of an architecture, stating that 
>HP's process technology is typical. . .
>
>The unanswered question seems to be: can the HP architecture be designed 
>with fewer gate delays than others?

The issue, as has been pointed out, is one of headroom.  Even if you
agree that HP's cpu clock is *entirely* due to architecture, you cannot
conclude that HP has the same ability to linearly scale clock rate as
other vendors.  For example, it is more likely that MIPS can go from
33 Mhz to 66 than that HP can go from 66 to 132.  CPU issues aside,
everyone is using essentially the same SRAMs, board traces, etc., 
and Amdahl's Law says that after cranking the CPU clock rate to a
certain point these other issues will begin to dominate.  Not that they
*can't* be made to keep up in a 132-Mhz system, just that the expense is
prohibitive compared to other solutions.

Of course, this does not detract from the fact that HP has developed
today a system which tests some of the purported limits of 1-processor 
RISC, while other vendors talk of unexploited headroom.  However,
it does suggest that to evaluate potential in a product line beyond
the short term, we should look at the sophistication of the vendor's
multiprocessor hardware, and (perhaps more importantly) software.
Small wonder that SMP was all the rage at recent trade shows.

-----------
Spencer Greene                             sgreene@leland.stanford.edu

preston@ariel.rice.edu (Preston Briggs) (04/06/91)

ram@shukra.Eng.Sun.COM (Renu Raman) writes:
>Does anybody know (exactly how) HP was able to improve matrix-300's 
>spec-ratio from 30-odd to 200+  - Just curious....

Well, matrix-300 is normally a cache buster.
With their rather large D-cache, it shouldn't be
nearly as severe, though the problem is still larger than
the cache used for the tests (more than 2Meg vs 128 or 256K).

The multiply-accumulate instruction surely helps too.

Also, it's possible to get a factor of 2 to 3 (or more?)
by inlining and reworking the loops extensively.
If their compilers can do this, there will be many happy customers.

Preston Briggs

ram@shukra.Eng.Sun.COM (Renu Raman) (04/06/91)

Does anybody know (exactly how) HP was able to improve matrix-300's 
spec-ratio from 30-odd to 200+  - Just curious....

renu raman
--
--------------------------------
   Renukanthan Raman				ARPA:ram@sun.com
   M/S 16-11, 2500 Garcia Avenue,               TEL :415-336-1813
   Sun Microsystems, Mt. View,  CA 94043

maf@hpfcso.FC.HP.COM (Mark Forsyth) (04/08/91)

<To summarize, John Mashey has spoken of SPECint / MHz as a measure of 
<architecture independent of absolute MHz, while Mark Forsyth has indicated 

This is a reasonble measure of architectures if the speed is similar.
It is more difficult to achieve low CPI at higher speeds since memory
speed doesn't always scale with processor speed, producing higher cache
and TLB miss penalties. For comparing 33 to 66MHz designs it probably 
doesn't make a big difference, except in some of the more cache intensive
benchmarks (like some SPEC FP benchmarks).

<that ability to build a higher MHz implementation given the same process 
<technology is also a relevant measure of an architecture, stating that 
<HP's process technology is typical.  Mashey countered that HP's technology 
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
HP's CMOS technology is excellent and very well suited to high-speed
designs. The other poster who made that statement has been corrected.
The only "typical" thing about it is the 1.0 micron channel lengths.

<is faster than average.
<
<The unanswered question seems to be: can the HP architecture be designed 
<with fewer gate delays than others?  I think there are 33MHz R3000s and 66 
<MHz Snakes.  Is this entire difference attributable to process technology? 
<When superscalar implementations become common, a similar question will 
<arise for them, and it's not obvious to me that the architectures which 
<allow the fewest gate dealys and best CPI numbers in a single-issue design 
<will still have the advantage in a multiple-issue design.

Speed is a function of circuit design methodology, processor partitioning,
chip floorplanning, feature set, etc. etc. as well as technology and 
architecture. All of these are interrelated and you have to review all 
of the choices with every generation to determine the best way to optimize 
cost, performance, and schedule for each specific product. Every RISC 
architecture is capable of higher speeds or superscalar implementations.
It will be interesting to compare all of the choices for the next generation.

- Mark Forsyth (my opinions, not HP's)
  maf@hpesmaf 

mjs@hpfcso.FC.HP.COM (Marc Sabatella) (04/09/91)

> [re: matrix300 performance on Snakes ]
>Also, it's possible to get a factor of 2 to 3 (or more?)
>by inlining and reworking the loops extensively.
>If their compilers can do this, there will be many happy customers.

matrix300 has a big array deliberately traversed in the "wrong" order.
By simply having the compiler transforming the loops from row-major to column
major traversal, you can get huge wins.  I believe this is what happened.
This is not my favorite benchmark.  However, even if you throw out the
spiffy matrix300 number, Snakes SPECmarks look mighty good.

murf@cypress.UUCP (Colin Murphy) (04/10/91)

The tradeoffs made in processor micro-architecture are influenced by process
capability and package availibility.  The process capability consists not only
of a delay per gate number, but also a number for the total count of
transistors and wires possible per die.

I am very familiar with the ROSS Technology SPARC 40Mhz 7C601 CPU, and barely 
familiar with the MIPs architecture and chips.

First some data on process:  ( Cypress Semiconductor is our foundry for this )

              ROSS Tech.   H-P       MIPs

Gate oxide    195 Ang     200 Ang   exact unknown, IDT and Performance
                                    do have comparable processes available.

In terms of transistor physics the first two processes are comparable. (I no
longer even talk about Leff because with the advent of LDD and dished punch
through control implants Leff is a function more of the particular measurement
technique used than of reality.)

So, at first glance H-P has just out engineered the rest of us.  Now lets check
out the "secondary" process characteristics.

                           ROSS Tech.    H-P       MIPs

contacted metal 1 pitch       4.0u       2.6u   ?about the same as cypress?

contacted metal 2 pitch       4.6u       2.6u   ?a little larger than cypress?

contacted metal 3 pitch       none       6.0u      none

die size (per side)           310 mils   550 mils  ?~330 mils, differs by vendor
                               7.9mm     14.0mm    ?~8.4mm

transistors                   104K       479K      ?<<200K, if memory serves
                                                    
I do not consider the H-P process to use the same generation of interconnect,
it is at least one generation more advanced for pitch, yield, and the use of
three levels of metal interconnect.  That is, H-P puts a lot more wires and
transistors, closer together, on the snake, than are on either the 7C601 or
the R3000A.

clc5q@virginia.edu (Clark L. Coleman) writes:
>In article <569@diab.se> pf@diab.UUCP (Per Fogelstr|m) writes:
>>If we could push the clock frequency for the R3000 up to 66Mhz it would,
>>if we scale the results, perform equally well with the HP9000/730.
>
>And will the HP9000/730 sit still while you do that? Can the R3000 be
>implemented TODAY in HP's technology at 66MHz ? Does anybody really
>believe that?

Why not?  I do, with some corrections to long paths, and while I am another
axe grinder, I am grinding mine to use on MIPSco, and H-P and IBM.  The R3000A
is limited by using an obsolete package so that it will be pin compatible with
the previous designs, as would any cpu that used the same pin out for more than
three years.

Let's look at package technology,

                     ROSS Tech.     H-P       MIPs
                        7C601      snake     R3000

number of pins           207        408       176

The number of pins used on the 7C601 was limited by both the package technology
available and the size of the die coupled with the minimum pad pitch on the die.

What was the effect of this?  H-P has a full harvard chip, with a 64 bit wide
data bus, the 7C601 uses a 32 bit wide combined instruction and data bus with
one address bus, the R3000 uses separate 32 bit instruction and data buses with
a multiplexed address bus.  The H-P chip has a built in advantage, one that is
especially important for double precision floating point.  BTW, the multiplexed
address bus is probably what is limiting the R3000A to 33 MHz systems, I would
guess that the chip itself is more capable, not that the end user cares.

Historical note: The intel ?3001? 2 bit bit-slice and the 8008 were made 
obsolete by the AMD 2901 4 bit bit-slice and the intel 8080.  Why?  Because
the first two were in 18 pin packages, and the second two were in the brand
new 40 pin package, circa 1974-76.

-- quote out of order --
>Seriously, I cannot believe I am reading so many people claim that MHz
>is 100% implementation, 0% architecture.

>I said to myself years ago that if HP were to implement the HP-PA stuff in
>state of the art semiconductors, they would blow away the competition.
>Prophecy fulfilled in 1991; the competitors ARE using a process that is just
>as good as HP's, other postings notwithstanding; and the detractions I see on
>this thread bespeak a lack of architecture understanding, or commercial envy
>and axe-grinding in a few cases.

What is there about the HP-PA architecure that allows for a faster
implementation given equal levels of technology?  I would like to know so I
can go beat up on some architects. 8^) 

Seriously, the next SPARC chips will have more transistors and wires per die
and use more pins.  This is the result of designing in 1989-91 as opposed to
1986-88, and has nothing to do with ISA, but everything to do with micro
architecture and economics.
-- 

Colin Murphy - ROSS Technology, Inc, daver!cypress!murf - (408) 943-2887"
"The many, the humble, the implementors of the SPARC custom CMOS IU"