[comp.benchmarks] SPECmarks

gcrum@aludra.usc.edu (Gary Crum) (11/08/90)

If anyone has information (e.g. references to articles) about
"specmarks", then please post some;  It'll be appreciated by many
readers here, I'm sure!

Gary

tds@cbnewsh.ATT.COM (antonio.desimone) (11/09/90)

Could someone post a simple description of SPECmark ratings for the
lazy outsider?  One of the trade rags (Better Workstations and
Gardens, or something) published a chart that apparently was pulled
out of context from another source, without enough supporting material to
make it comprehensible.  

I gather that it's a composite measure of some kind.  What are the
components?  Are the individual measures what's usually presented?
What does it mean when I see a single SPECmark rating?  E.g "the
SPARCstation (TM) 2 family, at 21 SPECmarks..."  How are the weights
chosen?
-- 
Tony DeSimone
AT&T Bell Laboratories
Holmdel, NJ 07733
antonio_desimone@att.com

khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (11/09/90)

In article <TDS.90Nov8174648@cbnewsh.ATT.COM> tds@cbnewsh.ATT.COM (antonio.desimone) writes:

   Could someone post a simple description of SPECmark ratings for the
   lazy outsider?  One of the trade rags (Better Workstations and
   Gardens, or something) published a chart that apparently was pulled
   out of context from another source, without enough supporting material to
   make it comprehensible.  

   I gather that it's a composite measure of some kind.  What are the
   components?  Are the individual measures what's usually presented?
   What does it mean when I see a single SPECmark rating?  E.g "the
   SPARCstation (TM) 2 family, at 21 SPECmarks..."  How are the weights
   chosen?
   -- 

the benchmarks are

gcc, espresso, spice, doduc, nasa7,li,eqntott,matrix300, fppp, and
tomcatv. 

The specmark is computed by computing the ratio of the machine under
test vs a particular configuration of a VAX and then computing the
geometric mean of the 10 scores.
--
----------------------------------------------------------------
Keith H. Bierman    kbierman@Eng.Sun.COM | khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33			 | (415 336 2648)   
    Mountain View, CA 94043

theune@aplcen.apl.jhu.edu (Peter Theune) (11/09/90)

khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes:



>In article <TDS.90Nov8174648@cbnewsh.ATT.COM> tds@cbnewsh.ATT.COM (antonio.desimone) writes:

>   Could someone post a simple description of SPECmark ratings for the
>   lazy outsider?  One of the trade rags (Better Workstations and
>   Gardens, or something) published a chart that apparently was pulled
>   out of context from another source, without enough supporting material to
>   make it comprehensible.  

>   I gather that it's a composite measure of some kind.  What are the
>   components?  Are the individual measures what's usually presented?
>   What does it mean when I see a single SPECmark rating?  E.g "the
>   SPARCstation (TM) 2 family, at 21 SPECmarks..."  How are the weights
>   chosen?
>   -- 

>the benchmarks are

>gcc, espresso, spice, doduc, nasa7,li,eqntott,matrix300, fppp, and
>tomcatv. 

>The specmark is computed by computing the ratio of the machine under
>test vs a particular configuration of a VAX and then computing the
                                         ^^^

I _THINK_ that the VAX used for comparison is a VAX-11/780 . I seem to
recall reading recently that this was the machine used for comparison. I
don't know wether or not a floating point processor was installed....

Just my $.02.

Corrections are welcome...

Peter Theune

theune@aplcen.apl.jhu.edu  theune@mailer.jhuapl.edu

>geometric mean of the 10 scores.
>--

>----------------------------------------------------------------
>Keith H. Bierman    kbierman@Eng.Sun.COM | khb@chiba.Eng.Sun.COM
>SMI 2550 Garcia 12-33			 | (415 336 2648)   
>    Mountain View, CA 94043

rehrauer@apollo.HP.COM (Steve Rehrauer) (11/10/90)

In article <1990Nov9.012540.28546@aplcen.apl.jhu.edu> theune@aplcen.apl.jhu.edu (Peter Theune) writes:
>khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes:
>>In article <TDS.90Nov8174648@cbnewsh.ATT.COM> tds@cbnewsh.ATT.COM (antonio.desimone) writes:
>>   Could someone post a simple description of SPECmark ratings for the
>>   lazy outsider?  One of the trade rags (Better Workstations and
>>   Gardens, or something) published a chart that apparently was pulled
>>   out of context from another source, without enough supporting material to
>>   make it comprehensible.  
>
>>   I gather that it's a composite measure of some kind.  What are the
>>   components?  Are the individual measures what's usually presented?
>>   What does it mean when I see a single SPECmark rating?  E.g "the
>>   SPARCstation (TM) 2 family, at 21 SPECmarks..."  How are the weights
>>   chosen?
>>   -- 
>
>>the benchmarks are
>
>>gcc, espresso, spice, doduc, nasa7,li,eqntott,matrix300, fppp, and
>>tomcatv. 

Footnote: these ten tests comprise the SPEC 1.0 suite (current revision is
actually 1.2, I think).  SPEC (the Systems Performance Evaluation Consortium)
plans to release suite 2.0 sometime soon, I believe.  Several new tests will
be present in the new suite; there will probably be some that test I/O and
overall "system-type stuff" performance, as well as new floating-point and
integer tests.

There's also a lesser known variant of SPEC known as the "Throughput" suite,
I believe, which involves simultaneously running multiple copies of each
test per processor.  This is intended to provide some indication of a
system's behaviour under more typical load conditions; i.e.: does the
system hum for a single user process, but squeal like a stuck hog when
several people simultaneously begin doing several things on it?

>>The specmark is computed by computing the ratio of the machine under
>>test vs a particular configuration of a VAX and then computing the
>
>I _THINK_ that the VAX used for comparison is a VAX-11/780 . I seem to
>recall reading recently that this was the machine used for comparison. I
>don't know wether or not a floating point processor was installed....

It is an 11/780.  I also don't know what hardware configuration was used.
If you get the full SPEC report, all that information is included, e.g.:
processor type, clock-speed, OS revision, compiler revision, compiler
switches used to produce the figures, amount of memory, etc.

>>geometric mean of the 10 scores.

Yep, that's what it is.  I'm told the SPEC members originally didn't want
to provide a "boiled down" number, since, like most simplistic "bottom line"
figures, it really doesn't give you an accurate picture of how the system
performs across the range of tests.  An unbalanced system might have huge
swings across the various tests.  The chart, which most glossy sales adverts
omit, will show this.  The "SPECmark" figure doesn't.

However, it was inevitable that people would want to make comfortable
"X to Y" comparisons, and at least "SPECmarks" are far, far less misleading
than "MIPS".
--
"I feel lightheaded, Sam.  I think my      | (Steve) rehrauer@apollo.hp.com
 brain is out of air.  But it's kind of    | The Apollo Systems Division of
 a neat feeling..." -- Freelance Police    |       Hewlett-Packard

mutchler@zule.EBay.Sun.COM (Dan Mutchler) (11/10/90)

In article <4de7f24e.20b6d@apollo.HP.COM> rehrauer@apollo.HP.COM (Steve Rehrauer) writes:

   Yep, that's what it is.  I'm told the SPEC members originally didn't want
   to provide a "boiled down" number, since, like most simplistic "bottom line"
   figures, it really doesn't give you an accurate picture of how the system
   performs across the range of tests.  An unbalanced system might have huge
   swings across the various tests.  The chart, which most glossy sales adverts
   omit, will show this.  The "SPECmark" figure doesn't.

   However, it was inevitable that people would want to make comfortable
   "X to Y" comparisons, and at least "SPECmarks" are far, far less misleading
   than "MIPS".

Well, maybe...If a the two systems X and Y have fairly balanced
performance then the single number is reasonable, but a current
example is the RS/6000. It comes in at 27 SPECmarks for the geometric
mean, but that is due largely to fantastic floating point performance
on vectorizable code. A user that does little floating point will find
that the integer portion of the system is about 15 SPECmarks, making
it much closer to its competitors.

I agree with the original SPEC position. Look at all ten numbers and
only use the ones that mean something to your application. The boiled
down number can be just as misleading as MIPS.

--

Dan Mutchler                       | ARPA/Internet:  mutchler@zule.EBay.Sun.COM
Sun Federal System Engineer        | UUCP:           ...!sun!mutchler
--------------------------------------------------------------------------
There is no such thing as sanity.
And that is the sanest fact.
                  --Mark Knopfler

rehrauer@apollo.HP.COM (Steve Rehrauer) (11/10/90)

In article <MUTCHLER.90Nov9092236@zule.EBay.Sun.COM> mutchler@zule.EBay.Sun.COM (Dan Mutchler) writes:
>In article <4de7f24e.20b6d@apollo.HP.COM> rehrauer@apollo.HP.COM (Steve Rehrauer) writes:
>   However, it was inevitable that people would want to make comfortable
>   "X to Y" comparisons, and at least "SPECmarks" are far, far less misleading
>   than "MIPS".
>
>Well, maybe...If a the two systems X and Y have fairly balanced
>performance then the single number is reasonable, but a current
>example is the RS/6000. It comes in at 27 SPECmarks for the geometric
>mean, but that is due largely to fantastic floating point performance
>on vectorizable code. A user that does little floating point will find
>that the integer portion of the system is about 15 SPECmarks, making
>it much closer to its competitors.

Yes, and I admire their strategy of giving the vectorizing preprocessor
to every buyer.  Making it standard means they needn't mention it in
the fine print anywhere.  Smart marketing (even if you don't get that
sort of performance everywhere across the suite).

>I agree with the original SPEC position. Look at all ten numbers and
>only use the ones that mean something to your application. The boiled
>down number can be just as misleading as MIPS.

True, and I agree.  On the other hand, I feel that people who are
hell-bent on evaluating a system by a single number would find some
other silly figure to fixate upon if SPECmarks weren't quoted, and
deserve what they get.  And the charts ARE sometimes published in
trade rags; _EE Times_ is usually good about this (makes sense since
they're a SPEC member :-).

I was merely pointing out that at least the SPEC suite represents a
moderate mix of code, as opposed to a single trivial integer test or
two.  As always, buyer beware.
--
"I feel lightheaded, Sam.  I think my      | (Steve) rehrauer@apollo.hp.com
 brain is out of air.  But it's kind of    | The Apollo Systems Division of
 a neat feeling..." -- Freelance Police    |       Hewlett-Packard

pack@acd.acd.ucar.edu (Dan Packman) (11/10/90)

In article <MUTCHLER.90Nov9092236@zule.EBay.Sun.COM> mutchler@zule.EBay.Sun.COM (Dan Mutchler) writes:
>...
>Well, maybe...If a the two systems X and Y have fairly balanced
>performance then the single number is reasonable, but a current
>example is the RS/6000. It comes in at 27 SPECmarks for the geometric
>mean, but that is due largely to fantastic floating point performance
>on vectorizable code. A user that does little floating point will find
>that the integer portion of the system is about 15 SPECmarks, making
>it much closer to its competitors.
>
>I agree with the original SPEC position. Look at all ten numbers and
>only use the ones that mean something to your application. The boiled
>down number can be just as misleading as MIPS.
>
>--
>
>Dan Mutchler                       | ARPA/Internet:  mutchler@zule.EBay.Sun.COM
>Sun Federal System Engineer        | UUCP:           ...!sun!mutchler
>--------------------------------------------------------------------------

Excellent point.  The best benchmark is clearly ones own application
or set of applications including multi-process loads.

One nitpick is that the RS/6000 is not 'vectorizing' but pipelining
operations.  Typically the block-mode algorithms in LAPACK allow better
vectorization on vector machines and better cache hit rates on scalar
machines.  This is presumably true of the more 'vectorizable' parts
of the SPEC suite.

Of some interest to me is that SUN products seem not to improve in
performance with the new LAPACK versus the old linpack code where all
other scalar machines I have tested do show at least a 20% speedup.
Does this mean that SUN manages 100% cache hit rates on the old code?
Or is there something specific to SPARC architecture here?  Ideas?

						Dan

Dan Packman     NCAR                         INTERNET: pack@ncar.UCAR.EDU
(303) 497-1427  P.O. Box 3000                   CSNET: pack@ncar.CSNET
                Boulder, CO  80307       DECNET  SPAN: 9.367::PACK