[comp.arch] SPECmarks

roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (10/23/89)

A trade rag here in holland announced that spec (system performance
evaluation cooperative) has published some benchmark results (so where
are they, Mashey? :-). These were the figures it gave:


Machine		Processor	Clock(Mhz)	Time(sec)	Specmark

Mips M/2000	R3000		25		235.0		16.5
Moto 8864DP	88100(2x)	20		511,8		15.1
Apollo DN10000	Prism		18.2		267.0		14.5
Sun Sparc 330	CYC7C601?	25		343.7		11.3
Mips M/120-5	R2000		16.67		345.3		11.2
Decstation 3100	R2000		16.67		381.4		10.1
HP9000 834	PA-RISC		15		408.0		9.5
Mips RC2030	R2000		16.67		417.6		9.3
Sun Sparc I	MB86901?	20		468.5		8.3
Moto 8864SP	88100		20		473,3		8.2
Moto 8608	88100		20		496.5		7.8
Decstation 2100	R2000		12.5		518.7		7.5
HP9000 370	68030		33		980.3		3.0
HP9000 340	68030		16.67		2432.4		1.6
Vax 11/780							1.0

The rag also said that the first version of the benchmarks consisted of 
about 10 C and Fortran programs that measured integer and floating point
perfomance. Future benchmarks would measure other aspects of applications
such as the performance of disks, graphics and networking.
Furthermore, benchmarks were in preparation for measurement of system
performance in multi-user and networked environments.

No mention was made of whether or not floating point coprocessors were 
used where applicable.

Also note that the anomalous value of the 8864DP (511,15.1) has to do
with a dual processor implementation.

Ok, guys 'n gals, start analyzing...:-)

-- 
Artificial Intelligence: 
When computers start selling stocks because its Friday the 13th....
Roelof Vuurboom  SSP/V3   Philips TDS Apeldoorn, The Netherlands   +31 55 432226
domain: roelof@idca.tds.philips.nl             uucp:  ...!mcvax!philapd!roelof

fotland@hpihoah.HP.COM (David Fotland) (10/25/89)

You could define an architectural figure of merit as Specmark/Clock freq.

A higher value indicates that this architecture gets more performance for the
same clock frequency.  I calculate:


Prism:	.80
MIPS:	.56-.67
HP-PA:	.63
SPARC:	.45
88100:	.39-41
68030:	.09


This seems to clearly show the advantage of RISC over 68K type machines.  I
think it also seems to show the disadvantage of register windows, since
PRISM, MIPS, and PA don't have them and SPARC and 88K have them.

David Fotland

shankar@SRC.Honeywell.COM (Subash Shankar) (10/26/89)

In article <4420015@hpihoah.HP.COM> fotland@hpihoah.HP.COM (David Fotland) writes:

>You could define an architectural figure of merit as Specmark/Clock freq.


I don't like this measure, since improved performance at the same processor
clock frequency does not take into account the possible range of clock
rates for the processor in question.  Two processors may not be constructible
at the same clock frequencies due to either processor architecture limitations,
or memory speeds.  I like Specmark/Memory Access Frequency better, though
this has problems too, off course.





---
Subash Shankar             Honeywell Systems & Research Center
voice: (612) 782 7558      US Snail: 3660 Technology Dr., Minneapolis, MN 55418
shankar@src.honeywell.com  srcsip!shankar

meissner@dg-rtp.dg.com (Michael Meissner) (10/26/89)

In article <4420015@hpihoah.HP.COM> fotland@hpihoah.HP.COM (David Fotland) writes:

|  You could define an architectural figure of merit as Specmark/Clock freq.
|  
|  A higher value indicates that this architecture gets more performance for the
|  same clock frequency.  I calculate:

	...  (various machines left out)

|  This seems to clearly show the advantage of RISC over 68K type machines.  I
|  think it also seems to show the disadvantage of register windows, since
|  PRISM, MIPS, and PA don't have them and SPARC and 88K have them.

Huh?  The 88K DOES NOT have register windows.  Maybe you were thinking
of the AMD2900?

--

Michael Meissner, Data General.				If compiles where much
Uucp:		...!mcnc!rti!xyzzy!meissner		faster, when would we
Internet:	meissner@dg-rtp.DG.COM			have time for netnews?

peter@ficc.uu.net (Peter da Silva) (10/26/89)

Personally, I think Specmark-per-dollar for delivered systems is a much
better measure in anything resembling the real world.
-- 
Peter da Silva, *NIX support guy @ Ferranti International Controls Corporation.
Biz: peter@ficc.uu.net, +1 713 274 5180. Fun: peter@sugar.hackercorp.com. `-_-'
"That particular mistake will not be repeated.  There are plenty of        'U`
 mistakes left that have not yet been used." -- Andy Tanenbaum (ast@cs.vu.nl)

swarren@eugene.uucp (Steve Warren) (10/26/89)

In article <4420015@hpihoah.HP.COM> fotland@hpihoah.HP.COM (David Fotland) writes:
>
>You could define an architectural figure of merit as Specmark/Clock freq.
>
>A higher value indicates that this architecture gets more performance for the
>same clock frequency.  I calculate:

                                 [...]

Did you really use clock frequency?  Or did you use bus cycles, which makes
much more sense?  For example, 680x0 arch. uses four clock cycles per
memory access.  So you need to divide the clock freq. by four to get
comparable numbers.  Etc.

--Steve
-------------------------------------------------------------------------
	  {uunet,sun}!convex!swarren; swarren@convex.COM

khb%chiba@Sun.COM (Keith Bierman - SPD Advanced Languages) (10/27/89)

>You could define an architectural figure of merit as Specmark/Clock freq.

The joy of making a definition is that it is by construction true.
Completely divoirced from reality, but true.

	"It is common, for example, to find papers reporting benchmark
	results as if they were an objective measure of processor
	comparison. The experimenter in this case is varying the
	architecture, implementation, clock rates, bus width,
	technology, system structure, system software and compilers. A
	single element, the benchmark is fixed..."

Tredennick in Microprocessor Report 1989

>This seems to clearly show the advantage of RISC over 68K type machines.  I
>think it also seems to show the disadvantage of register windows, since
>PRISM, MIPS, and PA don't have them and SPARC and 88K have them.

According to my Moto handout 88K has a total of 32 registers, used for
both FP and integer units (as opposed to SPARC and MIPS with split
register utilization). No windows.

Attempting to define a single figure of merit to compare _systems_
which is meaningful is hard. Attempting to use a single figure of
merit to judge architecture is futile.

Register windows may be good, bad or indifferent ... the SPEC _times_
don't provide a experimental proof one way or the other. All of these
systems have different memory systems (thus bandwidth to memory),
implementation features (cycles per FP multiply for example), etc. 



Keith H. Bierman    |*My thoughts are my own. !! kbierman@sun.com
It's Not My Fault   |	MTS --Only my work belongs to Sun* 
I Voted for Bill &  | Advanced Languages/Floating Point Group            
Opus                | "When the going gets Weird .. the Weird turn PRO"

"There is NO defense against the attack of the KILLER MICROS!"
			Eugene Brooks

rajivp@sunshade.Sun.COM (Rajiv Patel) (10/27/89)

In article <4420015@hpihoah.HP.COM> fotland@hpihoah.HP.COM (David Fotland) writes:
>
>You could define an architectural figure of merit as Specmark/Clock freq.
>
>A higher value indicates that this architecture gets more performance for the
>same clock frequency.  I calculate:
>
>
>Prism:	.80
>MIPS:	.56-.67
>HP-PA:	.63
>SPARC:	.45
>88100:	.39-41
>68030:	.09
>
>
>This seems to clearly show the advantage of RISC over 68K type machines.  I
>think it also seems to show the disadvantage of register windows, since
>PRISM, MIPS, and PA don't have them and SPARC and 88K have them.
>
>David Fotland

This is the kind of article which provides the industry with proof that even a
good attempt to standardize the benchmarking process can be really twisted by
individuals and companies who would like to do so.

SPEC decides to publish SPECmark. They explicitly do not mention any other
numbers to confuse people but then everyone would like to extrapolate...

Specmark/Clock frequency as David defines it is by far the furthest figure of
merit for an architecture. It might be used (though I wouldn't suggest it) as
a figure of merit for implementation BUT NEVER an architecture. How could one
explain a changing figure of merit for an architecture (say 68K) with different
chips - 68000, 68020, 68030, 68040 and so on. For an architecture shouldn't the
figure of merit remain constant ?

If one really does want to go with such figure's of merit then how about some
of these :-)
Technical
---------
SPECmark / # instructions
SPECmark / # mem. accesses
SPECmark * highest clock available
SPECmark / your favourite value

Marketing
---------
SPECmark / $$$
SPECmark / # applications
SPECmark / your favourite value


I guess, what I would like to convey is that let us not twist and turn the
numbers published by SPEC. This will only confuse more people and lend a bad
name to the business of benchmarking which as it is no one trusts.


Rajiv.

rec@dg.dg.com (Robert Cousins) (10/27/89)

In article <4420015@hpihoah.HP.COM> fotland@hpihoah.HP.COM (David Fotland) writes:
>This seems to clearly show the advantage of RISC over 68K type machines.  I
>think it also seems to show the disadvantage of register windows, since
>PRISM, MIPS, and PA don't have them and SPARC and 88K have them.
 
Before comparing numbers, one should make sure that compared machines are
comperable.  It is easy to see numbers based uon various CPUs and ignore the
remaining system characteristics such as RAM capacity and speed, Cache size,
OS, peripheral speed, etc.  The SPEC benchmarks are heavily tilted toward
Fortran floating point and can have a large memory useage in some cases. These
have SYSTEM performance implications which can dwarf the CPU impact.

Actually, the 88K does not have register windows.  However, the 88K numbers
which were published are not based on representative hardware and software.
You will see better SPEC numbers on 88K machines very soon now.
While I agree that Regiser Windows are not the total salvation which some
people believe them to be, the real point is the RISC does beat CISC quite 
handily.  

Lastly, by computing a "figure of merit" based upon a benchmark is quite dangerous.
The 88K, for example, is exceptionally good a certain benchmarks. So is the 80386.
If one searches for the proper combination of benchmarks, one could prove almost
anything.  In a previous life as a consultant, I was told an apocryphal story concerning
the then fastest computer on earth, the TI ASC and the TI 990 minicomputer. It appears
that DOD wouldn't buy the ASC  without a COBOL compiler due to DOD rules even though
the DOD would never run anything but FORTRAN on the machine.  As a result, the COBOL
for the 990 was fudged over onto the ASC in some form of interpreted mode (details
escape me, it's been many years).  As a result, the 990 would handily outperform
the ASC on all manner of COBOL benchmarks including the US Steel (the "standard
COBOL benchmark ofthe day").



>
>David Fotland


Robert Cousins
Dept. Mgr, Workstation Dev't.
Data General Corp.

Speaking for myself alone.

khb%chiba@Sun.COM (Keith Bierman - SPD Advanced Languages) (10/27/89)

In article <MEISSNER.89Oct25172147@twohot.rtp.dg.com> meissner@dg-rtp.dg.com (Michael Meissner) writes:
>
>Huh?  The 88K DOES NOT have register windows.  Maybe you were thinking
>of the AMD2900?

Neither does the AMD2900. It has a large register file (192 sticks in
my memory) and a way to "relabel" the registers (sorta like a hw
pointer to the physical registers).

I don't think Dave could have been thinking AMD2900 because it wasn't
in the SPEC numbers he posted.

Keith H. Bierman    |*My thoughts are my own. !! kbierman@sun.com
It's Not My Fault   |	MTS --Only my work belongs to Sun* 
I Voted for Bill &  | Advanced Languages/Floating Point Group            
Opus                | "When the going gets Weird .. the Weird turn PRO"

"There is NO defense against the attack of the KILLER MICROS!"
			Eugene Brooks

jdarcy@encore.UUCP (Jeff d'Arcy) (10/27/89)

swarren@eugene.uucp (Steve Warren):
> Did you really use clock frequency?  Or did you use bus cycles, which makes
> much more sense?  For example, 680x0 arch. uses four clock cycles per
> memory access.  So you need to divide the clock freq. by four to get
> comparable numbers.  Etc.

This is a particularly important point when comparing RISC vs. CISC.  One of
the major goals of CISC is to allow higher clock frequencies by simplifying
the design, but memory access time is another matter.

Jeff d'Arcy     OS/Network Software Engineer     jdarcy@encore.com
  Encore has provided the medium, but the message remains my own

jdarcy@encore.UUCP (Jeff d'Arcy) (10/27/89)

jdarcy@encore.UUCP (Jeff d'Arcy):
> the major goals of CISC is to allow higher clock frequencies by simplifying
                     ^^^^  OOPS!

I meant to say RISC, not CISC.

Jeff d'Arcy     OS/Network Software Engineer     jdarcy@encore.com
  Encore has provided the medium, but the message remains my own

scarter@gryphon.COM (Scott Carter) (10/28/89)

In article <4420015@hpihoah.HP.COM> fotland@hpihoah.HP.COM (David Fotland) writes:
>
>You could define an architectural figure of merit as Specmark/Clock freq.

Indeed, this can be a useful value WITHIN "similar" architectures.
>
>Prism:	.80
>MIPS:	.56-.67
>HP-PA:	.63
>SPARC:	.45
>88100:	.39-41
>68030:	.09
>
>
>This seems to clearly show the advantage of RISC over 68K type machines.  I
>think it also seems to show the disadvantage of register windows, since
>PRISM, MIPS, and PA don't have them and SPARC and 88K have them.
>
>David Fotland
Of course, the 88K does NOT have register windows.  What it does have is
a unified register file, with only two 32-bit read ports and one write port.
This can be killer on double-precision FP, even though the FP mult pipe has
first priority for writeback slots.  IMHO:  unified register files are a
bad idea if you're going to be doing lots of DP.  I wonder how the tradeoff
would have gone to devote less die area to the multiplier (which is used on
integer as well) to provide a small FP register file?

The Prism has a high FOM because of split instruction mode, which none of the
other processors in this list have.  Wonder how the i860 will do?

Scott Carter

kahn@batcomputer.tn.cornell.edu (Shahin Kahn) (10/28/89)

In article <4420015@hpihoah.HP.COM> fotland@hpihoah.HP.COM (David Fotland) writes:
>You could define an architectural figure of merit as Specmark/Clock freq.

You should also include compiler quality somehow.  A good architecture
with a lousy code generator shouldn't get a bad mark for architecture.

McGuire@Solbourne.COM (Jim McGuire) (10/30/89)

In article <21569@gryphon.COM> scarter@gryphon.COM (Scott Carter writes:

>In article <4420015@hpihoah.HP.COM> fotland@hpihoah.HP.COM (David Fotland) writes:

>>You could define an architectural figure of merit as Specmark/Clock freq.
>Indeed, this can be a useful value WITHIN "similar" architectures.
>>Prism:	.80
>>MIPS:	.56-.67
>>HP-PA:	.63
>>SPARC:	.45
>>88100:	.39-41
>>68030:	.09
>>This seems to clearly show the advantage of RISC over 68K type machines.  I
>>think it also seems to show the disadvantage of register windows, since
>>PRISM, MIPS, and PA don't have them and SPARC and 88K have them.
>>David Fotland

> ...stuff deleted...
>The Prism has a high FOM because of split instruction mode, which none of the
>other processors in this list have.  Wonder how the i860 will do?
>Scott Carter

On page 68 of the October 16, 1989 Electronic Engineering Times is
a figure entitled "SPEC Benchmark Release 1.0 Summary", which happens
to be for the Apollo DN10010 (PRISM).  Reading the fine print under the
"Notes/Summary of Changes" column:
	* gcc: symout.c not compiled -D_BUILTINS to avoid missing
	  declaration of getcwd()
	* spice2g6: F77 version 10.5 used instead of 10.7: code for DISTO
	  subroutine (not called in benchmark) commented out for F77
	  10.5's benefit
	* fppp: DFLOAT changed to DBLE in fmtest.f and gfloat.f
	# matrix300: code for SAXPY replaced with vec_$dmult_add;
	  compiled with -OPT 3

I found these comments rather interesting.  In particular, replacing
the SAXPY function with a presumably optimized assembly routine.  Also,
the comment "for F77 1.5's benefit" could be interpreted in several
ways, as well.

What's going on here with the SPEC benchmarks?  I thought that this
effort was at last a honest attempt to come up with something that
could be considered a fair comparison between systems.  Naive me :-)!

If people are going to reduce the SPEC numbers to one "FOM", as Mr.
Fotland has done, and ignored the "# ..code for SAXPY.." comments
then it seems to me that this is just more marketing hype. 

Does this mean any vendor can rewrite any portion of the SPEC tests
(or all of it), make a small footnote to the effect:
	* All tests rewritten in KILLER assembly
and then the world will just quote the final SPEC number?

I would prefer to put a stop to this nonsense right at the start.
But then I've already admitted to being naive! :-)
I want to see the SPEC numbers for unmodified benchmark sources.
If "symout.c" doesn't compile with the "-D_BUILTINS" (whatever
that means), I think this should be flagged as a failing to
function in a "standard" environment.  If the standard is not
portable, then fix the standard.

-- 
Jim McGuire, Solbourne Computer			Speaking for
mcguire@solbourne.com				myself only!
...boulder!stan!McGuire

rkc@XN.LL.MIT.EDU (rkc) (02/14/90)

Pardon my ignorance, but what EXACTLY is a specmark?  

	-Rob

gillies@p.cs.uiuc.edu (02/22/90)

> It will be difficult for such a benchmark to avoid caching effects, with
> all the sizable/big/humongous caches that are now appearing.

Hold on a minute.  I think some people are making blanket statements
about caching and performance that aren't true.  

A good benchmark should give you *some* idea of what a cache fault
costs.  But a trivial 5-line program that will cause continuous cache
faults -- just allocate a monster piece of memory and access random
words for a while.

Now maybe a good benchmark should test for a separate instruction
cache, if the machine has it.  But having a huge benchmark is THE
WRONG answer, since it is not a time-independent solution.  You need
to find a way to test the instruction cache without depending upon the
fact that today's caches are 32-128K, since tomorrow's caches may well
be 4-8 megabytes, and next years may be 32-64 MEGABYTES.  Clearly,
SPEC would be outdated very quickly.  It may even be true that
self-modifying code is the best way to do this.

I don't disagree that you need to test a spectrum of operations; the
SPEC benchmark is good in this respect.  But keeping SPEC large is a
poor way to test for caching affects.


Don Gillies, Dept. of Computer Science, University of Illinois
1304 W. Springfield, Urbana, Ill 61801      
ARPA: gillies@cs.uiuc.edu   UUCP: {uunet,harvard}!uiucdcs!gillies

henry@utzoo.uucp (Henry Spencer) (02/23/90)

In article <76700146@p.cs.uiuc.edu> gillies@p.cs.uiuc.edu writes:
>I don't disagree that you need to test a spectrum of operations; the
>SPEC benchmark is good in this respect.  But keeping SPEC large is a
>poor way to test for caching affects.

Why?  Most of the applications that people will be running are similarly
large.  Surely the way to test for caching effects on large programs is
to run large programs?
-- 
"The N in NFS stands for Not, |     Henry Spencer at U of Toronto Zoology
or Need, or perhaps Nightmare"| uunet!attcan!utzoo!henry henry@zoo.toronto.edu

ingoldsb@ctycal.UUCP (Terry Ingoldsby) (02/23/90)

In article <76700146@p.cs.uiuc.edu>, gillies@p.cs.uiuc.edu writes:
> 
> > It will be difficult for such a benchmark to avoid caching effects, with
> > all the sizable/big/humongous caches that are now appearing.
> 
...
> A good benchmark should give you *some* idea of what a cache fault
> costs.  But a trivial 5-line program that will cause continuous cache
> faults -- just allocate a monster piece of memory and access random
> words for a while.
> 
> Now maybe a good benchmark should test for a separate instruction
> cache, if the machine has it.  But having a huge benchmark is THE

I don't think that is the idea behind SPEC.  My understanding is that
SPEC hopes to show system performance under real world conditions with
real world programs.  It doesn't really matter if the benchmark fits
in cache as long as the benchmark represents a typical size application
program.  For example, if a vendor manages to build a 10 MByte cache
(which presumably is large enough to contain a major portion of most
programs) then I see no reason to penalize the vendor by concocting
a benchmark that does random jumps to outside of the cache.

Vendors who build small caches (say 1K) will compare very poorly to
the 10 MByte system, even though processor speed might be identical.
Performance on real applications is all we should hope for from
benchmarks.


-- 
  Terry Ingoldsby                ctycal!ingoldsb@calgary.UUCP
  Land Information Systems                 or
  The City of Calgary       ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb

wayne@dsndata.uucp (Wayne Schlitt) (02/23/90)

In article <1990Feb22.175317.12898@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
> 
> In article <76700146@p.cs.uiuc.edu> gillies@p.cs.uiuc.edu writes:
> >I don't disagree that you need to test a spectrum of operations; the
> >SPEC benchmark is good in this respect.  But keeping SPEC large is a
> >poor way to test for caching affects.
> 
> Why?  Most of the applications that people will be running are similarly
> large.  Surely the way to test for caching effects on large programs is
> to run large programs?


i dont really disagree with what henry says, but i would like to point
out not all real programs are large and not all large programs need to
have large caches.

i am sure there _are_ real programs that can fit entirely in the 68030's
massive (:-) 256 byte data and instruction caches.  i am sure there
are a lot more real programs that can fit entirely in a 8k cache.  i
am sure that a lot of real programs will fit nicely in 32k-128k
caches.  i dont think it is necessary nor is it correct to make your
benchmarks bust every sized cache.

i think the SPEC approach is correct.  use real programs that need a
reasonable range of cache requirements, and let the user be the judge
as to whether the SPECmark is applicable to the real programs that
they want to run.



-wayne

gillies@m.cs.uiuc.edu (02/24/90)

Several people have made the point that SPEC is supposed to be a
"real-world" program benchmark, consisting of "today's" programs.  
My understanding is that a benchmark is a standard, just as a 1-meter
bar of platinum stored in France is a standard.  Computers are
measured by the SPEC standard.  Standards should be timeless.  The
only time we change them is when we can simplify them and perhaps
simultaneously make them more precise (e.g. measuring meters using
atomic emissions).

I question the utility of a benchmark that is not a standard, a
benchmark that is an ever-moving target composed of "today's"
programs, a piece of software incapable of predicting the performance
of "tomorrow's" programs on "today's" computers.

Originally, Mr. Henry M. Spencer complained that a small benchmark
would have a difficult time testing cacheing affects.  Well, SPEC is
no better than Dhrystone in this respect.  SPEC is just as frozen in
time as dhrystone was, and 5 years from now SPEC will be just as
useless at testing cacheing, since caches will certainly be over a
megabyte by then, even on PC's.

mash@mips.COM (John Mashey) (02/25/90)

In article <3300102@m.cs.uiuc.edu> gillies@m.cs.uiuc.edu writes:

>Several people have made the point that SPEC is supposed to be a
>"real-world" program benchmark, consisting of "today's" programs.  
>My understanding is that a benchmark is a standard, just as a 1-meter
>bar of platinum stored in France is a standard.  Computers are
>measured by the SPEC standard.  Standards should be timeless.  The
>only time we change them is when we can simplify them and perhaps
>simultaneously make them more precise (e.g. measuring meters using
>atomic emissions).
>
>I question the utility of a benchmark that is not a standard, a
>benchmark that is an ever-moving target composed of "today's"
>programs, a piece of software incapable of predicting the performance
>of "tomorrow's" programs on "today's" computers.
>
>Originally, Mr. Henry M. Spencer complained that a small benchmark
>would have a difficult time testing cacheing affects.  Well, SPEC is
>no better than Dhrystone in this respect.  SPEC is just as frozen in
>time as dhrystone was, and 5 years from now SPEC will be just as
>useless at testing cacheing, since caches will certainly be over a
>megabyte by then, even on PC's.


It has been publicly stated many times that:
	a) Once we put a benchmark in the set, the reference time is frozejn,
	to avoid the "VAX-mips-time-warp" problem of comp;aring against
	different things.
	b) We are actively working to add benchmarks to the set over time.
	c) If we decide a benchmark has become useless, or was a bad idea,
	or turns out not to do what we thought it did, or whatever, we
	have no inhibition from:
		1) Simply deleting it.
		2) Modfiying it to create a new benchmark, under a new number,
		and deleting the old one.
	What we DON'T do is modify one of the components, avoiding having
	multiple variants floating around that get easily confused and
	mis-stated.  It won't bother us at all to supercede Suite 1.0
	with something better.....

I'm sad to hear that what we've done so far is "no better than Dhrystone",
because if that's true, a whole bunch of us have wasted, in toto, at
least several million $ to try to do something better....

In any case, constructive criticism is always welcomed; these things are
hardly perfect.....
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

lm@snafu.Sun.COM (Larry McVoy) (02/26/90)

>In article <3300102@m.cs.uiuc.edu> gillies@m.cs.uiuc.edu writes:
> [doesn't like SPEC]

In article <36438@mips.mips.COM> mash@mips.COM (John Mashey) writes:
>I'm sad to hear that what we've done so far is "no better than Dhrystone",
>because if that's true, a whole bunch of us have wasted, in toto, at
>least several million $ to try to do something better....

I, for one, think SPEC is great.  We use it a lot in house for
performance tuning.  I get warm fuzzies because I can look at N
different machines as measured by SPEC and the numbers ``make sense''
to me; SPEC seems to do a good job of measuring performance in a
machine independent manner.

On the other hand, SPEC is not the end all to beat all.  No benchmark
is.  If I could design the ideal benchmark, I'd design something that
had a bunch of knobs that I could turn, like an I/O knob, a CPU knob, a
memory knob, etc.  I don't have this, so I run several different
benchmarks that measure these sorts of things.  SPEC is one, Musbus is
another, and we have several internal/proprietary benchmarks as well.
Some people don't like you to quote one figure from one benchmark - I
like to see all the figures from all the benchmarks.  The more data you
have the easier it is to weed out the spikes.
---
What I say is my opinion.  I am not paid to speak for Sun, I'm paid to hack.
    Besides, I frequently read news when I'm drjhgunghc, err, um, drunk.
Larry McVoy, Sun Microsystems     (415) 336-7627       ...!sun!lm or lm@sun.com

alan@oz.nm.paradyne.com (Alan Lovejoy) (02/26/90)

In article <3300102@m.cs.uiuc.edu> gillies@m.cs.uiuc.edu writes:
>I question the utility of a benchmark that is not a standard, a
>benchmark that is an ever-moving target composed of "today's"
>programs, a piece of software incapable of predicting the performance
>of "tomorrow's" programs on "today's" computers.

And just how relevant would ANY benchmark designed in 1955 be to today's
computing?  Just how relevant will today's software resource utilization
profiles be to the computing of the middle of the next century? Consider 
the differences between Cobol, Smalltalk-80 and a neural net program!  Then
consider that such differences will seem miniscule to the people of 2050!


____"Congress shall have the power to prohibit speech offensive to Congress"____
Alan Lovejoy; alan@pdn; 813-530-2211; AT&T Paradyne: 8550 Ulmerton, Largo, FL.
Disclaimer: I do not speak for AT&T Paradyne.  They do not speak for me. 
Mottos:  << Many are cold, but few are frozen. >>     << Frigido, ergo sum. >>

aglew@dwarfs.csg.uiuc.edu (Andy Glew) (02/27/90)

>Several people have made the point that SPEC is supposed to be a
>"real-world" program benchmark, consisting of "today's" programs.  
>My understanding is that a benchmark is a standard, just as a 1-meter
>bar of platinum stored in France is a standard.  Computers are
>measured by the SPEC standard.  Standards should be timeless. 

You have your timeless standards for computer systems performance
- things like "do 1E9 multiplies of random numbers".
Trouble is, the timeless standards only measure a few dimensions
of systems performance, and computer architects keep inventing new
dimensions that make the timeless standards irrelevant.
--
Andy Glew, aglew@uiuc.edu

gillies@p.cs.uiuc.edu (02/28/90)

In Article, mash@mips.com (John Mashey) writes:
> I'm sad to hear that what we've done so far is "no better than Dhrystone",
> because if that's true, a whole bunch of us have wasted, in toto, at
> least several million $ to try to do something better....

Sorry to hear about the money.  My friends tell me that Consumer
Reports is an excellent source of information on managing your money
wisely. 


Don Gillies, Dept. of Computer Science, University of Illinois
1304 W. Springfield, Urbana, Ill 61801      
ARPA: gillies@cs.uiuc.edu   UUCP: {uunet,harvard}!uiucdcs!gillies