[comp.benchmarks] Running the SPEC benchmarks

gordoni@chook.adelaide.edu.au (Gordon Irlam) (04/05/91)

When running the SPEC 1.2 benchmarks the times reported are supposed to
be "the best times that can be consistently reproduced".

How should the word "consistently" be interpreted?

As meaning "nearly always".  Eg. "We ran the benchmark and consistently
got a time of less than 90.6 seconds."

Or as meaning "more than once".  Eg. "One in every 7 runs gives a time
of 90.6 seconds or better, and thus we are able to consistently reproduce
the reported time."

Is it permitted to run a benchmark several times in succession, rather than
cycle through them in order if this gives better results?  (For instance
the operating system might dynamically tune its memory management algorithms
in response to the decreased demand for new pages).

Running benchrun results in 3 sh's, make, and tee occupying memory as
well as the benchmark.  On a system with little memory this could result
in some extra paging when the benchmark first starts to run.  In addition
a tee is likely to cause several undesirable context switches.  Is it
permissible to run the benchmark from make?  What about typing in the
time command directly to the shell? (This would prevent a cp, which might
make it easier for some operating systems to keep the executable in memory.
It would also prevent the cost of scheduling the write back of the files
that would have been cp'ed - update is likely to perform a sync while the
benchmark is running.  For gcc we are talking of about writing back roughly
1M)?  What about typing in the cp, and then waiting 30 seconds before
starting the benchmark?  If you are very clever you could time when make
started running so that a sync occurs just before the cp has finished, but
before the benchmark starts running.  Is this allowed?  Are the results
always required to be written out to a file, or once the results have been
validated, can /dev/null be used.

Should the SPECmark figure be computed using infinite precision arithmetic,
or should the rounded SPECratios of each benchmark be used?

Which of the above possibilities, while stricly permitted would be considered
to depart from the honest, cooperative spirit in which SPEC was established?

Which of the above possibilities do most vendors employ?

                                            Thanks for any help,

                                                Gordon Irlam
                                                (gordoni@cs.adelaide.edu.au)

lewine@cheshirecat.webo.dg.com (Donald Lewine) (04/06/91)

In article <2767@sirius.ucs.adelaide.edu.au>, gordoni@chook.adelaide.edu.au (Gordon Irlam) writes:
|> When running the SPEC 1.2 benchmarks the times reported are supposed to
|> be "the best times that can be consistently reproduced".
	[[Many ways to cheat deleted]]
|> Which of the above possibilities do most vendors employ?
|> 
It is fair game for a customer to say, "I want you to reproduce your
claimed SPECmark numbers on *my* configuration as part of the 
acceptance test."  Vendors must be able to say, "No problem" or 
point out that the SPECmark numbers cannot be reproduced because the
customer is buying a smaller configuration than the published one.

In any event, the vendor must be able to tell the customer what
SPECmark (or any other) benchmark result to expect.  The customer
should insist on getting that result prior to paying.  Enough
customers do insist that vendors tend to live by the rules and don't
send output to /dev/null or do other tricks they would be unwilling
to repeat in front of a customer.

On the other hand, Data General uses two different C compilers.
The best compiler for gcc is not the best compiler for expresso.  We
document which compiler is used for which benchmark.  One would need
to buy both compilers to reproduce our exact results.

--------------------------------------------------------------------
Donald A. Lewine                (508) 870-9008 Voice
Data General Corporation        (508) 366-0750 FAX
4400 Computer Drive. MS D112A
Westboro, MA 01580  U.S.A.

uucp: uunet!dg!lewine   Internet: lewine@cheshirecat.webo.dg.com

jvm@hpfcso.FC.HP.COM (Jack McClurg) (04/07/91)

Without answering the questions asked here ( I think that you have an excellent
series of questions; I hesitate to answer, since my answer may be thought to
be "official" and it is not; Some reported results probably have used some of
the techniques described; BTW, which do you think are legitimate?), I merely
point out that SPEC distributes the source code to the SPEC benchmarks so that
the results printed in the SPEC newsletter may be reproduced.  Also, the
results summary page should have complete configuration information and notes
sufficient to ensure reproducibility.

If you have difficulty duplicating results, call the vendor for assistance,
and then tell us what transpired.  This should include instances where
your results were better than what the vendor claimed.

Jack McClurg
jack_mcclurg@fc.hp.com
303-229-2126
Chair, SPEC Steering Committee