[comp.arch] Benchmarking Systems' Performance

kjmcdonell@water.UUCP (05/19/87)

Recent discussions and suggestions have included ...
(a) acupuncture tests (dc bashing, getpid() thumping, ...)
(b) the stone family (whet, dhry, dhamp, ....)
(c) `stone soup'' (some combination of the above)
(d) monster recompilation (some/all of the Unix source)

If we are serious about measuring system performance, then the following
features and factors must be recognized,
(a) a single-figure-of-merit is not very useful
(b) results must be statistically reliable
(c) measurements must reflect computing activity that is representative
    of the usage at *your* site
(d) real-time delays cannot be ignored if the processing includes a significant
    interactive component
(e) *predicted* performance (e.g. saturation, response-time degradation) with
    varying load is the most useful outcome of any benchmarking

I have a constructed a benchmarking suite that provides a testbed for
measuring system performance for varying numbers of emulated users.  Objective
(c) above is met by the specification of workload profiles as shell scripts
(for the shell of your choice, e.g. /bin/sh, /bin/csh, an SQL interpreter,
your favourite interactive program, ...).  This software, known as the MUSBUS
suite has circulated widely (given the e-mail I receive!) and is the subject
of a talk to be given at the forthcoming Usenix meeting in Phoenix.

But because the results for one workload profile provide little useful
information for other processing environments, and because *no* globally
representative workload profile exists, I have discouraged widespread
publication and/or tabulation of comparative results.  If you are serious
about equipment selection, and plan to use benchmark results as input to the
decision making process, then you should be willing to define a representative
workload profile for your environment, and run the tests on competitive
machines to produce reliable and comparable results -- MUSBUS provides
a test-bed environment that makes this possible with minimal effort.

No single test, and no battery of discrete tests is going to be of long-term
validity in predicting the performance of a system in a particular processing
environment -- we need portable performance tools that can be configured to
measure total system performance (e.g. throughput or elapsed time) for
specific workload profiles.

----
Ken J. McDonell				kjmcdonell@er.waterloo.cdn
    currently visiting the 		kjmcdonell@water.uucp
    University of Waterloo from

Dept. Computer Science			kenj@moncsbruce.oz
Monash University, Clayton
AUSTRALIA 3168