[comp.arch] Definition of Benchmark

reiter@endor.harvard.edu (Ehud Reiter) (03/06/87)

I'm currently looking at teaching an AI-ish natural language program about
Dhrystone benchmarks.  The Poor Little AI Program (PLAIP for short) is
supposed to "understand" what people say about Dhrystones.  However, the
the process of describing Dhrystones to PLAIP brings up several questions:

1) I've told PLAIP that Dhrystone measurements are made of "computational
environments", consisting of {computer, operating system, compiler, compiler
switch settings that determine optimization and run-time error checking}.
However, people often refer to the Dhrystones of a computer (e.g. "the VAX
8600 has 6000 Dhrystones"), not of a computational environment.  What does
this mean?
   a)  Completely meaningless
   b)  The maximum Dhrystones under all computational environments with that
computer (which satisfy the Dhrystone rules).  Note that this means
the Dhrystone rating of a computer might change if a new compiler was
introduced.
   c)  Something else

2) I've also defined "performance" as an attribute of computational
environments, not computers, and the same question arises, since people
often refer to the performance or speed of a computer, not a computational
environment (e.g. "the SUN/3-150 is twice as fast as a VAX-11/780").

3) Since PLAIP knows that Dhrystone is not supposed to be run under full
optimization, but real programs are, it knows that the computational
environment of a Dhrystone run is different from the computational
environment of a real program.  Do I tell PLAIP that
   a)  The Dhrystone computational environment is completely unrelated, in
terms of "performance", to the computational environment of real programs.
   b)  The "performance" of the two computational environments is related (how?)

4) I've told PLAIP that "performance" is not a scalar number, but a function
which maps pairs of (program, input data) to an execution time.  However,
people seem to treat performance as a scalar in statements like the above
mentioned "the SUN/3-150 is twice as fast as a VAX-11/780".  Assuming for a
moment that "performance" really was a property of computers, not
computational environments, what does this sentence mean?
   a)  Meaningless
   b)  The performance function of a SUN/3-150 is always exactly twice the
performance function of a VAX-11/780, for all (program, input data) pairs.
   c)  The performance function of a SUN/3-150 is usually close to twice
the performance function of a VAX-11/780 (with what error distribution?).
   d)  Something else.

5) Since PLAIP knows that Dhrystones are scalars, and "performance" is a
function, it is hard to specify a relation between the two.  Do
I tell PLAIP that
   a) There is no relation
   b) The "performance" of a computational environment is, when restricted
to a certain class of programs (which class?  Not all programs, since
Dhrystone doesn't measure floating point and I/O) within some
error tolerance (what?) of the value of an unspecified generic
performance function (what?) multiplied by the Dhrystone figure.
   c) Something else.


I think these questions are important, because not only is PLAIP confused,
I'm confused, and I suspect that John Q. ComputerUser is also confused.
I think before we start introducing yet more benchmarks, arguing that the
new benchmark is "better" than an old benchmark, we need to define what
exactly a benchmark is supposed to be, and what it is supposed to do, and do
it in terms rigorous enough to satisfy PLAIP.  Once we've agreed what a
benchmark is supposed to do, we can then try to define a "goodness measure"
on benchmarks, and then (and only then!) can we really say that one benchmark
is "better" than another.

					Ehud Reiter
					reiter@harvard	(ARPA,BITNET,UUCP)

P.S.  PLAIP does not really exist - it is a "thought experiment" which
someone suggested I try.