reiter@endor.harvard.edu (Ehud Reiter) (03/06/87)
I'm currently looking at teaching an AI-ish natural language program about Dhrystone benchmarks. The Poor Little AI Program (PLAIP for short) is supposed to "understand" what people say about Dhrystones. However, the the process of describing Dhrystones to PLAIP brings up several questions: 1) I've told PLAIP that Dhrystone measurements are made of "computational environments", consisting of {computer, operating system, compiler, compiler switch settings that determine optimization and run-time error checking}. However, people often refer to the Dhrystones of a computer (e.g. "the VAX 8600 has 6000 Dhrystones"), not of a computational environment. What does this mean? a) Completely meaningless b) The maximum Dhrystones under all computational environments with that computer (which satisfy the Dhrystone rules). Note that this means the Dhrystone rating of a computer might change if a new compiler was introduced. c) Something else 2) I've also defined "performance" as an attribute of computational environments, not computers, and the same question arises, since people often refer to the performance or speed of a computer, not a computational environment (e.g. "the SUN/3-150 is twice as fast as a VAX-11/780"). 3) Since PLAIP knows that Dhrystone is not supposed to be run under full optimization, but real programs are, it knows that the computational environment of a Dhrystone run is different from the computational environment of a real program. Do I tell PLAIP that a) The Dhrystone computational environment is completely unrelated, in terms of "performance", to the computational environment of real programs. b) The "performance" of the two computational environments is related (how?) 4) I've told PLAIP that "performance" is not a scalar number, but a function which maps pairs of (program, input data) to an execution time. However, people seem to treat performance as a scalar in statements like the above mentioned "the SUN/3-150 is twice as fast as a VAX-11/780". Assuming for a moment that "performance" really was a property of computers, not computational environments, what does this sentence mean? a) Meaningless b) The performance function of a SUN/3-150 is always exactly twice the performance function of a VAX-11/780, for all (program, input data) pairs. c) The performance function of a SUN/3-150 is usually close to twice the performance function of a VAX-11/780 (with what error distribution?). d) Something else. 5) Since PLAIP knows that Dhrystones are scalars, and "performance" is a function, it is hard to specify a relation between the two. Do I tell PLAIP that a) There is no relation b) The "performance" of a computational environment is, when restricted to a certain class of programs (which class? Not all programs, since Dhrystone doesn't measure floating point and I/O) within some error tolerance (what?) of the value of an unspecified generic performance function (what?) multiplied by the Dhrystone figure. c) Something else. I think these questions are important, because not only is PLAIP confused, I'm confused, and I suspect that John Q. ComputerUser is also confused. I think before we start introducing yet more benchmarks, arguing that the new benchmark is "better" than an old benchmark, we need to define what exactly a benchmark is supposed to be, and what it is supposed to do, and do it in terms rigorous enough to satisfy PLAIP. Once we've agreed what a benchmark is supposed to do, we can then try to define a "goodness measure" on benchmarks, and then (and only then!) can we really say that one benchmark is "better" than another. Ehud Reiter reiter@harvard (ARPA,BITNET,UUCP) P.S. PLAIP does not really exist - it is a "thought experiment" which someone suggested I try.