[comp.benchmarks] More on benchmarking

eugene@eos.arc.nasa.gov (Eugene Miya) (11/16/90)
I forgot to add, as a problem or issue: composition (call it
combination), comp.arch has this as a discussion of
"holistic" versus "reductionistic" benchmarking.  I hesistate
to use the words: Gestalt benchmarking.  I posed the question
when visiting Los Alamos a couple of years back:
	Does the whole of a benchmark time equal the sum of the parts?
The question came from psychologist Fritz Perl's comment:
The whole is greater than the sum of its parts.  Cautiously
half of the audience raised hands in favor of equaling parts,
the other half said, no, it's greater than the sum of parts.
People began to discuss why: optimization, multi-bank memory effects,
etc.  Sciences tend to be built on analysing parts (breaking
things down) and putting them back together.  This discussion
disturbed a lot of people.  It should, since ti shows our problem.

Next, time, if I may, I want to pose some simple (naive) questions based
on the above and show some real data.  But first, I wish to enumerate
the benchmarking efforts I am aware.  On a long drive (6 hours)
I came up with the idea of a paper which I've outlined
"Who's on the bench?"  It's about who the different people in
benchmarking are, and how their positions affect the way they view
benchmarking: users, marketeers, architects and it looks at the adversarily
relationship among these people.  The idea came from the 5'W's.
Like "Do you trust benchmarks you don't run yourself?"  "Why?"

A useful portion is just to enumerate efforts:
First there are the classic profession societies: ACM/SIGMETRICS, CMG, etc.
and to lesser degrees ACM/SIGARCH.  Many people see some of these people
as "hight brow" theorists with distant practical value: they speak of
queueing network models and Mean Value Analysis."  They have the performance
mail list from Vanderbilt (current).  Some make a lot of money consulting
to tune systems.  I think we must make better efforts to unite theory with
practice.

There there are a host of smaller less formal efforts:
SPEC (I think generally good, and others agree) "Too workstation oriented"
	is one criticism.
PERFECT (U. Ill.) "too supercomputer oriented" is one comment I've heard.
The NIST (National Institute for Standards and Technology) they have
	a mail daemon child of netlib.
NETLIB (Dongarra's numerical software child of the LINPACK libraries,
	a good thing).
Transaction Processing Benchmark-- Jim Gray at Tandem, and that community.
Two efforts on graphics benchmarking--Bay Area ACM/SIGGRAPH (which I helped
to start [good start, but currently stagnant, partially my fault as I had
"other duties," resulted in a session with 300+ people in attendence, lots
of interest benchmarking graphics systems, proceedings] and Ken Anderson's
GPC effort."
Some very good work at IBM Yorktown Heights, LANL, and other sites.
Gabriel's LISP benchmarks.
Miscellaneous individual benchmarks (not efforts): Whetstone, Gibson
mix, Dhrystone, Livermore Loops, Byte, NAS Kernels, ad naseum.
SEI's Ada benchmarks.

It is the latter which interests me.  Benchmarking has borrowed nothing
from the functional testing community, and supposedily testing people
have interesting tools.  Computer graphics are sometimes used poorly.
View the video tape of the Supercomputing'89 workshop of benchmarking.
Even SPEC has had problems (line graphs of nominal data).

Some of you might be able to add to the list.  Certainly many companies
have benchmarking groups, but many are afterthoughts to hardware
development.

--e.n. miya, NASA Ames Research Center, eugene@eos.arc.nasa.gov
  {uunet,mailrus,most gateways}!ames!eugene
  AMERICA: CHANGE IT OR LOSE IT.