[comp.benchmarks] Preliminary Miya babblings on benchmarking

eugene@eos.arc.nasa.gov (Eugene Miya) (11/15/90)
While attending SPEC it was very clear to me
	1) we (all benchmarkers) are great philosophizers, no, talkers,
		well, something less than do'ers.  This is a hunch;
		my opinion can change at any time.
	2) None of us really knows what we are doing, but it is very
		clear to me that measurement is part of a process of
		consensus.  I can't come up with a solution unless
		you agree.  A prof at UCB's history department
		enlightening me about the story of the Meter 2 years ago).
	We are blind philosophers trying to count horse's teeth.

Basic problems as I see them.  I have some of this stuff written down
an really elaborated, but I've never finished the paper (4+ years).

I. The tension of simplicity.  We need things to be simple: we need to
port codes, we need to understand what we are tesing, and we need to be able
to comprehend the results {and make use of them}.  Every one want's to
predict, but our methods of description are poor.  No glory in descriptions.
We seek linear models.  That's a real problem.  We see the subproblems
of portability, statistics, social consequences.  I stop here.
[Portability is amusing, I've tried to collect the world's smallest
benchmarks in a few cases: APL, dc, and others, hey, I'm a theoretical
mathematician by training, if a benchmark exists, there must exist
a smallest one.  If a smallest exists, then.....I'll talk about it later.]

II. The problem of equivalence.  "How do you compare apples to oranges
(Bananas or Crays[tm])?"  I think it is possible, but I note two problems:
1) citing the A to O comparison is a great way to kill a discussion.
The biologists got beyond that point. 2) If you get to an "apples to Apples"
comparison, I find people argue "Macintoshes can't be compared to golden
delicious" and you are back to where you started.  Subproblems:
what's a 'real' program?  John Hennessy brought this up.  The problem is
NOT what's "real" the problem is what's "representative?"  A benchmark
is a surrogate.  You typically don't want the "real program."  Takes too
long to port, to run, etc. (I).  Also here are the issues of "repetition"
and "reproducibility."  Two subtle issues.  "Optimization" is another issue.
I think we need to develop the distinction of actual versus virtual work.
Can of worms.  PERFECT Club knows this.  David Kuck (UIUC) taped in Reno.
I jokingly call my stuff "<Perfect" and the prototype is "<<Perfect."
Cheap laugh.

III. Special problem of parallelism.  It's a type of optimization.
"Massive parallelism' is THE big buzzword for 2001.  How do I compare
my Cray to my CM-2 for my COBOL program.......  The problem (linear models)
is that software engineering discovered Brook's law and its corollaries:
the mythical man-month, we confuse work and effort, putting people on
a project late only makes the project later, 1 woman * 9 months = 1 baby
but != 9 women * 1 month.  We will rediscover the Mythical MIP or MFLOPS.
Some programs are not partitionable.  Other big problems.  

IV. Communication.  Benchmarking is just another form of testing.
Functional testing literature is nearly devoid of performance measurement.
That's done by some one else.  Some one else's baileywick.
Benchmarkers are just now learning about graphics, and "visualization"
and other branches of computing.  Oh boy!  And we get to learn to re-invent
the wheel, (re-inventing the ILLIAC again), etc.

Our ideas behind benchmarking are too simplicistic (I).  It's because we
have a "job to do."  The problem is we are so focused on that job,
we don't see that we are pushing the limits of technologies like
benchmarking, like timing, and to make progress, we have to advance the
state of our tools.

If we look for simple things, we will find them.  Only simple things.
So in the coming weeks, I'll try to share a few things and some real data.
We have to make some changes in the way we do things.

My problem: get off my butt and formally write some of this stuff up,
figure a way to do some of this stuff, and get time to think quietly
about it before someone else does 8^).

--e.n. miya, NASA Ames Research Center, eugene@eos.arc.nasa.gov
  Babbling in chunks.  Sorry, I apologize for the pontification. I thought
  it needed to be stated.
  {uunet,mailrus,most gateways}!ames!eugene
  AMERICA: CHANGE IT OR LOSE IT.

  "DON'T BENCHMARK ENRAPPEL."  -- MR.  Let me know if you read this, Martha.