[comp.benchmarks] performance

eugene@eos.arc.nasa.gov (Eugene Miya) (11/16/90)
In article <MCCALPIN.90Nov15090025@pereland.cms.udel.edu>
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes:
I noted 10% performance variation in VAXen.
>Eugene> 10% acceptable?  In some cases yes, others no.  
John>
John>I am surprised that such a sensible person as Eugene would imply that
John>*any* benchmark number had a precision of <10%.  I don't believe that
John>it is possible to take any combination of "general-purpose" benchmarks

I'm sorry, let me make a clarification.  If you check the original
post, I mentioned nothing about benchmarks.  The performance variation
measured was strictly cycle time.  If you want the start of the
references, let me know.

Me sensible?  I'm honored.  I think we both know that some aspects of
benchmarking are counter-intuitive.  We have to challenge a few
fundamental ideas.

P.S. What's "general purpose?"  And, you hit a key word which I forgot
to mention when posting last evening "combination."  That's
significant.  It only occurred to me after I got home, and you remind me
here.

John>and use that data to predict your application (or workload)
John>performance to within 10%.  In fact, it is all to easy to have 10%
John>changes in the performance of your application itself if (as is
John>inevitable) it is run under conditions that differ from the formal
John>benchmark test.  Minor changes like operating system or compiler
John>upgrades, changes in the system background load, or even disk
John>fragmentation can produce 10% changes in wall-clock time quite
John>easily....

Good points.  We should explore these in discussion.
I hope I can post some numbers.  Some codes merely need a compiler switch.

John>So how precise do I think the numbers are?  Well, with 6 or so years
John>of experience in performance evaluation of supercomputers and
John>high-performance workstations, I can generally [i.e., not always]
John>estimate the performance of my codes to within 20-25% based on a broad
John>suite of benchmark results (LINPACK 100x100, LINPACK 1000x1000,
John>Livermore Loops, hardware description with cycle counts, and maybe a
John>bit more).  (I deliberately ignore PERFECT since the one code that I
John>know in some detail [the ocean model from GFDL/Princeton] is a mess,
John>and I would not blame a compiler at all for having trouble vectorizing
John>or optimizing it [or even understanding what it is supposed to be
John>doing!]).

I don't think the solution will lie SOLELY in benchmark programs.
I don't think Hennessy made that clear enough, and I want to try and show
some of that, but I must also respect certain copyright and
non-disclosure information.  We have to instrument OSes more, the hardware,
and the compiler (what am I forgetting?).  Sadily, this won't immediately help
the guy who just wants to buy a PC, but it will help architects and compiler
writers and I hope it "trickles down."

--e.n. miya, NASA Ames Research Center, eugene@eos.arc.nasa.gov
  {uunet,mailrus,most gateways}!ames!eugene
  AMERICA: CHANGE IT OR LOSE IT.