[comp.org.usenix] Benchmarking

eugene@pioneer.arpa (Eugene Miya N.) (05/21/87)

In article <6024@steinmetz.steinmetz.UUCP> William E. Davidsen Jr writes:
>
>After doing benchmarks for about 15 years now, I will assure everyone
>that the hard part is not getting reproducable results, but in (a)
>deciding how these relate to the problem you want to solve, and (b) getting
>people to believe that there is no "one number" which can be used to
>characterize performance. If pressed I use the reciprocal of the total
>real time to run the suite. It's as good as any other voodoo number...

Yes I agree, and I have not had to do it that long.  Let's take a moment
to study ways to relate or characterize end-users applications:
1) without gross generalizations, but real quantitative data,
and 2) using common ideas and tools?  Okay?  Static as well as dynamic
tools.  What can we tell independent of machines and languages?

Second:
There's lots of disciplines which abuse and use single figures of merit
and get away from them.  Consider: earlier in the season, (end of ski
season really): base of NS was a sea of mud, 2/3 way up the mountain
in a sheltered area was the snow gauge reading 5.5 feet.  You think we
have problems with measurement?  Is an average ($ {int int from {all area of
ski resort} depth function dx dy} over area} a reasonable way to
characterize resort coverage?

Do we buy cars on single figures of merit?  If not, then now many?

Consider cardiology: heart function.  Single figures are used: heart
rates, but EKGs are much better they portrary more.  Picture worth a
thousand words?  Try embedding one on the net with any good resolution.

Yes, we can get away, but we have to take others with us.  I better stop
before Alan Smith totally loses respect (probably has already).

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

nerd@percival.UUCP (Michael Galassi) (05/25/87)

In article <415@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes:

>As larry says, real page-thrashers are highly dependent on a lot of attributes.
>That doesn't mean they're bad tests, merely that they're extremely hard
>to do in a controlled way.  In particular, you often see radically different
>results according to buffer cache sizes, for example.
>
>-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>

I've not seen this stated around here so I'll do it.
Benchmarks can be divided into two major categories:
Those which exercise the processor (CPU FPU MMU etc...) and those which
exercise the WHOLE computer (i.e. i/o system too).  For the person who
is evaluating a CPU family for a new design I can see where the first
class of benchmarks comes in VERY handy, but the rest of us (those who
want to buy a computer, install UNIX, and generate accounts) the MIPS,
FLOPS, *stones, etc that the cpu will do are rarely of much interest.
I care much more about how the system will handle with a dozen users
all doing real tasks (vi, cc, f77, rn, rogue, or whatever) than I do
about the the time it takes the cpu to find the first X primes when it
is not installed in its cardcage where god wanted it to be.
I guess I don't care much about the "a lot of attributes" individualy,
but rather how they all work together.  Give me anything that overall
preforms well (so long as there is no intel cpu in it) and I'll be
pleased as pie.
-michael
-- 
If my employer knew my opinions he would probably look for another engineer.

	Michael Galassi, Frye Electronics, Tigard, OR
	..!{decvax,ucbvax,ihnp4,seismo}!tektronix!reed!percival!nerd