[comp.arch] Sampling

eugene@pioneer.arpa (Eugene Miya N.) (09/26/87)

In article <704@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes:
>1) Pick a REPRESENTATIVE set of benchmarks.
> . . .
>Data, not anecdots.
>-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>

John, please do tell us how you feel this should be done.  Joanne's
IEEE paper also says this (job mix), but when you get down to it, this
is a stocastic process.  I have my opinions, she has hers, and you have
yours.  What's representative? ;-)

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

mash@mips.UUCP (John Mashey) (09/26/87)

In article <2882@ames.arpa> eugene@pioneer.UUCP (Eugene Miya N.) writes:
>In article <704@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes:
>>1) Pick a REPRESENTATIVE set of benchmarks.
>> . . .
>>Data, not anecdots.

>John, please do tell us how you feel this should be done.  Joanne's
>IEEE paper also says this (job mix), but when you get down to it, this
>is a stocastic process.  I have my opinions, she has hers, and you have
>yours.  What's representative? ;-)

1) This obviously has no right answer: it depends on what your application
domain is, and it certainly is a stochastic process. [See Doduc story below].

2) Most vendors have mixes that they have rightly or wrongly decided
is representative.  Presumably these grew by agglomeration.  DEC's mix
has a variety of languages, apparently slightly weighted towards FORTRAN.
HP's is likewise, but apparently more weighted towards commercial
COBOL applications.  I'd assume that those who measure in fractions of
a Cray care more about vector applications.:-)

3) If the question meant "what does MIPSco think is representative",
one must first look at what our system priorities are [my opinion, unofficial]:
	a) Run user-level integer programs fast.
	b) Run user-level scalar FP programs fast. [very close 2nd]
		(This doesn't mean we don't care about vector FP, just that
		the machines don't pretend to be vector machines.)
	c) Be reasonable multi-user machines, i.e., don't damage
	kernel code, context-switching, etc, too much to get a), and b).
	d) Run other things reasonably well [even COBOL!] if possible.
OK, so what  DON'T we like as representative.
	Tiny benchmarks, or most synthetic ones. We run them in self-protection,	(Dhrystone, Whetstone, Linpack, Livermore Loops, etc), and look at
	odd effects when we see them, but try to be careful with the results.
	Of these, we probably like them in the reverse order of the listing.
What do we like as representative programs:
	C compiler, assembler, debugger [as large integer programs]
	N/troff, yacc, diff, grep, sort, etc are OK.
	Spice, espresso, timberwolf [CAD pgms, mostly floating-point]
	Doduc [for intense non-vector FP]
	There are a bunch of applications and multi-user tests we've gotten from
	various potential customers [hence, unnameable, and not publishable]
	that we like, such as ray-tracing/rendering, PCB-routers, etc.

4) Nhaun Doduc's story is an interesting example of "representative".  The Doduc
benchmark is a 5300-line FORTRAN program that does Monte Carlo simulations
of some aspects of nuclear reactors.  We like it because it is a good stress
test for intense scalar floating point.  Doduc provides numbers for
a hundred machines or so, which seems surprising at first glance.  Why
should this particular program have been run on so many machines?
(Numbers are more easily available for some machines than Whetstones!]
This happened because particle physicists in
Europe were accustomed to doing big batch runs that analyze particle
events.  The nature of these runs is that you should use X% of your
total timecrunching through the data, and then use the last (100-X)%
of your time summarizing the data, making histograms, etc.  If you
don't finish this before you run out of time [big batch runs, remember,
// JOB cards with time limits, etc], you've wasted it all.  As different
machine types proliferated, people wanted to know how fast machines
would run this kind of processing.  For whatever reason, people found that
Doduc's program correlated quite well with actual performance on these tasks,
so they ended up using it to calibrate their machines.
....."representative" is whatever works!
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086