eugene@pioneer.arpa (Eugene Miya N.) (05/21/87)
In article <6024@steinmetz.steinmetz.UUCP> William E. Davidsen Jr writes: > >After doing benchmarks for about 15 years now, I will assure everyone >that the hard part is not getting reproducable results, but in (a) >deciding how these relate to the problem you want to solve, and (b) getting >people to believe that there is no "one number" which can be used to >characterize performance. If pressed I use the reciprocal of the total >real time to run the suite. It's as good as any other voodoo number... Yes I agree, and I have not had to do it that long. Let's take a moment to study ways to relate or characterize end-users applications: 1) without gross generalizations, but real quantitative data, and 2) using common ideas and tools? Okay? Static as well as dynamic tools. What can we tell independent of machines and languages? Second: There's lots of disciplines which abuse and use single figures of merit and get away from them. Consider: earlier in the season, (end of ski season really): base of NS was a sea of mud, 2/3 way up the mountain in a sheltered area was the snow gauge reading 5.5 feet. You think we have problems with measurement? Is an average ($ {int int from {all area of ski resort} depth function dx dy} over area} a reasonable way to characterize resort coverage? Do we buy cars on single figures of merit? If not, then now many? Consider cardiology: heart function. Single figures are used: heart rates, but EKGs are much better they portrary more. Picture worth a thousand words? Try embedding one on the net with any good resolution. Yes, we can get away, but we have to take others with us. I better stop before Alan Smith totally loses respect (probably has already). From the Rock of Ages Home for Retired Hackers: --eugene miya NASA Ames Research Center eugene@ames-aurora.ARPA "You trust the `reply' command with all those different mailers out there?" "Send mail, avoid follow-ups. If enough, I'll summarize." {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene
nerd@percival.UUCP (Michael Galassi) (05/25/87)
In article <415@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes: >As larry says, real page-thrashers are highly dependent on a lot of attributes. >That doesn't mean they're bad tests, merely that they're extremely hard >to do in a controlled way. In particular, you often see radically different >results according to buffer cache sizes, for example. > >-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> I've not seen this stated around here so I'll do it. Benchmarks can be divided into two major categories: Those which exercise the processor (CPU FPU MMU etc...) and those which exercise the WHOLE computer (i.e. i/o system too). For the person who is evaluating a CPU family for a new design I can see where the first class of benchmarks comes in VERY handy, but the rest of us (those who want to buy a computer, install UNIX, and generate accounts) the MIPS, FLOPS, *stones, etc that the cpu will do are rarely of much interest. I care much more about how the system will handle with a dozen users all doing real tasks (vi, cc, f77, rn, rogue, or whatever) than I do about the the time it takes the cpu to find the first X primes when it is not installed in its cardcage where god wanted it to be. I guess I don't care much about the "a lot of attributes" individualy, but rather how they all work together. Give me anything that overall preforms well (so long as there is no intel cpu in it) and I'll be pleased as pie. -michael -- If my employer knew my opinions he would probably look for another engineer. Michael Galassi, Frye Electronics, Tigard, OR ..!{decvax,ucbvax,ihnp4,seismo}!tektronix!reed!percival!nerd