radford@calgary.UUCP (Radford Neal) (09/22/86)
In article <20954@rochester.ARPA>, crowl@rochester.ARPA (Lawrence Crowl) writes: > The table below is a reorganization of the following table. [I ommitted > this table in this posting. It contained the times for the benchmarks in > seconds. RN] > Relative Performance > > processor 80286 80386 68000 68020 68020 32032 32100 32100 > cache (MHz) (10) (16) (8) N (16) C (16) (10) N (18) C (18) > > string search 1.37 1.00 3.85 2.25 1.08 3.51 4.71 1.89 > bit manipulate5.52 2.09 5.91 2.25 1.00 5.29 3.58 1.74 > linked list 3.08 1.70 4.11 1.79 1.00 2.90 2.36 1.28 > quicksort 4.07 2.49 4.39 2.05 1.00 3.12 3.12 1.32 > matrix trans 6.42 2.05 5.49 1.58 1.00 4.33 3.04 1.47 > > average 4.09 1.87 4.75 1.98 1.02 3.83 3.36 1.54 The averaging in this table is done incorrectly. As noted in a recent CACM article, normalized benchmark results should be averaged with a geometric mean, not an arithmetic mean. The geometic mean of N numbers is the Nth root of their product. This method gives the correct results: RIGHT average 3.60 1.79 4.68 1.97 1.02 3.74 3.28 1.52 In this case, it doesn't seem to make all that much difference in the conclusions. In can though, consider the following example: Machine A Machine B Benchmark 1: 10 seconds 5 seconds Benchmark 2: 10 seconds 20 seconds Look at the results of normalizing these figures to Machine A and then taking the arithmetic mean of the results: Machine A Machine B Benchmark 1: 1.0 0.5 Benchmark 2: 1.0 2.0 arith. mean: 1.0 1.25 Machine B is thus 25% slower than machine A, right? Wrong. Look at what happens when you take the *same* benchmark results, normalize to machine B, and take the arithmetic mean: Machine A Machine B Benchmark 1: 2.0 1.0 Benchmark 2: 0.5 1.0 arith. mean: 1.25 1.0 Now machine B comes out looking faster! If you take the geometric mean, however, machine A and machine B look equally fast regardless of how you normalize the results. Radford Neal The University of Calgary
eugene@ames.UUCP (Eugene Miya) (09/25/86)
> > The averaging in this table is done incorrectly. As noted in a recent CACM > article, normalized benchmark results should be averaged with a geometric > mean, not an arithmetic mean. > Radford Neal > The University of Calgary There is no clear cut evidence that the geometric mean is any more correct than any other [Re: the Flemming and Wallace paper]. Jack Wolton of Los Alamost published a paper in 1984 (IEEE Compcon, with the title: Bottleneckology: something, my proceedings are lent out at the moment). This paper touted the HARMONIC mean as the "correct" mean. This draws both suspect. The proof offered by W&F is not a sufficiently rigorous proof. And I think a poor proof is worse than no proof. The science and art of measurement are independent of statistics (I realize an over simplification). We resort to statistics because our measurement tool on the computer are so poor. What sense is it to average numbers intended to show some sort of peak performance? (Don't say average peak performance.) I suggest reading two texts: %A Darrell Huff %T How to Lie with Statistics %I Norton %C NY %D 1954 %A Edward R. Tufte %T The Visual Display of Quatitative Information %I Graphics Press %C Cheshire, CO %D 1983 Separate note on performance measurement: I have been talking to colleagues at Berkeley, LBL, LLNL, and other locations. We will be having an informal dinner meeting, probably to be held at UC Berkeley (to the shagrin of the Stanford people) sometime during the end of October. I am trying to think of a name to characterize this group: New Generation Performance Measurement Group (naw, make it a Ring), Bay Area Performance Measurement Ring, Those People Interested in Improving Performance Measurement People. Anyway, if you are in the Bay Area and are seriously interested, contact me. We have several mail correspondents like Jack Dongarra and Ken Dymond, but we want to have this meeting, too. Added reference: %A Philip J. Flemming %A John J. Wallace %T How Not to Lie with Statistics: The Correct Way to Summarize Benchmark Results %J CACM %V 29 %N 3 %D March 1986 %P 218-221 %X Waste of a good title. From the Rock of Ages Home for Retired Hackers: --eugene miya NASA Ames Research Center com'on do you trust Reply commands with all these different mailers? {hplabs,ihnp4,dual,hao,decwrl,tektronix,allegra}!ames!aurora!eugene eugene@ames-aurora.ARPA
peters@cubsvax.UUCP (Peter S. Shenkin) (09/27/86)
In article <ames.1675> eugene@ames.UUCP (Eugene Miya) writes: >> >> The averaging in this table is done incorrectly. As noted in a recent CACM >> article, normalized benchmark results should be averaged with a geometric >> mean, not an arithmetic mean. >> Radford Neal >> The University of Calgary > >There is no clear cut evidence that the geometric mean is any more correct >than any other [Re: the Flemming and Wallace paper]. Jack Wolton of >Los Alamos published a paper in 1984... >This paper touted the HARMONIC mean as the "correct" mean.... Note that if the values being averaged don't have too much spread, all these means are about the same; also, even if the distributions of values have large spread, I believe the distributions being compared have to be rather different in form for the different methods of calculating the mean to give different rank-orders. (But I seem to have missed the original article to which this refers, so I'm not sure what's being compared!) Peter S. Shenkin Columbia Univ. Biology Dept., NY, NY 10027 {philabs,rna}!cubsvax!peters cubsvax!peters@columbia.ARPA
radford@calgary.UUCP (Radford Neal) (09/29/86)
In article <1675@ames.UUCP>, eugene@ames.UUCP (Eugene Miya) writes: > > The averaging in this table is done incorrectly. As noted in a recent CACM > > article, normalized benchmark results should be averaged with a geometric > > mean, not an arithmetic mean. > > There is no clear cut evidence that the geometric mean is any more correct > than any other [Re: the Flemming and Wallace paper]. Jack Wolton of > Los Alamost published a paper in 1984 (IEEE Compcon) [touting] > HARMONIC mean as the "correct" mean... > The proof offered by W&F is not a sufficiently > rigorous proof. And I think a poor proof is worse than no proof. Could you elaborate on what's wrong with the proof offered by W&F? I re-read the article last night, and they seem to me to prove what they set out to prove. The harmonic mean certainly doesn't work for normalized numbers. W&F do not claim that the geometric mean is the only good way to average benchmarks, just the only way to average *normalized* benchmarks. If you have a good idea of your job mix, a weighted arithmetic mean of the raw data is the way to go. Then you can normalize this mean to one machine if you feel like it. Regardless, the case *against* arithmetic means of normalized numbers seems completely incontrovertable, regardless of what one considers the "best" replacement. It's not acceptable for the results to depend on which machine one arbitrarily decides to normalize to. The paper discussed here is: Flemming, Philip J. and Wallace, John J. "How not to lie with statistics: The correct way to summarize benchmark resuls", Communications of the ACM, Vol. 29, No. 3. (March 1986). Radford Neal The University of Calgary
hansen@mips.UUCP (10/03/86)
The geometric and harmonic means are MEANINGLESS for benchmarking machines. They are only useful for fudging the results to try to make an unbalanced, ill-designed, machine look good. Look at it this way: if you've got two jobs to get done and two machines take the following times: Machine A Machine B Job 1 10 sec 20 sec Job 2 10 sec 5 sec Does anyone disagree that Machine A takes 20 seconds, and Machine B takes 25 seconds? It should be obvious that machine A is the faster machine FOR THE GIVEN WORKLOAD. Even if the times were more extreme: Machine A Machine B Job 1 10 sec 20 sec Job 2 10 sec 1 sec Still, Machine B is slower. It doesn't matter a damn that Job 2 executes ten times faster on Machine B, because it was so slow on Job 1 that it already lost the race before even starting on Job 2. Now, if you run Job 2 more often than Job 1, or Job 2 is more closely representative of the workload you intend for the machine, then sure, go ahead and adjust the weighting. However, the geometric and harmonic means take neither of these factors into account, and effectively use inconsistent weightings between machines. Doesn't anyone remember the parable of the Tortise and the Hare? I suppose someone will now try and convince us that the Tortise should have been the winner via the geometric mean! -- Craig Hansen | "Evahthun' tastes MIPS Computer Systems | bettah when it ...decwrl!mips!hansen | sits on a RISC"