mash@mips.COM (John Mashey) (12/27/90)
In article <15379@ogicse.ogi.edu> borasky@ogicse.ogi.edu (M. Edward Borasky) writes: >Thank you for at least driving another stake in "bc benchmark"'s heart. >However, as you and I know, there is a tremendous need out there for >[sigh] [gasp] A SINGLE NUMBER to characterize JUST EXACLTY HOW FAST >ANY GIVEN COMPUTER IS. I have my own personal favorite which I will >not belabor because everyone has his own personal favorite. My question >is this: just as you and I believe that vampires don't exist, do you >believe that a single-number that measures a computer's speed doesn't >exist? I won't state MY belief to avoid bias in the discussion. My >use of the word "bias" in the preceding sentence is a HINT on my belief! There is no One Number that predicts performance. Let me restate the hypothesis more precisely: Let ON(a) be the One Number for machine a. We'd then expect that ON(a) / ON(b) would predict the relative performance of machines a and b on all benchmarks. Well, that number doesn't exist, and is easily proved not, even by looking just at the published SPEC benchmarks. How about saying that ON(a) / ON(b) predicts the relative performance of any benchmark within 10%? Well that doesn't exist either, from the same data. How about saying that ON(a) / ON(b) predicts the relative performance within a factor of 10, i.e., suppose ON(a) / ON(b) == 1.0, then it would be OK as long as "a" was no more than 10X faster than "b" on any benchmark, or vice-versa. This might exist, and might even cover the SPEC data, although one may have to go even higher, like allowing a factor of 20X off. (I'm just unpacking, and don't have the numbers handy.) For example, try comparing a CISC micro (like a 486), which has good integer performance, but whose VAX-relative floating-point is pretty low, with a vector machine (like the Stardent), or with the IBM RS/6000, both of whose floating point performance tends to be much stronger than their VAX-relative integer performance. Of course, something that predicts only within a factor of 10X to 20X is pretty useless..... But even if it were within 20-40%, it's still pretty bad. (Note, for example, that published Dhrystone results easily mis-predict SPEC integer benchmarks pretty badly, i.e., it is quite easy for machine "a" to be 25% faster on Dhrystone than "b", and end up 25% SLOWER on more realistic integer benchmarks.) -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
aburto@marlin.NOSC.MIL (Alfred A. Aburto) (01/03/91)
In article <44353@mips.mips.COM> mash@mips.COM (John Mashey) writes: >(Note, for example, that published Dhrystone results easily mis-predict >SPEC integer benchmarks pretty badly, i.e., it is quite easy for machine >"a" to be 25% faster on Dhrystone than "b", and end up 25% SLOWER on more >realistic integer benchmarks.) >-- >-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> This is an interesting observation (result). Dhrystone was intended to be REPRESENTATIVE of TYPICAL integer programs. That is, hundreds (I believe) of programs were analyzed to come up with the (ahem) 'typical' high level language instructions and their frequency of usage. In view of this I would, at first sight, suspect the Dhrystone to be more accurate than SPEC as SPEC is based upon only a few integer programs. What happened? Why does Dhrystone fail? Is it due to: (a) Instruction Mix is WRONG? (b) Optimization Problems? This is not a problem in my view --- we just need people to report results using various compiler options then we gain a more proper perspective of the variation in performance. Of course, in general, people tend to publically report the 'Max' or 'Best' performance. The 'Min' or 'Mean' results are more difficult to find. I know Dhrystone (1.0, 1.1, 2.0, 2.1) can all be optimized a great deal (up to a factor of 2 or so because I've done it) but this should not be a problem as long as we know what result corresponds to what compiler options --- this helps to define the RANGE of expected performance (Min, Max and/or Std. Dev.) with a certain compiler and system, and also the 'Mean' or 'Median' performance. (c) Program Size TOO small? I suppose that if it were not for cacheing (cache size) effects then program size should not be a problem, but I'm no expert ... (d) Something else? Why should one expect the integer SPEC results to be more 'accurate' than the Dhrystone? I'm just wondering. What is a 'typical' program or 'typical' frequency of instruction usage? Seems to me there is no one real 'typical' anything but a wide variety of 'typical' programs, instruction mixes, and frequency of usages depending upon application. Real programs also show a great variation in performance. I noticed this recently in a Scientific American article (Jan 1991) which showed the comparison of 13 different real programs on a wide variety of supercomputers. The program 'megflop' variation in perfromance was truly tremendous especially for the fastest systems (Cray and a NEC computer I think). Al Aburto aburto@marlin.nosc.mil