patel@uicsg.UUCP (02/25/87)
As an author of a benchmark DHAMPSTONE (posted earlier on this notes file) and a researcher in this field I should point out the following serious misconceptions/myths about the above two benchmarks. I hope people stop misusing these benchmarks after they have read this. Myth 1. It measures the cache performance. The benchmarks did not incorporate any referencing behavior at the address level. That is, they do not try to mimic the spatial or temporal locality in systems programs. Even worse, because the benchmarks were designed to be very small, they will fit in most reasonable caches. Thus running a benchmark on a cache machine in stand-alone mode will cause no cache misses except the initial loading. For comparison purposes, runing the benchmark on two different cache machines is acceptable as long as it is understood that the relative performance is that of the CPU without the usual cache degradation. To put it differently, the measures show the performance with infinite cache. Comparing a cache machine with a non-cache is also acceptable with the same understanding that the performance of the cache machine will be slightly over estimated. Myth 2. It measures the paging performance. Same comments as above. Myth 3. It measures the compiler performance. The benchmarks were based on high level language statistics. No information was incorporated regarding optimizing compilers and how they might alter the statistics. In fact, using a good optimizing compiler will totally distort the performance figures. Since the benchmarks are synthetic programs which do not do any useful computation, but merely reproduce the various measures (CALL frequency, parameter distribution, etc) some statements in the program are not logically necessary. For example the statement x = y + z; where x is never used again in the program is not logically necessary and will be removed by a smart compiler. However, the statement was introduced to produce a desired statistics on parameter and variable referencing, type of operation and so on. Benchmarks were designed to evaluate machine performance and not compiler performance. Some impact of the compiler on the performance is unavoidable. But to minimize the distortions, DO NOT USE AN OPTIMIZING COMPILER when using the above benchmarks. Myth 4. It measures the absolute machine performance. Benchmarks provide only a relative measure of performance not an absolute one. Thus if a machine X runs with a speed of 100 Dhampstones (Dhampstone/sec) there is no simple way to predict the speed of your "troff" or "cc" job. However, if another machine Y has a speed of 25 Dhampstones then one can safely assume that a reasonable mix of systems programs (troff, edit, compile etc) would run 4 times faster on machine X than on machine Y. Janak H. Patel University of Illinois ARPA: uicsg!patel@uiuc.edu UUCP: ihnp4!uiucdcs!uicsg!patel
mash@mips.UUCP (03/01/87)
In article <500002@uicsg> patel@uicsg.UUCP writes: > > >As an author of a benchmark DHAMPSTONE (posted earlier on this notes file) >and a researcher in this field I should point out the following >serious misconceptions/myths about the above two benchmarks. I hope >people stop misusing these benchmarks after they have read this. > ....a number of generally reasonable comments.... >Myth 3. It measures the compiler performance. > > The benchmarks were based on high level language statistics. No > information was incorporated regarding optimizing compilers and how > they might alter the statistics. In fact, using a good optimizing > compiler will totally distort the performance figures...... > type of operation and so on. Benchmarks were designed to evaluate > machine performance and not compiler performance. Some > impact of the compiler on the performance is unavoidable. But to > minimize the distortions, DO NOT USE AN OPTIMIZING COMPILER when > using the above benchmarks. > >Myth 4. It measures the absolute machine performance. > > Benchmarks provide only a relative measure of performance not an > absolute one. Thus if a machine X runs with a speed of 100 > Dhampstones (Dhampstone/sec) there is no simple way to predict > the speed of your "troff" or "cc" job. Agree. > However, if another machine Y has a speed of 25 Dhampstones then one > can safely assume that a reasonable mix of systems programs > (troff, edit, compile etc) would run 4 times faster on machine X than > on machine Y. Unfortunately, this last statement is just as misleading as the ways that people often use these numbers. For example, assume that X is a machine designed to be used with optimizing compilers, and it has good ones, that can gain (say) 20% performance (on exactly things like troff, diff, etc), and the normal system uses them. X is machine that is NOT designed for optimizing compilers, or at least doesn't have them. In this case, the "real" performance difference is about 5X, rather than 4X. The point is: if everyone has the same compiler technology, then this sort of benchmark perhaps has a gross correlation with performance. But if there are substantial differences in technology, the benchmarks ought to get fixed up so you can use benchmark the technology you use every day, instead of having to cripple it. This whole argument is like auto racing: some cars have turbochargers and some don't. You then have a race to discover the actual speeds of the cars, but you make those with turbochargers turn them off, and then claim the results tell you real performance. [Obviously, I have an axe to grind, in that we designed our chips as good compiler targets, and we provide serious optimizers.] Finally, on the Dhrystone topic, people may have noticed that the new IBM PC/RT is rated at 6500 Dhrystones, and claimed to be 4.5 RISC MIPS [whatever that is]. However, if you look carefully, you find that those are 1.0 Dhrystones, not 1.1, so subtract 15% right away, giving about 5500. Then, assuming that the new "Advanced C Compiler" is the one we think it is, it's in the same league as ours, and ours can add 15-20% to Dhrystone numbers, although some of that comes from dead code elimination, which (we think) IBM turned off when they were running the benchmark. So now, it's really hard to tell what the number means, but one would guess that the new RT is something like a 3-3.5Mips machine, on the scale where 1Mips = VAX 11/780 running 4.3BSD, although much of the floating point comes out at less than a Sun3/160. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086