[comp.arch] Dhrystone and Dhampstone

patel@uicsg.UUCP (02/25/87)

As an author of a benchmark DHAMPSTONE (posted earlier on this notes file)
and a researcher in this field I should point out the following
serious misconceptions/myths about the above two benchmarks.  I hope
people stop misusing these benchmarks after they have read this.

Myth 1.  It measures the cache performance.
   
   The benchmarks did not incorporate any referencing behavior at the
   address level.  That is, they do not try to mimic the spatial or
   temporal locality in systems programs.  Even worse, because the benchmarks
   were designed to be very small, they will fit in most reasonable
   caches.  Thus running a benchmark on a cache machine in stand-alone
   mode will cause no cache misses except the initial loading.  For
   comparison purposes, runing the benchmark on two different cache
   machines is acceptable as long as it is understood that the relative
   performance is that of the CPU without the usual cache
   degradation. To put it differently, the measures show the performance
   with infinite cache.  Comparing a cache machine with a non-cache is
   also acceptable with the same understanding that the performance of the
   cache machine will be slightly over estimated.

Myth 2.  It measures the paging performance.

   Same comments as above.

Myth 3.  It measures the compiler performance.

   The benchmarks were based on high level language statistics.  No
   information was incorporated regarding  optimizing compilers and how
   they might alter the statistics.  In fact, using a good optimizing
   compiler will totally distort the performance figures.  Since the
   benchmarks are synthetic programs which do not do any useful
   computation, but merely reproduce the various measures (CALL frequency,
   parameter distribution, etc) some statements in the program are not
   logically necessary.  For example the statement x = y + z; where x is never
   used again in the program is not logically necessary and will be
   removed by a smart compiler.  However, the statement was introduced to
   produce a desired statistics on parameter and variable referencing,
   type of operation and so on.  Benchmarks were designed to evaluate
   machine performance and not compiler performance.  Some
   impact of the compiler on the performance is unavoidable.  But to
   minimize the distortions, DO NOT USE AN OPTIMIZING COMPILER when
   using the above benchmarks.

Myth 4.  It measures the absolute machine performance.

   Benchmarks provide only a relative measure of performance not an
   absolute one.  Thus if a machine X runs with a speed of 100
   Dhampstones (Dhampstone/sec) there is no simple way to predict
   the speed of your "troff" or "cc" job.
   However,  if another machine Y has a speed of 25 Dhampstones then one
   can safely assume that a reasonable mix of systems programs
   (troff, edit, compile etc) would run 4 times faster on machine X than
   on machine Y.


Janak H. Patel
University of Illinois
ARPA: uicsg!patel@uiuc.edu
UUCP: ihnp4!uiucdcs!uicsg!patel

mash@mips.UUCP (03/01/87)

In article <500002@uicsg> patel@uicsg.UUCP writes:
>
>
>As an author of a benchmark DHAMPSTONE (posted earlier on this notes file)
>and a researcher in this field I should point out the following
>serious misconceptions/myths about the above two benchmarks.  I hope
>people stop misusing these benchmarks after they have read this.
>
....a number of generally reasonable comments....
>Myth 3.  It measures the compiler performance.
>
>   The benchmarks were based on high level language statistics.  No
>   information was incorporated regarding  optimizing compilers and how
>   they might alter the statistics.  In fact, using a good optimizing
>   compiler will totally distort the performance figures......
>   type of operation and so on.  Benchmarks were designed to evaluate
>   machine performance and not compiler performance.  Some
>   impact of the compiler on the performance is unavoidable.  But to
>   minimize the distortions, DO NOT USE AN OPTIMIZING COMPILER when
>   using the above benchmarks.
>
>Myth 4.  It measures the absolute machine performance.
>
>   Benchmarks provide only a relative measure of performance not an
>   absolute one.  Thus if a machine X runs with a speed of 100
>   Dhampstones (Dhampstone/sec) there is no simple way to predict
>   the speed of your "troff" or "cc" job.
Agree.
>   However,  if another machine Y has a speed of 25 Dhampstones then one
>   can safely assume that a reasonable mix of systems programs
>   (troff, edit, compile etc) would run 4 times faster on machine X than
>   on machine Y.

Unfortunately, this last statement is just as misleading as the ways
that people often use these numbers.
For example, assume that X is a machine designed to be used with optimizing
compilers, and it has good ones, that can gain (say) 20% performance (on
exactly things like troff, diff, etc), and the normal system uses them.
X is machine that is NOT designed for optimizing compilers, or at least
doesn't have them.
In this case, the "real" performance difference is about 5X, rather than
4X.

The point is: if everyone has the same compiler technology, then this
sort of benchmark perhaps has a gross correlation with performance.
But if there are substantial differences in technology, the benchmarks
ought to get fixed up so you can use benchmark the technology you use
every day, instead of having to cripple it.

This whole argument is like auto racing: some cars have turbochargers
and some don't.  You then have a race to discover the actual speeds
of the cars, but you make those with turbochargers turn them off,
and then claim the results tell you real performance.
[Obviously, I have an axe to grind, in that we designed our chips as
good compiler targets, and we provide serious optimizers.]

Finally, on the Dhrystone topic, people may have noticed that the
new IBM PC/RT is rated at 6500 Dhrystones, and claimed to be 4.5 RISC
MIPS [whatever that is].  However, if you look carefully, you find that
those are 1.0 Dhrystones, not 1.1, so subtract 15% right away, giving
about 5500.  Then, assuming that the new "Advanced C Compiler"
is the one we think it is, it's in the same league as ours, and ours
can add 15-20% to Dhrystone numbers, although some of that comes from dead
code elimination, which (we think) IBM turned off when they were running
the benchmark. So now, it's really hard to tell what the number means,
but one would guess that the new RT is something like a 3-3.5Mips machine,
on the scale where 1Mips = VAX 11/780 running 4.3BSD, although
much of the floating point comes out at less than a Sun3/160.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086