[comp.lang.ada] Dhrystone 2.1 : RATIONALE File

karl@grebyn.com (Karl Nyberg) (07/15/89)

    Dhrystone Benchmark (Ada Version 2): Rationale and Measurement Rules


                 Reinhold P. Weicker
                 Siemens AG, E STE 35
                 Postfach 3220
                 D-8520 Erlangen
                 Germany (West)




1.  Why a Version 2 of Dhrystone?

The Dhrystone benchmark  program  [1]  has  become  a  popular  benchmark  for
CPU/compiler   performance   measurement,   in   particular  in  the  area  of
minicomputers, workstations, PC's and microprocesors.  It apparently satisfies
a  need  for  an  easy-to-use  integer benchmark; it gives a first performance
indication which is more meaningful than MIPS numbers which, in their  literal
meaning  (million  instructions  per  second), cannot be used across different
instruction sets (e.g. RISC  vs.  CISC).   With  the  increasing  use  of  the
benchmark, it seems necessary to reconsider the benchmark and to check whether
it can still fulfill this function.  Version 2 of Dhrystone is the  result  of
such a re-evaluation, it has been made for two reasons:

o As far as it is  possible  without  changes  to  the  Dhrystone  statistics,
  optimizing   compilers   should   be  prevented  from  removing  significant
  statements.  It has  turned  out  in  the  past  that  optimizing  compilers
  suppressed  code  generation for too many statements (by "dead code removal"
  or  "dead  variable  elimination").   This  has  lead  to  the  danger  that
  benchmarking  results obtained by a naive application of Dhrystone - without
  inspection of the code that was generated - could become meaningless.

o Dhrystone has been published in Ada [1], and versions in Ada, Pascal  and  C
  have  been  distributed  by  Reinhold Weicker via floppy disk.  However, the
  version that was used most often for benchmarking has been the version  made
  by  Rick  Richardson  by another translation from the Ada version into the C
  programming language, this has been the version  distributed  via  the  UNIX
  network Usenet [2].

  There has been an obvious need for a common C version of Dhrystone,  and  in
  the  process of publication of a version 2 for C [3], it became necessary to
  update the Ada version as well.  There should be, as far as  possible,  only
  one  version  of  Dhrystone  per  language such that results can be compared
  without restrictions.  In order to  allow  cross-language  comparisons,  the
  Ada,  Pascal,  and  C versions should be maintained together; they have been
  updated for version 2.1 in a consistent way.

Dhrystone uses only the "Pascal subset" of Ada, it cannot be used  to  measure
the  efficiency  of  implementation  for  Ada-specific  features like tasking,
generics etc.  However, often the "Pascal subset" language  features  will  be
the  ones  most often used in practical programs; so it is not unreasonable to
have a benchmark program that is restricted  to  these  features.   Experience
with previous measurements has shown that a common prejudice "Ada programs run
slower than programs written in other languages" is not true: While  the  very
first  Ada  compilers  sometimes  generated  slow code, this does not hold any
longer for the present generation of Ada compilers. If correct comparisons are
made  (i.e.  Ada  runtime  checks disabled for comparison with other languages
that do not have runtime checks), it turns out that Ada compilers can generate
code  that  is  as  fast  as  the code generated from other languages, or even
faster.

The  overall  policiy  for  version  2  has  been  that  the  distribution  of
statements,  operand types and operand locality described in [1] should remain
unchanged as much as possible.  (Very few changes were necessary; their impact
should be negligible.)  Also, the order of statements should remain unchanged.
Although I am aware of some critical remarks on the benchmark - I  agree  with
several  of them - and know some suggestions for improvement, I didn't want to
change the benchmark into something different from what has  become  known  as
"Dhrystone"; the confusion generated by such a change would probably outweight
the benefits. If I were to write a new benchmark program, I wouldn't  give  it
the  name  "Dhrystone"  since  this  denotes  the  program  published  in [1].
However, I do recognize  the  need  for  a  larger  number  of  representative
programs  that can be used as benchmarks; users should always be encouraged to
use more than just one benchmark.

The new versions (version 2.1 for Ada, Pascal and C) will  be  distributed  as
widely  as possible.  (Version 2.1 differs from the C version 2.0 published in
[3] only in a few corrections for minor deficiencies found by users of version
2.0.)   Readers  who  want to use the benchmark for their own measurements can
obtain a copy in machine-readable form on floppy disk (MS-DOS or XENIX format)
>from the author.


2.  Overall Characteristics of Version 2

In general, version 2  follows  -  in  the  parts  that  are  significant  for
performance  measurement,  i.e.   within  the  measurement loop - the original
(Ada) version.  The original publication of  Dhrystone  did  not  contain  any
statements  for  time measurement since they are necessarily system-dependent.
However, it turned out that  it  is  not  enough  just  to  inclose  the  main
procedure  of  Dhrystone  in a loop and to measure the execution time.  If the
variables that are computed are not used somehow, there is the danger that the
compiler considers them as "dead variables" and suppresses code generation for
a part of the statements. Therefore in version 2 all variables are printed  at
the  end  of  the  program.  This  also  permits some plausibility control for
correct execution of the benchmark.

At several places in the benchmark, code has been added, but only in  branches
that  are  not  executed. The intention is that optimizing compilers should be
prevented from moving code out of the measurement loop, or from removing  code
altogether.  Statements that are executed have been changed in very few places
only.  In these cases, only the role of some operands has been changed, and it
was   made  sure  that  the  numbers  defining  the  "Dhrystone  distribution"
(distribution of statements, operand types and locality) still hold as much as
possible.   Except for sophisticated optimizing compilers, execution times for
version 2.1 should be the same as for previous versions.

Because of the self-imposed limitation that the order and distribution of  the
executed  statements  should  not  be  changed,  there  are  still cases where
optimizing compilers may not generate code for some statements. To  a  certain
degree,  this  is  unavoidable  for  small synthetic benchmarks.  Users of the
benchmark are advised to check code listings whether code is generated for all
statements of Dhrystone.

Contrary to the suggestion in the published paper and its realization  in  the
versions previously distributed, no attempt has been made to subtract the time
for the measurement loop overhead. (This calculation has proven  difficult  to
implement  in  a  correct  way,  and  its omission makes the program simpler.)
However, since the loop check is now part of the benchmark, this does have  an
impact  -  though a very minor one - on the distribution statistics which have
been updated for this version.


3.  Discussion of Individual Changes

In this section, all changes are described that affect  the  measurement  loop
and  that  are  not  just renamings of variables. All remarks refer to the Ada
version; the other language versions have been updated similarly.

In addition to adding  the  measurement  loop  and  the  printout  statements,
changes have been made at the following places:

o In procedure "Proc_0", three statements have been added in the  non-executed
  "then" part of the statement

        if Enum_Loc = Pack_2.Func_1 (Char_Index, 'C')

  they are

        String_Loc_2 := "DHRYSTONE PROGRAM, 3'RD STRING";
        Int_Loc_2 := Run_Index;
        Int_Glob := Run_Index;

  The string assignment prevents  movement  of  the  preceding  assignment  to
  String_Loc_2  (5'th statement of "Proc_0") out of the measurement loop (This
  happened with another language and compiler.)  The assignment  to  Int_Loc_2
  prevents  value  propagation  for  Int_Loc_2, and the assignment to Int_Glob
  makes the value of Int_Glob possibly dependent from the value of Run_Index.

o In the three arithmetic computations at the end of the measurement  loop  in
  "Proc_0  ",  the  role  of some variables has been exchanged, to prevent the
  division from just cancelling out the multiplication as it was  in  [1].   A
  very   smart  compiler  might  have  recognized  this  and  suppressed  code
  generation for the division.

o For Proc_2, no code has been changed, but the values of the actual parameter
  have changed due to changes in "Proc_0".

o In Proc_4, the second assignment has been changed from

        Bool_Loc := Bool_Loc or Bool_Glob;

  to

        Bool_Glob := Bool_Loc or Bool_Glob;

  It now assigns a value to a global variable  instead  of  a  local  variable
  (Bool_Loc);   Bool_Loc  would  be  a  "dead  variable"  which  is  not  used
  afterwards.

o In Func_1, the statement

        Pack_1.Char_Glob_1 := Char_Loc_1;

  was added in the non-executed "else" part of the "if" statement, to  prevent
  the suppression of code generation for the assignment to Char_Loc_1.

o In Func_2, the second character comparison statement has been changed to

        if Char_Loc = 'R'

  ('R' instead of 'X') because  a  comparison  with  'X'  is  implied  in  the
  preceding "if" statement.

  Also in Func_2, the statement

        Pack_1.Int_Glob := Int_Loc;

  has been added in the non-executed part of the last "if" statement, in order
  to prevent Int_Loc from becoming a dead variable.

o In Func_3, a non-executed "else" part has been added to the "if"  statement.
  While  the  program  would  not be incorrect without this "else" part, it is
  considered bad programming practice if a function  can  be  left  without  a
  return  value.  Also,  Ada requires that leaving a function without a return
  value raises an exception, and even though this exception is  never  raised,
  the presence of an exception handler may impact execution time.

  To compensate for this change, the (non-executed) "else" part  in  the  "if"
  statement of Proc_3 was removed.

The distribution statistics have been changed only  by  the  addition  of  the
measurement loop iteration (1 additional statement, 4 additional local integer
operands) and by the change in Proc_4  (one  operand  changed  from  local  to
global).  The distribution statistics in the comment headers have been updated
accordingly.


4.  String Operations

The string operations (string assignment and string comparison) have not  been
changed, to keep the program consistent with the original version.

There has been some concern, mostly from users of the C version,  that  string
operations  are  over-represented  in  the program, and that execution time is
dominated by these operations.  This was true in  particular  when  optimizing
compilers  removed  too much code in the main part of the program, this should
have been mitigated in version 2.

It should be noted that this is a  language-dependent  issue:   Dhrystone  was
first  published  in  Ada, and with Ada or Pascal semantics, the time spent in
the string operations is,  at  least  in  all  implementations  known  to  me,
considerably  smaller than in C.  In Ada and Pascal, assignment and comparison
of strings are operators defined in the language, and the upper bounds of  the
strings  occuring  in  Dhrystone  are  part  of  the type information known at
compilation time.  The compilers can therefore generate efficient inline  code
whereas  in  C,  the  string  operations  must  be expressed in terms of the C
library functions "strcpy" and "strcmp".  (This is probably  the  main  reason
why  on  most  systems known to me, the Ada and Pascal version are faster than
the C version.)

I admit that the  string  comparison  in  Dhrystone  terminates  later  (after
scanning  20  characters)  than most string comparisons in real programs.  For
consistency with the original benchmark, I didn't change the  program  despite
this weakness.


5.  Intended Use of Dhrystone

When Dhrystone is used, the following "ground rules" apply:

o Separate compilation (Ada and C versions)

  As mentioned in [1], Dhrystone was written  to  reflect  actual  programming
  practice  in  systems  programming.   The  division into several compilation
  units (5 in the Ada version, 2 in the C version)  is  intended,  as  is  the
  distribution of inter-module and intra-module subprogram calls.  Although on
  many systems there will be no difference in execution time  to  a  Dhrystone
  version  where  all  compilation units are merged into one file, the rule is
  that separate compilation should  be  used.   The  intention  is  that  real
  programming  practice,  where  programs  consist  of  several  independently
  compiled units, should  be  reflected.   This  also  has  implies  that  the
  compiler,  while  compiling  one  unit,  has no information about the use of
  variables, register allocation etc.  occuring in  other  compilation  units.
  Although  in  real  life  compilation  units  will  probably  be larger, the
  intention is that these effects  of  separate  compilation  are  modeled  in
  Dhrystone.

  A few language systems have post-linkage optimization available (e.g., final
  register allocation is performed after linkage).  This is a borderline case:
  Post-linkage  optimization  involves  additional  program  preparation  time
  (although  not  as  much  as  compilation in one unit) which may prevent its
  general use in practical programming.  I think that  since  it  defeats  the
  intentions given above, it should not be used for Dhrystone.

  Unfortunately, ISO/ANSI  Pascal  does  not  contain  language  features  for
  separate  compilation.   Although  most  commercial Pascal compilers provide
  separate compilation in some way, we cannot use it for Dhrystone since  such
  a  version  would  not  be portable.  Therefore, no attempt has been made to
  provide a Pascal  version  with  several  compilation  units.   When  Pascal
  results  are  compared with Ada or C results, it should be kept in mind that
  this difference can influence execution times.

o Results with and without runtime checks should be reported; default  results
  are those with runtime checks suppressed (Ada version)

  It is customary in benchmarking to publish only the fastest results possible
  for  the  particular  hardware/compiler  combination,  and therefore runtime
  checks are almost always disabled.  This is contrary to the  Ada  philosophy
  that the default case is the case "runtime checks enabled".  Since Dhrystone
  is often used for cross-language comparisons, and since other languages have
  either  no concept of runtime checks at all (C) or have runtime checks as an
  optional, non-standardized feature only (Pascal), default results should  be
  results  with all runtime checks suppressed.  However, Ada results should be
  reported for the case  "all  runtime  checks  enabled"  also;  a  comparison
  between  the  two  values shows how much thought the compiler implementation
  has given the idea that runtime checks should be implemented as  efficiently
  as  possible.  Dhrystone intentionally contains several statements where the
  compiler can recognize that a particular constraint is always satisfied, and
  where the corresponding constraint checks can be suppressed.

o No procedure merging (no pragma "inline")

  Although Dhrystone contains some very short procedures where execution would
  benefit  from  procedure  merging (inlining, macro expansion of procedures),
  procedure merging is not to be used.  The reason is that the  percentage  of
  procedure  and  function  calls  is  part of the "Dhrystone distribution" of
  statements contained in [1].  This restriction does not hold for the  string
  functions  of  the  C  version  since ANSI C allows an implementation to use
  inline code for these functions.

o Other optimizations are allowed, but they should be indicated

  It is often hard to draw an exact line between "normal code generation"  and
  "optimization"  in  compilers:  Some compilers perform operations by default
  that are invoked in other compilers only  when  optimization  is  explicitly
  requested.  Also, we cannot avoid that in benchmarking people try to achieve
  results that look as good as possible.  Therefore,  optimizations  performed
  by  compilers  -  other  than  those  listed  above - are not forbidden when
  Dhrystone execution times are measured.  Dhrystone is  not  intended  to  be
  non-optimizable  but  is  intended  to  be  similarly  optimizable as normal
  programs.   For  example,  there  are  several  places  in  Dhrystone  where
  performance   benefits   from   optimizations   like   common  subexpression
  elimination, value  propagation  etc.,  but  normal  programs  usually  also
  benefit  from  these  optimizations.   Therefore,  no  effort  was  made  to
  artificially  prevent  such  optimizations.   However,  measurement  reports
  should  indicate  which  compiler  optimization  levels  have been used, and
  reporting results with different levels of  compiler  optimization  for  the
  same hardware is encouraged.

Of course, for experimental  purposes,  post-linkage  optimization,  procedure
merging and/or compilation in one unit can be done to determine their effects.
However,  Dhrystone  numbers  obtained  under  these  conditions   should   be
explicitly  marked as such; "normal" Dhrystone results should be understood as
results obtained following the ground rules listed above.

In any case, for serious performance evaluation, users are advised to ask  for
code  listings  and  to  check  them carefully.  In this way, when results for
different systems are  compared,  the  reader  can  get  a  feeling  how  much
performance  difference is due to compiler optimization and how much is due to
hardware speed.


6.  Acknowledgements

This Ada version 2.1 of Dhrystone folllows closely the C version  2.1.  The  C
version  has been developed in cooperation with Rick Richardson (Tinton Falls,
NJ), it incorporates many ideas from the "Version 1.1" distributed  previously
by  him  over the UNIX network Usenet.  I also thank Chaim Benedelac (National
Semiconductor), David Ditzel (SUN), Earl Killian and John Mashey (MIPS),  Alan
Smith  and  Rafael  Saavedra-Barrera  (UC  at  Berkeley)  for  their help with
comments on earlier versions of the benchmark.


7.  Bibliography

[1]
   Reinhold P. Weicker: Dhrystone: A Synthetic Systems Programming Benchmark.
   Communications of the ACM 27, 10 (Oct. 1984), 1013-1030

[2]
   Rick Richardson: Dhrystone Benchmark Summary (and Program Text)
   Informal Distribution via "Usenet", Last Versions Known to  me:  Sept.  21,
   1987 (Version 1.1) and December 4, 1988 (Version 2)

[3]
   Reinhold P. Weicker: Dhrystone  Benchmark:  Rationale  for  Version  2  and
   Measurement Rules; Program Text (C Version 2.0)
   SIGPLAN Notices 23,8 (Aug. 1988), 49-62