karl@grebyn.com (Karl Nyberg) (07/15/89)
Dhrystone Benchmark (Ada Version 2): Rationale and Measurement Rules
Reinhold P. Weicker
Siemens AG, E STE 35
Postfach 3220
D-8520 Erlangen
Germany (West)
1. Why a Version 2 of Dhrystone?
The Dhrystone benchmark program [1] has become a popular benchmark for
CPU/compiler performance measurement, in particular in the area of
minicomputers, workstations, PC's and microprocesors. It apparently satisfies
a need for an easy-to-use integer benchmark; it gives a first performance
indication which is more meaningful than MIPS numbers which, in their literal
meaning (million instructions per second), cannot be used across different
instruction sets (e.g. RISC vs. CISC). With the increasing use of the
benchmark, it seems necessary to reconsider the benchmark and to check whether
it can still fulfill this function. Version 2 of Dhrystone is the result of
such a re-evaluation, it has been made for two reasons:
o As far as it is possible without changes to the Dhrystone statistics,
optimizing compilers should be prevented from removing significant
statements. It has turned out in the past that optimizing compilers
suppressed code generation for too many statements (by "dead code removal"
or "dead variable elimination"). This has lead to the danger that
benchmarking results obtained by a naive application of Dhrystone - without
inspection of the code that was generated - could become meaningless.
o Dhrystone has been published in Ada [1], and versions in Ada, Pascal and C
have been distributed by Reinhold Weicker via floppy disk. However, the
version that was used most often for benchmarking has been the version made
by Rick Richardson by another translation from the Ada version into the C
programming language, this has been the version distributed via the UNIX
network Usenet [2].
There has been an obvious need for a common C version of Dhrystone, and in
the process of publication of a version 2 for C [3], it became necessary to
update the Ada version as well. There should be, as far as possible, only
one version of Dhrystone per language such that results can be compared
without restrictions. In order to allow cross-language comparisons, the
Ada, Pascal, and C versions should be maintained together; they have been
updated for version 2.1 in a consistent way.
Dhrystone uses only the "Pascal subset" of Ada, it cannot be used to measure
the efficiency of implementation for Ada-specific features like tasking,
generics etc. However, often the "Pascal subset" language features will be
the ones most often used in practical programs; so it is not unreasonable to
have a benchmark program that is restricted to these features. Experience
with previous measurements has shown that a common prejudice "Ada programs run
slower than programs written in other languages" is not true: While the very
first Ada compilers sometimes generated slow code, this does not hold any
longer for the present generation of Ada compilers. If correct comparisons are
made (i.e. Ada runtime checks disabled for comparison with other languages
that do not have runtime checks), it turns out that Ada compilers can generate
code that is as fast as the code generated from other languages, or even
faster.
The overall policiy for version 2 has been that the distribution of
statements, operand types and operand locality described in [1] should remain
unchanged as much as possible. (Very few changes were necessary; their impact
should be negligible.) Also, the order of statements should remain unchanged.
Although I am aware of some critical remarks on the benchmark - I agree with
several of them - and know some suggestions for improvement, I didn't want to
change the benchmark into something different from what has become known as
"Dhrystone"; the confusion generated by such a change would probably outweight
the benefits. If I were to write a new benchmark program, I wouldn't give it
the name "Dhrystone" since this denotes the program published in [1].
However, I do recognize the need for a larger number of representative
programs that can be used as benchmarks; users should always be encouraged to
use more than just one benchmark.
The new versions (version 2.1 for Ada, Pascal and C) will be distributed as
widely as possible. (Version 2.1 differs from the C version 2.0 published in
[3] only in a few corrections for minor deficiencies found by users of version
2.0.) Readers who want to use the benchmark for their own measurements can
obtain a copy in machine-readable form on floppy disk (MS-DOS or XENIX format)
>from the author.
2. Overall Characteristics of Version 2
In general, version 2 follows - in the parts that are significant for
performance measurement, i.e. within the measurement loop - the original
(Ada) version. The original publication of Dhrystone did not contain any
statements for time measurement since they are necessarily system-dependent.
However, it turned out that it is not enough just to inclose the main
procedure of Dhrystone in a loop and to measure the execution time. If the
variables that are computed are not used somehow, there is the danger that the
compiler considers them as "dead variables" and suppresses code generation for
a part of the statements. Therefore in version 2 all variables are printed at
the end of the program. This also permits some plausibility control for
correct execution of the benchmark.
At several places in the benchmark, code has been added, but only in branches
that are not executed. The intention is that optimizing compilers should be
prevented from moving code out of the measurement loop, or from removing code
altogether. Statements that are executed have been changed in very few places
only. In these cases, only the role of some operands has been changed, and it
was made sure that the numbers defining the "Dhrystone distribution"
(distribution of statements, operand types and locality) still hold as much as
possible. Except for sophisticated optimizing compilers, execution times for
version 2.1 should be the same as for previous versions.
Because of the self-imposed limitation that the order and distribution of the
executed statements should not be changed, there are still cases where
optimizing compilers may not generate code for some statements. To a certain
degree, this is unavoidable for small synthetic benchmarks. Users of the
benchmark are advised to check code listings whether code is generated for all
statements of Dhrystone.
Contrary to the suggestion in the published paper and its realization in the
versions previously distributed, no attempt has been made to subtract the time
for the measurement loop overhead. (This calculation has proven difficult to
implement in a correct way, and its omission makes the program simpler.)
However, since the loop check is now part of the benchmark, this does have an
impact - though a very minor one - on the distribution statistics which have
been updated for this version.
3. Discussion of Individual Changes
In this section, all changes are described that affect the measurement loop
and that are not just renamings of variables. All remarks refer to the Ada
version; the other language versions have been updated similarly.
In addition to adding the measurement loop and the printout statements,
changes have been made at the following places:
o In procedure "Proc_0", three statements have been added in the non-executed
"then" part of the statement
if Enum_Loc = Pack_2.Func_1 (Char_Index, 'C')
they are
String_Loc_2 := "DHRYSTONE PROGRAM, 3'RD STRING";
Int_Loc_2 := Run_Index;
Int_Glob := Run_Index;
The string assignment prevents movement of the preceding assignment to
String_Loc_2 (5'th statement of "Proc_0") out of the measurement loop (This
happened with another language and compiler.) The assignment to Int_Loc_2
prevents value propagation for Int_Loc_2, and the assignment to Int_Glob
makes the value of Int_Glob possibly dependent from the value of Run_Index.
o In the three arithmetic computations at the end of the measurement loop in
"Proc_0 ", the role of some variables has been exchanged, to prevent the
division from just cancelling out the multiplication as it was in [1]. A
very smart compiler might have recognized this and suppressed code
generation for the division.
o For Proc_2, no code has been changed, but the values of the actual parameter
have changed due to changes in "Proc_0".
o In Proc_4, the second assignment has been changed from
Bool_Loc := Bool_Loc or Bool_Glob;
to
Bool_Glob := Bool_Loc or Bool_Glob;
It now assigns a value to a global variable instead of a local variable
(Bool_Loc); Bool_Loc would be a "dead variable" which is not used
afterwards.
o In Func_1, the statement
Pack_1.Char_Glob_1 := Char_Loc_1;
was added in the non-executed "else" part of the "if" statement, to prevent
the suppression of code generation for the assignment to Char_Loc_1.
o In Func_2, the second character comparison statement has been changed to
if Char_Loc = 'R'
('R' instead of 'X') because a comparison with 'X' is implied in the
preceding "if" statement.
Also in Func_2, the statement
Pack_1.Int_Glob := Int_Loc;
has been added in the non-executed part of the last "if" statement, in order
to prevent Int_Loc from becoming a dead variable.
o In Func_3, a non-executed "else" part has been added to the "if" statement.
While the program would not be incorrect without this "else" part, it is
considered bad programming practice if a function can be left without a
return value. Also, Ada requires that leaving a function without a return
value raises an exception, and even though this exception is never raised,
the presence of an exception handler may impact execution time.
To compensate for this change, the (non-executed) "else" part in the "if"
statement of Proc_3 was removed.
The distribution statistics have been changed only by the addition of the
measurement loop iteration (1 additional statement, 4 additional local integer
operands) and by the change in Proc_4 (one operand changed from local to
global). The distribution statistics in the comment headers have been updated
accordingly.
4. String Operations
The string operations (string assignment and string comparison) have not been
changed, to keep the program consistent with the original version.
There has been some concern, mostly from users of the C version, that string
operations are over-represented in the program, and that execution time is
dominated by these operations. This was true in particular when optimizing
compilers removed too much code in the main part of the program, this should
have been mitigated in version 2.
It should be noted that this is a language-dependent issue: Dhrystone was
first published in Ada, and with Ada or Pascal semantics, the time spent in
the string operations is, at least in all implementations known to me,
considerably smaller than in C. In Ada and Pascal, assignment and comparison
of strings are operators defined in the language, and the upper bounds of the
strings occuring in Dhrystone are part of the type information known at
compilation time. The compilers can therefore generate efficient inline code
whereas in C, the string operations must be expressed in terms of the C
library functions "strcpy" and "strcmp". (This is probably the main reason
why on most systems known to me, the Ada and Pascal version are faster than
the C version.)
I admit that the string comparison in Dhrystone terminates later (after
scanning 20 characters) than most string comparisons in real programs. For
consistency with the original benchmark, I didn't change the program despite
this weakness.
5. Intended Use of Dhrystone
When Dhrystone is used, the following "ground rules" apply:
o Separate compilation (Ada and C versions)
As mentioned in [1], Dhrystone was written to reflect actual programming
practice in systems programming. The division into several compilation
units (5 in the Ada version, 2 in the C version) is intended, as is the
distribution of inter-module and intra-module subprogram calls. Although on
many systems there will be no difference in execution time to a Dhrystone
version where all compilation units are merged into one file, the rule is
that separate compilation should be used. The intention is that real
programming practice, where programs consist of several independently
compiled units, should be reflected. This also has implies that the
compiler, while compiling one unit, has no information about the use of
variables, register allocation etc. occuring in other compilation units.
Although in real life compilation units will probably be larger, the
intention is that these effects of separate compilation are modeled in
Dhrystone.
A few language systems have post-linkage optimization available (e.g., final
register allocation is performed after linkage). This is a borderline case:
Post-linkage optimization involves additional program preparation time
(although not as much as compilation in one unit) which may prevent its
general use in practical programming. I think that since it defeats the
intentions given above, it should not be used for Dhrystone.
Unfortunately, ISO/ANSI Pascal does not contain language features for
separate compilation. Although most commercial Pascal compilers provide
separate compilation in some way, we cannot use it for Dhrystone since such
a version would not be portable. Therefore, no attempt has been made to
provide a Pascal version with several compilation units. When Pascal
results are compared with Ada or C results, it should be kept in mind that
this difference can influence execution times.
o Results with and without runtime checks should be reported; default results
are those with runtime checks suppressed (Ada version)
It is customary in benchmarking to publish only the fastest results possible
for the particular hardware/compiler combination, and therefore runtime
checks are almost always disabled. This is contrary to the Ada philosophy
that the default case is the case "runtime checks enabled". Since Dhrystone
is often used for cross-language comparisons, and since other languages have
either no concept of runtime checks at all (C) or have runtime checks as an
optional, non-standardized feature only (Pascal), default results should be
results with all runtime checks suppressed. However, Ada results should be
reported for the case "all runtime checks enabled" also; a comparison
between the two values shows how much thought the compiler implementation
has given the idea that runtime checks should be implemented as efficiently
as possible. Dhrystone intentionally contains several statements where the
compiler can recognize that a particular constraint is always satisfied, and
where the corresponding constraint checks can be suppressed.
o No procedure merging (no pragma "inline")
Although Dhrystone contains some very short procedures where execution would
benefit from procedure merging (inlining, macro expansion of procedures),
procedure merging is not to be used. The reason is that the percentage of
procedure and function calls is part of the "Dhrystone distribution" of
statements contained in [1]. This restriction does not hold for the string
functions of the C version since ANSI C allows an implementation to use
inline code for these functions.
o Other optimizations are allowed, but they should be indicated
It is often hard to draw an exact line between "normal code generation" and
"optimization" in compilers: Some compilers perform operations by default
that are invoked in other compilers only when optimization is explicitly
requested. Also, we cannot avoid that in benchmarking people try to achieve
results that look as good as possible. Therefore, optimizations performed
by compilers - other than those listed above - are not forbidden when
Dhrystone execution times are measured. Dhrystone is not intended to be
non-optimizable but is intended to be similarly optimizable as normal
programs. For example, there are several places in Dhrystone where
performance benefits from optimizations like common subexpression
elimination, value propagation etc., but normal programs usually also
benefit from these optimizations. Therefore, no effort was made to
artificially prevent such optimizations. However, measurement reports
should indicate which compiler optimization levels have been used, and
reporting results with different levels of compiler optimization for the
same hardware is encouraged.
Of course, for experimental purposes, post-linkage optimization, procedure
merging and/or compilation in one unit can be done to determine their effects.
However, Dhrystone numbers obtained under these conditions should be
explicitly marked as such; "normal" Dhrystone results should be understood as
results obtained following the ground rules listed above.
In any case, for serious performance evaluation, users are advised to ask for
code listings and to check them carefully. In this way, when results for
different systems are compared, the reader can get a feeling how much
performance difference is due to compiler optimization and how much is due to
hardware speed.
6. Acknowledgements
This Ada version 2.1 of Dhrystone folllows closely the C version 2.1. The C
version has been developed in cooperation with Rick Richardson (Tinton Falls,
NJ), it incorporates many ideas from the "Version 1.1" distributed previously
by him over the UNIX network Usenet. I also thank Chaim Benedelac (National
Semiconductor), David Ditzel (SUN), Earl Killian and John Mashey (MIPS), Alan
Smith and Rafael Saavedra-Barrera (UC at Berkeley) for their help with
comments on earlier versions of the benchmark.
7. Bibliography
[1]
Reinhold P. Weicker: Dhrystone: A Synthetic Systems Programming Benchmark.
Communications of the ACM 27, 10 (Oct. 1984), 1013-1030
[2]
Rick Richardson: Dhrystone Benchmark Summary (and Program Text)
Informal Distribution via "Usenet", Last Versions Known to me: Sept. 21,
1987 (Version 1.1) and December 4, 1988 (Version 2)
[3]
Reinhold P. Weicker: Dhrystone Benchmark: Rationale for Version 2 and
Measurement Rules; Program Text (C Version 2.0)
SIGPLAN Notices 23,8 (Aug. 1988), 49-62