gautron@corto.inria.fr (Philippe Gautron) (04/27/88)
Drystone is a benchmark program which measures processor+compiler
efficiency in executing a 'typical' program. My purpose is NOT to
know if this program is good, good enough, bad for these measures
but to compare them on a same site (and same conditions) :
- pcc as reference,
- gcc, the GNU C compiler
- C++ (ATT version 1.1), cfront translator and pcc: C++.pcc
- C++ (ATT version 1.1), cfront translator and gcc: C++.gcc
[cfront compiled by C++.gcc itself]
- G++, the GNU C++ compiler (version 1.18)
Machine: SUN 3/260, STANDALONE, 8M RAM
All compilations with -O. All compiles include the standard Sun
libraries, not gnulib.
Two tests: with and without register declarations.
dry.c: (version C/1.1, 12/01/84)
* Date: PROGRAM updated 01/06/86, RESULTS updated 03/31/86
* Compile: cc -O dry.c -o drynr : No registers
* cc -O -DREG=register dry.c -o dryr : Registers
1) First, a bug in dry.c: procedure Proc3
struct Record
{
struct Record *PtrComp;
...
};
typedef struct Record RecordType;
typedef RecordType * RecordPtr;
Proc1(PtrParIn)
REG RecordPtr PtrParIn;
{
#define NextRecord (*(PtrParIn->PtrComp))
...
Proc3(NextRecord.PtrComp); <== call with struct Record*
...
}
Proc3(PtrParOut)
RecordPtr *PtrParOut; <== called with struct Record**
This bug does not abort the execution. I have translate dry.c in C++ syntax,
and the first compilation aborts on the Proc3 declaration.
2) Second, the results (average):
Uptime
9:36am up 2 mins, 0 user, load average: 0.08, 0.00, 0.00
- pcc, drynr
Dhrystone(1.1) time for 500000 passes = 84
This machine benchmarks at 5919 dhrystones/second
- pcc, dryr
Dhrystone(1.1) time for 500000 passes = 78
This machine benchmarks at 6378 dhrystones/second
- gcc, drynr
Dhrystone(1.1) time for 500000 passes = 73
This machine benchmarks at 6815 dhrystones/second
- gcc, dryr
Dhrystone(1.1) time for 500000 passes = 73
This machine benchmarks at 6811 dhrystones/second
- C++ 1.1 with pcc, drynr
Dhrystone(1.1 (C++ syntax)) time for 500000 passes = 85
This machine benchmarks at 5850 dhrystones/second
- C++ 1.1 with pcc, drynr
Dhrystone(1.1 (C++ syntax)) time for 500000 passes = 86
This machine benchmarks at 5759 dhrystones/second
- C++ 1.1 with gcc, drynr
Dhrystone(1.1 (C++ syntax)) time for 500000 passes = 73
This machine benchmarks at 6761 dhrystones/second
- C++ 1.1 with gcc, dryr
Dhrystone(1.1 (C++ syntax)) time for 500000 passes = 73
This machine benchmarks at 6765 dhrystones/second
- G++ 1.18, drynr
Dhrystone(1.1 (C++ syntax)) time for 500000 passes = 71
This machine benchmarks at 7038 dhrystones/second
- G++ 1.18, dryr
Dhrystone(1.1 (C++ syntax)) time for 500000 passes = 71
This machine benchmarks at 7040 dhrystones/second
3) Conclusion:
register declaration are interesting with pcc, less with gcc
but pcc is good for loops.
G++ > gcc > C++.gcc > pcc > C++.pcc
There is a interesting gap between C++.pcc and C++.gcc.
If you have some comments..
P. Gautron
------
Here is my C++ source
------
/* EVERBODY: Please read "APOLOGY" below. -rick 01/06/85
*
* "DHRYSTONE" Benchmark Program
*
* Version: C/1.1, 12/01/84
*
* Date: PROGRAM updated 01/06/86, RESULTS updated 03/31/86
*
* Author: Reinhold P. Weicker, CACM Vol 27, No 10, 10/84 pg. 1013
* Translated from ADA by Rick Richardson
* Every method to preserve ADA-likeness has been used,
* at the expense of C-ness.
*
* Compile: cc -O dry.c -o drynr : No registers
* cc -O -DREG=register dry.c -o dryr : Registers
*
* Defines: Defines are provided for old C compiler's
* which don't have enums, and can't assign structures.
* The time(2) function is library dependant; Most
* return the time in seconds, but beware of some, like
* Aztec C, which return other units.
* The LOOPS define is initially set for 50000 loops.
* If you have a machine with large integers and is
* very fast, please change this number to 500000 to
* get better accuracy. Please select the way to
* measure the execution time using the TIME define.
* For single user machines, time(2) is adequate. For
* multi-user machines where you cannot get single-user
* access, use the times(2) function. If you have
* neither, use a stopwatch in the dead of night.
* Use a "printf" at the point marked "start timer"
* to begin your timings. DO NOT use the UNIX "time(1)"
* command, as this will measure the total time to
* run this program, which will (erroneously) include
* the time to malloc(3) storage and to compute the
* time it takes to do nothing.
*
* Run: drynr; dryr
*
* Results: If you get any new machine/OS results, please send to:
*
* ihnp4!castor!pcrat!rick
*
* and thanks to all that do.
*
* Note: I order the list in increasing performance of the
* "with registers" benchmark. If the compiler doesn't
* provide register variables, then the benchmark
* is the same for both REG and NOREG.
*
* PLEASE: Send complete information about the machine type,
* clock speed, OS and C manufacturer/version. If
* the machine is modified, tell me what was done.
* On UNIX, execute uname -a and cc -V to get this info.
*
* 80x8x NOTE: 80x8x benchers: please try to do all memory models
* for a particular compiler.
*
* APOLOGY (1/30/86):
* Well, I goofed things up! As pointed out by Haakon Bugge,
* the line of code marked "GOOF" below was missing from the
* Dhrystone distribution for the last several months. It
* *WAS* in a backup copy I made last winter, so no doubt it
* was victimized by sleepy fingers operating vi!
*
* The effect of the line missing is that the reported benchmarks
* are 15% too fast (at least on a 80286). Now, this creates
* a dilema - do I throw out ALL the data so far collected
* and use only results from this (corrected) version, or
* do I just keep collecting data for the old version?
*
* Since the data collected so far *is* valid as long as it
* is compared with like data, I have decided to keep
* TWO lists- one for the old benchmark, and one for the
* new. This also gives me an opportunity to correct one
* other error I made in the instructions for this benchmark.
* My experience with C compilers has been mostly with
* UNIX 'pcc' derived compilers, where the 'optimizer' simply
* fixes sloppy code generation (peephole optimization).
* But today, there exist C compiler optimizers that will actually
* perform optimization in the Computer Science sense of the word,
* by removing, for example, assignments to a variable whose
* value is never used. Dhrystone, unfortunately, provides
* lots of opportunities for this sort of optimization.
*
* I request that benchmarkers re-run this new, corrected
* version of Dhrystone, turning off or bypassing optimizers
* which perform more than peephole optimization. Please
* indicate the version of Dhrystone used when reporting the
* results to me.
*
*
* The following program contains statements of a high-level programming
* language (C) in a distribution considered representative:
*
* assignments 53%
* control statements 32%
* procedure, function calls 15%
*
* 100 statements are dynamically executed. The program is balanced with
* respect to the three aspects:
* - statement type
* - operand type (for simple data types)
* - operand access
* operand global, local, parameter, or constant.
*
* The combination of these three aspects is balanced only approximately.
*
* The program does not compute anything meaningfull, but it is
* syntactically and semantically correct.
*
*/
/* Accuracy of timings and human fatigue controlled by next two lines */
//const LOOPS = 50000; /* Use this for slow or 16 bit machines */
const LOOPS = 500000; /* Use this for faster machines */
/* Compiler dependent options */
#undef NOENUM /* Define if compiler has no enum's */
#undef NOSTRUCTASSIGN /* Define if compiler can't assign structures */
/* define only one of the next two defines */
#define TIMES /* Use times(2) time function */
/*#define TIME /* Use time(2) time function */
/* define the granularity of your times(2) function (when used) */
const HZ = 60; /* times(2) returns 1/60 second (most) */
//const HZ = 100; /* times(2) returns 1/100 second (WECo) */
/* for compatibility with goofed up version */
/*#undef GOOF /* Define if you want the goofed up version */
#ifdef GOOF
char Version[] = "1.0";
#else
char Version[] = "1.1 (C++ syntax)";
#endif
#ifdef NOSTRUCTASSIGN
#define structassign(d, s) memcpy(&(d), &(s), sizeof(d))
#else
#define structassign(d, s) d = s
#endif
#ifdef NOENUM
const Ident1 = 1;
const Ident2 = 2;
const Ident3 = 3;
const Ident4 = 4;
const Ident5 = 5;
#else
enum Enumeration {Ident1, Ident2, Ident3, Ident4, Ident5};
#endif
typedef int OneToThirty;
typedef int OneToFifty;
typedef char CapitalLetter;
typedef char String30[31];
typedef int Array1Dim[51];
typedef int Array2Dim[51][51];
struct Record
{
Record *PtrComp;
Enumeration Discr;
Enumeration EnumComp;
OneToFifty IntComp;
String30 StringComp;
};
typedef int boolean;
const NULL = 0;
const TRUE = 1;
const FALSE = 0;
#ifndef REG
#define REG
#endif
extern Enumeration Func1( CapitalLetter, CapitalLetter);
extern boolean Func2( String30, String30 );
// C++ declararions
extern boolean Func3( Enumeration );
extern void Proc0(), Proc1( Record* ), Proc2( OneToFifty* ),
Proc3( Record** ), Proc4(), Proc5(),
Proc6( Enumeration, Enumeration* ),
Proc7( OneToFifty, OneToFifty, OneToFifty* ),
Proc8( Array1Dim, Array2Dim, OneToFifty, OneToFifty );
extern int exit( int ),
printf( const char*, ... );
extern char* strcpy( const char*, const char* ),
strcmp( const char*, const char* );
#ifdef TIMES
#include <sys/types.h>
#include <sys/times.h>
#endif
main()
{
Proc0();
exit(0);
}
/*
* Package 1
*/
int IntGlob;
boolean BoolGlob;
char Char1Glob;
char Char2Glob;
Array1Dim Array1Glob;
Array2Dim Array2Glob;
Record* PtrGlb;
Record* PtrGlbNext;
void Proc0()
{
OneToFifty IntLoc1;
REG OneToFifty IntLoc2;
OneToFifty IntLoc3;
REG char CharLoc;
REG char CharIndex;
Enumeration EnumLoc;
String30 String1Loc;
String30 String2Loc;
// extern char *malloc();
register unsigned int i;
#ifdef TIME
long time( long* );
long starttime;
long benchtime;
long nulltime;
starttime = time(0);
for (i = 0; i < LOOPS; ++i);
nulltime = time(0) - starttime; /* Computes o'head of loop */
#endif
#ifdef TIMES
time_t starttime;
time_t benchtime;
time_t nulltime;
struct tms tms;
times(&tms); starttime = tms.tms_utime;
for (i = 0; i < LOOPS; ++i);
times(&tms);
nulltime = tms.tms_utime - starttime; /* Computes overhead of looping */
#endif
PtrGlbNext = new Record;
PtrGlb = new Record;
PtrGlb->PtrComp = PtrGlbNext;
PtrGlb->Discr = Ident1;
PtrGlb->EnumComp = Ident3;
PtrGlb->IntComp = 40;
strcpy(PtrGlb->StringComp, "DHRYSTONE PROGRAM, SOME STRING");
#ifndef GOOF
strcpy(String1Loc, "DHRYSTONE PROGRAM, 1'ST STRING"); /*GOOF*/
#endif
Array2Glob[8][7] = 10; /* Was missing in published program */
/*****************
-- Start Timer --
*****************/
#ifdef TIME
starttime = time(0);
#endif
#ifdef TIMES
times(&tms); starttime = tms.tms_utime;
#endif
for (i = 0; i < LOOPS; ++i)
{
Proc5();
Proc4();
IntLoc1 = 2;
IntLoc2 = 3;
strcpy(String2Loc, "DHRYSTONE PROGRAM, 2'ND STRING");
EnumLoc = Ident2;
BoolGlob = ! Func2(String1Loc, String2Loc);
while (IntLoc1 < IntLoc2)
{
IntLoc3 = 5 * IntLoc1 - IntLoc2;
Proc7(IntLoc1, IntLoc2, &IntLoc3);
++IntLoc1;
}
Proc8(Array1Glob, Array2Glob, IntLoc1, IntLoc3);
Proc1(PtrGlb);
for (CharIndex = 'A'; CharIndex <= Char2Glob; ++CharIndex)
if (EnumLoc == Func1(CharIndex, 'C'))
Proc6(Ident1, &EnumLoc);
IntLoc3 = IntLoc2 * IntLoc1;
IntLoc2 = IntLoc3 / IntLoc1;
IntLoc2 = 7 * (IntLoc3 - IntLoc2) - IntLoc1;
Proc2(&IntLoc1);
}
/*****************
-- Stop Timer --
*****************/
#ifdef TIME
benchtime = time(0) - starttime - nulltime;
printf("Dhrystone(%s) time for %ld passes = %ld\n",
Version,
(long) LOOPS, benchtime);
printf("This machine benchmarks at %ld dhrystones/second\n",
((long) LOOPS) / benchtime);
#endif
#ifdef TIMES
times(&tms);
benchtime = tms.tms_utime - starttime - nulltime;
printf("Dhrystone(%s) time for %ld passes = %ld\n",
Version,
(long) LOOPS, benchtime/HZ);
printf("This machine benchmarks at %ld dhrystones/second\n",
((long) LOOPS) * HZ / benchtime);
#endif
}
void Proc1 (REG Record* PtrParIn)
{
#define NextRecord (*(PtrParIn->PtrComp))
structassign(NextRecord, *PtrGlb);
PtrParIn->IntComp = 5;
NextRecord.IntComp = PtrParIn->IntComp;
NextRecord.PtrComp = PtrParIn->PtrComp;
Proc3((Record**)NextRecord.PtrComp);
if (NextRecord.Discr == Ident1)
{
NextRecord.IntComp = 6;
Proc6(PtrParIn->EnumComp, &NextRecord.EnumComp);
NextRecord.PtrComp = PtrGlb->PtrComp;
Proc7(NextRecord.IntComp, 10, &NextRecord.IntComp);
}
else
structassign(*PtrParIn, NextRecord);
#undef NextRecord
}
void Proc2 (OneToFifty* IntParIO)
{
REG OneToFifty IntLoc;
REG Enumeration EnumLoc;
IntLoc = *IntParIO + 10;
for(;;)
{
if (Char1Glob == 'A')
{
--IntLoc;
*IntParIO = IntLoc - IntGlob;
EnumLoc = Ident1;
}
if (EnumLoc == Ident1)
break;
}
}
void Proc3 (Record** PtrParOut)
{
if (PtrGlb != 0)
*PtrParOut = PtrGlb->PtrComp;
else
IntGlob = 100;
Proc7(10, IntGlob, &PtrGlb->IntComp);
}
void Proc4()
{
REG boolean BoolLoc;
BoolLoc = Char1Glob == 'A';
BoolLoc |= BoolGlob;
Char2Glob = 'B';
}
void Proc5()
{
Char1Glob = 'A';
BoolGlob = FALSE;
}
// extern boolean Func3();
void Proc6( REG Enumeration EnumParIn, REG Enumeration* EnumParOut )
{
*EnumParOut = EnumParIn;
if (! Func3(EnumParIn) )
*EnumParOut = Ident4;
switch (EnumParIn)
{
case Ident1: *EnumParOut = Ident1; break;
case Ident2: if (IntGlob > 100) *EnumParOut = Ident1;
else *EnumParOut = Ident4;
break;
case Ident3: *EnumParOut = Ident2; break;
case Ident4: break;
case Ident5: *EnumParOut = Ident3;
}
}
void Proc7 (OneToFifty IntParI1, OneToFifty IntParI2, OneToFifty* IntParOut )
{
REG OneToFifty IntLoc;
IntLoc = IntParI1 + 2;
*IntParOut = IntParI2 + IntLoc;
}
void Proc8 (Array1Dim Array1Par, Array2Dim Array2Par,
OneToFifty IntParI1, OneToFifty IntParI2)
{
REG OneToFifty IntLoc;
REG OneToFifty IntIndex;
IntLoc = IntParI1 + 5;
Array1Par[IntLoc] = IntParI2;
Array1Par[IntLoc+1] = Array1Par[IntLoc];
Array1Par[IntLoc+30] = IntLoc;
for (IntIndex = IntLoc; IntIndex <= (IntLoc+1); ++IntIndex)
Array2Par[IntLoc][IntIndex] = IntLoc;
++Array2Par[IntLoc][IntLoc-1];
Array2Par[IntLoc+20][IntLoc] = Array1Par[IntLoc];
IntGlob = 5;
}
Enumeration Func1 (CapitalLetter CharPar1, CapitalLetter CharPar2)
{
REG CapitalLetter CharLoc1;
REG CapitalLetter CharLoc2;
CharLoc1 = CharPar1;
CharLoc2 = CharLoc1;
if (CharLoc2 != CharPar2)
return (Ident1);
else
return (Ident2);
}
boolean Func2 (String30 StrParI1, String30 StrParI2)
{
REG OneToThirty IntLoc;
REG CapitalLetter CharLoc;
IntLoc = 1;
while (IntLoc <= 1)
if (Func1(StrParI1[IntLoc], StrParI2[IntLoc+1]) == Ident1)
{
CharLoc = 'A';
++IntLoc;
}
if (CharLoc >= 'W' && CharLoc <= 'Z')
IntLoc = 7;
if (CharLoc == 'X')
return(TRUE);
else
{
if (strcmp(StrParI1, StrParI2) > 0)
{
IntLoc += 7;
return (TRUE);
}
else
return (FALSE);
}
}
boolean Func3( REG Enumeration EnumParIn )
{
REG Enumeration EnumLoc;
EnumLoc = EnumParIn;
if (EnumLoc == Ident3) return (TRUE);
return (FALSE);
}
#ifdef NOSTRUCTASSIGN
memcpy(d, s, l)
register char *d;
register char *s;
register int l;
{
while (l--) *d++ = *s++;
}
#endifjima@hplsla.HP.COM ( Jim Adcock) (04/30/88)
Here's some results for running this "C++" compatible version of Dhrystone program on various hp machines I have around me. Other than on my 320 system the "C++" results were generated by transferring the ..c intermediate file to the target machine, changing _new to malloc and removing _main, then compiling on that machines available "cc" compilers. The definition of "HZ" was changed to match the respective HP machines -- 50Hz for HP series 300, 100Hz for HP series 800. Caveat: I do not believe these results are particularly useful for comparing the relative merits of cc compilers, C++ compilers, or various host machines if the results are at all similar in speed. If the results are radically different [>2X], then maybe these kinds of tests CAN be considered somewhat meaningful in comparing compilers and machines. The HP320 is a mid-range 68020 based engineering workstation. The HP350 is a high-range 68020 based engineering workstation. The 840 and 850 are engineering minicomputers based on HP's new proprietory "RISC" architecture. The HP320 is the machine I have presently, which explains why more benchmarks are available for it. These results were generated "casually", just for my own edification and amusement, and shouldn't be considered gospel. In my opinion, this test program is not doing a good job of exercising the power/lack-of-power available in the various compilers optimization routines. Compile and test some "real-world" C++ routines, like oopslib, to "seriously" compare C++ compilers. Better yet, run comparisons using the kind of programs you really write. CPU Compiler[s] "Dhrystones" HP320 CC 1.11 / HP-UX 6.0 cc 2083 HP320 CC 1.11 / HP-UX 6.0 cc -O 2181 HP320 g++ 1.18 2194 HP320 g++ 1.18 -O 2194 HP320 g++ 1.18 -O -fomit-frame-pointer -finline-functions 2203 HP320 CC 1.11 / Microtec 3.0 [options unknown] 2278 HP350 CC 1.11 / HP-UX 6.0 cc -O 4752 HP350 g++ 1.18 -O -fomit-frame-pointer -finline-functions 4902 HP350 CC 1.11 / Microtec 3.0 [options unknown] 5122 HP840 CC 1.11 / HP-UX cc -O 10940 HP850 CC 1.11 / HP-UX cc -O 20408 Conclusions: On THIS benchmark, choice of compilers doesn't matter too much. Choice of computer system makes a much bigger difference. The HP320 might be a little on the low side for doing much C++ compiling -- at least until incremental compilation/link of C++ programs becomes a reality. The HP350 isn't a bad match for C++ compilation work. Access to a lightly loaded HP850 system would make C++ development work a gas.