[comp.lang.c++] D

gautron@corto.inria.fr (Philippe Gautron) (04/27/88)

    Drystone is a benchmark program which measures processor+compiler
    efficiency in executing a 'typical' program. My purpose is NOT to
    know if this program is good, good enough, bad for these measures
    but to compare them on a same site (and same conditions) :
    - pcc as reference,
    - gcc, the GNU C compiler
    - C++ (ATT version 1.1), cfront translator and pcc: C++.pcc
    - C++ (ATT version 1.1), cfront translator and gcc: C++.gcc
                      [cfront compiled by C++.gcc itself]
    - G++, the GNU C++ compiler (version 1.18)

Machine: SUN 3/260, STANDALONE, 8M RAM
All compilations with -O.  All compiles include the standard Sun
libraries, not gnulib.
Two tests: with and without register declarations.

dry.c: (version C/1.1, 12/01/84)
 *	Date:		PROGRAM updated 01/06/86, RESULTS updated 03/31/86
 *	Compile:	cc -O dry.c -o drynr			: No registers
 *			cc -O -DREG=register dry.c -o dryr	: Registers

1) First, a bug in dry.c: procedure Proc3
struct	Record
	struct Record		*PtrComp;

typedef struct Record 	RecordType;
typedef RecordType *	RecordPtr;

REG RecordPtr	PtrParIn;
#define	NextRecord	(*(PtrParIn->PtrComp))
	Proc3(NextRecord.PtrComp);	<== call with struct Record*

RecordPtr	*PtrParOut;		<== called with struct Record**

This bug does not abort the execution. I have translate dry.c in C++ syntax,
and the first compilation aborts on the Proc3 declaration.

2) Second, the results (average):
  9:36am  up 2 mins,  0 user,  load average: 0.08, 0.00, 0.00

- pcc, drynr
Dhrystone(1.1) time for 500000 passes = 84
This machine benchmarks at 5919 dhrystones/second

- pcc, dryr
Dhrystone(1.1) time for 500000 passes = 78
This machine benchmarks at 6378 dhrystones/second

- gcc, drynr
Dhrystone(1.1) time for 500000 passes = 73
This machine benchmarks at 6815 dhrystones/second

- gcc, dryr
Dhrystone(1.1) time for 500000 passes = 73
This machine benchmarks at 6811 dhrystones/second

- C++ 1.1 with pcc, drynr
Dhrystone(1.1 (C++ syntax)) time for 500000 passes = 85
This machine benchmarks at 5850 dhrystones/second

- C++ 1.1 with pcc, drynr
Dhrystone(1.1 (C++ syntax)) time for 500000 passes = 86
This machine benchmarks at 5759 dhrystones/second

- C++ 1.1 with gcc, drynr
Dhrystone(1.1 (C++ syntax)) time for 500000 passes = 73
This machine benchmarks at 6761 dhrystones/second

- C++ 1.1 with gcc, dryr
Dhrystone(1.1 (C++ syntax)) time for 500000 passes = 73
This machine benchmarks at 6765 dhrystones/second

- G++ 1.18, drynr
Dhrystone(1.1 (C++ syntax)) time for 500000 passes = 71
This machine benchmarks at 7038 dhrystones/second

- G++ 1.18, dryr
Dhrystone(1.1 (C++ syntax)) time for 500000 passes = 71
This machine benchmarks at 7040 dhrystones/second

3) Conclusion: 
   register declaration are interesting with pcc, less with gcc
   but pcc is good for loops.

   G++ > gcc > C++.gcc > pcc > C++.pcc

   There is a interesting gap between C++.pcc and C++.gcc. 

If you have some comments..

P. Gautron

Here is my C++ source

/*	EVERBODY:	Please read "APOLOGY" below. -rick 01/06/85
 *	"DHRYSTONE" Benchmark Program
 *	Version:	C/1.1, 12/01/84
 *	Date:		PROGRAM updated 01/06/86, RESULTS updated 03/31/86
 *	Author:		Reinhold P. Weicker,  CACM Vol 27, No 10, 10/84 pg. 1013
 *			Translated from ADA by Rick Richardson
 *			Every method to preserve ADA-likeness has been used,
 *			at the expense of C-ness.
 *	Compile:	cc -O dry.c -o drynr			: No registers
 *			cc -O -DREG=register dry.c -o dryr	: Registers
 *	Defines:	Defines are provided for old C compiler's
 *			which don't have enums, and can't assign structures.
 *			The time(2) function is library dependant; Most
 *			return the time in seconds, but beware of some, like
 *			Aztec C, which return other units.
 *			The LOOPS define is initially set for 50000 loops.
 *			If you have a machine with large integers and is
 *			very fast, please change this number to 500000 to
 *			get better accuracy.  Please select the way to
 *			measure the execution time using the TIME define.
 *			For single user machines, time(2) is adequate. For
 *			multi-user machines where you cannot get single-user
 *			access, use the times(2) function.  If you have
 *			neither, use a stopwatch in the dead of night.
 *			Use a "printf" at the point marked "start timer"
 *			to begin your timings. DO NOT use the UNIX "time(1)"
 *			command, as this will measure the total time to
 *			run this program, which will (erroneously) include
 *			the time to malloc(3) storage and to compute the
 *			time it takes to do nothing.
 *	Run:		drynr; dryr
 *	Results:	If you get any new machine/OS results, please send to:
 *				ihnp4!castor!pcrat!rick
 *			and thanks to all that do.
 *	Note:		I order the list in increasing performance of the
 *			"with registers" benchmark.  If the compiler doesn't
 *			provide register variables, then the benchmark
 *			is the same for both REG and NOREG.
 *	PLEASE:		Send complete information about the machine type,
 *			clock speed, OS and C manufacturer/version.  If
 *			the machine is modified, tell me what was done.
 *			On UNIX, execute uname -a and cc -V to get this info.
 *	80x8x NOTE:	80x8x benchers: please try to do all memory models
 *			for a particular compiler.
 *	APOLOGY (1/30/86):
 *		Well, I goofed things up!  As pointed out by Haakon Bugge,
 *		the line of code marked "GOOF" below was missing from the
 *		Dhrystone distribution for the last several months.  It
 *		*WAS* in a backup copy I made last winter, so no doubt it
 *		was victimized by sleepy fingers operating vi!
 *		The effect of the line missing is that the reported benchmarks
 *		are 15% too fast (at least on a 80286).  Now, this creates
 *		a dilema - do I throw out ALL the data so far collected
 *		and use only results from this (corrected) version, or
 *		do I just keep collecting data for the old version?
 *		Since the data collected so far *is* valid as long as it
 *		is compared with like data, I have decided to keep
 *		TWO lists- one for the old benchmark, and one for the
 *		new.  This also gives me an opportunity to correct one
 *		other error I made in the instructions for this benchmark.
 *		My experience with C compilers has been mostly with
 *		UNIX 'pcc' derived compilers, where the 'optimizer' simply
 *		fixes sloppy code generation (peephole optimization).
 *		But today, there exist C compiler optimizers that will actually
 *		perform optimization in the Computer Science sense of the word,
 *		by removing, for example, assignments to a variable whose
 *		value is never used.  Dhrystone, unfortunately, provides
 *		lots of opportunities for this sort of optimization.
 *		I request that benchmarkers re-run this new, corrected
 *		version of Dhrystone, turning off or bypassing optimizers
 *		which perform more than peephole optimization.  Please
 *		indicate the version of Dhrystone used when reporting the
 *		results to me.
 *	The following program contains statements of a high-level programming
 *	language (C) in a distribution considered representative:
 *	assignments			53%
 *	control statements		32%
 *	procedure, function calls	15%
 *	100 statements are dynamically executed.  The program is balanced with
 *	respect to the three aspects:
 *		- statement type
 *		- operand type (for simple data types)
 *		- operand access
 *			operand global, local, parameter, or constant.
 *	The combination of these three aspects is balanced only approximately.
 *	The program does not compute anything meaningfull, but it is
 *	syntactically and semantically correct.

/* Accuracy of timings and human fatigue controlled by next two lines */
//const LOOPS =	50000;		/* Use this for slow or 16 bit machines */
const LOOPS =	500000;		/* Use this for faster machines */

/* Compiler dependent options */
#undef	NOENUM			/* Define if compiler has no enum's */
#undef	NOSTRUCTASSIGN		/* Define if compiler can't assign structures */

/* define only one of the next two defines */
#define TIMES			/* Use times(2) time function */
/*#define TIME			/* Use time(2) time function */

/* define the granularity of your times(2) function (when used) */
const HZ =	60;		/* times(2) returns 1/60 second (most) */
//const HZ =	100;		/* times(2) returns 1/100 second (WECo) */

/* for compatibility with goofed up version */
/*#undef GOOF			/* Define if you want the goofed up version */

#ifdef GOOF
char	Version[] = "1.0";
char	Version[] = "1.1 (C++ syntax)";

#define	structassign(d, s)	memcpy(&(d), &(s), sizeof(d))
#define	structassign(d, s)	d = s

#ifdef	NOENUM
const	Ident1 =	1;
const	Ident2 =	2;
const	Ident3 =	3;
const	Ident4 =	4;
const	Ident5 =	5;
enum Enumeration {Ident1, Ident2, Ident3, Ident4, Ident5};

typedef int	OneToThirty;
typedef int	OneToFifty;
typedef char	CapitalLetter;
typedef char	String30[31];
typedef int	Array1Dim[51];
typedef int	Array2Dim[51][51];

struct	Record
	Record			*PtrComp;
	Enumeration		Discr;
	Enumeration		EnumComp;
	OneToFifty		IntComp;
	String30		StringComp;

typedef int		boolean;

const	NULL =		0;
const	TRUE =		1;
const	FALSE =		0;

#ifndef REG
#define	REG

extern Enumeration	Func1( CapitalLetter, CapitalLetter);
extern boolean		Func2( String30, String30 );

// C++ declararions
extern boolean		Func3( Enumeration );
extern void		Proc0(), Proc1( Record* ), Proc2( OneToFifty* ),
			Proc3( Record** ), Proc4(), Proc5(),
			Proc6( Enumeration, Enumeration* ),
			Proc7( OneToFifty, OneToFifty, OneToFifty* ),
			Proc8( Array1Dim, Array2Dim, OneToFifty, OneToFifty );
extern int		exit( int ),
			printf( const char*, ... );
extern char*		strcpy( const char*, const char* ),
			strcmp( const char*, const char* );

#ifdef TIMES
#include <sys/types.h>
#include <sys/times.h>


 * Package 1
int		IntGlob;
boolean		BoolGlob;
char		Char1Glob;
char		Char2Glob;
Array1Dim	Array1Glob;
Array2Dim	Array2Glob;
Record*	PtrGlb;
Record*	PtrGlbNext;

void Proc0()
	OneToFifty		IntLoc1;
	REG OneToFifty		IntLoc2;
	OneToFifty		IntLoc3;
	REG char		CharLoc;
	REG char		CharIndex;
	Enumeration	 	EnumLoc;
	String30		String1Loc;
	String30		String2Loc;
//	extern char		*malloc();
	register unsigned int	i;

#ifdef TIME
	long			time( long* );
	long			starttime;
	long			benchtime;
	long			nulltime;

	starttime = time(0);
	for (i = 0; i < LOOPS; ++i);
	nulltime = time(0) - starttime; /* Computes o'head of loop */
#ifdef TIMES
	time_t			starttime;
	time_t			benchtime;
	time_t			nulltime;
	struct tms		tms;

	times(&tms); starttime = tms.tms_utime;
	for (i = 0; i < LOOPS; ++i);
	nulltime = tms.tms_utime - starttime; /* Computes overhead of looping */

	PtrGlbNext = new Record;
	PtrGlb = new Record;
	PtrGlb->PtrComp = PtrGlbNext;
	PtrGlb->Discr = Ident1;
	PtrGlb->EnumComp = Ident3;
	PtrGlb->IntComp = 40;
	strcpy(PtrGlb->StringComp, "DHRYSTONE PROGRAM, SOME STRING");
#ifndef	GOOF
	strcpy(String1Loc, "DHRYSTONE PROGRAM, 1'ST STRING");	/*GOOF*/
	Array2Glob[8][7] = 10;	/* Was missing in published program */

-- Start Timer --
#ifdef TIME
	starttime = time(0);
#ifdef TIMES
	times(&tms); starttime = tms.tms_utime;
	for (i = 0; i < LOOPS; ++i)

		IntLoc1 = 2;
		IntLoc2 = 3;
		strcpy(String2Loc, "DHRYSTONE PROGRAM, 2'ND STRING");
		EnumLoc = Ident2;
		BoolGlob = ! Func2(String1Loc, String2Loc);
		while (IntLoc1 < IntLoc2)
			IntLoc3 = 5 * IntLoc1 - IntLoc2;
			Proc7(IntLoc1, IntLoc2, &IntLoc3);
		Proc8(Array1Glob, Array2Glob, IntLoc1, IntLoc3);
		for (CharIndex = 'A'; CharIndex <= Char2Glob; ++CharIndex)
			if (EnumLoc == Func1(CharIndex, 'C'))
				Proc6(Ident1, &EnumLoc);
		IntLoc3 = IntLoc2 * IntLoc1;
		IntLoc2 = IntLoc3 / IntLoc1;
		IntLoc2 = 7 * (IntLoc3 - IntLoc2) - IntLoc1;

-- Stop Timer --

#ifdef TIME
	benchtime = time(0) - starttime - nulltime;
	printf("Dhrystone(%s) time for %ld passes = %ld\n",
		(long) LOOPS, benchtime);
	printf("This machine benchmarks at %ld dhrystones/second\n",
		((long) LOOPS) / benchtime);
#ifdef TIMES
	benchtime = tms.tms_utime - starttime - nulltime;
	printf("Dhrystone(%s) time for %ld passes = %ld\n",
		(long) LOOPS, benchtime/HZ);
	printf("This machine benchmarks at %ld dhrystones/second\n",
		((long) LOOPS) * HZ / benchtime);


void Proc1 (REG Record* PtrParIn)
#define	NextRecord	(*(PtrParIn->PtrComp))

	structassign(NextRecord, *PtrGlb);
	PtrParIn->IntComp = 5;
	NextRecord.IntComp = PtrParIn->IntComp;
	NextRecord.PtrComp = PtrParIn->PtrComp;
	if (NextRecord.Discr == Ident1)
		NextRecord.IntComp = 6;
		Proc6(PtrParIn->EnumComp, &NextRecord.EnumComp);
		NextRecord.PtrComp = PtrGlb->PtrComp;
		Proc7(NextRecord.IntComp, 10, &NextRecord.IntComp);
		structassign(*PtrParIn, NextRecord);

#undef	NextRecord

void Proc2 (OneToFifty* IntParIO)
	REG OneToFifty		IntLoc;
	REG Enumeration		EnumLoc;

	IntLoc = *IntParIO + 10;
		if (Char1Glob == 'A')
			*IntParIO = IntLoc - IntGlob;
			EnumLoc = Ident1;
		if (EnumLoc == Ident1)

void Proc3 (Record** PtrParOut)
	if (PtrGlb != 0)
		*PtrParOut = PtrGlb->PtrComp;
		IntGlob = 100;
	Proc7(10, IntGlob, &PtrGlb->IntComp);

void Proc4()
	REG boolean	BoolLoc;

	BoolLoc = Char1Glob == 'A';
	BoolLoc |= BoolGlob;
	Char2Glob = 'B';

void Proc5()
	Char1Glob = 'A';
	BoolGlob = FALSE;

// extern boolean Func3();

void Proc6( REG Enumeration	EnumParIn, REG Enumeration* EnumParOut )
	*EnumParOut = EnumParIn;
	if (! Func3(EnumParIn) )
		*EnumParOut = Ident4;
	switch (EnumParIn)
	case Ident1:	*EnumParOut = Ident1; break;
	case Ident2:	if (IntGlob > 100) *EnumParOut = Ident1;
			else *EnumParOut = Ident4;
	case Ident3:	*EnumParOut = Ident2; break;
	case Ident4:	break;
	case Ident5:	*EnumParOut = Ident3;

void Proc7 (OneToFifty IntParI1, OneToFifty IntParI2, OneToFifty* IntParOut )
	REG OneToFifty	IntLoc;

	IntLoc = IntParI1 + 2;
	*IntParOut = IntParI2 + IntLoc;

void Proc8 (Array1Dim Array1Par, Array2Dim Array2Par,
                                   OneToFifty IntParI1, OneToFifty IntParI2)
	REG OneToFifty	IntLoc;
	REG OneToFifty	IntIndex;

	IntLoc = IntParI1 + 5;
	Array1Par[IntLoc] = IntParI2;
	Array1Par[IntLoc+1] = Array1Par[IntLoc];
	Array1Par[IntLoc+30] = IntLoc;
	for (IntIndex = IntLoc; IntIndex <= (IntLoc+1); ++IntIndex)
		Array2Par[IntLoc][IntIndex] = IntLoc;
	Array2Par[IntLoc+20][IntLoc] = Array1Par[IntLoc];
	IntGlob = 5;

Enumeration Func1 (CapitalLetter CharPar1, CapitalLetter CharPar2)
	REG CapitalLetter	CharLoc1;
	REG CapitalLetter	CharLoc2;

	CharLoc1 = CharPar1;
	CharLoc2 = CharLoc1;
	if (CharLoc2 != CharPar2)
		return (Ident1);
		return (Ident2);

boolean Func2 (String30	StrParI1, String30 StrParI2)
	REG OneToThirty		IntLoc;
	REG CapitalLetter	CharLoc;

	IntLoc = 1;
	while (IntLoc <= 1)
		if (Func1(StrParI1[IntLoc], StrParI2[IntLoc+1]) == Ident1)
			CharLoc = 'A';
	if (CharLoc >= 'W' && CharLoc <= 'Z')
		IntLoc = 7;
	if (CharLoc == 'X')
		if (strcmp(StrParI1, StrParI2) > 0)
			IntLoc += 7;
			return (TRUE);
			return (FALSE);

boolean Func3( REG Enumeration EnumParIn )
	REG Enumeration	EnumLoc;

	EnumLoc = EnumParIn;
	if (EnumLoc == Ident3) return (TRUE);
	return (FALSE);

memcpy(d, s, l)
register char	*d;
register char	*s;
register int	l;
	while (l--) *d++ = *s++;

jima@hplsla.HP.COM ( Jim Adcock) (04/30/88)

Here's some results for running this "C++" compatible version
of Dhrystone program on various hp machines I have around
me.  Other than on my 320 system the "C++" results were
generated by transferring the ..c intermediate file to
the target machine, changing _new to malloc and removing
_main, then compiling on that machines available "cc"
compilers.  The definition of "HZ" was changed to match the
respective HP machines -- 50Hz for HP series 300, 100Hz for
HP series 800.  Caveat: I do not believe these results are
particularly useful for comparing the relative merits of
cc compilers, C++ compilers, or various host machines if
the results are at all similar in speed.  If the results 
are radically different [>2X], then maybe these kinds of tests
CAN be considered somewhat meaningful in comparing compilers
and machines.  The HP320 is a mid-range 68020 based engineering
workstation.  The HP350 is a high-range 68020 based engineering
workstation.  The 840 and 850 are engineering minicomputers
based on HP's new proprietory "RISC" architecture.  The HP320
is the machine I have presently, which explains why more 
benchmarks are available for it.  These results were generated
"casually", just for my own edification and amusement, and shouldn't
be considered gospel.  In my opinion, this test program is not
doing a good job of exercising the power/lack-of-power available
in the various compilers optimization routines.  Compile and
test some "real-world" C++ routines, like oopslib, to "seriously"
compare C++ compilers.  Better yet, run comparisons using the
kind of programs you really write.

CPU	Compiler[s]				     "Dhrystones"

HP320	CC 1.11 / HP-UX 6.0 cc				 2083
HP320	CC 1.11 / HP-UX 6.0 cc -O			 2181
HP320	g++ 1.18					 2194
HP320	g++ 1.18 -O					 2194
HP320	g++ 1.18 -O -fomit-frame-pointer 
		    -finline-functions 	 		 2203
HP320	CC 1.11 / Microtec 3.0 [options unknown]	 2278
HP350	CC 1.11 / HP-UX 6.0 cc -O	 		 4752
HP350	g++ 1.18 -O -fomit-frame-pointer 
		    -finline-functions 	 		 4902
HP350	CC 1.11 / Microtec 3.0 [options unknown]	 5122
HP840   CC 1.11 / HP-UX cc -O				10940
HP850   CC 1.11 / HP-UX cc -O				20408

Conclusions: On THIS benchmark, choice of compilers doesn't
matter too much.  Choice of computer system makes a much bigger
difference.  The HP320 might be a little on the low side
for doing much C++ compiling -- at least until incremental 
compilation/link of C++ programs becomes a reality.  The HP350
isn't a bad match for C++ compilation work.  Access to a lightly
loaded HP850 system would make C++ development work a gas.