[comp.arch] NSIEVE C Source File

aburto@marlin.NOSC.MIL (Alfred A. Aburto) (08/24/88)

/****************** Start NSIEVE C Source Code *************************/

/****************************************************************/
/*                             NSIEVE                           */
/*                        C Program Source                      */
/*             Sieve benchmark for variable sized arrays        */
/*                            Al Aburto                         */
/*                            22 Aug 88                         */
/*                   aburto@marlin.nosc.mil.UUCP                */
/*                          'ala' on BIX                        */
/*                                                              */
/* The program uses the VAX and SUN UNIX C 'times()' routine to */
/* determine the elapsed run time in seconds.  The compiler     */
/* option 'UNIX' must be defined for proper operation on VAX or */
/* SUN UNIX systems.  Also 'HZ' must be defined.  HZ will be 60 */
/* for most systems and this is the default set in the program. */
/* Please check the system 'times()' definition so that HZ is   */
/* properly defined for the right measure of time.  HZ may need */ 
/* to be set to 50 or 100 on some systems.                      */
/*                                                              */
/* A timing routine for the Amiga is also included. The compiler*/
/* option 'Amiga' must be defined ('-DAmiga') for proper timing */
/* on Amiga systems.  HZ is defined as 50 with the routines     */
/* used on this system.                                         */
/*                                                              */
/* A timing routine for the IBM PC/PC-AT type systems with      */
/* TURBO C is also included.  The Turbo C compiler option       */
/* '-DTURBO_C' must be defined for this timing routine to work. */
/* HZ is defined as 100 for this system.                        */
/*                                                              */
/* The program uses a maximum array size of 320,000 bytes.  You */
/* can increase or lower this value by changing the 'limit'     */
/* define from 6 to a higher or lower value.  Some systems may  */
/* be unable to work beyond 'limit = 4' which corresponds to an */
/* array size of 40,000 bytes.  The '#define limit 6' in the    */
/* program below corresponds to a max SIEVE array size of       */
/* 320,000 bytes.                                               */
/* The maximum array size is given by:                          */
/*              size = 5000 * ( 2 ** limit ).                   */
/*                                                              */
/* The program uses register variables.  Some compilers may not */
/* accept this option.  If this is the case then 'register'     */
/* will need to be removed from the SIEVE() subroutine.         */
/*                                                              */
/* In re-running the program I am  getting about a 5% variation */
/* in the 'Average BenchTime' output.                           */
/*                                               	        */
/* Program outputs to check for proper operation:               */
/* Array Size       Primes Found      Last Prime                */
/*  (Bytes)                                                     */
/*    8191              1899              16381                 */
/*   10000              2261              19997                 */
/*   20000              4202              39989                 */
/*   40000              7836              79999                 */
/*   80000             14683             160001                 */
/*  160000             27607             319993                 */
/*  320000             52073             639997                 */
/****************************************************************/

/****************************************************/  
/* Example Compilation:                             */
/* (1) VAX and SUN UNIX C:                          */
/*     cc -O -DUNIX nsieve.c -o nsieve              */
/*     cc -DUNIX nsieve.c -o nsieve                 */
/*                                                  */
/* (2) Turbo-Amiga (68020):                         */
/*     cc +2 +L +ff -DAmiga nsieve.c                */
/*     ln nsieve.o -lm32 -lc32                      */
/*                                                  */
/* (3) Amiga:                                       */
/*     cc +L +ff -DAmiga nsieve.c                   */
/*     ln nsieve.o -lm32 -lc32                      */
/*                                                  */
/* (4) IBM PC/PC-AT types with TURBO C              */
/*     Compile with '-DTURBO_C' or uncomment the    */
/*     the 'TURBO_C' define in the main program.    */
/*     Use 'huge' data option. Compile for 'speed'. */
/****************************************************/

/************************************************************************/
/* Some Results:                                                        */
/* Array Size   ------------------ BenchTime (sec) -------------------- */
/*  (Bytes)     SUN 3/280   VAX 8600     Turbo-Amiga                    */
/*            68020 @25 MHz           68020 @ 14.32 MHz                 */
/*              ( cc -O )   ( cc -O )  ( cc +2 +L +ff )                 */ 
/*      8191        0.267      0.250         0.461  ( 0.264 )           */
/*     10000        0.300      0.283         0.578  ( 0.331 )           */
/*     20000        0.650      0.783         1.195  ( 0.684 )           */
/*     40000        1.333      1.767         2.383  ( 1.365 )           */
/*     80000        2.917      3.750         4.820  ( 2.761 )           */
/*    160000        7.833      8.033         9.758  ( 5.589 )           */
/*    320000       17.600     17.717        ------     |                */
/*                                                     |                */
/*                Scaled to 25 MHz --------------------+                */
/*                                                                      */
/* Average Run Time relative to 10 Iterations and 8191 array size:      */
/*                  0.315      0.345         0.484                      */
/*                                                                      */
/* The VAX 8600 results are different from those that appeared in the   */
/* Jun Byte Article because I thought I could not define 'register'     */
/* variables at all, but I was in error.  'register long' variables     */
/* seem to work fine on our VAX 8600 UNIX 4.3 C compiler.  It is the    */
/* register 'short' types that (apparently) cause problems.             */
/*                                                                      */
/************************************************************************/

			  /**********************************************/
			  /* You may just want to uncomment one of      */
			  /* these for your system.                     */
			  /**********************************************/
/*  #define Amiga    */
/*  #define UNIX     */
/*  #define TURBO_C  */

#ifdef UNIX
#include <sys/types.h>
#include <sys/times.h>
#endif

#ifdef Amiga
#include <exec/types.h>
#include <ctype.h>
#include <stdio.h>
#include <math.h>
#endif

#ifdef TURBO_C
#include <ctype.h>
#include <dos.h>
#include <stdio.h>
#include <time.h>
#endif

#define false 0
#define true  1

#define limit 6

#ifdef UNIX
#define HZ  60
struct tms tms;
#endif

#ifdef Amiga
#define HZ  50
#endif

#ifdef TURBO_C
#define HZ  100
struct time now;
#endif

float TimeArray[3];
float nulltime,benchtime;

main()
{

long   i,j;
float sumtime;

printf("\n   Sieve of Eratosthenes (scaled to 10 Iterations)\n\n");
printf("   Array Size   Primes   Last Prime   BenchTime\n");
printf("    (Bytes)     Found                   (Sec)\n");

dtime(TimeArray);
dtime(TimeArray);
nulltime = TimeArray[1];

sumtime = 0.0;

j = 8191;
SIEVE(j);
sumtime = sumtime + benchtime;

j = 5000;

for( i=1 ; i<= limit ; i++)
{
	j = 2 * j;
	SIEVE(j);
	sumtime = sumtime + benchtime * ( 8191.0 / (float)j );
}

sumtime = sumtime / (float)(limit + 1);

printf("\n   Relative to 10 Iterations and the 8191 Array Size:\n");
printf("   Average BenchTime = %8.3f (sec)\n\n",sumtime);

}


/**************************************/
/*  Sieve of Erathosthenes Benchmark  */
/*    Version By William L. Hembree   */
/*       Las Cruces, New Mexico       */
/**************************************/

SIEVE(n)
long n;
{

register char *flags;
register long i,prime,k;
register long count,size;

long j,m,iter;

char *ptr = 0L;
char *malloc();
size  = n - 1;
ptr   = malloc(n);
flags = ptr;

	dtime(TimeArray);
	dtime(TimeArray);
	nulltime = TimeArray[1];

	j = 0;

	dtime(TimeArray);

	for(iter=1 ; iter<=10 ; iter++)
	{

	count = 0;

	for(i=0 ; i<=size ; i++)
	{
	   *(flags+i) = true;
        }

		for(i=0 ; i<=size ; i++)
		{
			if(*(flags+i))
			{
			   prime = i + i + 3;
				for(k = i + prime ; k<=size ; k+=prime)
				{
				*(flags+k)=false;
				}
                           count++;
                        }
                }
	   j = j + count;
	}

	dtime(TimeArray);
	      
	benchtime = TimeArray[1] - nulltime;

	j = j / 10; 
	m = prime;

	printf("  %8ld    %8ld     %8ld   %9.3f\n",n,j,m,benchtime);

}


#ifdef UNIX
/***********************************************/
/*  dtime function for the VAX UNIX C Compiler */
/*   dtime() returns elapsed time in seconds.  */
/***********************************************/
dtime(p)
float p[];
{

	float q;

	q = p[2];
	times(&tms);
	p[2] = (double)(tms.tms_utime) / (float)HZ;
	p[1] = p[2] - q;
}
#endif


#ifdef Amiga
/**********************************/
/*  dtime function for the Amiga  */
/**********************************/
dtime(p)
float p[];
{
	float q;

	struct tt {
	   long days;
	   long minutes;
	   long ticks;
        } tt;

	q = p[2];

	DateStamp(&tt);

	p[2] = ((float)(tt.ticks+(tt.minutes*60L*(long)HZ))) / (float)HZ;
	p[1] = p[2] - q;
}
#endif


#ifdef TURBO_C
/*************************************************/
/*  dtime for IBM PC/PC-AT systems with TURBO C  */
/*************************************************/
dtime(p)
float p[];
{
   float q,v;

   q = p[2];

   gettime(&now);

   v = 60.0*(float)(now.ti_min);
   v = v +  (float)(now.ti_sec);
   v = v +  (float)(now.ti_hund) / (float)HZ;

   p[2] = v;
   p[1] = v - q;
}
#endif


/************************ End Of NSIEVE C Source Code **********************/

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (08/25/88)

  Although I agree with the aims of NSIEVE to overcome weaknesses of
the original, there are some problems in this version as well, which
relate to the accuracy of the timing. I posted some commentary on time
measurement last year, perhaps it's time to repeat.

  The clock timing value, HZ, is not set at 60 by some default, that's
just the value on a VAX. In a number of system, such as SunOS, Xenix,
UNICOS, and all SysV systems, the value is found in <sys/param.h>. The
suggested way to get this is to use code like:
	#ifndef HZ
	#include <sys/param.h>
	#endif
and to force the user to define it on the cc command line if it's not in
the include file. Since popular values are 20, 50, 100, 4096, and
243902439 (the last on a Cray2), the assumption of of 60 will yield
incorrect results.

  In UNIX versions the CPU time should be used, rather than real time,
since the real time may vary wildly. The realtime will excede the CPU by
some factor based on the load average on uniprocessor machines, and will
be smaller than the CPU by about N on a good multiprocessor machine. On
MS-DOS, CP/M and other small control programs the assumption of
CPU=clock is usually valid.

  I hope that these comments will allow the author to improve this very
useful program, such that it will produce consistent results. I have the
UNIX routines for realtime and CPU in a portable manner if that would be
of use.

-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

mash@mips.COM (John Mashey) (08/25/88)

In article <1073@marlin.NOSC.MIL> aburto@marlin.nosc.mil.UUCP (Alfred A. Aburto) writes:

>/****************** Start NSIEVE C Source Code *************************/

It is a good thing to publish benchmark sources so people can check themr,
and Alfred should commended for doing so.
Note, however, that this is not the most meaningful benchmark.
See gory details below, including architectural analysis.

>/************************************************************************/
>/* Some Results:                                                        */
>/* Array Size   ------------------ BenchTime (sec) -------------------- */
>/*  (Bytes)     SUN 3/280   VAX 8600     Turbo-Amiga                    */
>/*            68020 @25 MHz           68020 @ 14.32 MHz                 */
>/*              ( cc -O )   ( cc -O )  ( cc +2 +L +ff )                 */ 
>/*      8191        0.267      0.250         0.461  ( 0.264 )           */
>/*     10000        0.300      0.283         0.578  ( 0.331 )           */
>/*     20000        0.650      0.783         1.195  ( 0.684 )           */
>/*     40000        1.333      1.767         2.383  ( 1.365 )           */
>/*     80000        2.917      3.750         4.820  ( 2.761 )           */
>/*    160000        7.833      8.033         9.758  ( 5.589 )           */
>/*    320000       17.600     17.717        ------     |                */
>/*                                                     |                */
>/*                Scaled to 25 MHz --------------------+                */

>/*                                                                      */
>/* Average Run Time relative to 10 Iterations and 8191 array size:      */
>/*                  0.315      0.345         0.484                      */

A. Output from MIPS R2000
	(in M/120: 16.7MHz, 128K cache, low-latency memory system)

   Sieve of Eratosthenes (scaled to 10 Iterations)

   Array Size   Primes   Last Prime   BenchTime
    (Bytes)     Found                   (Sec)
      8191        1899        16381       0.130
     10000        2261        19997       0.150
     20000        4202        39989       0.320
     40000        7836        79999       0.630
     80000       14683       160001       1.270
    160000       27607       319993       2.580
    320000       52073       639997       5.570

   Relative to 10 Iterations and the 8191 Array Size:
   Average BenchTime =    0.131 (sec)

Now, some more data on this, or WHY YOU MUST BE CAREFUL WITH SMALL BENCHMARKS!

B. First, architectural data:
One can see (look for the (#) codes in the data below.
	(1) The instructions fit in a 32-word I-cache.
	(2) Essentially 100% of loads and stores are bytes.
		A more typical number might be 10%.
	(3) 75% of the loads+stores are stores.  A more typical number
		is 30-35%.
	(4) 100% of the loads had NO useful work schedulable in the
	load-delay slot, i.e., in this case, a load is actually costing
	2 cycles (+ any cache miss time).  A more typical number if 30%
	(and this is very germane to using this benchmark on RISC machines
	with pipelining).
	(5) ~100% of the loads/stores have zero-offsets; a more typical
	number is 10-15%.  [relevant to 29K, for example]


pixstats nsieve:
 159336572 (1.000) instructions
   6392573 (0.040) loads
  20105454 (0.126) stores
  26498027 (0.166) loads+stores		(2)
  26498114 (0.166) data bus use
  26484254 (0.166) partial word references (2)
  33974425 (0.213) branches

0.999 load nops per load (4)
0.759 stores per memory reference	(3)
0.999 partial word references per reference (2)

Instruction concentration:
         1   8.6%
         2  17.2%
         4  34.4%
         8  55.1%
        16  87.1%
        32 100.0%   (1)

Load/store displacement   16.6% (5)  (the size says how many bits used)
size                0-extend                1-extend sign-extend
   0  26482960 100.0% 100.0%        29   0.0%   0.0%
   1        36   0.0% 100.0%         0   0.0%   0.0%  100.0%
   2         0   0.0% 100.0%        29   0.0%   0.0%  100.0%
   3      1912   0.0% 100.0%         8   0.0%   0.0%  100.0%
   4      1654   0.0% 100.0%         0   0.0%   0.0%  100.0%
   9         2   0.0% 100.0%         0   0.0%   0.0%  100.0%
  10        18   0.0% 100.0%         0   0.0%   0.0%  100.0%
  11        92   0.0% 100.0%         0   0.0%   0.0%  100.0%
  12       602   0.0% 100.0%         0   0.0%   0.0%  100.0%
  13        60   0.0% 100.0%         0   0.0%   0.0%  100.0%

C.  Now, what the program was doing:

Profile listing generated Wed Aug 24 18:11:22 1988 with:
----------------------------------------------------------------------------
*  -p[rocedures] using basic-block counts;                                 *
*  sorted in descending order by the number of cycles executed in each     *
*  procedure; unexecuted procedures are excluded                           *
----------------------------------------------------------------------------
159336572 cycles
    cycles %cycles  cum %     cycles  bytes procedure (file)
                               /call  /line
 159286753   99.97  99.97   22755251     14 SIEVE (nsieve.c)
     24351    0.02  99.98         43     29 _flsbuf (flsbuf.c)
     16695    0.01  99.99       1392     18 _doprnt (doprnt.c)
	...rest are in the noise, obviously.....

----------------------------------------------------------------------------
*  -h[eavy] using basic-block counts;                                      *
*  sorted in descending order by the number of cycles executed in each     *
*  line; unexecuted lines are excluded                                     *
----------------------------------------------------------------------------
(source code with line numbering is shown following)

procedure (file)                           line bytes     cycles      %  cum %

SIEVE (nsieve.c)                            241    12   41148840  25.83  25.83
SIEVE (nsieve.c)                            240     8   27432560  17.22  43.04
SIEVE (nsieve.c)                            231    20   25527710  16.02  59.06
SIEVE (nsieve.c)                            235    16   25527640  16.02  75.08
SIEVE (nsieve.c)                            244    16   25527640  16.02  91.11
SIEVE (nsieve.c)                            230     4    6381910   4.01  95.11
SIEVE (nsieve.c)                            238    16    4422440   2.78  97.89
SIEVE (nsieve.c)                            237     8    2211220   1.39  99.27
SIEVE (nsieve.c)                            242     4    1105610   0.69  99.97
228		for(i=0 ; i<=size ; i++)
229		{
230		   *(flags+i) = true;
231	        }
233			for(i=0 ; i<=size ; i++)
234			{
235				if(*(flags+i))
236				{
237				   prime = i + i + 3;
238					for(k = i + prime ; k<=size ; k+=prime)
239					{
240					*(flags+k)=false;
241					}
242	                           count++;
243	                        }
244	                }


D. Summary
	The statistics of this benchmark bear little resemblance to
	most substantial programs.  This doesn't mean it's a bad benchmark,
	just that extreme care must be taken with it or any other small
	benchmark.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

aburto@marlin.NOSC.MIL (Alfred A. Aburto) (08/26/88)

In article <11961@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
>
>  I hope that these comments will allow the author to improve this very
>useful program, such that it will produce consistent results. I have the
>UNIX routines for realtime and CPU in a portable manner if that would be
>of use.
>
>-- 
>	bill davidsen		(wedu@ge-crd.arpa)
>  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen

Yes, I would very much appreciate the Unix realtime and CPU routines.

I knew that the measure of time was not uniform across the various Unix
system but I didn't know how to handle it properly so I just went with what
I knew.    

Thanks

Al Aburto
aburto@marlin.nosc.mil.UUCP

aburto@marlin.NOSC.MIL (Alfred A. Aburto) (09/02/88)

--------------

There is of course no denying the the fact that 'nsieve' (really sieve)
and so many other so-called 'benchmark' programs 'bear little resemblance
to most substantial' programs.

The Sieve (Jim Gilbreath, 'A High-Level Language Benchmark', BYTE, Sep 1981,
page 180) is not really a 'benchmark' program.  Gilbreath indicated this
in his article, but none the less he continued to call the program a           
and coded in many languages for the purpose of comparing relative            
performance.  It is certainly not a 'typical' program in terms of what
it does (generate some prime numbers), and it is not typical with respect
to the types of instructions used.

I think alot of confusion happens because (and I'm guilty of this) we keep
calling the Sieve a benchmark when it really isn't anything more than a
simple program to generate some prime numbers.  There is even more confusion
because there is apparently no clear cut set of definitions or specifications
or requirements for a computer benchmark program.  What is a benchmark
program?  What kinds of things should it do?  What kind of outputs should it
provide?  What minimum requirements should be met?  None of these things
are defined and as a result we find ourselves with alot of programs that
claim to be benchmarks of some sort.  The Whetstone and Dhrystone are good
but still I don't think of them as 'benchmarks' or standards --- mostly
because they don't provide much information relative to the performance of
the system under test and because they can be optimized to various degrees.
'nsieve' is I think relativey immune (I'm not positive of this) to high
degrees of optimization, but it provides no weighting for the frequency
of instructions used and it has other biases as your analysis shows.

Still, in my opinion, these programs when run on many systems using a variety
of different languages do provide valuable information with respect to the
capability of these systems (although somewhat limited information).

I don't have alot of information at this time but some nsieve results show
'performance gains' very close to what we see from the Dhrystone. I can't
'prove' that the nsieve and dhrystone provide comparable performance gains
in general as I lack a sufficient amount of data, but some of the results
indicate that they do.

For example, the Amdahl 5890 Dhyrstone/sec is I believe about 30,000. Well,
on the Amiga with 68020 at 14.32 Mhz and Manx Aztec C (with no 'optimizer')
the best Dhrystones/sec figure I've seen is 3034.  So the Amdahl 5890 has a
performance gain of 30000/3034 = 9.9 (roughly) over the Amiga/020. 

Thanks to Chuck Simmons of Amdahl, I also have the nsieve results.  The
Amdahl 5890 ran nsieve (for the 8191 sized array) in 0.05 seconds.  An
average of 5 runs on the Amiga/020 yielded a run time of 0.48+/-0.02 seconds.
These results indicate the Amdahl 5890 has a performance gain of 
0.48 / 0.05 = 9.6 over the Amiga/020.  That is (in this one isolated case)
both the Dhrystone and the nsieve are yielding pretty much the same 
information within about a 4% error.  I would like to investigate these
relationships in more detail, but (right now) there is not sufficient data.

Anyway, thanks again for the nsieve analysis.   I'm not a computer systems
person really --- just a physicist working as an engineer who has somehow
'fallen into' the benchmarking arena.                  

Thanks to Bill Davidson too!  For the Unix timer information.

B A Wichmann of the National Physical Laboratory in England has also sent
me a recent (Mar 88) version of the Whetstone in ISO Pascal --- I posted
it to BIX 'supermicros/listings' as 'whet.arc'.  The main feature of the
new Whetstone is that it provides error checking routines.

Al Aburto
aburto@marlin.nosc.mil.UUCP

chuck@amdahl.uts.amdahl.com (Charles Simmons) (09/02/88)

In article <1076@marlin.NOSC.MIL> aburto@marlin.nosc.mil.UUCP (Alfred A. Aburto) writes:
>For example, the Amdahl 5890 Dhyrstone/sec is I believe about 30,000. Well,
>on the Amiga with 68020 at 14.32 Mhz and Manx Aztec C (with no 'optimizer')
>the best Dhrystones/sec figure I've seen is 3034.  So the Amdahl 5890 has a
>performance gain of 30000/3034 = 9.9 (roughly) over the Amiga/020. 

45,000 would be a better approxamation for the dhrystone/sec
figure on a 5890.  45000/3034 = 14.8 (roughly).  [30,000 is in the
ballpark of a 5880.]

>Thanks to Chuck Simmons of Amdahl, I also have the nsieve results.  The
>Amdahl 5890 ran nsieve (for the 8191 sized array) in 0.05 seconds.  An
>average of 5 runs on the Amiga/020 yielded a run time of 0.48+/-0.02 seconds.
>These results indicate the Amdahl 5890 has a performance gain of 
>0.48 / 0.05 = 9.6 over the Amiga/020.  That is (in this one isolated case)
>both the Dhrystone and the nsieve are yielding pretty much the same 
>information within about a 4% error.  I would like to investigate these
>relationships in more detail, but (right now) there is not sufficient data.

For all you GNU fans out there, the current version of GCC
produces a binary that runs in 0.033 seconds for the 8191
sized array.

>Al Aburto
>aburto@marlin.nosc.mil.UUCP

-- Chuck