aburto@marlin.NOSC.MIL (Alfred A. Aburto) (08/24/88)
/****************** Start NSIEVE C Source Code *************************/ /****************************************************************/ /* NSIEVE */ /* C Program Source */ /* Sieve benchmark for variable sized arrays */ /* Al Aburto */ /* 22 Aug 88 */ /* aburto@marlin.nosc.mil.UUCP */ /* 'ala' on BIX */ /* */ /* The program uses the VAX and SUN UNIX C 'times()' routine to */ /* determine the elapsed run time in seconds. The compiler */ /* option 'UNIX' must be defined for proper operation on VAX or */ /* SUN UNIX systems. Also 'HZ' must be defined. HZ will be 60 */ /* for most systems and this is the default set in the program. */ /* Please check the system 'times()' definition so that HZ is */ /* properly defined for the right measure of time. HZ may need */ /* to be set to 50 or 100 on some systems. */ /* */ /* A timing routine for the Amiga is also included. The compiler*/ /* option 'Amiga' must be defined ('-DAmiga') for proper timing */ /* on Amiga systems. HZ is defined as 50 with the routines */ /* used on this system. */ /* */ /* A timing routine for the IBM PC/PC-AT type systems with */ /* TURBO C is also included. The Turbo C compiler option */ /* '-DTURBO_C' must be defined for this timing routine to work. */ /* HZ is defined as 100 for this system. */ /* */ /* The program uses a maximum array size of 320,000 bytes. You */ /* can increase or lower this value by changing the 'limit' */ /* define from 6 to a higher or lower value. Some systems may */ /* be unable to work beyond 'limit = 4' which corresponds to an */ /* array size of 40,000 bytes. The '#define limit 6' in the */ /* program below corresponds to a max SIEVE array size of */ /* 320,000 bytes. */ /* The maximum array size is given by: */ /* size = 5000 * ( 2 ** limit ). */ /* */ /* The program uses register variables. Some compilers may not */ /* accept this option. If this is the case then 'register' */ /* will need to be removed from the SIEVE() subroutine. */ /* */ /* In re-running the program I am getting about a 5% variation */ /* in the 'Average BenchTime' output. */ /* */ /* Program outputs to check for proper operation: */ /* Array Size Primes Found Last Prime */ /* (Bytes) */ /* 8191 1899 16381 */ /* 10000 2261 19997 */ /* 20000 4202 39989 */ /* 40000 7836 79999 */ /* 80000 14683 160001 */ /* 160000 27607 319993 */ /* 320000 52073 639997 */ /****************************************************************/ /****************************************************/ /* Example Compilation: */ /* (1) VAX and SUN UNIX C: */ /* cc -O -DUNIX nsieve.c -o nsieve */ /* cc -DUNIX nsieve.c -o nsieve */ /* */ /* (2) Turbo-Amiga (68020): */ /* cc +2 +L +ff -DAmiga nsieve.c */ /* ln nsieve.o -lm32 -lc32 */ /* */ /* (3) Amiga: */ /* cc +L +ff -DAmiga nsieve.c */ /* ln nsieve.o -lm32 -lc32 */ /* */ /* (4) IBM PC/PC-AT types with TURBO C */ /* Compile with '-DTURBO_C' or uncomment the */ /* the 'TURBO_C' define in the main program. */ /* Use 'huge' data option. Compile for 'speed'. */ /****************************************************/ /************************************************************************/ /* Some Results: */ /* Array Size ------------------ BenchTime (sec) -------------------- */ /* (Bytes) SUN 3/280 VAX 8600 Turbo-Amiga */ /* 68020 @25 MHz 68020 @ 14.32 MHz */ /* ( cc -O ) ( cc -O ) ( cc +2 +L +ff ) */ /* 8191 0.267 0.250 0.461 ( 0.264 ) */ /* 10000 0.300 0.283 0.578 ( 0.331 ) */ /* 20000 0.650 0.783 1.195 ( 0.684 ) */ /* 40000 1.333 1.767 2.383 ( 1.365 ) */ /* 80000 2.917 3.750 4.820 ( 2.761 ) */ /* 160000 7.833 8.033 9.758 ( 5.589 ) */ /* 320000 17.600 17.717 ------ | */ /* | */ /* Scaled to 25 MHz --------------------+ */ /* */ /* Average Run Time relative to 10 Iterations and 8191 array size: */ /* 0.315 0.345 0.484 */ /* */ /* The VAX 8600 results are different from those that appeared in the */ /* Jun Byte Article because I thought I could not define 'register' */ /* variables at all, but I was in error. 'register long' variables */ /* seem to work fine on our VAX 8600 UNIX 4.3 C compiler. It is the */ /* register 'short' types that (apparently) cause problems. */ /* */ /************************************************************************/ /**********************************************/ /* You may just want to uncomment one of */ /* these for your system. */ /**********************************************/ /* #define Amiga */ /* #define UNIX */ /* #define TURBO_C */ #ifdef UNIX #include <sys/types.h> #include <sys/times.h> #endif #ifdef Amiga #include <exec/types.h> #include <ctype.h> #include <stdio.h> #include <math.h> #endif #ifdef TURBO_C #include <ctype.h> #include <dos.h> #include <stdio.h> #include <time.h> #endif #define false 0 #define true 1 #define limit 6 #ifdef UNIX #define HZ 60 struct tms tms; #endif #ifdef Amiga #define HZ 50 #endif #ifdef TURBO_C #define HZ 100 struct time now; #endif float TimeArray[3]; float nulltime,benchtime; main() { long i,j; float sumtime; printf("\n Sieve of Eratosthenes (scaled to 10 Iterations)\n\n"); printf(" Array Size Primes Last Prime BenchTime\n"); printf(" (Bytes) Found (Sec)\n"); dtime(TimeArray); dtime(TimeArray); nulltime = TimeArray[1]; sumtime = 0.0; j = 8191; SIEVE(j); sumtime = sumtime + benchtime; j = 5000; for( i=1 ; i<= limit ; i++) { j = 2 * j; SIEVE(j); sumtime = sumtime + benchtime * ( 8191.0 / (float)j ); } sumtime = sumtime / (float)(limit + 1); printf("\n Relative to 10 Iterations and the 8191 Array Size:\n"); printf(" Average BenchTime = %8.3f (sec)\n\n",sumtime); } /**************************************/ /* Sieve of Erathosthenes Benchmark */ /* Version By William L. Hembree */ /* Las Cruces, New Mexico */ /**************************************/ SIEVE(n) long n; { register char *flags; register long i,prime,k; register long count,size; long j,m,iter; char *ptr = 0L; char *malloc(); size = n - 1; ptr = malloc(n); flags = ptr; dtime(TimeArray); dtime(TimeArray); nulltime = TimeArray[1]; j = 0; dtime(TimeArray); for(iter=1 ; iter<=10 ; iter++) { count = 0; for(i=0 ; i<=size ; i++) { *(flags+i) = true; } for(i=0 ; i<=size ; i++) { if(*(flags+i)) { prime = i + i + 3; for(k = i + prime ; k<=size ; k+=prime) { *(flags+k)=false; } count++; } } j = j + count; } dtime(TimeArray); benchtime = TimeArray[1] - nulltime; j = j / 10; m = prime; printf(" %8ld %8ld %8ld %9.3f\n",n,j,m,benchtime); } #ifdef UNIX /***********************************************/ /* dtime function for the VAX UNIX C Compiler */ /* dtime() returns elapsed time in seconds. */ /***********************************************/ dtime(p) float p[]; { float q; q = p[2]; times(&tms); p[2] = (double)(tms.tms_utime) / (float)HZ; p[1] = p[2] - q; } #endif #ifdef Amiga /**********************************/ /* dtime function for the Amiga */ /**********************************/ dtime(p) float p[]; { float q; struct tt { long days; long minutes; long ticks; } tt; q = p[2]; DateStamp(&tt); p[2] = ((float)(tt.ticks+(tt.minutes*60L*(long)HZ))) / (float)HZ; p[1] = p[2] - q; } #endif #ifdef TURBO_C /*************************************************/ /* dtime for IBM PC/PC-AT systems with TURBO C */ /*************************************************/ dtime(p) float p[]; { float q,v; q = p[2]; gettime(&now); v = 60.0*(float)(now.ti_min); v = v + (float)(now.ti_sec); v = v + (float)(now.ti_hund) / (float)HZ; p[2] = v; p[1] = v - q; } #endif /************************ End Of NSIEVE C Source Code **********************/
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (08/25/88)
Although I agree with the aims of NSIEVE to overcome weaknesses of the original, there are some problems in this version as well, which relate to the accuracy of the timing. I posted some commentary on time measurement last year, perhaps it's time to repeat. The clock timing value, HZ, is not set at 60 by some default, that's just the value on a VAX. In a number of system, such as SunOS, Xenix, UNICOS, and all SysV systems, the value is found in <sys/param.h>. The suggested way to get this is to use code like: #ifndef HZ #include <sys/param.h> #endif and to force the user to define it on the cc command line if it's not in the include file. Since popular values are 20, 50, 100, 4096, and 243902439 (the last on a Cray2), the assumption of of 60 will yield incorrect results. In UNIX versions the CPU time should be used, rather than real time, since the real time may vary wildly. The realtime will excede the CPU by some factor based on the load average on uniprocessor machines, and will be smaller than the CPU by about N on a good multiprocessor machine. On MS-DOS, CP/M and other small control programs the assumption of CPU=clock is usually valid. I hope that these comments will allow the author to improve this very useful program, such that it will produce consistent results. I have the UNIX routines for realtime and CPU in a portable manner if that would be of use. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
mash@mips.COM (John Mashey) (08/25/88)
In article <1073@marlin.NOSC.MIL> aburto@marlin.nosc.mil.UUCP (Alfred A. Aburto) writes: >/****************** Start NSIEVE C Source Code *************************/ It is a good thing to publish benchmark sources so people can check themr, and Alfred should commended for doing so. Note, however, that this is not the most meaningful benchmark. See gory details below, including architectural analysis. >/************************************************************************/ >/* Some Results: */ >/* Array Size ------------------ BenchTime (sec) -------------------- */ >/* (Bytes) SUN 3/280 VAX 8600 Turbo-Amiga */ >/* 68020 @25 MHz 68020 @ 14.32 MHz */ >/* ( cc -O ) ( cc -O ) ( cc +2 +L +ff ) */ >/* 8191 0.267 0.250 0.461 ( 0.264 ) */ >/* 10000 0.300 0.283 0.578 ( 0.331 ) */ >/* 20000 0.650 0.783 1.195 ( 0.684 ) */ >/* 40000 1.333 1.767 2.383 ( 1.365 ) */ >/* 80000 2.917 3.750 4.820 ( 2.761 ) */ >/* 160000 7.833 8.033 9.758 ( 5.589 ) */ >/* 320000 17.600 17.717 ------ | */ >/* | */ >/* Scaled to 25 MHz --------------------+ */ >/* */ >/* Average Run Time relative to 10 Iterations and 8191 array size: */ >/* 0.315 0.345 0.484 */ A. Output from MIPS R2000 (in M/120: 16.7MHz, 128K cache, low-latency memory system) Sieve of Eratosthenes (scaled to 10 Iterations) Array Size Primes Last Prime BenchTime (Bytes) Found (Sec) 8191 1899 16381 0.130 10000 2261 19997 0.150 20000 4202 39989 0.320 40000 7836 79999 0.630 80000 14683 160001 1.270 160000 27607 319993 2.580 320000 52073 639997 5.570 Relative to 10 Iterations and the 8191 Array Size: Average BenchTime = 0.131 (sec) Now, some more data on this, or WHY YOU MUST BE CAREFUL WITH SMALL BENCHMARKS! B. First, architectural data: One can see (look for the (#) codes in the data below. (1) The instructions fit in a 32-word I-cache. (2) Essentially 100% of loads and stores are bytes. A more typical number might be 10%. (3) 75% of the loads+stores are stores. A more typical number is 30-35%. (4) 100% of the loads had NO useful work schedulable in the load-delay slot, i.e., in this case, a load is actually costing 2 cycles (+ any cache miss time). A more typical number if 30% (and this is very germane to using this benchmark on RISC machines with pipelining). (5) ~100% of the loads/stores have zero-offsets; a more typical number is 10-15%. [relevant to 29K, for example] pixstats nsieve: 159336572 (1.000) instructions 6392573 (0.040) loads 20105454 (0.126) stores 26498027 (0.166) loads+stores (2) 26498114 (0.166) data bus use 26484254 (0.166) partial word references (2) 33974425 (0.213) branches 0.999 load nops per load (4) 0.759 stores per memory reference (3) 0.999 partial word references per reference (2) Instruction concentration: 1 8.6% 2 17.2% 4 34.4% 8 55.1% 16 87.1% 32 100.0% (1) Load/store displacement 16.6% (5) (the size says how many bits used) size 0-extend 1-extend sign-extend 0 26482960 100.0% 100.0% 29 0.0% 0.0% 1 36 0.0% 100.0% 0 0.0% 0.0% 100.0% 2 0 0.0% 100.0% 29 0.0% 0.0% 100.0% 3 1912 0.0% 100.0% 8 0.0% 0.0% 100.0% 4 1654 0.0% 100.0% 0 0.0% 0.0% 100.0% 9 2 0.0% 100.0% 0 0.0% 0.0% 100.0% 10 18 0.0% 100.0% 0 0.0% 0.0% 100.0% 11 92 0.0% 100.0% 0 0.0% 0.0% 100.0% 12 602 0.0% 100.0% 0 0.0% 0.0% 100.0% 13 60 0.0% 100.0% 0 0.0% 0.0% 100.0% C. Now, what the program was doing: Profile listing generated Wed Aug 24 18:11:22 1988 with: ---------------------------------------------------------------------------- * -p[rocedures] using basic-block counts; * * sorted in descending order by the number of cycles executed in each * * procedure; unexecuted procedures are excluded * ---------------------------------------------------------------------------- 159336572 cycles cycles %cycles cum % cycles bytes procedure (file) /call /line 159286753 99.97 99.97 22755251 14 SIEVE (nsieve.c) 24351 0.02 99.98 43 29 _flsbuf (flsbuf.c) 16695 0.01 99.99 1392 18 _doprnt (doprnt.c) ...rest are in the noise, obviously..... ---------------------------------------------------------------------------- * -h[eavy] using basic-block counts; * * sorted in descending order by the number of cycles executed in each * * line; unexecuted lines are excluded * ---------------------------------------------------------------------------- (source code with line numbering is shown following) procedure (file) line bytes cycles % cum % SIEVE (nsieve.c) 241 12 41148840 25.83 25.83 SIEVE (nsieve.c) 240 8 27432560 17.22 43.04 SIEVE (nsieve.c) 231 20 25527710 16.02 59.06 SIEVE (nsieve.c) 235 16 25527640 16.02 75.08 SIEVE (nsieve.c) 244 16 25527640 16.02 91.11 SIEVE (nsieve.c) 230 4 6381910 4.01 95.11 SIEVE (nsieve.c) 238 16 4422440 2.78 97.89 SIEVE (nsieve.c) 237 8 2211220 1.39 99.27 SIEVE (nsieve.c) 242 4 1105610 0.69 99.97 228 for(i=0 ; i<=size ; i++) 229 { 230 *(flags+i) = true; 231 } 233 for(i=0 ; i<=size ; i++) 234 { 235 if(*(flags+i)) 236 { 237 prime = i + i + 3; 238 for(k = i + prime ; k<=size ; k+=prime) 239 { 240 *(flags+k)=false; 241 } 242 count++; 243 } 244 } D. Summary The statistics of this benchmark bear little resemblance to most substantial programs. This doesn't mean it's a bad benchmark, just that extreme care must be taken with it or any other small benchmark. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
aburto@marlin.NOSC.MIL (Alfred A. Aburto) (08/26/88)
In article <11961@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes: > > I hope that these comments will allow the author to improve this very >useful program, such that it will produce consistent results. I have the >UNIX routines for realtime and CPU in a portable manner if that would be >of use. > >-- > bill davidsen (wedu@ge-crd.arpa) > {uunet | philabs | seismo}!steinmetz!crdos1!davidsen Yes, I would very much appreciate the Unix realtime and CPU routines. I knew that the measure of time was not uniform across the various Unix system but I didn't know how to handle it properly so I just went with what I knew. Thanks Al Aburto aburto@marlin.nosc.mil.UUCP
aburto@marlin.NOSC.MIL (Alfred A. Aburto) (09/02/88)
-------------- There is of course no denying the the fact that 'nsieve' (really sieve) and so many other so-called 'benchmark' programs 'bear little resemblance to most substantial' programs. The Sieve (Jim Gilbreath, 'A High-Level Language Benchmark', BYTE, Sep 1981, page 180) is not really a 'benchmark' program. Gilbreath indicated this in his article, but none the less he continued to call the program a and coded in many languages for the purpose of comparing relative performance. It is certainly not a 'typical' program in terms of what it does (generate some prime numbers), and it is not typical with respect to the types of instructions used. I think alot of confusion happens because (and I'm guilty of this) we keep calling the Sieve a benchmark when it really isn't anything more than a simple program to generate some prime numbers. There is even more confusion because there is apparently no clear cut set of definitions or specifications or requirements for a computer benchmark program. What is a benchmark program? What kinds of things should it do? What kind of outputs should it provide? What minimum requirements should be met? None of these things are defined and as a result we find ourselves with alot of programs that claim to be benchmarks of some sort. The Whetstone and Dhrystone are good but still I don't think of them as 'benchmarks' or standards --- mostly because they don't provide much information relative to the performance of the system under test and because they can be optimized to various degrees. 'nsieve' is I think relativey immune (I'm not positive of this) to high degrees of optimization, but it provides no weighting for the frequency of instructions used and it has other biases as your analysis shows. Still, in my opinion, these programs when run on many systems using a variety of different languages do provide valuable information with respect to the capability of these systems (although somewhat limited information). I don't have alot of information at this time but some nsieve results show 'performance gains' very close to what we see from the Dhrystone. I can't 'prove' that the nsieve and dhrystone provide comparable performance gains in general as I lack a sufficient amount of data, but some of the results indicate that they do. For example, the Amdahl 5890 Dhyrstone/sec is I believe about 30,000. Well, on the Amiga with 68020 at 14.32 Mhz and Manx Aztec C (with no 'optimizer') the best Dhrystones/sec figure I've seen is 3034. So the Amdahl 5890 has a performance gain of 30000/3034 = 9.9 (roughly) over the Amiga/020. Thanks to Chuck Simmons of Amdahl, I also have the nsieve results. The Amdahl 5890 ran nsieve (for the 8191 sized array) in 0.05 seconds. An average of 5 runs on the Amiga/020 yielded a run time of 0.48+/-0.02 seconds. These results indicate the Amdahl 5890 has a performance gain of 0.48 / 0.05 = 9.6 over the Amiga/020. That is (in this one isolated case) both the Dhrystone and the nsieve are yielding pretty much the same information within about a 4% error. I would like to investigate these relationships in more detail, but (right now) there is not sufficient data. Anyway, thanks again for the nsieve analysis. I'm not a computer systems person really --- just a physicist working as an engineer who has somehow 'fallen into' the benchmarking arena. Thanks to Bill Davidson too! For the Unix timer information. B A Wichmann of the National Physical Laboratory in England has also sent me a recent (Mar 88) version of the Whetstone in ISO Pascal --- I posted it to BIX 'supermicros/listings' as 'whet.arc'. The main feature of the new Whetstone is that it provides error checking routines. Al Aburto aburto@marlin.nosc.mil.UUCP
chuck@amdahl.uts.amdahl.com (Charles Simmons) (09/02/88)
In article <1076@marlin.NOSC.MIL> aburto@marlin.nosc.mil.UUCP (Alfred A. Aburto) writes: >For example, the Amdahl 5890 Dhyrstone/sec is I believe about 30,000. Well, >on the Amiga with 68020 at 14.32 Mhz and Manx Aztec C (with no 'optimizer') >the best Dhrystones/sec figure I've seen is 3034. So the Amdahl 5890 has a >performance gain of 30000/3034 = 9.9 (roughly) over the Amiga/020. 45,000 would be a better approxamation for the dhrystone/sec figure on a 5890. 45000/3034 = 14.8 (roughly). [30,000 is in the ballpark of a 5880.] >Thanks to Chuck Simmons of Amdahl, I also have the nsieve results. The >Amdahl 5890 ran nsieve (for the 8191 sized array) in 0.05 seconds. An >average of 5 runs on the Amiga/020 yielded a run time of 0.48+/-0.02 seconds. >These results indicate the Amdahl 5890 has a performance gain of >0.48 / 0.05 = 9.6 over the Amiga/020. That is (in this one isolated case) >both the Dhrystone and the nsieve are yielding pretty much the same >information within about a 4% error. I would like to investigate these >relationships in more detail, but (right now) there is not sufficient data. For all you GNU fans out there, the current version of GCC produces a binary that runs in 0.033 seconds for the 8191 sized array. >Al Aburto >aburto@marlin.nosc.mil.UUCP -- Chuck