COBB@BRANDEIS.BITNET (04/23/87)
Date: Wed, 22 Apr 87 23:06 EDT From: <COBB@BRANDEIS.BITNET> (wes cobb [ cobb@brandeis.bitnet ]) Subject: benchmark battles: part 2. To: info-atari16@score.stanford.edu X-Original-To: atari16, COBB dear benchfolk, someone ( sorry -- i`ve lost the reference ) made a remark which in essence claimed that one should use BASIC if one wished to do fp work on the ST!!!!! well i strongly disagree...lets digress for a moment and look at the code he posted: supposedly the basic program the writer had tested used... x = tan(atan(exp(log(sqr(x*x)))))+1. ...as what he was looping over. now in every basic *I* have ever seen the arc tangent function is called ATN _not_ atan.... i ran the code he listed under atari basic - there were no error messages - and the program literally blew fortran and c away. of course since atan was undefined the program was really just calculating tan(0.) 2500 times so it SHOULD have been pretty fast! when i substituted atn for the arc tangent function i got about 26 seconds for the single precision execution time. not bad - but a good 50% slower than fortran. also in ATARI basic there is ( as far as i can tell ) no way at all to even _PERFORM_ a double precision savage benchmark - i.e there dont seem to BE double precision forms of the math library functions. obviously doing the loops in single precision and then typecasting the results into double precision is NOT legal -- so where on earth are these double precision atari basic results coming from? if one wants to use benchmark results to choose what language to work in one must be careful to choose reasonable benchmark programs and once one has to compare apples with apples and oranges with oranges -- it is silly to compare 48 bit floating point number benchmark results with 64 bit benchmark results etc.... at any rate its NUTS to base ones choice of programming language on ( of all tests ) Savage because: THE SAVAGE BENCHMARK IS *N*O*T* A GOOD TEST OF FP PERFORMANCE. the savage tests ONLY trig library functions. period. it isnt even a very good test of THEM. period. since the overwhelming preponderance of all fp calls, operations, and usage do NOT involve trig library calls it just doesnt make sense to base anything important on it. if you MUST base everything on a single test then at least make it the whetstone benchmark since that at least is a reasonable model of a real applications program. ( atari basic by the way is about 100x slower than fortran on the whetstone ) i`ve included another benchmark program which you may want to try...it tests the speed of floating *,-,/,+ in c,fortran,ratfor,and basic. even THIS is surely a more realistic test of floating point behavior than the savage benchmark is: fp *,-,/,+ occur overwhelmingly more often than trig lib calls in real applications code. wes ############################################################################ THE FLOPS BENCHMARK results as of 22 april 1987 machine language os rating cpu-time mant size % error notes ---------- ---------- --- --------- -------- ---- ---- ------- ------- atari/st absoft f77 tos 11,453 153.70 24 32 8.4e-2 1,2 atari/st absoft f77 tos 6,223 282.80 53 32 1.6e-10 1,2 atari/st megamax c tos 3,659 480.95 24 32 2.7e-3 1,3 atari/st megamax c tos 1,352 1301.34 53 64 3.4e-10 1,4 atari/st lattice c tos 1,227 1433.99 54 64 3.5e-12 1,5 atari/st basic tos 607 2899.19 24 (32) 1.3e-3 1,6 ---------- ---------- --- --------- -------- ---- ---- ------- ------- 1. 68000 at 8 MHz. 1 meg ram. no fpa. no desk accs. med rez. 2. absoft fortran v2.2 ( dynamically linked ) 3. megamax c v1.00 ( fmath.o ) 4. megamax c v1.00 ( double.o ) 5. lattice c v3.02 ( doesnt support single precision ) 6. atari basic ( i ASSUME 32 bits here... manual doesnt say ) the flops benchmark - as its name suggests - purports to estimate how many floating operations per second a machine/compiler combination can perform. this program is primarily intended to be run on microcomputers so no effort has been made to allow for machines more than about 10x faster than a typical supermini. in the context of this program a floating operation is merely one of the 4 basic operations { *,/,-,+ } weighted so as to reflect what would seem to be their frequency of appearance in scientific applications code: 35% for *, 26% for -, 22% for +, and 17% for /. these numbers were derived from a study of IMSL source code. the program does use several nested loops and typecasts but -- never fear -- the overhead these require is subtracted out of the total since we are only really interested here in flops. note that the overhead is typically about 50% of the flops cputime total. the program makes calls to system dependent timing routines. for the atari st ( well at least for a monotasking atari st running TOS ) these routines are provided in the code. if you are running this on another system you will have to improvise. please send results to: electronic mail: cobb@brandeis.bitnet ci$ [ 72155, 1422 ] snail mail: wes cobb dept. of physics brandeis university waltham, mass 02254 usa ############################################################################ /* * c version of flops * tested with * lattice && megamax */ #include <stdio.h> #include <osbind.h> #define OUTERMOST 20 #define INNERMOST 20 #define INNER 20 #define OUTER 20 #define ANSWER 2.061914565513972e7 main() { long sbits,dbits,smant(),dmant(); title(); sbits = smant(); dbits = dmant(); if( sbits < 23 || sbits > 25 ){ printf("\n this single precision is NOT ieee standard..."); } if( dbits < 52 || dbits > 54 ){ printf("\n this double precision is NOT ieee standard..."); } if( sbits == dbits ){ printf("\n ...there is evidently no difference between `float`"); printf("\n and `double` in this c implementation."); printf("\n ( only going to run d-flops ... )"); dflops(dbits); exit(); } sflops(sbits); dflops(dbits); printf("\n "); } title(){ printf("\n E"); printf("\n flops benchmark v1.0"); printf("\n 22 april 1987"); printf("\n "); } long smant() { float s; long i; s = 1.; i = 0; while( (s+1.) != 1. ){ i++; s = s / 2.; } return(i); } long dmant() { long i; double d; d = 1.; i = 0; while( (d+1.) != 1. ){ i++; d = d / 2.; } return(i); } sflops(sbits) long sbits; { float a,b,c,d,e,f,g,h,p,q,r,s,t,u,v,w,x,y,z,rating,perror,error; double dt,secnds(),dtover; long i,j,k,l,m; dtover = secnds(0.); for( i = 1; i <= OUTERMOST ;i++){ q = (float)(i); for( j = 1; j<= OUTER ;j++){ r = (float)(j); for( k = 1;k <= INNER ;k++){ s = (float)(k); for( m = 1; m <= INNERMOST ;m++){ t = (float)(m); } } } } dtover = secnds(dtover); p = 1.; dt = secnds(0.); for( i = 1; i <= OUTERMOST ;i++){ q = (float)(i); for( j = 1;j <= OUTER ;j++){ r = (float)(j); for( k = 1;k <= INNER ;k++){ s = (float)(k); for( m = 1;m <= INNERMOST ;m++){ t = (float)(m); v = 1./t; w = s * v; x = r + w; y = q - x; a = y * v; b = a * t; c = b - w; d = c / q; f = d - r; g = f * s; p = p + g; } } } } dt = secnds(dt) - dtover; error = abs(p) - ANSWER; perror = abs(100. * error / ANSWER ); rating = 11. * (float)(OUTERMOST*OUTER*INNER*INNERMOST) / dt; printf("\n total execution time = %7.2f ",dt+2.*dtover); printf("\n overhead = %7.2f ",2.*dtover); printf("\n <*> s-flops rating = %ld ",(long)(rating+.5)); printf("\n <*> s-flops cpu time = %7.2f ",dt); printf("\n <*> float mantissa = %ld ",sbits); printf("\n <*> percent error = %12.5e ",perror); } dflops(dbits) long dbits; { double a,b,c,d,e,f,g,h,p,q,r,s,t,u,v,w,x,y,z,rating,perror; double error; double dt,secnds(),dtover; long i,j,k,l,m; dtover = secnds(0.); for( i = 1; i <= OUTERMOST ;i++){ q = (double)(i); for( j = 1;j <= OUTER ;j++){ r = (double)(j); for( k = 1;k <= INNER ;k++){ s = (double)(k); for( m = 1;m <= INNERMOST;m++ ){ t = (double)(m); } } } } dtover = secnds(dtover); p = 1.; dt = secnds(0.); for( i = 1; i <= OUTERMOST; i++ ){ q = (double)(i); for( j = 1;j <= OUTER; j++){ r = (double)(j); for( k = 1;k <= INNER;k++ ){ s = (double)(k); for( m = 1; m<= INNERMOST;m++ ){ t = (double)(m); v = 1./t; w = s * v; x = r + w; y = q - x; a = y * v; b = a * t; c = b - w; d = c / q; f = d - r; g = f * s; p = p + g; } } } } dt = secnds(dt)-dtover; error = abs(p) - ANSWER; perror = abs(100. * error / ANSWER ); rating = 11. * (double)(OUTERMOST*OUTER*INNER*INNERMOST) / (double)(dt); printf("\n total execution time = %7.2f ",dt+2.*dtover); printf("\n overhead = %7.2f ",2.*dtover); printf("\n <*> s-flops rating = %ld ",(long)(rating+.5)); printf("\n <*> s-flops cpu time = %7.2f ",dt); printf("\n <*> double mantissa = %ld ",dbits); printf("\n <*> percent error = %12.5e ",perror); } double secnds(offset) double offset; { long peek(); double temp; temp = .005 * (double)xbios( 38, &peek ) - offset ; return(temp); } long peek() { long temp2; temp2 = *(long *)0x4ba; return(temp2); } ############################################################################ /* * absoft fortran 77 * version of the flops benchmark * this is what the ratfor source ends up as... */ program flops implicit none integer*4 sbits,dbits call title call smant(sbits) call dmant(dbits) if( sbits .LT. 23 .OR. sbits .GT. 25 )then write(9,*)' note: this single precision is NOT ieee standard' endif if( dbits .LT. 52 .OR. dbits .GT. 54 )then write(9,*)' note: this double precision is NOT ieee standard' endif if( sbits .EQ. dbits )then write(9,*)' evidently single and double are the same in ...' write(9,*)' ...this language. we will only run d-flops' write(9,*)' ' write(9,*)' *** now running the d-flops test ***' call dflops(dbits) else write(9,*)' ' write(9,*)' *** now running the s-flops test ***' call sflops(sbits) write(9,*)' ' write(9,*)' *** now running the d-flops test ***' call dflops(dbits) endif write(9,*)' ' end subroutine title write(9,*)' E' write(9,*)' flops benchmark v1.0' write(9,*)' 22 april 1987' write(9,*)' ' return end subroutine smant(i) implicit none integer*4 i real*4 s s = 1. i = 0 while( (s+1.) .NE. 1. ) i = i + 1 s = s / 2. repeat return end subroutine dmant(i) implicit none integer*4 i real*8 d d = 1. i = 0 while( (d+1.) .NE. 1. ) i = i + 1 d = d / 2. repeat return end subroutine sflops(sbits) implicit none integer*4 sbits real*4 a,b,c,d,e,f,g,h,p,q,r,s,t,u,v,w,x,y,z,rating,perror real*4 dt,secnds,dtover,error integer*4 i,j,k,l,m dtover = secnds(0.) do( i = 1, 20 ) q = float(i) do( j = 1, 20 ) r = float(j) do( k = 1, 20 ) s = float(k) do( m = 1, 20 ) t = float(m) enddo enddo enddo enddo dtover = secnds(dtover) p = 1.0 dt = secnds(0.) do( i = 1, 20 ) q = float(i) do( j = 1, 20 ) r = float(j) do( k = 1, 20 ) s = float(k) do( m = 1, 20 ) t = float(m) v = 1./t w = s * v x = r + w y = q - x a = y * v b = a * t c = b - w d = c / q f = d - r g = f * s p = p + g enddo enddo enddo enddo dt = secnds(dt) - dtover error = abs(p) - 2.061914565513972d7 perror = abs(100. * error / 2.061914565513972d7 ) rating = 11. * float(20*20*20*20) / dt write(9,*)' total execution time = ',dt+2.*dtover write(9,*)' overhead = ',2.*dtover write(9,*)' <*> s-flops rating = ',nint(rating) write(9,*)' <*> s-flops cpu time = ',dt write(9,*)' <*> real*4 mantissa = ',sbits write(9,*)' <*> real*4 length = ',32 write(9,*)' <*> percent error = ',perror return end subroutine dflops(dbits) implicit none integer*4 dbits real*8 a,b,c,d,e,f,g,h,p,q,r,s,t,u,v,w,x,y,z,rating,perror real*8 error real*4 dt,secnds,dtover integer*4 i,j,k,l,m dtover = secnds(0.) do( i = 1, 20 ) q = dble(i) do( j = 1, 20 ) r = dble(j) do( k = 1, 20 ) s = dble(k) do( m = 1, 20 ) t = dble(m) enddo enddo enddo enddo dtover = secnds(dtover) p = 1.d0 dt = secnds(0.) do( i = 1, 20 ) q = dble(i) do( j = 1, 20 ) r = dble(j) do( k = 1, 20 ) s = dble(k) do( m = 1, 20 ) t = dble(m) v = 1./t w = s * v x = r + w y = q - x a = y * v b = a * t c = b - w d = c / q f = d - r g = f * s p = p + g enddo enddo enddo enddo dt = secnds(dt)-dtover error = abs(p) - 2.061914565513972d7 perror = abs(100. * error / 2.061914565513972d7 ) rating = 11. * dble(20*20*20*20)/dble(dt) write(9,*)' total execution time = ',dt+2.*dtover write(9,*)' overhead = ',2.*dtover write(9,*)' <*> s-flops rating = ',nint(rating) write(9,*)' <*> s-flops cpu time = ',dt write(9,*)' <*> real*8 mantissa = ',dbits write(9,*)' <*> real*8 length = ',64 write(9,*)' <*> percent error = ',perror return end real*4 function secnds(offset) integer*4 Super parameter (Super = z'00000902') real*4 offset integer*4 atari,dummy,stack,systimer,oldstack real*4 mspt parameter ( mspt = 5.0e-3 ) oldstack = atari( Super, 0 ) systimer = long(z'4BA') dummy = atari( Super, oldstack ) secnds = -offset + mspt * float(systimer) return end ############################################################################ /* * ratfor version of flops benchmark. */ #include <lib\fortran.h> /* this just defines stdout,stdin and the very */ /* few quirky things which vary between vaxf77 */ /* and absoft... */ #define OUTERMOST 20 #define INNERMOST 20 #define INNER 20 #define OUTER 20 #define ANSWER 2.061914565513972d7 /* the right answer to 16 figs */ program flops implicit none integer*4 sbits,dbits call title call smant(sbits) call dmant(dbits) if( sbits < 23 || sbits > 25 )then write(stdout,*)' note: this single precision is NOT ieee standard' endif if( dbits < 52 || dbits > 54 )then write(stdout,*)' note: this double precision is NOT ieee standard' endif if( sbits == dbits )then write(stdout,*)' evidently single and double are the same in ...' write(stdout,*)' ...this language. we will only run d-flops' write(stdout,*)' ' write(stdout,*)' *** now running the d-flops test ***' call dflops(dbits) else write(stdout,*)' ' write(stdout,*)' *** now running the s-flops test ***' call sflops(sbits) write(stdout,*)' ' write(stdout,*)' *** now running the d-flops test ***' call dflops(dbits) endif write(stdout,*)' '; end /* * this just clears the screen and then draws the title message. */ subroutine title write(stdout,*)' E' write(stdout,*)' flops benchmark v1.0' write(stdout,*)' 22 april 1987' write(stdout,*)' ' return end /* * this measures how many significant bits are carried along in the * mantissa of single precision floating point numbers...does this by * repeated division by 2 until stasis is reached. */ subroutine smant(i) implicit none; integer*4 i; real*4 s; s = 1.; i = 0; while( (s+1.) != 1. ) i = i + 1; s = s / 2.; repeat return end /* * exactly like smant except for double precision numbers... */ subroutine dmant(i) implicit none; integer*4 i; real*8 d; d = 1.; i = 0; while( (d+1.) != 1. ) i = i + 1; d = d / 2.; repeat return end /* * the single precision benchmark. * */ subroutine sflops(sbits) implicit none; integer*4 sbits; real*4 a,b,c,d,e,f,g,h,p,q,r,s,t,u,v,w,x,y,z,rating,perror; real*4 dt,secnds,dtover,error; integer*4 i,j,k,l,m; /* * first figure out what the loop and type conversion overhead is... */ dtover = secnds(0.); do( i = 1, OUTERMOST ) q = float(i); do( j = 1, OUTER ) r = float(j); do( k = 1, INNER ) s = float(k); do( m = 1, INNERMOST ) t = float(m); enddo enddo enddo enddo dtover = secnds(dtover); /* * now do the actual benchmark loop.. */ p = 1.0; dt = secnds(0.); do( i = 1, OUTERMOST ) q = float(i); do( j = 1, OUTER ) r = float(j); do( k = 1, INNER ) s = float(k); do( m = 1, INNERMOST ) t = float(m); v = 1./t; w = s * v; x = r + w; y = q - x; a = y * v; b = a * t; c = b - w; d = c / q; f = d - r; g = f * s; p = p + g; enddo enddo enddo enddo dt = secnds(dt) - dtover; error = abs(p) - ANSWER; perror = abs(100. * error / ANSWER); rating = 11. * float(OUTERMOST*OUTER*INNER*INNERMOST) / dt; write(stdout,*)' total execution time = ',dt+2.*dtover; write(stdout,*)' overhead = ',2.*dtover; write(stdout,*)' <*> s-flops rating = ',nint(rating); write(stdout,*)' <*> s-flops cpu time = ',dt; write(stdout,*)' <*> real*4 mantissa = ',sbits; write(stdout,*)' <*> real*4 length = ',32; write(stdout,*)' <*> percent error = ',perror; end /* * exactly like the single precision test except in double precision */ subroutine dflops(dbits) implicit none; integer*4 dbits; real*8 a,b,c,d,e,f,g,h,p,q,r,s,t,u,v,w,x,y,z,rating,perror; real*8 error; real*4 dt,secnds,dtover; integer*4 i,j,k,l,m; dtover = secnds(0.); do( i = 1, OUTERMOST ) q = dble(i); do( j = 1, OUTER ) r = dble(j); do( k = 1, INNER ) s = dble(k); do( m = 1, INNERMOST ) t = dble(m); enddo enddo enddo enddo dtover = secnds(dtover); p = 1.d0; dt = secnds(0.); do( i = 1, OUTERMOST ) q = dble(i); do( j = 1, OUTER ) r = dble(j); do( k = 1, INNER ) s = dble(k); do( m = 1, INNERMOST ) t = dble(m); v = 1./t; w = s * v; x = r + w; y = q - x; a = y * v; b = a * t; c = b - w; d = c / q; f = d - r; g = f * s; p = p + g; enddo enddo enddo enddo dt = secnds(dt)-dtover; error = abs(p) - ANSWER; perror = abs(100. * error / ANSWER); rating = 11. * dble(OUTERMOST*OUTER*INNER*INNERMOST)/dble(dt); write(stdout,*)' total execution time = ',dt+2.*dtover; write(stdout,*)' overhead = ',2.*dtover; write(stdout,*)' <*> s-flops rating = ',nint(rating); write(stdout,*)' <*> s-flops cpu time = ',dt; write(stdout,*)' <*> real*8 mantissa = ',dbits; write(stdout,*)' <*> real*8 length = ',64; write(stdout,*)' <*> percent error = ',perror; end real*4 function secnds(offset) #include <lib\gemdos.inc> real*4 offset; integer*4 atari,dummy,stack,systimer,oldstack; real*4 mspt; parameter ( mspt = 5.0e-3 ); /* milli seconds per tick */ oldstack = atari( Super, 0 ); /* save stack */ systimer = long(z'4BA'); /* change mode and read */ dummy = atari( Super, oldstack ); /* timer, and restore stack */ secnds = -offset + mspt * float(systimer); /* convert ticks to seconds */ return end ######################################################################## /* * atari basic * version of the flops benchmark * single precsion only * ( double looked too slow to wait for ) */ 1000 rem:--------------------------------------------- 1010 rem: 1020 rem: flops benchmark, atari basic version. 1030 rem: 1040 rem:--------------------------------------------- 9920 defsng a-h,p-z 10100 defint i-n 10130 def seg = 0 10135 loc# = 1210 10140 def fntimer(z) = .005 * peek( loc# ) + z 10145 ANSWER = 2.061914565513972e7 10150 rem:----------------------------------: 10170 rem: mantissa of single precision : 10190 rem:----------------------------------: 10200 s = 1. 10210 i# = 0 10220 while (s+1.) <> 1. 10230 i# = i# + 1# 10240 s = s / 2. 10250 wend 13000 rem:------------------------------------------------- 13010 rem: 13020 rem: single precision test... 13030 rem: 13040 rem:------------------------------------------------- 13100 dtover = fntimer(0.) 13200 for i# = 1# to 20 13300 q = float(i#) 13400 for j# = 1 to 20 13500 r = float(j#) 13600 for k# = 1 to 20 13700 s = float(k#) 13800 for m# = 1 to 20 13900 t = float(m#) 14000 next m# 14010 next k# 14020 next j# 14030 next i# 14040 dtover = fntimer(dtover) 15000 p = 1. 15100 dt = fntimer(0.) 15200 for i# = 1 to 20 15300 q = float(i#) 15400 for j# = 1 to 20 15500 r = float(j#) 15600 for k# = 1 to 20 15700 s = float(k#) 15800 for m# = 1 to 20 15900 t = float(m#) 15990 v = 1./t 15991 w = s * v 15992 x = r + w 15993 y = q - x 15994 a = y * v 15995 b = a * t 15996 c = b - w 15997 d = c / q 15998 f = d - r 15999 g = f * s 16000 p = p + g 16001 next m# 16010 next k# 16020 next j# 16030 next i# 16040 dt = fntimer(dt) - dtover 16050 erratum = abs(p) - ANSWER 16060 perror = abs(100.*erratum/ANSWER) 16070 rating = 11. * float(20.*20.*20.*20.)/dt 16073 print " <*> s-flops rating = ";int(rating+.5) 16080 print " <*> s-flops cputime = ";dt 16090 print " <*> single mantissa = ";i# 16100 print " <*> percent error = ";perror 16300 end ############################################################################