COBB@BRANDEIS.BITNET (04/23/87)
Date: Wed, 22 Apr 87 23:06 EDT
From: <COBB@BRANDEIS.BITNET> (wes cobb [ cobb@brandeis.bitnet ])
Subject: benchmark battles: part 2.
To: info-atari16@score.stanford.edu
X-Original-To: atari16, COBB
dear benchfolk,
someone ( sorry -- i`ve lost the reference ) made a remark which in
essence claimed that one should use BASIC if one wished to do fp work
on the ST!!!!!
well i strongly disagree...lets digress for a moment and look at
the code he posted: supposedly the basic program the writer had
tested used...
x = tan(atan(exp(log(sqr(x*x)))))+1.
...as what he was looping over. now in every basic *I* have ever seen
the arc tangent function is called ATN _not_ atan.... i ran the code
he listed under atari basic - there were no error messages - and the
program literally blew fortran and c away. of course since atan
was undefined the program was really just calculating tan(0.) 2500
times so it SHOULD have been pretty fast! when i substituted atn
for the arc tangent function i got about 26 seconds for the single
precision execution time. not bad - but a good 50% slower than
fortran.
also in ATARI basic there is ( as far as i can tell ) no way
at all to even _PERFORM_ a double precision savage benchmark - i.e
there dont seem to BE double precision forms of the math library
functions. obviously doing the loops in single precision
and then typecasting the results into double precision is NOT legal
-- so where on earth are these double precision atari basic results
coming from?
if one wants to use benchmark results to choose what language to work
in one must be careful to choose reasonable benchmark programs and once
one has to compare apples with apples and oranges with oranges -- it is
silly to compare 48 bit floating point number benchmark results with 64
bit benchmark results etc.... at any rate its NUTS to base ones choice
of programming language on ( of all tests ) Savage because:
THE SAVAGE BENCHMARK IS *N*O*T* A GOOD TEST OF FP PERFORMANCE.
the savage tests ONLY trig library functions. period. it isnt even a very
good test of THEM. period. since the overwhelming preponderance of all
fp calls, operations, and usage do NOT involve trig library calls it just
doesnt make sense to base anything important on it.
if you MUST base everything on a single test then at least make it
the whetstone benchmark since that at least is a reasonable model of a
real applications program. ( atari basic by the way is about 100x slower
than fortran on the whetstone )
i`ve included another benchmark program which you may want to try...it
tests the speed of floating *,-,/,+ in c,fortran,ratfor,and basic. even
THIS is surely a more realistic test of floating point behavior than the
savage benchmark is: fp *,-,/,+ occur overwhelmingly more often than
trig lib calls in real applications code.
wes
############################################################################
THE FLOPS BENCHMARK
results as of
22 april 1987
machine language os rating cpu-time mant size % error notes
---------- ---------- --- --------- -------- ---- ---- ------- -------
atari/st absoft f77 tos 11,453 153.70 24 32 8.4e-2 1,2
atari/st absoft f77 tos 6,223 282.80 53 32 1.6e-10 1,2
atari/st megamax c tos 3,659 480.95 24 32 2.7e-3 1,3
atari/st megamax c tos 1,352 1301.34 53 64 3.4e-10 1,4
atari/st lattice c tos 1,227 1433.99 54 64 3.5e-12 1,5
atari/st basic tos 607 2899.19 24 (32) 1.3e-3 1,6
---------- ---------- --- --------- -------- ---- ---- ------- -------
1. 68000 at 8 MHz. 1 meg ram. no fpa. no desk accs. med rez.
2. absoft fortran v2.2 ( dynamically linked )
3. megamax c v1.00 ( fmath.o )
4. megamax c v1.00 ( double.o )
5. lattice c v3.02 ( doesnt support single precision )
6. atari basic ( i ASSUME 32 bits here... manual doesnt say )
the flops benchmark - as its name suggests - purports to estimate how
many floating operations per second a machine/compiler combination can
perform. this program is primarily intended to be run on microcomputers
so no effort has been made to allow for machines more than about 10x
faster than a typical supermini.
in the context of this program a floating operation is merely one
of the 4 basic operations { *,/,-,+ } weighted so as to reflect what
would seem to be their frequency of appearance in scientific applications
code: 35% for *, 26% for -, 22% for +, and 17% for /. these numbers
were derived from a study of IMSL source code.
the program does use several nested loops and typecasts but -- never
fear -- the overhead these require is subtracted out of the total
since we are only really interested here in flops. note that the
overhead is typically about 50% of the flops cputime total.
the program makes calls to system dependent timing routines. for the
atari st ( well at least for a monotasking atari st running TOS ) these
routines are provided in the code. if you are running this on another
system you will have to improvise.
please send results to:
electronic mail: cobb@brandeis.bitnet
ci$ [ 72155, 1422 ]
snail mail: wes cobb
dept. of physics
brandeis university
waltham, mass 02254
usa
############################################################################
/*
* c version of flops
* tested with
* lattice && megamax
*/
#include <stdio.h>
#include <osbind.h>
#define OUTERMOST 20
#define INNERMOST 20
#define INNER 20
#define OUTER 20
#define ANSWER 2.061914565513972e7
main()
{
long sbits,dbits,smant(),dmant();
title();
sbits = smant();
dbits = dmant();
if( sbits < 23 || sbits > 25 ){
printf("\n this single precision is NOT ieee standard...");
}
if( dbits < 52 || dbits > 54 ){
printf("\n this double precision is NOT ieee standard...");
}
if( sbits == dbits ){
printf("\n ...there is evidently no difference between `float`");
printf("\n and `double` in this c implementation.");
printf("\n ( only going to run d-flops ... )");
dflops(dbits);
exit();
}
sflops(sbits);
dflops(dbits);
printf("\n ");
}
title(){
printf("\n E");
printf("\n flops benchmark v1.0");
printf("\n 22 april 1987");
printf("\n ");
}
long smant()
{
float s;
long i;
s = 1.;
i = 0;
while( (s+1.) != 1. ){
i++;
s = s / 2.;
}
return(i);
}
long dmant()
{
long i;
double d;
d = 1.;
i = 0;
while( (d+1.) != 1. ){
i++;
d = d / 2.;
}
return(i);
}
sflops(sbits)
long sbits;
{
float a,b,c,d,e,f,g,h,p,q,r,s,t,u,v,w,x,y,z,rating,perror,error;
double dt,secnds(),dtover;
long i,j,k,l,m;
dtover = secnds(0.);
for( i = 1; i <= OUTERMOST ;i++){
q = (float)(i);
for( j = 1; j<= OUTER ;j++){
r = (float)(j);
for( k = 1;k <= INNER ;k++){
s = (float)(k);
for( m = 1; m <= INNERMOST ;m++){
t = (float)(m);
}
}
}
}
dtover = secnds(dtover);
p = 1.;
dt = secnds(0.);
for( i = 1; i <= OUTERMOST ;i++){
q = (float)(i);
for( j = 1;j <= OUTER ;j++){
r = (float)(j);
for( k = 1;k <= INNER ;k++){
s = (float)(k);
for( m = 1;m <= INNERMOST ;m++){
t = (float)(m);
v = 1./t;
w = s * v;
x = r + w;
y = q - x;
a = y * v;
b = a * t;
c = b - w;
d = c / q;
f = d - r;
g = f * s;
p = p + g;
}
}
}
}
dt = secnds(dt) - dtover;
error = abs(p) - ANSWER;
perror = abs(100. * error / ANSWER );
rating = 11. * (float)(OUTERMOST*OUTER*INNER*INNERMOST) / dt;
printf("\n total execution time = %7.2f ",dt+2.*dtover);
printf("\n overhead = %7.2f ",2.*dtover);
printf("\n <*> s-flops rating = %ld ",(long)(rating+.5));
printf("\n <*> s-flops cpu time = %7.2f ",dt);
printf("\n <*> float mantissa = %ld ",sbits);
printf("\n <*> percent error = %12.5e ",perror);
}
dflops(dbits)
long dbits;
{
double a,b,c,d,e,f,g,h,p,q,r,s,t,u,v,w,x,y,z,rating,perror;
double error;
double dt,secnds(),dtover;
long i,j,k,l,m;
dtover = secnds(0.);
for( i = 1; i <= OUTERMOST ;i++){
q = (double)(i);
for( j = 1;j <= OUTER ;j++){
r = (double)(j);
for( k = 1;k <= INNER ;k++){
s = (double)(k);
for( m = 1;m <= INNERMOST;m++ ){
t = (double)(m);
}
}
}
}
dtover = secnds(dtover);
p = 1.;
dt = secnds(0.);
for( i = 1; i <= OUTERMOST; i++ ){
q = (double)(i);
for( j = 1;j <= OUTER; j++){
r = (double)(j);
for( k = 1;k <= INNER;k++ ){
s = (double)(k);
for( m = 1; m<= INNERMOST;m++ ){
t = (double)(m);
v = 1./t;
w = s * v;
x = r + w;
y = q - x;
a = y * v;
b = a * t;
c = b - w;
d = c / q;
f = d - r;
g = f * s;
p = p + g;
}
}
}
}
dt = secnds(dt)-dtover;
error = abs(p) - ANSWER;
perror = abs(100. * error / ANSWER );
rating = 11. * (double)(OUTERMOST*OUTER*INNER*INNERMOST) / (double)(dt);
printf("\n total execution time = %7.2f ",dt+2.*dtover);
printf("\n overhead = %7.2f ",2.*dtover);
printf("\n <*> s-flops rating = %ld ",(long)(rating+.5));
printf("\n <*> s-flops cpu time = %7.2f ",dt);
printf("\n <*> double mantissa = %ld ",dbits);
printf("\n <*> percent error = %12.5e ",perror);
}
double secnds(offset)
double offset;
{
long peek();
double temp;
temp = .005 * (double)xbios( 38, &peek ) - offset ;
return(temp);
}
long peek()
{
long temp2;
temp2 = *(long *)0x4ba;
return(temp2);
}
############################################################################
/*
* absoft fortran 77
* version of the flops benchmark
* this is what the ratfor source ends up as...
*/
program flops
implicit none
integer*4 sbits,dbits
call title
call smant(sbits)
call dmant(dbits)
if( sbits .LT. 23 .OR. sbits .GT. 25 )then
write(9,*)' note: this single precision is NOT ieee standard'
endif
if( dbits .LT. 52 .OR. dbits .GT. 54 )then
write(9,*)' note: this double precision is NOT ieee standard'
endif
if( sbits .EQ. dbits )then
write(9,*)' evidently single and double are the same in ...'
write(9,*)' ...this language. we will only run d-flops'
write(9,*)' '
write(9,*)' *** now running the d-flops test ***'
call dflops(dbits)
else
write(9,*)' '
write(9,*)' *** now running the s-flops test ***'
call sflops(sbits)
write(9,*)' '
write(9,*)' *** now running the d-flops test ***'
call dflops(dbits)
endif
write(9,*)' '
end
subroutine title
write(9,*)' E'
write(9,*)' flops benchmark v1.0'
write(9,*)' 22 april 1987'
write(9,*)' '
return
end
subroutine smant(i)
implicit none
integer*4 i
real*4 s
s = 1.
i = 0
while( (s+1.) .NE. 1. )
i = i + 1
s = s / 2.
repeat
return
end
subroutine dmant(i)
implicit none
integer*4 i
real*8 d
d = 1.
i = 0
while( (d+1.) .NE. 1. )
i = i + 1
d = d / 2.
repeat
return
end
subroutine sflops(sbits)
implicit none
integer*4 sbits
real*4 a,b,c,d,e,f,g,h,p,q,r,s,t,u,v,w,x,y,z,rating,perror
real*4 dt,secnds,dtover,error
integer*4 i,j,k,l,m
dtover = secnds(0.)
do( i = 1, 20 )
q = float(i)
do( j = 1, 20 )
r = float(j)
do( k = 1, 20 )
s = float(k)
do( m = 1, 20 )
t = float(m)
enddo
enddo
enddo
enddo
dtover = secnds(dtover)
p = 1.0
dt = secnds(0.)
do( i = 1, 20 )
q = float(i)
do( j = 1, 20 )
r = float(j)
do( k = 1, 20 )
s = float(k)
do( m = 1, 20 )
t = float(m)
v = 1./t
w = s * v
x = r + w
y = q - x
a = y * v
b = a * t
c = b - w
d = c / q
f = d - r
g = f * s
p = p + g
enddo
enddo
enddo
enddo
dt = secnds(dt) - dtover
error = abs(p) - 2.061914565513972d7
perror = abs(100. * error / 2.061914565513972d7 )
rating = 11. * float(20*20*20*20) / dt
write(9,*)' total execution time = ',dt+2.*dtover
write(9,*)' overhead = ',2.*dtover
write(9,*)' <*> s-flops rating = ',nint(rating)
write(9,*)' <*> s-flops cpu time = ',dt
write(9,*)' <*> real*4 mantissa = ',sbits
write(9,*)' <*> real*4 length = ',32
write(9,*)' <*> percent error = ',perror
return
end
subroutine dflops(dbits)
implicit none
integer*4 dbits
real*8 a,b,c,d,e,f,g,h,p,q,r,s,t,u,v,w,x,y,z,rating,perror
real*8 error
real*4 dt,secnds,dtover
integer*4 i,j,k,l,m
dtover = secnds(0.)
do( i = 1, 20 )
q = dble(i)
do( j = 1, 20 )
r = dble(j)
do( k = 1, 20 )
s = dble(k)
do( m = 1, 20 )
t = dble(m)
enddo
enddo
enddo
enddo
dtover = secnds(dtover)
p = 1.d0
dt = secnds(0.)
do( i = 1, 20 )
q = dble(i)
do( j = 1, 20 )
r = dble(j)
do( k = 1, 20 )
s = dble(k)
do( m = 1, 20 )
t = dble(m)
v = 1./t
w = s * v
x = r + w
y = q - x
a = y * v
b = a * t
c = b - w
d = c / q
f = d - r
g = f * s
p = p + g
enddo
enddo
enddo
enddo
dt = secnds(dt)-dtover
error = abs(p) - 2.061914565513972d7
perror = abs(100. * error / 2.061914565513972d7 )
rating = 11. * dble(20*20*20*20)/dble(dt)
write(9,*)' total execution time = ',dt+2.*dtover
write(9,*)' overhead = ',2.*dtover
write(9,*)' <*> s-flops rating = ',nint(rating)
write(9,*)' <*> s-flops cpu time = ',dt
write(9,*)' <*> real*8 mantissa = ',dbits
write(9,*)' <*> real*8 length = ',64
write(9,*)' <*> percent error = ',perror
return
end
real*4 function secnds(offset)
integer*4 Super
parameter (Super = z'00000902')
real*4 offset
integer*4 atari,dummy,stack,systimer,oldstack
real*4 mspt
parameter ( mspt = 5.0e-3 )
oldstack = atari( Super, 0 )
systimer = long(z'4BA')
dummy = atari( Super, oldstack )
secnds = -offset + mspt * float(systimer)
return
end
############################################################################
/*
* ratfor version of flops benchmark.
*/
#include <lib\fortran.h> /* this just defines stdout,stdin and the very */
/* few quirky things which vary between vaxf77 */
/* and absoft... */
#define OUTERMOST 20
#define INNERMOST 20
#define INNER 20
#define OUTER 20
#define ANSWER 2.061914565513972d7 /* the right answer to 16 figs */
program flops
implicit none
integer*4 sbits,dbits
call title
call smant(sbits)
call dmant(dbits)
if( sbits < 23 || sbits > 25 )then
write(stdout,*)' note: this single precision is NOT ieee standard'
endif
if( dbits < 52 || dbits > 54 )then
write(stdout,*)' note: this double precision is NOT ieee standard'
endif
if( sbits == dbits )then
write(stdout,*)' evidently single and double are the same in ...'
write(stdout,*)' ...this language. we will only run d-flops'
write(stdout,*)' '
write(stdout,*)' *** now running the d-flops test ***'
call dflops(dbits)
else
write(stdout,*)' '
write(stdout,*)' *** now running the s-flops test ***'
call sflops(sbits)
write(stdout,*)' '
write(stdout,*)' *** now running the d-flops test ***'
call dflops(dbits)
endif
write(stdout,*)' ';
end
/*
* this just clears the screen and then draws the title message.
*/
subroutine title
write(stdout,*)' E'
write(stdout,*)' flops benchmark v1.0'
write(stdout,*)' 22 april 1987'
write(stdout,*)' '
return
end
/*
* this measures how many significant bits are carried along in the
* mantissa of single precision floating point numbers...does this by
* repeated division by 2 until stasis is reached.
*/
subroutine smant(i)
implicit none;
integer*4 i;
real*4 s;
s = 1.;
i = 0;
while( (s+1.) != 1. )
i = i + 1;
s = s / 2.;
repeat
return
end
/*
* exactly like smant except for double precision numbers...
*/
subroutine dmant(i)
implicit none;
integer*4 i;
real*8 d;
d = 1.;
i = 0;
while( (d+1.) != 1. )
i = i + 1;
d = d / 2.;
repeat
return
end
/*
* the single precision benchmark.
*
*/
subroutine sflops(sbits)
implicit none;
integer*4 sbits;
real*4 a,b,c,d,e,f,g,h,p,q,r,s,t,u,v,w,x,y,z,rating,perror;
real*4 dt,secnds,dtover,error;
integer*4 i,j,k,l,m;
/*
* first figure out what the loop and type conversion overhead is...
*/
dtover = secnds(0.);
do( i = 1, OUTERMOST )
q = float(i);
do( j = 1, OUTER )
r = float(j);
do( k = 1, INNER )
s = float(k);
do( m = 1, INNERMOST )
t = float(m);
enddo
enddo
enddo
enddo
dtover = secnds(dtover);
/*
* now do the actual benchmark loop..
*/
p = 1.0;
dt = secnds(0.);
do( i = 1, OUTERMOST )
q = float(i);
do( j = 1, OUTER )
r = float(j);
do( k = 1, INNER )
s = float(k);
do( m = 1, INNERMOST )
t = float(m);
v = 1./t;
w = s * v;
x = r + w;
y = q - x;
a = y * v;
b = a * t;
c = b - w;
d = c / q;
f = d - r;
g = f * s;
p = p + g;
enddo
enddo
enddo
enddo
dt = secnds(dt) - dtover;
error = abs(p) - ANSWER;
perror = abs(100. * error / ANSWER);
rating = 11. * float(OUTERMOST*OUTER*INNER*INNERMOST) / dt;
write(stdout,*)' total execution time = ',dt+2.*dtover;
write(stdout,*)' overhead = ',2.*dtover;
write(stdout,*)' <*> s-flops rating = ',nint(rating);
write(stdout,*)' <*> s-flops cpu time = ',dt;
write(stdout,*)' <*> real*4 mantissa = ',sbits;
write(stdout,*)' <*> real*4 length = ',32;
write(stdout,*)' <*> percent error = ',perror;
end
/*
* exactly like the single precision test except in double precision
*/
subroutine dflops(dbits)
implicit none;
integer*4 dbits;
real*8 a,b,c,d,e,f,g,h,p,q,r,s,t,u,v,w,x,y,z,rating,perror;
real*8 error;
real*4 dt,secnds,dtover;
integer*4 i,j,k,l,m;
dtover = secnds(0.);
do( i = 1, OUTERMOST )
q = dble(i);
do( j = 1, OUTER )
r = dble(j);
do( k = 1, INNER )
s = dble(k);
do( m = 1, INNERMOST )
t = dble(m);
enddo
enddo
enddo
enddo
dtover = secnds(dtover);
p = 1.d0;
dt = secnds(0.);
do( i = 1, OUTERMOST )
q = dble(i);
do( j = 1, OUTER )
r = dble(j);
do( k = 1, INNER )
s = dble(k);
do( m = 1, INNERMOST )
t = dble(m);
v = 1./t;
w = s * v;
x = r + w;
y = q - x;
a = y * v;
b = a * t;
c = b - w;
d = c / q;
f = d - r;
g = f * s;
p = p + g;
enddo
enddo
enddo
enddo
dt = secnds(dt)-dtover;
error = abs(p) - ANSWER;
perror = abs(100. * error / ANSWER);
rating = 11. * dble(OUTERMOST*OUTER*INNER*INNERMOST)/dble(dt);
write(stdout,*)' total execution time = ',dt+2.*dtover;
write(stdout,*)' overhead = ',2.*dtover;
write(stdout,*)' <*> s-flops rating = ',nint(rating);
write(stdout,*)' <*> s-flops cpu time = ',dt;
write(stdout,*)' <*> real*8 mantissa = ',dbits;
write(stdout,*)' <*> real*8 length = ',64;
write(stdout,*)' <*> percent error = ',perror;
end
real*4 function secnds(offset)
#include <lib\gemdos.inc>
real*4 offset;
integer*4 atari,dummy,stack,systimer,oldstack;
real*4 mspt;
parameter ( mspt = 5.0e-3 ); /* milli seconds per tick */
oldstack = atari( Super, 0 ); /* save stack */
systimer = long(z'4BA'); /* change mode and read */
dummy = atari( Super, oldstack ); /* timer, and restore stack */
secnds = -offset + mspt * float(systimer); /* convert ticks to seconds */
return
end
########################################################################
/*
* atari basic
* version of the flops benchmark
* single precsion only
* ( double looked too slow to wait for )
*/
1000 rem:---------------------------------------------
1010 rem:
1020 rem: flops benchmark, atari basic version.
1030 rem:
1040 rem:---------------------------------------------
9920 defsng a-h,p-z
10100 defint i-n
10130 def seg = 0
10135 loc# = 1210
10140 def fntimer(z) = .005 * peek( loc# ) + z
10145 ANSWER = 2.061914565513972e7
10150 rem:----------------------------------:
10170 rem: mantissa of single precision :
10190 rem:----------------------------------:
10200 s = 1.
10210 i# = 0
10220 while (s+1.) <> 1.
10230 i# = i# + 1#
10240 s = s / 2.
10250 wend
13000 rem:-------------------------------------------------
13010 rem:
13020 rem: single precision test...
13030 rem:
13040 rem:-------------------------------------------------
13100 dtover = fntimer(0.)
13200 for i# = 1# to 20
13300 q = float(i#)
13400 for j# = 1 to 20
13500 r = float(j#)
13600 for k# = 1 to 20
13700 s = float(k#)
13800 for m# = 1 to 20
13900 t = float(m#)
14000 next m#
14010 next k#
14020 next j#
14030 next i#
14040 dtover = fntimer(dtover)
15000 p = 1.
15100 dt = fntimer(0.)
15200 for i# = 1 to 20
15300 q = float(i#)
15400 for j# = 1 to 20
15500 r = float(j#)
15600 for k# = 1 to 20
15700 s = float(k#)
15800 for m# = 1 to 20
15900 t = float(m#)
15990 v = 1./t
15991 w = s * v
15992 x = r + w
15993 y = q - x
15994 a = y * v
15995 b = a * t
15996 c = b - w
15997 d = c / q
15998 f = d - r
15999 g = f * s
16000 p = p + g
16001 next m#
16010 next k#
16020 next j#
16030 next i#
16040 dt = fntimer(dt) - dtover
16050 erratum = abs(p) - ANSWER
16060 perror = abs(100.*erratum/ANSWER)
16070 rating = 11. * float(20.*20.*20.*20.)/dt
16073 print " <*> s-flops rating = ";int(rating+.5)
16080 print " <*> s-flops cputime = ";dt
16090 print " <*> single mantissa = ";i#
16100 print " <*> percent error = ";perror
16300 end
############################################################################