COBB@BRANDEIS.BITNET.UUCP (04/13/87)
Date: Mon, 13 Apr 87 00:36 EDT
From: <COBB@BRANDEIS.BITNET> (wes cobb [ cobb@brandeis.bitnet ])
Subject: benchmark battles - round 1.
To: info-atari16@score.stanford.edu
X-Original-To: atari16, COBB
dear benchmarkers,
first of all, here is yet another savage benchmark result. actually the only
reason i am posting this is that it disagrees by more than 10% with a recent
recent posting which appeared in volume #157 ....
##############################################################################
Savage Benchmark
----------------
float int
size mant size
computer cpu-MHz fpu-MHz OS compiler bits bits bits accuracy time
----------- -------- -------- --- -------------- ---- ---- ---- -------- -----
atari/st(1) 68000-8 none tos absoft fortran 32 24 32 3.92e2 20.67
atari/st(1) 68000-8 none tos absoft fortran 64 53 32 1.76e-7 67.41
Notes:
1. atari 520 st with 1 Meg memory upgrade.
##############################################################################
i totally agree with moshe braner`s remarks about the savage benchmark -
the silly thing is ONLY really a test for the trig library supplied with
a compiler - it is NOT a reasonable benchmark program to test realistic
floating point performance. it also certainly makes the 68881 look much
much better than it really is - i have spoken to Absoft ( they have done
several compiler versions for different machines which support 68881s )
they tell me that typically one sees about a 5-10x improvement in floating
*,-,+,/ operations with a 68881 and up to a 50x improvment in sin,exp,log
and atan. don't expect to do several Mflops on your ST with a 68881
board...( you may get at most a few hundred Kflops )
a much better test of floating point performance is the whetstone
benchmark. whetstone was based on a study of real applications programs -
the authors studied how often sin, atan, log, exp, *, -, +, /, array
indexing, subroutine calls, and integer arithmetic show up in typical
scientific and engineering oriented programs. i think i have both the
`double` and `single` precision versions of the program running around
here somewhere -- if anyone is interested i guess i could post c and
fortran source to whetstone ... i suppose one can make an argument for
just doing the double precision test ( otherwise virtually no c people
get to take part in testing ) even though one rarely uses double
precision in real fp applications.
i've gotten rather frustrated by the plethora of benchmark results flying
around the nets lately ( yes i played my part in it too! ) - and i would
like to make a couple of suggestions and/or pleas to all of you benchers
out there.
1. what do int and long mean to your compiler?
------------------------------------------
if you are going to run a 'standard' benchmark program on your
favorite compiler there is at least one utterly obvious - and
usually overlooked - rule to follow: you must be sure that
you are using the same size integer and floating point numbers
as everyone else is. now obviously if the benchmark program
was sloppily written - as most unfortunately were - you arent
going to easily be able to do this ( example: in the Sieve,
is the `int` type used in the loops supposed to be 16 bits
or 32 bits? running the code AS IS will kill your results
in Lattice C just because Lattice uses a 32 bit int size, and
will INCORRECTLY lead you to assume that Lattice is much
slower than it is ). since you CAN'T usually know what was
intended, it is best to explicitly STATE what your int size
is.
2. what do float and double mean to your compiler?
----------------------------------------------
the same problem holds for floating point numbers in an even
more extreme fashion: with floating point numbers not only do
you need to know whether you have 4,6,8, or 16 byte floating
point numbers, it is also crucial to know HOW those bytes
are distributed as mantissas,exponents, and sign bits. It
just doesnt make any sense to compare Whetstone results for
Absoft F77 in single precision ( real*4 with a 24 bit mantissa )
to Lattice C in double precision ( real*8 with a 53 bit
mantissa ) to GFA Basic in middle precision ( real*6 with a
32 bit mantissa )
c and f77 programs for testing the mantissa size of single and
double precision numbers are appended to the end of this letter.
it should be easy to adapt one of these to any other language
you might want to use.
3. always use checksums.
---------------------
if you are going to write or create your own benchmark program
ALWAYS provide some sort of checksum as a means of checking the
accuracy of your answers. there are 2 reasons for this - first
of all some compiler optimizers are clever enough to simply skip
code which is never going to be used for anything outside of a
loop. second, it is all well and good that your compiler has
smeared the world at the BRUTUS benchmark - but if the answer
you ended up with is utter nonsense then what good will it do
you? case in point: megamax-c has an _apparently_ functional
-- albeit slow -- log(x) function which works for x > .5 but
gives wildly inaccurate answers for x approaching 0....why?
the stupid thing apparently uses the WRONG SERIES EXPANSION
for x < .5 !!!
( moral: fast but WRONG is not interesting - supply a checksum )
4. timer routines
--------------
a lot of people have been using the xbios gettime() routine for
reporting benchmark times. this is okay IF AND ONLY IF the execution
time for the program was so great that +/- 2 seconds ( the accuracy
of the gettime routine ) doesnt significantly affect the results -
i would argue that this would require execution times of at at least
several hundred seconds to give reasonable accuracy. in any event it
is silly to quote something as short as 16 seconds as a benchmark time
using gettime() - ( it could be 14, it could be 18, it could be just
about anything in between )
c and fortran source code for timer routines accurate to +/- .005
second are in the appendix to this letter.
5. system software configuration
-----------------------------
it MATTERS what desk accessories and \auto folder programs you have
installed on your system. in particular things like screensavers,
control panels, foreign operating systems, etc can EASILY make a
10-15% difference in performance - since it isnt practical to keep
vast lists of qualifications explaining exactly what was resident
on benchers systems during the tests - DONT RUN BENCHMARKS IF YOU
HAVE DESK ACCESSORIES OR \AUTO\ PROGRAMS loaded. unload them.
THEN run the benchmarks. if you are using MINIX, or OS9, or MTC
then SAY SO - AND BE SURE TO USE ELAPSED CPU-TIME *not* REAL-TIME
in your time reporting.
6. system hardware configuration
-----------------------------
it MAY matter whether or not you have a 520st, or a 520st + 1meg
upgrade, or a 1040ST!! - for example if your upgrade memory uses
significantly faster or slower RAM than original RAM the system
still has, then depending on what your ramdisk setup is, you may
find that sometimes your program may be executing in fast ram,
and sometimes ( with a different ramdisk size ) it may be executing
in slow ram. this could make a 5-10% difference in benchmark
performance too. it CERTAINLY matters if you have popped a
68010 into your machine. also - if you have a 68881 board on your
system you should say what speed IT is running at since unless you
have a 68020 based system you are likely running in an asynchronous
mode with a different clock speed from the main processor.
( moral: when reporting a benchmark result, if you
have modified the hardware then by all means say so! )
wes cobb ( cobb@brandeis.bitnet )
department of physics
brandeis university
waltham, mass 02254
appendix.( source code mentioned in the body of the letter. )
--------
/*
* mntss.c - tests to see how many bits are in the mantissae of
* floats and doubles.
*/
#include <stdio.h>
main()
{
long i,j;
float x;
double y;
i = 0;
x = 1.;
do{
++i;
x /= 2.;
}while( (1.+x) != 1. );
printf("\n floats have %ld bit mantissae",i);
j = 0;
y = 1.;
do{
++j;
y /= 2.;
}while( (1.+y) != 1. );
printf("\n doubles have %ld bit mantissae",j);
}
*
* here is fortran code for the same thing...
* stdout - is a system dependent number.
* absoft f77 has stdout = 9
* vax fortran has stdout = 6
*
program mntss
integer*4 i,j,stdout
parameter ( stdout = 9 )
real*4 x
real*8 y
i = 0
x = 1.
dowhile( (1.+x) .ne. 1. )
i = i + 1;
x = x / 2.;
enddo
write(stdout,*)' floats have ',i,' bit mantissae '
j = 0
y = 1.
dowhile( (1.+y) .ne. 1. )
j = j + 1;
y = y / 2.;
enddo
write(stdout,*)' doubles have ',j,' bit mantissae '
end
/*
* secnds.c - a timer routine for c
* ( tested with Megamax, Lattice )
*
* usage:
* main()
* {
* double dt,secnds();
* ...
* ...
* dt = secnds(0.);
* ...
* ... whatever is to be timed goes here
* ...
* dt = secnds(dt);
* ...
* printf("\n elapsed time = %7.2f seconds",dt);
* }
*/
#include <osbind.h>
#define SECONDS_PER_TICK .005
double secnds(offset)
double offset;
{
long peek_timer(),temp;
temp = SECONDS_PER_TICK * (double)xbios( 38, &peek_timer ) - offset;
return(temp);
}
long peek_timer()
{
long temp2;
temp2 = *(long *)0x4BA;
return(temp2);
}
*
* fortran timer routine for
* the atari-st - absoft fortran
*
* usage: program test
* real*8 secnds,dt
* ...
* dt = secnds(0.)
* ...
* ...what you want to time..
* ...
* dt = secnds(dt)
* ...
* write(9,'('' elapsed time = '',f7.2,'' seconds '')')dt
* end
*
real*8 function systimer(offset)
implicit none
include lib\gemdos.inc
integer*4 atari,dummy,systix,oldstack
real*8 mspt,offset
parameter ( mspt = 5.0e-3 ) ! milli seconds per tick
oldstack = atari( Super, 0 ) ! save stack
systix = long(z'4BA') ! change mode and read
dummy = atari( Super, oldstack ) ! timer, and restore stack
systimer = -offset + mspt * dble(systix) ! convert ticks to seconds
return
end