COBB@BRANDEIS.BITNET.UUCP (04/13/87)
Date: Mon, 13 Apr 87 00:36 EDT From: <COBB@BRANDEIS.BITNET> (wes cobb [ cobb@brandeis.bitnet ]) Subject: benchmark battles - round 1. To: info-atari16@score.stanford.edu X-Original-To: atari16, COBB dear benchmarkers, first of all, here is yet another savage benchmark result. actually the only reason i am posting this is that it disagrees by more than 10% with a recent recent posting which appeared in volume #157 .... ############################################################################## Savage Benchmark ---------------- float int size mant size computer cpu-MHz fpu-MHz OS compiler bits bits bits accuracy time ----------- -------- -------- --- -------------- ---- ---- ---- -------- ----- atari/st(1) 68000-8 none tos absoft fortran 32 24 32 3.92e2 20.67 atari/st(1) 68000-8 none tos absoft fortran 64 53 32 1.76e-7 67.41 Notes: 1. atari 520 st with 1 Meg memory upgrade. ############################################################################## i totally agree with moshe braner`s remarks about the savage benchmark - the silly thing is ONLY really a test for the trig library supplied with a compiler - it is NOT a reasonable benchmark program to test realistic floating point performance. it also certainly makes the 68881 look much much better than it really is - i have spoken to Absoft ( they have done several compiler versions for different machines which support 68881s ) they tell me that typically one sees about a 5-10x improvement in floating *,-,+,/ operations with a 68881 and up to a 50x improvment in sin,exp,log and atan. don't expect to do several Mflops on your ST with a 68881 board...( you may get at most a few hundred Kflops ) a much better test of floating point performance is the whetstone benchmark. whetstone was based on a study of real applications programs - the authors studied how often sin, atan, log, exp, *, -, +, /, array indexing, subroutine calls, and integer arithmetic show up in typical scientific and engineering oriented programs. i think i have both the `double` and `single` precision versions of the program running around here somewhere -- if anyone is interested i guess i could post c and fortran source to whetstone ... i suppose one can make an argument for just doing the double precision test ( otherwise virtually no c people get to take part in testing ) even though one rarely uses double precision in real fp applications. i've gotten rather frustrated by the plethora of benchmark results flying around the nets lately ( yes i played my part in it too! ) - and i would like to make a couple of suggestions and/or pleas to all of you benchers out there. 1. what do int and long mean to your compiler? ------------------------------------------ if you are going to run a 'standard' benchmark program on your favorite compiler there is at least one utterly obvious - and usually overlooked - rule to follow: you must be sure that you are using the same size integer and floating point numbers as everyone else is. now obviously if the benchmark program was sloppily written - as most unfortunately were - you arent going to easily be able to do this ( example: in the Sieve, is the `int` type used in the loops supposed to be 16 bits or 32 bits? running the code AS IS will kill your results in Lattice C just because Lattice uses a 32 bit int size, and will INCORRECTLY lead you to assume that Lattice is much slower than it is ). since you CAN'T usually know what was intended, it is best to explicitly STATE what your int size is. 2. what do float and double mean to your compiler? ---------------------------------------------- the same problem holds for floating point numbers in an even more extreme fashion: with floating point numbers not only do you need to know whether you have 4,6,8, or 16 byte floating point numbers, it is also crucial to know HOW those bytes are distributed as mantissas,exponents, and sign bits. It just doesnt make any sense to compare Whetstone results for Absoft F77 in single precision ( real*4 with a 24 bit mantissa ) to Lattice C in double precision ( real*8 with a 53 bit mantissa ) to GFA Basic in middle precision ( real*6 with a 32 bit mantissa ) c and f77 programs for testing the mantissa size of single and double precision numbers are appended to the end of this letter. it should be easy to adapt one of these to any other language you might want to use. 3. always use checksums. --------------------- if you are going to write or create your own benchmark program ALWAYS provide some sort of checksum as a means of checking the accuracy of your answers. there are 2 reasons for this - first of all some compiler optimizers are clever enough to simply skip code which is never going to be used for anything outside of a loop. second, it is all well and good that your compiler has smeared the world at the BRUTUS benchmark - but if the answer you ended up with is utter nonsense then what good will it do you? case in point: megamax-c has an _apparently_ functional -- albeit slow -- log(x) function which works for x > .5 but gives wildly inaccurate answers for x approaching 0....why? the stupid thing apparently uses the WRONG SERIES EXPANSION for x < .5 !!! ( moral: fast but WRONG is not interesting - supply a checksum ) 4. timer routines -------------- a lot of people have been using the xbios gettime() routine for reporting benchmark times. this is okay IF AND ONLY IF the execution time for the program was so great that +/- 2 seconds ( the accuracy of the gettime routine ) doesnt significantly affect the results - i would argue that this would require execution times of at at least several hundred seconds to give reasonable accuracy. in any event it is silly to quote something as short as 16 seconds as a benchmark time using gettime() - ( it could be 14, it could be 18, it could be just about anything in between ) c and fortran source code for timer routines accurate to +/- .005 second are in the appendix to this letter. 5. system software configuration ----------------------------- it MATTERS what desk accessories and \auto folder programs you have installed on your system. in particular things like screensavers, control panels, foreign operating systems, etc can EASILY make a 10-15% difference in performance - since it isnt practical to keep vast lists of qualifications explaining exactly what was resident on benchers systems during the tests - DONT RUN BENCHMARKS IF YOU HAVE DESK ACCESSORIES OR \AUTO\ PROGRAMS loaded. unload them. THEN run the benchmarks. if you are using MINIX, or OS9, or MTC then SAY SO - AND BE SURE TO USE ELAPSED CPU-TIME *not* REAL-TIME in your time reporting. 6. system hardware configuration ----------------------------- it MAY matter whether or not you have a 520st, or a 520st + 1meg upgrade, or a 1040ST!! - for example if your upgrade memory uses significantly faster or slower RAM than original RAM the system still has, then depending on what your ramdisk setup is, you may find that sometimes your program may be executing in fast ram, and sometimes ( with a different ramdisk size ) it may be executing in slow ram. this could make a 5-10% difference in benchmark performance too. it CERTAINLY matters if you have popped a 68010 into your machine. also - if you have a 68881 board on your system you should say what speed IT is running at since unless you have a 68020 based system you are likely running in an asynchronous mode with a different clock speed from the main processor. ( moral: when reporting a benchmark result, if you have modified the hardware then by all means say so! ) wes cobb ( cobb@brandeis.bitnet ) department of physics brandeis university waltham, mass 02254 appendix.( source code mentioned in the body of the letter. ) -------- /* * mntss.c - tests to see how many bits are in the mantissae of * floats and doubles. */ #include <stdio.h> main() { long i,j; float x; double y; i = 0; x = 1.; do{ ++i; x /= 2.; }while( (1.+x) != 1. ); printf("\n floats have %ld bit mantissae",i); j = 0; y = 1.; do{ ++j; y /= 2.; }while( (1.+y) != 1. ); printf("\n doubles have %ld bit mantissae",j); } * * here is fortran code for the same thing... * stdout - is a system dependent number. * absoft f77 has stdout = 9 * vax fortran has stdout = 6 * program mntss integer*4 i,j,stdout parameter ( stdout = 9 ) real*4 x real*8 y i = 0 x = 1. dowhile( (1.+x) .ne. 1. ) i = i + 1; x = x / 2.; enddo write(stdout,*)' floats have ',i,' bit mantissae ' j = 0 y = 1. dowhile( (1.+y) .ne. 1. ) j = j + 1; y = y / 2.; enddo write(stdout,*)' doubles have ',j,' bit mantissae ' end /* * secnds.c - a timer routine for c * ( tested with Megamax, Lattice ) * * usage: * main() * { * double dt,secnds(); * ... * ... * dt = secnds(0.); * ... * ... whatever is to be timed goes here * ... * dt = secnds(dt); * ... * printf("\n elapsed time = %7.2f seconds",dt); * } */ #include <osbind.h> #define SECONDS_PER_TICK .005 double secnds(offset) double offset; { long peek_timer(),temp; temp = SECONDS_PER_TICK * (double)xbios( 38, &peek_timer ) - offset; return(temp); } long peek_timer() { long temp2; temp2 = *(long *)0x4BA; return(temp2); } * * fortran timer routine for * the atari-st - absoft fortran * * usage: program test * real*8 secnds,dt * ... * dt = secnds(0.) * ... * ...what you want to time.. * ... * dt = secnds(dt) * ... * write(9,'('' elapsed time = '',f7.2,'' seconds '')')dt * end * real*8 function systimer(offset) implicit none include lib\gemdos.inc integer*4 atari,dummy,systix,oldstack real*8 mspt,offset parameter ( mspt = 5.0e-3 ) ! milli seconds per tick oldstack = atari( Super, 0 ) ! save stack systix = long(z'4BA') ! change mode and read dummy = atari( Super, oldstack ) ! timer, and restore stack systimer = -offset + mspt * dble(systix) ! convert ticks to seconds return end