sandra@utah-cs.UUCP (Sandra J Loosemore) (03/19/87)
I think I must have been asleep when I ran my earlier floating point
benchmarks, because I took a more careful look at it and it turns out I
wrote my numbers down backwards. Here are the correct numbers for
primitive arithmetic operations. These are in 200-hz clock tick units
for 1000 repetitions of the operation, with no attempt made to account
the overhead of the loop. There was no significant difference between
IEEE single and double precision here.
IEEE FFP
+ 15 13
- 15 23
* 22 20
/ 58 19
Ali Ozer <ali@rocky.stanford.edu> recently sent me a floating point
benchmark program called the "Savage" benchmark, which primarily tests
the double-precision floating point math library. I've tacked on his
original message to the end. Here's my C version:
main ()
{ int i, iloop;
double a;
long start, end;
start = gettime ();
a = 1.0;
iloop = 2499;
for (i=0; i<iloop; i++)
a = tan(atan(exp(log(sqrt(a*a))))) + 1.0;
end = gettime ();
printf ("%e\n", (float)(a-2500.0)); /* error term */
printf ("%ld\n", (long)(end - start)); /* elapsed time */
}
And the results for Alcyon C V4.14:
IEEE (libm): 1.763e-7, 72.6 seconds
FFP (libf): 2.269e+2, 7.4 seconds
So the FFP library is much faster, but loses on accuracy as it is only
single precision.
-Sandra (sandra@cs.utah.edu)
-----------------------------------
***************************************************************
* Savage Benchmark Results *
* 16 DEC 1986 *
* Al Aburto/Lew Wolfgang/Larry Phillips/John Gilmore/Ali Ozer *
* Glenn Miller/Mike Howard/And Others........................ *
***************************************************************
123456789012345678901234567890123456789012345678901234567890123456789012345
System CPU / FPP CLOCK LANGUAGE TIME ERROR
(MHz) (Sec) Abs(a-2500)
Turbo-Amiga (68020/68881) 14.32 Absoft F77 V2.2B 0.39 2.7 E-12
Sun-3/160 (68020/68881) 16.67 Sun 3.0 F77 0.4 2.0 E-12
Turbo-Amiga (68020/68881) 14.32 Lattice C/68881 Assem 0.46 9.2 E-13
HP 9000/320 (68020/68881) Fortran 77 0.7 3.2 E-09
HP 9000/320 (68020/68881) Pascal 0.7 2.8 E-07
Amiga (68020/68881) 7.16 Absoft F77 V2.2B 0.78 2.0 E-12
VAX-8600 Fortran 77 0.9 1.8 E-08
Amiga (68020/68881) 7.16 Lattice C/68881 Assem 0.92 5.9 E-12
HP 9000/320 (68020/68881) C 1.0 2.5 E-08
DEC 2060 1.6 2.0 E-12
VAX-11/750 Fortran 77 1.9 6.6 E-10
Masscomp (68010/ FPP) 2.1 3.2 E-07
VAX-11/780 UNIX 4.3BSD F77-O 2.7 1.8 E-12
Turbo-Amiga (68020/68881) 14.32 MetaComCo ABasiC V1.0 3.2 2.3 E+01
DMS ( 8086/ 8087) Turbo Pascal 3.8 1.1 E-09
Zenith Z-248 (80286/80287) 8.00 MS Fortran77 V3.20 4.5 1.2 E-09
IBM PC-AT (80286/80287) 6.00 ProFor F77 4.9 8.7 E-11
IBM PC-AT (80286/80287) 6.00 MS Fortran77 7.2 1.2 E-09
IBM PC-AT (80286/80287) 6.00 Turbo Pascal 7.4 1.2 E-09
IBM PC ( 8088/ 8087) 4.77 Microsoft C 8.0 1.2 E-09
Amiga (68020/68881) 7.16 Metacomco ABasiC V1.0 8.6 2.3 E+01
Turbo-Amiga (68020/-----) 14.32 Metacomco ABasiC V1.0 13.3 2.7 E+02
Turbo-Amiga (68020/-----) 14.32 ABasiC V1.0(Cache Off) 14.7 2.7 E+02
Sun-3/160 (68020/-----) 16.67 Sun 3.0 F77 21.5 3.1 E-07
Turbo-Amiga (68020/-----) 14.32 Absoft F77 V2.2B 21.9 1.8 E-07
Amiga (68020/-----) 7.16 Metacomco ABasiC V1.0 37.0 2.7 E+02
Amiga (68000/-----) 7.16 Metacomco ABasiC V1.0 39.7 2.7 E+02
Amiga (68020/-----) 7.16 ABasiC V1.0(Cache Off) 42.2 2.7 E+02
HP 9826 (68000/-----) 8.00 HP Basic V2.0 44.5 3.2 E-07
Turbo-Amiga (68020/-----) 14.32 Lattice C V3.03 55.4 3.2 E-07
IBM PC-XT ( 8088/ 8087) 4.77 Gauss 58.0 1.2 E-09
Amiga (68020/-----) 7.16 Absoft F77 V2.2B 59.7 1.8 E-07
HP Integral (68000/-----) Basic Interpreter 60.9 3.2 E-07
HP Integral (68000/-----) C 63.0 3.2 E-07
Amiga (68000/-----) 7.16 True Basic (Compiler) 65.2 3.0 E-03
Amiga (68020/-----) 7.16 MS AmigaBASIC V1.0 67.0 3.2 E-07
Amiga (68000/-----) 7.16 MS AmigaBASIC V1.0 73.0 3.2 E-07
Amiga (68000/-----) 7.16 Absoft F77 V2.2B 77.2 1.8 E-07
HP Integral (68000/-----) Absoft F77 100.0 1.8 E-07
Amiga (68020/-----) 7.16 Lattice C V3.03 139.0 3.2 E-07
Macintosh (68000/-----) 7.83 MAC C 221.0 (?)
Amiga (68000/-----) 7.16 Lattice C V3.03 234.0 3.2 E-07
Macintosh (68000/-----) 7.83 DeSmet C 244.0 (?)
Commodore 128( 8502/-----) 2.00 Basic Interpreter 256.0 9.0 E-04
Macintosh (68000/-----) 7.83 Manx Aztec C 353.0 (?)
IBM PC-XT ( 8088/-----) 4.77 BASICA 895.0 3.0 E-08
Tandy PC-5 Basic Interpreter 961.0 2.7 E-03
****************************************************************************
Notes:
(1) The Savage Benchmark, by Bill Savage, first appeared in Dr. Dobb's
Journal, Sept 1983, page 120.
(2) The Macintosh results are from Byte, The Small Systems Journal,
Aug 1986, page 254. There appears to be a 'typo' in the
published accuracy results. Exact result should be 2500.0 .
(3) The Savage Benchmark requires use of IEEE double precision
to obtain a reasonably small error. The error is unacceptably
large for IEEE single precision. All the above results were
obtained with double precision except for the MetaComCo ABasiC
where double precision variables were used but the math functions
were calculated only to single precision. As can be seen ABasiC
is fast but the error is too large for a meaningful result.
-----------------------------------
c Here is the Savage Benchmark Program:
c **************************************
c * Fortran 77 *
c **************************************
Program Savage
implicit double precision (a-h,o-z)
write(*,1000)
a = 1.0
iloop = 2499
do 100 i=1,iloop
a = dtan(datan(dexp(dlog(dsqrt(a*a))))) + 1.0
100 continue
write(*,1010)
write(*,1020) a
1000 format(5x,'Start')
1010 format(5x,'Stop ')
1020 format(5x,'a = ',f22.15)
stop
end
-----------------------------------
braner@batcomputer.UUCP (03/21/87)
[] Thanks to Sandra Loosemore for posting the interesting benchmarks. Here are results of the Savage benchmark for Megamax C on the Atari ST (8 MHz 68000): time error Single precision: 146 4.3E+01 Double precision: 496 8.5E-07 Double precision, with 32081: 119 2.2E-08 The Megamax math library (written in C, using sloppy algorithms) is even slower than the (in)SANE numeric package on the Apple Macintosh, as exemplified by Aztec C (353 seconds). In comparision, Absoft FORTRAN on the Amiga did it in 77 seconds (could someone post the Absoft time on the ST?), Alcyon C v4.14 (libm) clocked in at 73 seconds, and HP BASIC (also on an 8 MHz 68000) managed 45 seconds. (Any data for Mark Williams C?) The 32081 case needs explanation: This is _still_ using the Megamax library, but doing the +-*/ primitives on a 32081 FPU mounted as a peripheral and running at 4 MHz. This speeded it up by a factor of 4. (Why the error is smaller I don't know.) That is _not_ the best the 32081 can do. I have tested, on my ST, an optimized log() function written in assembler language for the 68000/32081 pair by Hal Hardenbergh of Digital Acoustics. It took 520 microseconds. Extrapolating from there, assuming the other functions will be as fast, predicts that the Savage benchmark time would be 7 seconds, or as fast as an IBM AT! Alas, Hal will not disclose his code for the other functions, and I do not have the time right now to write my own, nor to replace the 32081 with a 68881 (anybody done that?). What can be done to improve the performance of your ST in number-crunching? - Use Absoft FORTRAN - Use the recent version of Atari/DRI/Alcyon C - Pressure your favorite C compiler vendor to get it together - Hack a 32081 onto your ST and write your own (optimized in AL) math library - Hack a 68881 onto your ST - Get a MegaST and a 68881 card (Fall 1987?) - Wait for the Atari TT (_Supposedly_ Winter 1988) - Get the MSDOS add-on box for the ST (when?) and add an 8087 - Give up and get a Mac II or a "Turbo Amiga" (big $$$) - Get an Atari PC (8 MHz 8086) and add an 8087 (Turbo C is here 8-) - Atari PC not yet...) The 68881 is now about $140, similar in price to the 8087. (Finally!) It has the transcendental functions built-in (the 32081 does not). It is designed as a coprocessor for the 68020, although it _can_ be connected to the 68000 as a peripheral (a lot slower). Is the 68881 card for the MegaST (rumored) going to have a 68020 too? Is Atari _ever_ going to build a machine suitable for number-crunching? Keep tuned for the responses from Atari... - Moshe Braner Quiz: what computer comes standard with a mouse but no keyboard? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PS: here is the C code I used: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <stdio.h> #include <math.h> #include <osbind.h> long sysclk; gettime() { sysclk = *((long *)0x4BA); /* System variable: 200 Hz counter */ } main() { int i, iloop; double a; long start, end; Supexec(&gettime); start = sysclk; a = 1.0; iloop = 2499; for (i=0; i<iloop; i++) a = tan(atan(exp(log(sqrt(a*a))))) + 1.0; Supexec(&gettime); end = sysclk; printf("\007time = %f\n", (double)(end-start)/200.0); printf("error = %e\n", a-2500.0); }
sandra@utah-cs.UUCP (03/25/87)
Here are the Savage benchmark results for Absoft Fortran on the ST (courtesy of Wes Cobb): System CPU / FPP MHz LANGUAGE Seconds Error ------------------------------------------------------------------------------ Atari ST 68000/----- 8.0 Absoft F77 v2.2 67.6 1.7 E-07 -Sandra
daemon@watmath.UUCP (03/27/87)
> Thanks to Sandra Loosemore for posting the interesting benchmarks. > Here are results of the Savage benchmark for Megamax C on the Atari ST > (8 MHz 68000): > time error > > Single precision: 146 4.3E+01 > Double precision: 496 8.5E-07 > Double precision, with 32081: 119 2.2E-08 > > > - Moshe Braner > I tried out your code with the Mark Williams Version 2 C compiler (their latest update) and I got some surprising results: time error Single precision: 82.6 -2.56E-01 Double precision: 82.7 -1.19E-07 (I even checked to see that I had the right program :-) It seems that Mark Williams has done their homework with floating point algorithms! The new version has some great features, such as real if statements and while loops in the shell, alias, pages and pages of GEM documentation, etc. I'm really pleased with what I've seen so far. Mike Berkley, University of Waterloo UUCP: {allegra,ihnp4,utcsri,utzoo}!watmath!watsup!mberkley Bitnet: mberkley%watsup%waterloo@csnet-relay.ARPA
hakanson@orstcs.UUCP (03/27/87)
<yum!> And here are the results I get on my 1040ST, using Moshe Braner's code compiled under Mark Williams C v1.1 (should be double precision): sec: 83.695000 error: -1.188916e-7 Note that I just typed these in verbatim from the output of Moshe's C version of the Savage benchmark posted recently. If I remember, I'll try it again & post the results when I get v2.0 of the compiler. Marion Hakanson CSnet: hakanson%oregon-state@csnet-relay UUCP : {hp-pcd,tektronix}!orstcs!hakanson
vic@bobkat.UUCP (03/30/87)
In article <470@batcomputer.tn.cornell.edu> braner@batcomputer.UUCP (braner) writes: >[] > >Thanks to Sandra Loosemore for posting the interesting benchmarks. >Here are results of the Savage benchmark for Megamax C on the Atari ST >(8 MHz 68000): > time error > > Single precision: 146 4.3E+01 > Double precision: 496 8.5E-07 > Double precision, with 32081: 119 2.2E-08 > >The Megamax math library (written in C, using sloppy algorithms) is even >slower than the (in)SANE numeric package on the Apple Macintosh, as >exemplified by Aztec C (353 seconds). In comparision, Absoft FORTRAN >on the Amiga did it in 77 seconds (could someone post the Absoft time >on the ST?), Alcyon C v4.14 (libm) clocked in at 73 seconds, and HP BASIC >(also on an 8 MHz 68000) managed 45 seconds. (Any data for Mark Williams C?) > This is a response to Moshe Braner's posting. My brother Mike Bunnell wrote the floating point math library for the Megamax C compiler about a year ago. He also wrote the C compiler (by the way). He wrote the floating point routines in 2 days because Megamax was anxious to get the compiler out the door. They were supposed to replace the routines along time ago. It looks like they will do so this month. The reason for this posting is I have some benchmark results that I think you will find interesting. The results for the Savage Benchmark for a 68020 (16.67 MHZ) with a 68881 (12.5 MHZ) (compiler PCC): time (in seconds) error Double precision: 0.63 1.177341e-09 The results for the Savage Benchmark for a 68010 (12.5 MHZ) with a 68881 (12.5 MHZ) (Megamax C): time (in seconds) error Double precision: 1.25 1.177341e-09 Note that in the case of the 68010 the floating point processor was hooked up as a peripheral just as it would be on a 68000. Also the 68010 computer is a muli-tasking machine so the floating point processor was accessed through a trap routine. With a single tasking system (like the ST) there would be less overhead because the processor could be accessed in-line. The 68020 was, of course, co-processing with its 68881. It seems to me that adding a 68881 card to the ROM port on the ST would give you a reasonable number crunching machine. You would not even need the added expense of a 68020. According to the schematics there is no read/write line going to the ROM port. If that is true you would have to sneak that line from the DMA (hard disk) port. With such a system you would blow away an 8086+8087 computer. Mitch Bunnell
braner@batcomputer.UUCP (04/01/87)
[] I do agree that a 68881 would be wonderful to have, even on a 68000 machine. But don't let that benchmark trick you into thinking that the penalty for running the 68881 as a peripheral is less than 2:1. The Savage benchmark tests _only_ transcendental functions, where the calculation time (inside the 68881) dominates. In most real-life programs there will be lots of lowly add/sub/mul/div FP ops, where the overhead of communicating with the FP chip is very important (especially when you don't have an optimizing compiler that would keep everything inside the 68881 registers as far as possible). I am happy to hear that Megamax is finally about to upgrade its FP package (or what passed for one). If the complaints on the net and the comparative benchmarks gave the necessary push, then it proves the net's value... (As things currently are, Megamax's FP lib is an order of magnitude slower _and_ buggier than _either_ MWC or Alcyon!) I hope that upgrade is really coming, and that Megamax C owners will be notified and given upgrades. - Moshe Braner
XBR1DA29@DDATHD21.BITNET.UUCP (04/08/87)
Received: from BR1.THD.DA.D.EUROPE by DDATHD21.BITNET via GNET with RJE ; 07 Apr 87 20:52:48 Date: Tue, 7 Apr 87 20:50:39 +0200 (Central European Sommer Time) From: XBR1DA29@DDATHD21.BITNET (Martin Costabel) Subject: Re: Floating Point Benchmarks To: info-atari16@score.stanford.edu X-VMS-To: ATARIINFO,DA29 [] Here are some more Savage benchmark results for the ST (forgive me if they were already on the net): System CPU / FPP MHz LANGUAGE Seconds Error ------------------------------------------------------------------------------ Atari ST 68000/----- 8.0 ProFortran (single prec.) 16 2.7 E+02 Atari ST 68000/----- 8.0 ProFortran (double prec.) 52 3.1 E-07 Atari ST 68000/----- 8.0 GfA-Basic (Interpreter) 15.7 3.7 E-05 Atari ST 68000/----- 8.0 GfA-Basic (Compiler) 13.9 3.7 E-05 Atari ST 68000/----- 8.0 Omikron-Basic (single prec.) 11 0.6 E 00 Atari ST 68000/----- 8.0 Omikron-Basic (double prec.) 76 1.1 E-09 (!) Conclusion: If you want to do number-crunching on the Atari ST, try BASIC ! Here is the GfA-Basic program that was used: Startingtime=Timer A=1 For I=1 To 2499 A=Tan(Atn(Exp(Log(Sqr(A*A)))))+1 Next I Print (Timer-Startingtime)/200'"seconds","Error :"'A-2500 Martin Costabel Technical Univ. Darmstadt Germany xbr1da29@ddathd21.BITNET
dickey@cwruecmp.UUCP (04/08/87)
Last evening, we tried the Savage Benchmark with APL.68000 on the AtariST. We did two tests. The first test, F1, is given by: (*) i IS IOTA 2500 +/ABS i - TAN ATAN EXP LN ( i TIMES i ) * .5 and the second test, F2, is given by: i IS 0 a IS 1 LP: a IS 1 + TAN ATAN EXP LN ( a TIMES a ) * .5 GO (2499 < i IS i+1) /LP a-2500 The results are: Function Time Value F1 119.480 5.867098224E-7 F2 181.700 -5.646261343E-7 Comment: Function F2 is the "Savage benchmark", in which there is a loop in the program, and in each pass through the loop, the value of A is found by by the same sequence of steps given by Bill Savage in his Dr. Dobb's article. Function F1 is similar, but it creates the vector of integers from 1 to 2500, and then does vector operations, exploiting the internal APL compiled loops. To preserve the spirit of the benchmark, the error was accumulated, by adding the absolute values of all the deviations. This executed in about two thirds the time. (*) Keywords: Here we use keywords to describe the APL symbols that actually appear in the programs. A transfer form is available for those who wish to receive a copy. CSNet: dickey@case.csnet ARPA: dickey%case@csnet-relay.arpa UUCP: ...!{decvax,cbosgd,cbatt,sun}!cwruecmp!dickey
braner@batcomputer.UUCP (04/09/87)
[] Interpreters can give good results on the Savage benchmark since most of the time is spent on the tan(), exp(), etc. To judge the suitability of a language system for number crunching you need to check integer and simple-FP-ops performance too! From what I've gathered by now, if you want to crunch numbers you should get a FP chip. On the ST, you should use Absoft Fortran. (Alcyon and MWC are not that far behind, though, and Alcyon (like Fortran) allows single-precision when you need the speed and don't need that much accuracy. Does MWC?) I suggest we gather here some benchmarks about the speed of typical +-*/ FP operations (a complete statement of the form: "a=b+c;" - that's the only way to benchmark!!!). - Moshe Braner