[comp.sys.atari.st] Floating Point Benchmarks

sandra@utah-cs.UUCP (Sandra J Loosemore) (03/19/87)

I think I must have been asleep when I ran my earlier floating point 
benchmarks, because I took a more careful look at it and it turns out I
wrote my numbers down backwards.  Here are the correct numbers for 
primitive arithmetic operations.  These are in 200-hz clock tick units 
for 1000 repetitions of the operation, with no attempt made to account 
the overhead of the loop.  There was no significant difference between
IEEE single and double precision here.

           IEEE        FFP
    +       15          13
    -       15          23
    *       22          20
    /       58          19


Ali Ozer <ali@rocky.stanford.edu> recently sent me a floating point
benchmark program called the "Savage" benchmark, which primarily tests
the double-precision floating point math library.  I've tacked on his
original message to the end.  Here's my C version:

main ()
{   int i, iloop;
    double a;
    long start, end;

    start = gettime ();    
    a = 1.0;
    iloop = 2499;
    for (i=0; i<iloop; i++)
        a = tan(atan(exp(log(sqrt(a*a))))) + 1.0;
    end = gettime ();
    printf ("%e\n", (float)(a-2500.0));		/* error term */
    printf ("%ld\n", (long)(end - start));      /* elapsed time */
    }

And the results for Alcyon C V4.14:

IEEE (libm):  1.763e-7, 72.6 seconds
FFP  (libf):  2.269e+2, 7.4 seconds

So the FFP library is much faster, but loses on accuracy as it is only
single precision.

-Sandra (sandra@cs.utah.edu)

-----------------------------------

***************************************************************
*                     Savage Benchmark Results                *
*                           16 DEC 1986                       *
* Al Aburto/Lew Wolfgang/Larry Phillips/John Gilmore/Ali Ozer *
* Glenn Miller/Mike Howard/And Others........................ *
***************************************************************
123456789012345678901234567890123456789012345678901234567890123456789012345
System         CPU / FPP    CLOCK        LANGUAGE          TIME     ERROR 
		   	    (MHz)                           (Sec)  Abs(a-2500)
Turbo-Amiga  (68020/68881)  14.32   Absoft F77 V2.2B         0.39   2.7 E-12
Sun-3/160    (68020/68881)  16.67   Sun 3.0 F77              0.4    2.0 E-12
Turbo-Amiga  (68020/68881)  14.32   Lattice C/68881 Assem    0.46   9.2 E-13
HP 9000/320  (68020/68881)          Fortran 77               0.7    3.2 E-09
HP 9000/320  (68020/68881)          Pascal                   0.7    2.8 E-07
Amiga        (68020/68881)   7.16   Absoft F77 V2.2B         0.78   2.0 E-12
VAX-8600                            Fortran 77               0.9    1.8 E-08
Amiga        (68020/68881)   7.16   Lattice C/68881 Assem    0.92   5.9 E-12
HP 9000/320  (68020/68881)          C                        1.0    2.5 E-08
DEC 2060                                                     1.6    2.0 E-12
VAX-11/750                          Fortran 77               1.9    6.6 E-10
Masscomp     (68010/  FPP)                                   2.1    3.2 E-07
VAX-11/780                          UNIX 4.3BSD F77-O        2.7    1.8 E-12
Turbo-Amiga  (68020/68881)  14.32   MetaComCo ABasiC V1.0    3.2    2.3 E+01
DMS          ( 8086/ 8087)          Turbo Pascal             3.8    1.1 E-09
Zenith Z-248 (80286/80287)   8.00   MS Fortran77 V3.20       4.5    1.2 E-09
IBM PC-AT    (80286/80287)   6.00   ProFor F77               4.9    8.7 E-11
IBM PC-AT    (80286/80287)   6.00   MS Fortran77             7.2    1.2 E-09
IBM PC-AT    (80286/80287)   6.00   Turbo Pascal             7.4    1.2 E-09
IBM PC       ( 8088/ 8087)   4.77   Microsoft C              8.0    1.2 E-09
Amiga        (68020/68881)   7.16   Metacomco ABasiC V1.0    8.6    2.3 E+01
Turbo-Amiga  (68020/-----)  14.32   Metacomco ABasiC V1.0   13.3    2.7 E+02
Turbo-Amiga  (68020/-----)  14.32   ABasiC V1.0(Cache Off)  14.7    2.7 E+02
Sun-3/160    (68020/-----)  16.67   Sun 3.0 F77             21.5    3.1 E-07
Turbo-Amiga  (68020/-----)  14.32   Absoft F77 V2.2B        21.9    1.8 E-07
Amiga        (68020/-----)   7.16   Metacomco ABasiC V1.0   37.0    2.7 E+02
Amiga        (68000/-----)   7.16   Metacomco ABasiC V1.0   39.7    2.7 E+02
Amiga        (68020/-----)   7.16   ABasiC V1.0(Cache Off)  42.2    2.7 E+02
HP 9826      (68000/-----)   8.00   HP Basic V2.0           44.5    3.2 E-07
Turbo-Amiga  (68020/-----)  14.32   Lattice C V3.03         55.4    3.2 E-07
IBM PC-XT    ( 8088/ 8087)   4.77   Gauss                   58.0    1.2 E-09
Amiga        (68020/-----)   7.16   Absoft F77 V2.2B        59.7    1.8 E-07
HP Integral  (68000/-----)          Basic Interpreter       60.9    3.2 E-07
HP Integral  (68000/-----)          C                       63.0    3.2 E-07
Amiga        (68000/-----)   7.16   True Basic (Compiler)   65.2    3.0 E-03
Amiga        (68020/-----)   7.16   MS AmigaBASIC V1.0      67.0    3.2 E-07
Amiga        (68000/-----)   7.16   MS AmigaBASIC V1.0      73.0    3.2 E-07
Amiga        (68000/-----)   7.16   Absoft F77 V2.2B        77.2    1.8 E-07
HP Integral  (68000/-----)          Absoft F77             100.0    1.8 E-07
Amiga        (68020/-----)   7.16   Lattice C V3.03        139.0    3.2 E-07
Macintosh    (68000/-----)   7.83   MAC C                  221.0    (?)
Amiga        (68000/-----)   7.16   Lattice C V3.03        234.0    3.2 E-07
Macintosh    (68000/-----)   7.83   DeSmet C               244.0    (?)
Commodore 128( 8502/-----)   2.00   Basic Interpreter      256.0    9.0 E-04
Macintosh    (68000/-----)   7.83   Manx Aztec C           353.0    (?)
IBM PC-XT    ( 8088/-----)   4.77   BASICA                 895.0    3.0 E-08
Tandy PC-5                          Basic Interpreter      961.0    2.7 E-03
****************************************************************************
Notes:
	(1) The Savage Benchmark, by Bill Savage, first appeared in Dr. Dobb's
	    Journal, Sept 1983, page 120.
        (2) The Macintosh results are from Byte, The Small Systems Journal,
	    Aug 1986, page 254.  There appears to be a 'typo' in the
	    published accuracy results.  Exact result should be 2500.0  .
        (3) The Savage Benchmark requires use of IEEE double precision
	    to obtain a reasonably small error. The error is unacceptably 
	    large for IEEE single precision.  All the above results were
	    obtained with double precision except for the MetaComCo ABasiC
	    where double precision variables were used but the math functions
	    were calculated only to single precision.  As can be seen ABasiC
	    is fast but the error is too large for a meaningful result.



-----------------------------------

c Here is the Savage Benchmark Program:
c **************************************
c *           Fortran 77               *
c **************************************
	Program Savage
	implicit double precision (a-h,o-z)

	write(*,1000)

	a = 1.0
	iloop = 2499

	do 100 i=1,iloop
	 a = dtan(datan(dexp(dlog(dsqrt(a*a))))) + 1.0
 100     continue

       write(*,1010)
       write(*,1020) a
 1000  format(5x,'Start')
 1010  format(5x,'Stop ')
 1020  format(5x,'a = ',f22.15)
       stop
       end



-----------------------------------

braner@batcomputer.UUCP (03/21/87)

[]

Thanks to Sandra Loosemore for posting the interesting benchmarks.
Here are results of the Savage benchmark for Megamax C on the Atari ST
(8 MHz 68000):
					time	error

	Single precision:		146	4.3E+01
	Double precision:		496	8.5E-07
	Double precision, with 32081:	119	2.2E-08

The Megamax math library (written in C, using sloppy algorithms) is even
slower than the (in)SANE numeric package on the Apple Macintosh, as
exemplified by Aztec C (353 seconds).  In comparision, Absoft FORTRAN
on the Amiga did it in 77 seconds (could someone post the Absoft time
on the ST?), Alcyon C v4.14 (libm) clocked in at 73 seconds, and HP BASIC
(also on an 8 MHz 68000) managed 45 seconds.  (Any data for Mark Williams C?)

The 32081 case needs explanation:  This is _still_ using the Megamax library,
but doing the +-*/ primitives on a 32081 FPU mounted as a peripheral and
running at 4 MHz.  This speeded it up by a factor of 4. (Why the error is
smaller I don't know.)  That is _not_ the best the 32081 can do.  I have
tested, on my ST, an optimized log() function written in assembler language
for the 68000/32081 pair by Hal Hardenbergh of Digital Acoustics.  It took
520 microseconds.  Extrapolating from there, assuming the other functions
will be as fast, predicts that the Savage benchmark time would be 7 seconds,
or as fast as an IBM AT!  Alas, Hal will not disclose his code for the other
functions, and I do not have the time right now to write my own, nor to
replace the 32081 with a 68881 (anybody done that?).

What can be done to improve the performance of your ST in number-crunching?

	- Use Absoft FORTRAN
	- Use the recent version of Atari/DRI/Alcyon C
	- Pressure your favorite C compiler vendor to get it together
	- Hack a 32081 onto your ST and
		write your own (optimized in AL) math library
	- Hack a 68881 onto your ST
	- Get a MegaST and a 68881 card (Fall 1987?)
	- Wait for the Atari TT (_Supposedly_ Winter 1988)
	- Get the MSDOS add-on box for the ST (when?) and add an 8087
	- Give up and get a Mac II or a "Turbo Amiga" (big $$$)
	- Get an Atari PC (8 MHz 8086) and add an 8087
		(Turbo C is here 8-)  - Atari PC not yet...)

The 68881 is now about $140, similar in price to the 8087.  (Finally!)
It has the transcendental functions built-in (the 32081 does not).
It is designed as a coprocessor for the 68020, although it _can_ be
connected to the 68000 as a peripheral (a lot slower).  Is the 68881 card
for the MegaST (rumored) going to have a 68020 too?  Is Atari _ever_ going
to build a machine suitable for number-crunching?  Keep tuned for the
responses from Atari...

- Moshe Braner

Quiz: what computer comes standard with a mouse but no keyboard?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PS: here is the C code I used:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <stdio.h>
#include <math.h>
#include <osbind.h>
long	sysclk;
gettime() {
	sysclk = *((long *)0x4BA);	/* System variable: 200 Hz counter */
}
main() {
	int	i, iloop;
	double	a;
	long	start, end;
	Supexec(&gettime);	start = sysclk;
	a = 1.0;	iloop = 2499;
	for (i=0; i<iloop; i++)
		a = tan(atan(exp(log(sqrt(a*a))))) + 1.0;
	Supexec(&gettime);	end = sysclk;
	printf("\007time = %f\n", (double)(end-start)/200.0);
	printf("error = %e\n", a-2500.0);
}

sandra@utah-cs.UUCP (03/25/87)

Here are the Savage benchmark results for Absoft Fortran on the ST (courtesy
of Wes Cobb):


System        CPU / FPP      MHz       LANGUAGE          Seconds    Error
------------------------------------------------------------------------------
Atari ST      68000/-----    8.0      Absoft F77 v2.2      67.6    1.7 E-07


-Sandra

daemon@watmath.UUCP (03/27/87)

> Thanks to Sandra Loosemore for posting the interesting benchmarks.
> Here are results of the Savage benchmark for Megamax C on the Atari ST
> (8 MHz 68000):
> 					time	error
> 
> 	Single precision:		146	4.3E+01
> 	Double precision:		496	8.5E-07
> 	Double precision, with 32081:	119	2.2E-08
> 
> 
> - Moshe Braner
> 

I tried out your code with the Mark Williams Version 2 C compiler
(their latest update) and I got some surprising results:


					time	error

	Single precision:		82.6	-2.56E-01
	Double precision:		82.7	-1.19E-07

(I even checked to see that I had the right program :-)

It seems that Mark Williams has done their homework with floating
point algorithms!


The new version has some great features, such as real if statements
and while loops in the shell, alias, pages and pages of GEM
documentation, etc.  I'm really pleased with what I've seen so far.


Mike Berkley, University of Waterloo

UUCP:		{allegra,ihnp4,utcsri,utzoo}!watmath!watsup!mberkley
Bitnet:		mberkley%watsup%waterloo@csnet-relay.ARPA

hakanson@orstcs.UUCP (03/27/87)

<yum!>

And here are the results I get on my 1040ST, using Moshe Braner's code
compiled under Mark Williams C v1.1 (should be double precision):

	sec: 83.695000	error: -1.188916e-7

Note that I just typed these in verbatim from the output of Moshe's
C version of the Savage benchmark posted recently.  If I remember, I'll
try it again & post the results when I get v2.0 of the compiler.

Marion Hakanson         CSnet:  hakanson%oregon-state@csnet-relay
                        UUCP :  {hp-pcd,tektronix}!orstcs!hakanson

vic@bobkat.UUCP (03/30/87)

In article <470@batcomputer.tn.cornell.edu> braner@batcomputer.UUCP (braner) writes:
>[]
>
>Thanks to Sandra Loosemore for posting the interesting benchmarks.
>Here are results of the Savage benchmark for Megamax C on the Atari ST
>(8 MHz 68000):
>					time	error
>
>	Single precision:		146	4.3E+01
>	Double precision:		496	8.5E-07
>	Double precision, with 32081:	119	2.2E-08
>
>The Megamax math library (written in C, using sloppy algorithms) is even
>slower than the (in)SANE numeric package on the Apple Macintosh, as
>exemplified by Aztec C (353 seconds).  In comparision, Absoft FORTRAN
>on the Amiga did it in 77 seconds (could someone post the Absoft time
>on the ST?), Alcyon C v4.14 (libm) clocked in at 73 seconds, and HP BASIC
>(also on an 8 MHz 68000) managed 45 seconds.  (Any data for Mark Williams C?)
>

This is a response to Moshe Braner's posting. My brother Mike Bunnell wrote
the floating point math library for the Megamax C compiler about a year ago.
He also wrote the C compiler (by the way). He wrote the floating point routines
in 2 days because Megamax was anxious to get the compiler out the door. They
were supposed to replace the routines along time ago. It looks like they will
do so this month.

The reason for this posting is I have some benchmark results that I think you
will find interesting.

The results for the Savage Benchmark for a 68020 (16.67 MHZ)
with a 68881 (12.5 MHZ) (compiler PCC):

                        time (in seconds)       error
    Double precision:   0.63                    1.177341e-09

The results for the Savage Benchmark for a 68010 (12.5 MHZ)
with a 68881 (12.5 MHZ) (Megamax C):

                        time (in seconds)       error
    Double precision:   1.25                    1.177341e-09

Note that in the case of the 68010 the floating point processor was hooked
up as a peripheral just as it would be on a 68000. Also the 68010 computer
is a muli-tasking machine so the floating point processor was accessed through
a trap routine. With a single tasking system (like the ST) there would be
less overhead because the processor could be accessed in-line.

The 68020 was, of course, co-processing with its 68881.

It seems to me that adding a 68881 card to the ROM port on the ST would give
you a reasonable number crunching machine.  You would not even need the
added expense of a 68020. According to the schematics there is no read/write
line going to the ROM port.  If that is true you would have to sneak that line
from the DMA (hard disk) port.

With such a system you would blow away an 8086+8087 computer.

Mitch Bunnell

braner@batcomputer.UUCP (04/01/87)

[]

I do agree that a 68881 would be wonderful to have, even on a 68000
machine.  But don't let that benchmark trick you into thinking that
the penalty for running the 68881 as a peripheral is less than 2:1.
The Savage benchmark tests _only_ transcendental functions, where the
calculation time (inside the 68881) dominates.  In most real-life programs
there will be lots of lowly add/sub/mul/div FP ops, where the overhead
of communicating with the FP chip is very important (especially when
you don't have an optimizing compiler that would keep everything inside
the 68881 registers as far as possible).

I am happy to hear that Megamax is finally about to upgrade its FP
package (or what passed for one).  If the complaints on the net
and the comparative benchmarks gave the necessary push, then it proves
the net's value...  (As things currently are, Megamax's FP lib is an order
of magnitude slower _and_ buggier than _either_ MWC or Alcyon!)
I hope that upgrade is really coming, and that Megamax C owners will be
notified and given upgrades.

- Moshe Braner

XBR1DA29@DDATHD21.BITNET.UUCP (04/08/87)

Received: from BR1.THD.DA.D.EUROPE by DDATHD21.BITNET
          via GNET with RJE ; 07 Apr 87 20:52:48
Date:     Tue,  7 Apr 87 20:50:39 +0200 (Central European Sommer Time)
From:     XBR1DA29@DDATHD21.BITNET (Martin Costabel)
Subject:  Re: Floating Point Benchmarks
To:       info-atari16@score.stanford.edu
X-VMS-To: ATARIINFO,DA29

[]

Here are some more Savage benchmark results for the ST (forgive me if they were
already on the net):

System        CPU / FPP      MHz       LANGUAGE          Seconds    Error
------------------------------------------------------------------------------
Atari ST      68000/-----    8.0  ProFortran (single prec.)  16    2.7 E+02
Atari ST      68000/-----    8.0  ProFortran (double prec.)  52    3.1 E-07
Atari ST      68000/-----    8.0  GfA-Basic (Interpreter)    15.7  3.7 E-05
Atari ST      68000/-----    8.0  GfA-Basic (Compiler)       13.9  3.7 E-05
Atari ST      68000/-----    8.0  Omikron-Basic (single prec.) 11  0.6 E 00
Atari ST      68000/-----    8.0  Omikron-Basic (double prec.) 76  1.1 E-09 (!)

Conclusion: If you want to do number-crunching on the Atari ST, try BASIC !
Here is the GfA-Basic program that was used:

Startingtime=Timer
A=1
For I=1 To 2499
  A=Tan(Atn(Exp(Log(Sqr(A*A)))))+1
Next I
Print (Timer-Startingtime)/200'"seconds","Error :"'A-2500

 Martin Costabel
 Technical Univ.
 Darmstadt
 Germany
 xbr1da29@ddathd21.BITNET

dickey@cwruecmp.UUCP (04/08/87)

Last evening, we tried the Savage Benchmark with APL.68000 on the
AtariST.  We did two tests.  The first test, F1, is given by:  (*)

	i IS IOTA 2500
	+/ABS i - TAN ATAN EXP LN ( i TIMES i ) * .5

and the second test, F2, is given by:

	i IS 0                    
	a IS 1                    
	LP: a IS 1 + TAN ATAN EXP LN ( a TIMES a ) * .5
  	GO (2499 < i IS i+1) /LP       
	a-2500                 

The results are:
	Function  	Time  		Value

	F1		119.480		 5.867098224E-7         
	F2		181.700		-5.646261343E-7        

Comment:
Function F2 is the "Savage benchmark", in which there is a loop in the
program, and in each pass through the loop, the value of A is found by
by the same sequence of steps given by Bill Savage in his Dr. Dobb's
article.  Function F1 is similar, but it creates the vector of integers
from 1 to 2500, and then does vector operations, exploiting the
internal APL compiled loops.  To preserve the spirit of the benchmark,
the error was accumulated, by adding the absolute values of all the
deviations.  This executed in about two thirds the time.

(*) Keywords: 
Here we use keywords to describe the APL symbols that actually
appear in the programs.  A transfer form is available for those
who wish to receive a copy.

 CSNet: dickey@case.csnet
 ARPA:	dickey%case@csnet-relay.arpa
 UUCP:	...!{decvax,cbosgd,cbatt,sun}!cwruecmp!dickey

braner@batcomputer.UUCP (04/09/87)

[]

Interpreters can give good results on the Savage benchmark since most
of the time is spent on the tan(), exp(), etc.  To judge the suitability
of a language system for number crunching you need to check integer
and simple-FP-ops performance too!

From what I've gathered by now, if you want to crunch numbers you should get
a FP chip.  On the ST, you should use Absoft Fortran.  (Alcyon and MWC are
not that far behind, though, and Alcyon (like Fortran) allows single-precision
when you need the speed and don't need that much accuracy.  Does MWC?)

I suggest we gather here some benchmarks about the speed of typical +-*/
FP operations (a complete statement of the form: "a=b+c;" - that's the
only way to benchmark!!!).

- Moshe Braner