aburto@marlin.NOSC.MIL (Alfred A. Aburto) (01/20/88)
------------ A while back I posted some Savage Benchmark comparison results here. The Savage Results were useful but limited in that only (some of) the transcendental and trigonometric functions were tested. I needed a floating-point add, sub, mul, and divide test ( a simple one ). I tried the C FLOAT (and other programs) but they had problems with some of the optimizing compilers (e.g., zero run times). Anyway, I finally settled on a reasonable test program that seems to hold its own against most optimizing compilers (that I have tested). The program calculates PI using the series expansion of 4 * atan(1). Pretty simple, but it gives reasonably accurate comparisons of system performance with double precision floating-point operations. The main output is expressed in thousands of double precision floating-point operations per second (KFLOPS). KPLOPS is based on the average time it takes to do the double precision +,-,*, and / operations. This is a bit unrealistic because the '/' operation usually dominates since it takes the most time. So the program also outputs a maximum KFLOPS based on the '+' operation. With the 68881 and the Weitek 1167 this results in a measure of the time to do an FADD.D register-to-register (about 600 nanoseconds for the Weitek and 2.6 microseconds 68881 at 20 MHz). If there is any interest I'll post the results I have accumulated so far. Most of the results are for the Amiga and Turbo-Amiga but there are some Sun and VAX and PC results as well. The table of results is about 55 lines long. I will also post the Fortran and C FLOPS programs if others would like to check it out. Thanks, Al Aburto aburto@marlin.nosc.mil.UUCP nosc!marlin!aburto aburto@NOSC.MIL
aburto@marlin.NOSC.MIL (Alfred A. Aburto) (01/21/88)
---------- The following are some double precision floating-point test results --- mostly for the Amiga and Turbo-Amiga. The KFLOPS results (thousands of floating-point operations per second) are based on the average time it takes to do a set of '+', '-', '*', and '/' operations. As such, the results are somewhat biased by the divide operation which normally takes the most time to execute. In any event, it was the relative comparisons in performance that I wanted to to examine, and I thought you might find the results interesting as well. The W1167 FPU is the Weitek 1167 floating-point processor chip set which can do a floating-point add in 600 nanoseconds (1600 KFLOPS Max). The divide operation for the Weitek 1167 seems to take about 3 microseconds at 20 MHz (333 KFLOPS Max). I don't have documentation on the Weitek so I can't confirm these results but it is a very fast (from my viewpoint) floating-point processor. If a compiler had an optimize flag ('-O') then I ran the FLOPS program with and without the flag set. The 'R' option with the Manx Aztec C is my notation to indicate I ran with 'register double' variables. Designating some variables to 'live' in 68881 registers makes quite a bit of difference in the results as shown by the Manx Aztec C results. A huge difference really. The use of 'register double' variables causes some problems though, because there is no 'memory address' for these variables. Absoft Fortran 77 really gets confused in that it forced 68881 register variables to have an address (for a subroutine call) by moving (pushing) the 68881 register variables onto the stack before a subroutine call. The variables were retrieved from the stack after the subroutine call and put back into the original 68881 registers. All the time of course the data within the 68881 registers was totally valid and unchanged. A very inefficient and unnecessary procedure. The Lattice C V4.0 results are really a *vast* improvements over earliar compiler versions (good ol V3.03). It looks to me that the 68000/68881 combo just doesn't do justice to the 68881's inherent processing capability (The StarBD II results from Langeveld (BIX)). The 68000/68881 provides a significant improvement over the 68000 with software floating-point but its performance is well below what can be achieved with the 68020/68881 at the same clock speeds with 16-bit memory. The 68000/68882 pair may improve things quite a bit because the 68882 is 2 times faster with FMOVE instructions than the 68881. Also further improvements can be achieved with compilers that can reduce or eliminate the library or subroutine call overhead delays. Lattice C V4.0 has this capability but I haven't seen any results and I don't have a StarBoard II to test. Also it appears that the 68000/68881 just can't measure up to the more tightly coupled 80286/80287 systems like the Zenith Z-248 or other PC-AT type systems (with respect to the floating-point results here). To step ahead of these systems (from the floating-point viewpoint) the 68020/68881 or 68020/68882 or 68030/68882 is needed (apparently). System Language CPU/FPU CPU/FPU KFLOPS 1 Sun 3/280 Sun F77 V3.4 (f77-O) 68020/W1167 25.0/20.0 652.5 2 Compaq DeskPro High C 386 V?.? 80386/W1167 16.0/16.0 602.4 3 VAX 8600 4.3 BSD UNIX (f77-O) 464.9 4 VAX 8600 4.3 BSD UNIX (f77 ) 436.8 5 Sun 3/280 Sun F77 V3.4 (f77 ) 68020/W1167 25.0/20.0 330.0 6 DSI-785 SVS C V2.6 68020/68881 30.0/30.0 306.8 7 Sun 3/280 Sun F77 V3.4 (f77-O) 68020/68881 25.0/20.0 238.1 8 Compaq DeskPro High C 386 V?.? 80386/80387 16.0/16.0 212.8 9 Sun 3/160 Sun F77 V3.4 (f77-O) 68020/68881 16.7/16.7 199.6 10 Turbo-Amiga Aztec C V3.4B (m8.lib,R) 68020/68881 14.3/14.3 185.7 11 Turbo-Amiga Aztec C V3.4B (m8.lib,R) 68020/68881 7.2/14.3 185.3 12 Turbo-Amiga Absoft F77 V2.2C 68020/68881 14.3/14.3 135.5 13 Turbo-Amiga Absoft F77 V2.2C 68020/68881 7.2/14.3 135.5 14 Sun 3/280 Sun F77 V3.4 (f77 ) 68020/68881 25.0/20.0 109.4 15 Sun 3/160 Sun F77 V3.4 (f77 ) 68020/68881 16.7/16.7 93.1 16 Turbo-Amiga Aztec C V3.4B (m8.lib ) 68020/68881 14.3/14.3 71.2 17 Turbo-Amiga Aztec C V3.4B (m8.lib ) 68020/68881 7.2/14.3 57.1 18 PC's Limited286 Ryan-MacFarland F77 80286/80287 12.0/12.0 48.9 19 Zenith Z-248 Ryan-MacFarland F77 80286/80287 8.0/ 8.0 27.0 20 Sun 3/280 Sun F77 V3.4 (f77 ) 68020/----- 25.0/---- 25.2 21 Sun 3/280 Sun F77 V3.4 (f77-O) 68020/----- 25.0/---- 24.9 22 Turbo-Amiga Aztec C V3.4B (ma.lib,R) 68020/68881 14.3/14.3 22.0 23 Turbo-Amiga Aztec C V3.4B (ma.lib ) 68020/68881 14.3/14.3 21.0 24 Tandy 4000 QuickBASIC V4.0 80386/80287 16.0/ 8.0 18.1 25 Turbo-Amiga Lattice C V4.0( m.lib,R) 68020/----- 14.3/---- 26 Turbo-Amiga Lattice C V4.0( m.lib ) 68020/----- 14.3/---- 15.3 27 Sun 3/160 Sun F77 V3.4 (f77 ) 68020/----- 16.7/---- 14.2 28 Sun 3/160 Sun F77 V3.4 (f77-O) 68020/----- 16.7/---- 13.9 29 Turbo-Amiga Absoft F77 V2.2C 68020/----- 14.3/---- 12.5 30 Turbo-Amiga Aztec C V3.4B (ma.lib,R) 68020/68881 7.2/14.3 12.7 31 Amiga/StarBD II Aztec C V3.4B (mi.lib ) 68000/68881 7.2/12.5 11.7 32 Turbo-Amiga Aztec C V3.4B (ma.lib ) 68020/68881 7.2/14.3 11.9 33 Amiga/StarBD II Aztec C V3.4B (ma.lib ) 68000/68881 7.2/12.5 10.5 34 Turbo-Amiga Aztec C V3.4B (mx.lib,R) 68020/----- 14.3/---- 8.4 35 Turbo-Amiga Aztec C V3.4B (mx.lib ) 68020/----- 14.3/---- 8.3 36 Turbo-Amiga Absoft F77 V2.2C 68020/----- 7.2/---- 6.0 37 Turbo-Amiga Lattice C V4.0( m.lib,R) 68020/----- 7.2/---- 38 Turbo-Amiga Lattice C V4.0( m.lib ) 68020/----- 7.2/---- 5.7 39 Amiga Lattice C V4.0( m.lib ) 68000/----- 7.2/---- 5.2 40 Turbo-Amiga Lattice C V3.03 68020/----- 14.3/---- 4.7 41 Turbo-Amiga Aztec C V3.4B (mx.lib,R) 68020/----- 7.2/---- 4.4 42 Turbo-Amiga Aztec C V3.4B (mx.lib ) 68020/----- 7.2/---- 4.3 43 Amiga Aztec C V3.4B (mx.lib,R) 68000/----- 7.2/---- 4.1 44 Amiga Aztec C V3.4B (mx.lib ) 68000/----- 7.2/---- 4.0 45 Amiga Absoft F77 V2.2C 68000/----- 7.2/---- 3.2 46 Turbo-Amiga Lattice C V3.03 68020/----- 7.2/---- 2.5 47 Amiga Aztec C V3.4B (mx.lib,R) 68000/----- 7.2/---- 2.3 48 Amiga Aztec C V3.4B (mx.lib ) 68000/----- 7.2/---- 2.3 49 Tandy 3000 QuickBASIC V4.0 80386/----- 16.0/---- 2.2 50 Turbo-Amiga AmigaBASIC V1.2 68020/68881 14.3/---- 1.9 51 Turbo-Amiga AmigaBASIC V1.2 68020/----- 14.3/---- 1.5 52 Amiga AmigaBASIC V1.2 68000/----- 7.2/---- 1.4 53 Amiga Lattice C V3.03 68000/----- 7.2/---- 1.1