guppy@henry.mit.edu (Harold Youngren) (03/10/91)
Here is yet another benchmark for those speed daemons:-) -> the single precision Linpack.
This note compares timings and Mflops for the NeXT and MIPS machines
for the standard single precision Linpack benchmark. The Linpack code
was compared using both f2c/cc and native f77 compilers.
The latest Absoft Ver. 3 compiler was used for the f77 on the NeXT.
Note that Absoft says its full 040 port is still several months off.
The Linpack is a matrix factorization and back substitution benchmark that estimates the performance of a machine on purely linear system performance.
No transcendentals. The Mflop rating it gives is measured by estimating the
operation count for the matrix operations and dividing by the total time,
which is the sum of user+system time. This is much less dependent on high
resolution timing routines than the Livermore Loops benchmark.
The results are summarized below and sample runs are given for each
compiler/processor combination. As with my previous benchmarking note,
the f2c/cc performs better than the current Absoft compiler in Mflops
and total time. Native compilers for the MIPS processors were 50%-80% faster than the f2c/cc route.
The comparison of the Livermore Loops and Linpack performance illustrates the pitfalls in benchmarking. The NeXT is cast in much more favorable light with
the Linpack. The current (mostly 030) port of the Absoft compiler suffers in this benchmark, showing poorer performance than the translated c code. I find this surprising as the Linpack is all linear algeba, and involves only mult/add/divide performance and should not be heavily dependent on currently (un)optimized math libraries. Perhaps this does indicate that the Gnu C compiler is pretty good at optimization.
In my experience the Linpack is a relatively poor indicator of overall machine floating point performance. Timings for ALL of my programs (fluid dynamics, structural and large matrix applications) have fallen closer to the Livermore Loops comparisons than the Linpack speed comparisons. This is, of course, application dependent to some extent, though this gives some guidelines for evaluating compiler and workstation effectiveness for crunching.
*****************************
Single prec. Linpack results
*****************************
Summary:
f2c/cc f77
NeXT 040 1.78Mflops 24.1sec u+s 1.59Mflops 26.9sec u+s
DEC 3100 1.46Mflops 29.5sec u+s 2.3Mflops 18.6sec u+s
DEC 5000 2.41Mflops 17.9sec u+s 4.43Mflops 9.7sec u+s
Assumptions: NeXT 040 slab 8Mb memory
DEC 3100 16Mb
DEC 5000 32Mb
The Linpack benchmark was set up for 400x400 matrices for all runs.
Same code for all runs (FloatConvert conversion of .c timing call with Absoft)
f2c translated code compiled for each system with full cc optimization
f77 set for full optimization
For reference the DEC 3100 runs roughly 10x a microVAX with VMS F77
=======================================================================
Sample cases:
NeXT 040 f2c/cc
norm. resid resid machep x(1) x(n)
6.21577024E+00 5.92894154E-04 1.19209290E-07 9.99943674E-01 1.00004315E+00
times are reported for matrices of order 400
factor solve total mflops unit ratio
2.395E+01 1.833E-01 2.413E+01 1.781E+00 1.123E+00 4.310E+02
26.575u 0.378s 0:27.32 98% 0+0k 9+0io 0pf+0w
NeXT 040 Absoft F77
norm. resid resid machep x(1) x(n)
6.21577020E+00 5.92894200E-04 1.19209300E-07 9.99943700E-01 1.00004310E+00
times are reported for matrices of order 400
factor solve total mflops unit ratio
2.668E+01 2.350E-01 2.691E+01 1.597E+00 1.252E+00 4.806E+02
26.924u 1.825s 0:29.12 98% 0+0k 0+0io 0pf+0w
DEC 3100 f2c/cc
norm. resid resid machep x(1) x(n)
5.78647947E+00 5.51885576E-04 1.19209290E-07 9.99983013E-01 1.00003326E+00
times are reported for matrices of order 400
factor solve total mflops unit ratio
2.925E+01 2.833E-01 2.953E+01 1.456E+00 1.374E+00 5.274E+02
31.6u 0.3s 0:32 98% 103+1663k 0+0io 12pf+0w
DEC 3100 f77
norm. resid resid machep x(1) x(n)
6.19695282E+00 5.91099262E-04 1.19209290E-07 9.99943674E-01 1.00004315E+00
times are reported for matrices of order 400
factor solve total mflops unit ratio
1.848E+01 1.719E-01 1.865E+01 2.305E+00 8.676E-01 3.330E+02
20.4u 0.1s 0:21 97% 205+1688k 1+0io 14pf+0w
DEC 5000 f2c/cc
norm. resid resid machep x(1) x(n)
5.78647947E+00 5.51885576E-04 1.19209290E-07 9.99983013E-01 1.00003326E+00
times are reported for matrices of order 400
factor solve total mflops unit ratio
1.777E+01 1.333E-01 1.790E+01 2.401E+00 8.328E-01 3.196E+02
19.3u 0.1s 0:20 96% 30+415k 0+0io 8pf+0w
DEC 5000 f77
norm. resid resid machep x(1) x(n)
6.19695282E+00 5.91099262E-04 1.19209290E-07 9.99943674E-01 1.00004315E+00
times are reported for matrices of order 400
factor solve total mflops unit ratio
9.632E+00 7.421E-02 9.706E+00 4.429E+00 4.516E-01 1.733E+02
10.7u 0.0s 0:11 97% 49+420k 1+0io 12pf+0w
For further information mail to me at guppy@henry.mit.edu
Hal Youngren