[comp.sys.next] Benchmarking Linpack on the NeXT

guppy@henry.mit.edu (Harold Youngren) (03/10/91)

Here is yet another benchmark for those speed daemons:-) -> the single precision Linpack.

This note compares timings and Mflops for the NeXT and MIPS machines
for the standard single precision Linpack benchmark. The Linpack code
was compared using both f2c/cc and native f77 compilers.
The latest Absoft Ver. 3 compiler was used for the f77 on the NeXT.
Note that Absoft says its full 040 port is still several months off.

The Linpack is a matrix factorization and back substitution benchmark that estimates the performance of a machine on purely linear system performance.
No transcendentals.  The Mflop rating it gives is measured by estimating the 
operation count for the matrix operations  and dividing by the total time,
which is the sum of user+system time.  This is much less dependent on high
resolution timing routines than the Livermore Loops benchmark.

The results are summarized below and sample runs are given for each 
compiler/processor combination. As with my previous benchmarking note, 
the f2c/cc performs better than the current Absoft compiler in Mflops 
and total time. Native compilers for the MIPS processors were 50%-80% faster than the f2c/cc route. 

The comparison of the Livermore Loops and Linpack performance illustrates the pitfalls in benchmarking. The NeXT is cast in much more favorable light with
the Linpack.  The current (mostly 030) port of the Absoft compiler suffers in this benchmark, showing poorer performance than the translated c code. I find this surprising as the Linpack is all linear algeba, and involves only mult/add/divide performance and should not be heavily dependent on currently (un)optimized math libraries. Perhaps this does indicate that the Gnu C compiler is pretty good at optimization.
 
In my experience the Linpack is a relatively poor indicator of overall machine floating point performance. Timings for ALL of my programs (fluid dynamics, structural and large matrix applications) have fallen closer to the Livermore Loops comparisons than the Linpack speed comparisons.  This is, of course, application dependent to some extent, though this gives some guidelines for evaluating compiler and workstation effectiveness for crunching.


*****************************
Single prec. Linpack results
*****************************
Summary:
 
                          f2c/cc                         f77
 NeXT 040         1.78Mflops   24.1sec u+s      1.59Mflops  26.9sec u+s
 
 DEC 3100         1.46Mflops   29.5sec u+s       2.3Mflops  18.6sec u+s
 
 DEC 5000         2.41Mflops   17.9sec u+s      4.43Mflops   9.7sec u+s

Assumptions:  NeXT  040 slab  8Mb memory
              DEC 3100       16Mb
	      DEC 5000       32Mb

The Linpack benchmark was set up for 400x400 matrices for all runs.
Same code for all runs (FloatConvert conversion of .c timing call with Absoft)
f2c translated code compiled for each system with full cc optimization
f77 set for full optimization

For reference the DEC 3100 runs roughly 10x a microVAX with VMS F77

=======================================================================
Sample cases:

NeXT 040 f2c/cc

     norm. resid      resid           machep         x(1)          x(n)
  6.21577024E+00  5.92894154E-04  1.19209290E-07  9.99943674E-01 1.00004315E+00

    times are reported for matrices of order   400
      factor     solve      total     mflops       unit      ratio
  2.395E+01  1.833E-01  2.413E+01  1.781E+00  1.123E+00  4.310E+02

26.575u 0.378s 0:27.32 98% 0+0k 9+0io 0pf+0w


NeXT 040 Absoft F77

     norm. resid      resid           machep         x(1)          x(n)
  6.21577020E+00  5.92894200E-04  1.19209300E-07  9.99943700E-01 1.00004310E+00

    times are reported for matrices of order   400
      factor     solve      total     mflops       unit      ratio
  2.668E+01  2.350E-01  2.691E+01  1.597E+00  1.252E+00  4.806E+02

26.924u 1.825s 0:29.12 98% 0+0k 0+0io 0pf+0w



DEC 3100  f2c/cc

     norm. resid      resid           machep         x(1)          x(n)
  5.78647947E+00  5.51885576E-04  1.19209290E-07  9.99983013E-01 1.00003326E+00

    times are reported for matrices of order   400
      factor     solve      total     mflops       unit      ratio
  2.925E+01  2.833E-01  2.953E+01  1.456E+00  1.374E+00  5.274E+02

31.6u 0.3s 0:32 98% 103+1663k 0+0io 12pf+0w


DEC 3100  f77

     norm. resid      resid           machep         x(1)          x(n)
  6.19695282E+00  5.91099262E-04  1.19209290E-07  9.99943674E-01 1.00004315E+00

    times are reported for matrices of order   400
      factor     solve      total     mflops       unit      ratio
  1.848E+01  1.719E-01  1.865E+01  2.305E+00  8.676E-01  3.330E+02

20.4u 0.1s 0:21 97% 205+1688k 1+0io 14pf+0w



DEC 5000  f2c/cc

     norm. resid      resid           machep         x(1)          x(n)
  5.78647947E+00  5.51885576E-04  1.19209290E-07  9.99983013E-01 1.00003326E+00

    times are reported for matrices of order   400
      factor     solve      total     mflops       unit      ratio
  1.777E+01  1.333E-01  1.790E+01  2.401E+00  8.328E-01  3.196E+02

19.3u 0.1s 0:20 96% 30+415k 0+0io 8pf+0w


DEC 5000  f77

     norm. resid      resid           machep         x(1)          x(n)
  6.19695282E+00  5.91099262E-04  1.19209290E-07  9.99943674E-01 1.00004315E+00


    times are reported for matrices of order   400
      factor     solve      total     mflops       unit      ratio
  9.632E+00  7.421E-02  9.706E+00  4.429E+00  4.516E-01  1.733E+02

10.7u 0.0s 0:11 97% 49+420k 1+0io 12pf+0w



For further information mail to me at guppy@henry.mit.edu

					Hal Youngren