[comp.sys.next] Benchmarking of NeXT - Livermore Loops

guppy@henry.mit.edu (Harold Youngren) (03/10/91)

Enclosed are benchmarking results comparing NeXT and the DEC MIPS machines
for the standard Livermore Loops benchmark. For comparison purposes the f2c/cc 
translator and native f77 compilers were used on each machine.
The latest Ver. 3 Absoft compiler was used for the f77 on the NeXT
(they tell me the full 040 port is still several months off).

The Livermore Loops times computations on 14 different loops, each 
representing a particular class of frequently encountered numerical problem.
Each loop is timed separately to generate the Mflops estimate. 
Initialization is done outside the timing loops. The idea is that the 
composite Mflops rating is a good estimate of machine performance for 
general number crunching.  Specialized applications may be faster, ie. 
the Linpack benchmark. The problem with the tests is in the timing of the 
loops - a fast clock is necessary to eliminate "granularity" in the times.  
Normally several runs are averaged to give more consistent results.
None of the loops involves transcendental computations, these would be 
considerably slower on the NeXT with the current math libraries.

The results are summarized below and sample runs are given for each 
compiler/processor combination. Note in particular that the f2c/cc performs 
as well as the current Absoft compiler in Mflops and better in terms of
total time.  The Absoft compiler also performs particularly badly for loops 
13 and 14 in the benchmark (see below), this has been brought to their 
attention.  These results closely match my experience with porting numeric 
codes to the NeXT, I have found that the f2c/cc consistently outperforms 
the current native f77 compiler, in extreme cases by as much as 30%.  
Note that for the MIPS machines the native compiler is roughly 50% faster
than the f2c/cc translated code in terms of raw Mflops.


*************************
Livermore Loops results
*************************
Summary:
 
                          f2c/cc                         f77
 NeXT 040         1.27Mflops  23.2sec u+s      1.25Mflops  26.6sec u+s
 
 DEC 3100         2.41Mflops  17.6sec u+s      3.76Mflops  15.7sec u+s
 
 DEC 5000         3.63Mflops   9.1sec u+s      5.61Mflops   8.1sec u+s

Assumptions:  NeXT  040 slab  8Mb memory
              DEC 3100       16Mb
	      DEC 5000       32Mb

Times and Mflops are averaged results for 5 runs.
Same code for all runs (FloatConvert conversion of .c timing call with Absoft)
f2c compiled for each system with full cc optimization
f77 set for full optimization.

The latest version of the att f2c code was used, not the old archive version.
The Loops benchmark is not a huge storage hog, only 40k R*4 storage is used.

For reference the DEC 3100 runs roughly 10x a microVAX with VMS FORTRAN

=======================================================================
Sample cases:


NeXT 040 Slab f2c/cc 
 MFLOPS - Livermore loops Timing Tests.

 All times are in microseconds = seconds/1,000,000
 Clock overhead =    39.8300     usec
 Each loop was repeated    100 times for accuracy.

 Each loop represents a common type of calculation.
 The checksums are for debugging only.
 The FLOPS column counts the number of floating point
 operations carried out in the loop.
 The TIME column counts the total time in the loop.
 MFLOPS counts the corresponding MegaFLOP rate.

 Loop         Checksum        FLOPS       TIME     MegaFLOPS

   1      .811987050000E+07    200000 104481.35      1.91
   2      .356310577393E+03    200000 124524.37      1.61
   3      .356312805176E+03    200000 157142.17      1.27
   4     -.402412578125E+05    102000 103119.39       .99
   5      .136579046875E+06    200000 153934.02      1.30
   6      .419716531250E+06    200000 153476.73      1.30
   7      .429449800000E+07    192000  76445.12      2.51
   8      .314064437500E+06    144000  77781.22      1.85
   9      .182709000000E+07    170000  92209.35      1.84
  10     -.140415344000E+09     90000 116250.53       .77
  11      .374892352000E+09    100000 129634.40       .77
  12      .000000000000E+00    100000 123457.45       .81
  13      .171449062500E+06     89600 239869.61       .37
  14     -.508451750000E+07    165000 204294.70       .81

 Average MegaFLOPS=    1.29
16.911u 6.368s 0:24.61 94% 0+0k 11+0io 0pf+0w


NeXT 040 Slab Absoft F77
 MFLOPS - Livermore loops Timing Tests.

 All times are in microseconds = seconds/1,000,000
 Clock overhead =    40.2475     usec
 Each loop was repeated    100 times for accuracy.

 Each loop represents a common type of calculation.
 The checksums are for debugging only.
 The FLOPS column counts the number of floating point
 operations carried out in the loop.
 The TIME column counts the total time in the loop.
 MFLOPS counts the corresponding MegaFLOP rate.

 Loop         Checksum        FLOPS       TIME     MegaFLOPS

   1     0.811987000000E+07    200000  89263.05      2.24
   2     0.356309930000E+03    200000 110413.40      1.81
   3     0.356310100000E+03    200000 116882.42      1.71
   4    -0.402411910000E+05    102000 110168.92       .93
   5     0.136579030000E+06    200000 171979.90      1.16
   6     0.419716530000E+06    200000 173243.80      1.15
   7     0.429449800000E+07    192000  80450.76      2.39
   8     0.314064400000E+06    144000  80094.56      1.80
   9     0.182709000000E+07    170000  87512.72      1.94
  10    -0.140415200000E+09     90000  91792.80       .98
  11     0.374892200000E+09    100000 118482.33       .84
  12     0.000000000000E+00    100000 102852.60       .97
  13     0.171449000000E+06     896004690539.00       .02
  14    -0.508451700000E+07    1650002839298.00       .06

 Average MegaFLOPS=    1.29
13.459u 13.210s 0:28.08 94% 0+0k 0+0io 0pf+0w


DEC 3100  f2c/cc
 MFLOPS - Livermore loops Timing Tests.

 All times are in microseconds = seconds/1,000,000
 Clock overhead =    53.7075     usec
 Each loop was repeated    100 times for accuracy.

 Each loop represents a common type of calculation.
 The checksums are for debugging only.
 The FLOPS column counts the number of floating point
 operations carried out in the loop.
 The TIME column counts the total time in the loop.
 MFLOPS counts the corresponding MegaFLOP rate.

 Loop         Checksum        FLOPS       TIME     MegaFLOPS

   1      .811987100000E+07    200000  72749.35      2.75
   2      .356310577393E+03    200000  57125.32      3.50
   3      .356312805176E+03    200000  72749.24      2.75
   4     -.402412617188E+05    102000  53219.19      1.92
   5      .136579031250E+06    200000  33689.37      5.94
   6      .419716562500E+06    200000  96185.08      2.08
   7      .429449850000E+07    192000  61031.69      3.15
   8      .314064437500E+06    144000  41500.91      3.47
   9      .182709000000E+07    170000  53220.62      3.19
  10     -.140415344000E+09     90000  49314.37      1.83
  11      .374892384000E+09    100000  76656.21      1.30
  12      .000000000000E+00    100000  68843.71      1.45
  13      .171449062500E+06     89600 100090.37       .90
  14     -.508451800000E+07    165000 119619.23      1.38

 Average MegaFLOPS=    2.54
15.2u 2.4s 0:17 99% 110+184k 0+0io 3pf+0w


DEC 3100  f77
 MFLOPS - Livermore loops Timing Tests.

 All times are in microseconds = seconds/1,000,000
 Clock overhead =    40.5860     usec
 Each loop was repeated    100 times for accuracy.

 Each loop represents a common type of calculation.
 The checksums are for debugging only.
 The FLOPS column counts the number of floating point
 operations carried out in the loop.
 The TIME column counts the total time in the loop.
 MFLOPS counts the corresponding MegaFLOP rate.

 Loop         Checksum        FLOPS       TIME     MegaFLOPS

   1     0.811987050000E+07    200000  46719.43      4.28
   2     0.356312805176E+03    200000  54531.45      3.67
   3     0.356312805176E+03    200000  54531.45      3.67
   4    -0.402412578125E+05    102000  46719.55      2.18
   5     0.136579046875E+06    200000  38907.53      5.14
   6     0.419716531250E+06    200000  74061.39      2.70
   7     0.429449900000E+07    192000  31095.50      6.17
   8     0.314064437500E+06    144000  15471.22      9.31
   9     0.182709000000E+07    170000  27189.02      6.25
  10    -0.140415344000E+09     90000  38908.72      2.31
  11     0.374892352000E+09    100000  35001.99      2.86
  12     0.000000000000E+00    100000  46718.84      2.14
  13     0.171449078125E+06     89600  74060.68      1.21
  14    -0.508451850000E+07    165000  77965.98      2.12

 Average MegaFLOPS=    3.86
11.4u 4.2s 0:15 98% 131+214k 0+0io 3pf+0w


DEC 5000 f2c/cc
 MFLOPS - Livermore loops Timing Tests.

 All times are in microseconds = seconds/1,000,000
 Clock overhead =    20.2331     usec
 Each loop was repeated    100 times for accuracy.

 Each loop represents a common type of calculation.
 The checksums are for debugging only.
 The FLOPS column counts the number of floating point
 operations carried out in the loop.
 The TIME column counts the total time in the loop.
 MFLOPS counts the corresponding MegaFLOP rate.

 Loop         Checksum        FLOPS       TIME     MegaFLOPS

   1      .811987100000E+07    200000  37036.66      5.40
   2      .356310577393E+03    200000  37036.72      5.40
   3      .356312805176E+03    200000  40942.76      4.88
   4     -.402412617188E+05    102000  33130.68      3.08
   5      .136579031250E+06    200000  60472.64      3.31
   6      .419716562500E+06    200000  52660.86      3.80
   7      .429449850000E+07    192000  40942.58      4.69
   8      .314064437500E+06    144000  29224.55      4.93
   9      .182709000000E+07    170000  40942.58      4.15
  10     -.140415344000E+09     90000  17506.75      5.14
  11      .374892384000E+09    100000  48754.61      2.05
  12      .000000000000E+00    100000  48755.32      2.05
  13      .171449062500E+06     89600  76096.45      1.18
  14     -.508451800000E+07    165000  83908.47      1.97

 Average MegaFLOPS=    3.72
5.7u 3.4s 0:09 97% 33+40k 0+0io 0pf+0w

DEC 5000 f77
 MFLOPS - Livermore loops Timing Tests.

 All times are in microseconds = seconds/1,000,000
 Clock overhead =    20.7421     usec
 Each loop was repeated    100 times for accuracy.

 Each loop represents a common type of calculation.
 The checksums are for debugging only.
 The FLOPS column counts the number of floating point
 operations carried out in the loop.
 The TIME column counts the total time in the loop.
 MFLOPS counts the corresponding MegaFLOP rate.

 Loop         Checksum        FLOPS       TIME     MegaFLOPS

   1     0.811987050000E+07    200000  33079.86      6.05
   2     0.356312805176E+03    200000  36985.84      5.41
   3     0.356312805176E+03    200000  21361.74      9.36
   4    -0.402412578125E+05    102000  21361.86      4.77
   5     0.136579046875E+06    200000  33079.65      6.05
   6     0.419716531250E+06    200000  33079.89      6.05
   7     0.429449900000E+07    192000  29173.64      6.58
   8     0.314064437500E+06    144000  21361.86      6.74
   9     0.182709000000E+07    170000  29173.88      5.83
  10    -0.140415344000E+09     90000   9643.82      9.33
  11     0.374892352000E+09    100000  25267.63      3.96
  12     0.000000000000E+00    100000  52609.95      1.90
  13     0.171449078125E+06     89600  52609.95      1.70
  14    -0.508451850000E+07    165000  44797.93      3.68

 Average MegaFLOPS=    5.53
5.2u 2.9s 0:08 97% 32+53k 0+0io 3pf+0w

For further information or a copy of the benchmark, mail to me at
guppy@henry.mit.edu

                      Hal Youngren 
                      MIT Aero/Astro CFD Lab