guppy@henry.mit.edu (Harold Youngren) (03/10/91)
Enclosed are benchmarking results comparing NeXT and the DEC MIPS machines for the standard Livermore Loops benchmark. For comparison purposes the f2c/cc translator and native f77 compilers were used on each machine. The latest Ver. 3 Absoft compiler was used for the f77 on the NeXT (they tell me the full 040 port is still several months off). The Livermore Loops times computations on 14 different loops, each representing a particular class of frequently encountered numerical problem. Each loop is timed separately to generate the Mflops estimate. Initialization is done outside the timing loops. The idea is that the composite Mflops rating is a good estimate of machine performance for general number crunching. Specialized applications may be faster, ie. the Linpack benchmark. The problem with the tests is in the timing of the loops - a fast clock is necessary to eliminate "granularity" in the times. Normally several runs are averaged to give more consistent results. None of the loops involves transcendental computations, these would be considerably slower on the NeXT with the current math libraries. The results are summarized below and sample runs are given for each compiler/processor combination. Note in particular that the f2c/cc performs as well as the current Absoft compiler in Mflops and better in terms of total time. The Absoft compiler also performs particularly badly for loops 13 and 14 in the benchmark (see below), this has been brought to their attention. These results closely match my experience with porting numeric codes to the NeXT, I have found that the f2c/cc consistently outperforms the current native f77 compiler, in extreme cases by as much as 30%. Note that for the MIPS machines the native compiler is roughly 50% faster than the f2c/cc translated code in terms of raw Mflops. ************************* Livermore Loops results ************************* Summary: f2c/cc f77 NeXT 040 1.27Mflops 23.2sec u+s 1.25Mflops 26.6sec u+s DEC 3100 2.41Mflops 17.6sec u+s 3.76Mflops 15.7sec u+s DEC 5000 3.63Mflops 9.1sec u+s 5.61Mflops 8.1sec u+s Assumptions: NeXT 040 slab 8Mb memory DEC 3100 16Mb DEC 5000 32Mb Times and Mflops are averaged results for 5 runs. Same code for all runs (FloatConvert conversion of .c timing call with Absoft) f2c compiled for each system with full cc optimization f77 set for full optimization. The latest version of the att f2c code was used, not the old archive version. The Loops benchmark is not a huge storage hog, only 40k R*4 storage is used. For reference the DEC 3100 runs roughly 10x a microVAX with VMS FORTRAN ======================================================================= Sample cases: NeXT 040 Slab f2c/cc MFLOPS - Livermore loops Timing Tests. All times are in microseconds = seconds/1,000,000 Clock overhead = 39.8300 usec Each loop was repeated 100 times for accuracy. Each loop represents a common type of calculation. The checksums are for debugging only. The FLOPS column counts the number of floating point operations carried out in the loop. The TIME column counts the total time in the loop. MFLOPS counts the corresponding MegaFLOP rate. Loop Checksum FLOPS TIME MegaFLOPS 1 .811987050000E+07 200000 104481.35 1.91 2 .356310577393E+03 200000 124524.37 1.61 3 .356312805176E+03 200000 157142.17 1.27 4 -.402412578125E+05 102000 103119.39 .99 5 .136579046875E+06 200000 153934.02 1.30 6 .419716531250E+06 200000 153476.73 1.30 7 .429449800000E+07 192000 76445.12 2.51 8 .314064437500E+06 144000 77781.22 1.85 9 .182709000000E+07 170000 92209.35 1.84 10 -.140415344000E+09 90000 116250.53 .77 11 .374892352000E+09 100000 129634.40 .77 12 .000000000000E+00 100000 123457.45 .81 13 .171449062500E+06 89600 239869.61 .37 14 -.508451750000E+07 165000 204294.70 .81 Average MegaFLOPS= 1.29 16.911u 6.368s 0:24.61 94% 0+0k 11+0io 0pf+0w NeXT 040 Slab Absoft F77 MFLOPS - Livermore loops Timing Tests. All times are in microseconds = seconds/1,000,000 Clock overhead = 40.2475 usec Each loop was repeated 100 times for accuracy. Each loop represents a common type of calculation. The checksums are for debugging only. The FLOPS column counts the number of floating point operations carried out in the loop. The TIME column counts the total time in the loop. MFLOPS counts the corresponding MegaFLOP rate. Loop Checksum FLOPS TIME MegaFLOPS 1 0.811987000000E+07 200000 89263.05 2.24 2 0.356309930000E+03 200000 110413.40 1.81 3 0.356310100000E+03 200000 116882.42 1.71 4 -0.402411910000E+05 102000 110168.92 .93 5 0.136579030000E+06 200000 171979.90 1.16 6 0.419716530000E+06 200000 173243.80 1.15 7 0.429449800000E+07 192000 80450.76 2.39 8 0.314064400000E+06 144000 80094.56 1.80 9 0.182709000000E+07 170000 87512.72 1.94 10 -0.140415200000E+09 90000 91792.80 .98 11 0.374892200000E+09 100000 118482.33 .84 12 0.000000000000E+00 100000 102852.60 .97 13 0.171449000000E+06 896004690539.00 .02 14 -0.508451700000E+07 1650002839298.00 .06 Average MegaFLOPS= 1.29 13.459u 13.210s 0:28.08 94% 0+0k 0+0io 0pf+0w DEC 3100 f2c/cc MFLOPS - Livermore loops Timing Tests. All times are in microseconds = seconds/1,000,000 Clock overhead = 53.7075 usec Each loop was repeated 100 times for accuracy. Each loop represents a common type of calculation. The checksums are for debugging only. The FLOPS column counts the number of floating point operations carried out in the loop. The TIME column counts the total time in the loop. MFLOPS counts the corresponding MegaFLOP rate. Loop Checksum FLOPS TIME MegaFLOPS 1 .811987100000E+07 200000 72749.35 2.75 2 .356310577393E+03 200000 57125.32 3.50 3 .356312805176E+03 200000 72749.24 2.75 4 -.402412617188E+05 102000 53219.19 1.92 5 .136579031250E+06 200000 33689.37 5.94 6 .419716562500E+06 200000 96185.08 2.08 7 .429449850000E+07 192000 61031.69 3.15 8 .314064437500E+06 144000 41500.91 3.47 9 .182709000000E+07 170000 53220.62 3.19 10 -.140415344000E+09 90000 49314.37 1.83 11 .374892384000E+09 100000 76656.21 1.30 12 .000000000000E+00 100000 68843.71 1.45 13 .171449062500E+06 89600 100090.37 .90 14 -.508451800000E+07 165000 119619.23 1.38 Average MegaFLOPS= 2.54 15.2u 2.4s 0:17 99% 110+184k 0+0io 3pf+0w DEC 3100 f77 MFLOPS - Livermore loops Timing Tests. All times are in microseconds = seconds/1,000,000 Clock overhead = 40.5860 usec Each loop was repeated 100 times for accuracy. Each loop represents a common type of calculation. The checksums are for debugging only. The FLOPS column counts the number of floating point operations carried out in the loop. The TIME column counts the total time in the loop. MFLOPS counts the corresponding MegaFLOP rate. Loop Checksum FLOPS TIME MegaFLOPS 1 0.811987050000E+07 200000 46719.43 4.28 2 0.356312805176E+03 200000 54531.45 3.67 3 0.356312805176E+03 200000 54531.45 3.67 4 -0.402412578125E+05 102000 46719.55 2.18 5 0.136579046875E+06 200000 38907.53 5.14 6 0.419716531250E+06 200000 74061.39 2.70 7 0.429449900000E+07 192000 31095.50 6.17 8 0.314064437500E+06 144000 15471.22 9.31 9 0.182709000000E+07 170000 27189.02 6.25 10 -0.140415344000E+09 90000 38908.72 2.31 11 0.374892352000E+09 100000 35001.99 2.86 12 0.000000000000E+00 100000 46718.84 2.14 13 0.171449078125E+06 89600 74060.68 1.21 14 -0.508451850000E+07 165000 77965.98 2.12 Average MegaFLOPS= 3.86 11.4u 4.2s 0:15 98% 131+214k 0+0io 3pf+0w DEC 5000 f2c/cc MFLOPS - Livermore loops Timing Tests. All times are in microseconds = seconds/1,000,000 Clock overhead = 20.2331 usec Each loop was repeated 100 times for accuracy. Each loop represents a common type of calculation. The checksums are for debugging only. The FLOPS column counts the number of floating point operations carried out in the loop. The TIME column counts the total time in the loop. MFLOPS counts the corresponding MegaFLOP rate. Loop Checksum FLOPS TIME MegaFLOPS 1 .811987100000E+07 200000 37036.66 5.40 2 .356310577393E+03 200000 37036.72 5.40 3 .356312805176E+03 200000 40942.76 4.88 4 -.402412617188E+05 102000 33130.68 3.08 5 .136579031250E+06 200000 60472.64 3.31 6 .419716562500E+06 200000 52660.86 3.80 7 .429449850000E+07 192000 40942.58 4.69 8 .314064437500E+06 144000 29224.55 4.93 9 .182709000000E+07 170000 40942.58 4.15 10 -.140415344000E+09 90000 17506.75 5.14 11 .374892384000E+09 100000 48754.61 2.05 12 .000000000000E+00 100000 48755.32 2.05 13 .171449062500E+06 89600 76096.45 1.18 14 -.508451800000E+07 165000 83908.47 1.97 Average MegaFLOPS= 3.72 5.7u 3.4s 0:09 97% 33+40k 0+0io 0pf+0w DEC 5000 f77 MFLOPS - Livermore loops Timing Tests. All times are in microseconds = seconds/1,000,000 Clock overhead = 20.7421 usec Each loop was repeated 100 times for accuracy. Each loop represents a common type of calculation. The checksums are for debugging only. The FLOPS column counts the number of floating point operations carried out in the loop. The TIME column counts the total time in the loop. MFLOPS counts the corresponding MegaFLOP rate. Loop Checksum FLOPS TIME MegaFLOPS 1 0.811987050000E+07 200000 33079.86 6.05 2 0.356312805176E+03 200000 36985.84 5.41 3 0.356312805176E+03 200000 21361.74 9.36 4 -0.402412578125E+05 102000 21361.86 4.77 5 0.136579046875E+06 200000 33079.65 6.05 6 0.419716531250E+06 200000 33079.89 6.05 7 0.429449900000E+07 192000 29173.64 6.58 8 0.314064437500E+06 144000 21361.86 6.74 9 0.182709000000E+07 170000 29173.88 5.83 10 -0.140415344000E+09 90000 9643.82 9.33 11 0.374892352000E+09 100000 25267.63 3.96 12 0.000000000000E+00 100000 52609.95 1.90 13 0.171449078125E+06 89600 52609.95 1.70 14 -0.508451850000E+07 165000 44797.93 3.68 Average MegaFLOPS= 5.53 5.2u 2.9s 0:08 97% 32+53k 0+0io 3pf+0w For further information or a copy of the benchmark, mail to me at guppy@henry.mit.edu Hal Youngren MIT Aero/Astro CFD Lab