mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (03/19/91)
Here is a table with some info I have derived from Jack Dongarra's latest LINPACK report. The important columns are the last two, giving the performance/price ratio in MFLOPS per Million dollars for two different MFLOPS estimators: (1) The optimized LINPACK 1000x1000 case; and (2) The memory-bandwidth-limited MFLOPS rate for long dyadic vector operations. The former number gives the best-case results for each machine, while the latter number is (in my experience) a good estimator of the performance of real, un-optimized (but vectorizable) codes. I was surprised at how well the traditional supercomputers and minisupers are holding up in price/performance..... Given that these numbers are only accurate to within +/- 25% (at best?), the bottom line is that only the IBM 320 noticeably exceeds the Y/MP in "Streaming MFLOPS" per Million dollars. But since the difference in performance is a factor of 200, it is not really appropriate to make a one-for-one sort of comparison between these two machines. For "cache-friendly" applications, on the other hand, several machines are noticeably more cost-effective than the Cray's, notably the IBM and Stardent machines, with a pretty good performance turned in by the SGI 4D/380. By the way, the prices used are for the best available University discounts and including the cost of 3rd-party memory and disk drives. ----------------------------------- Performance Summary Table - LINPACK ----------------------------------- MFLOPS MFLOPS MFLOPS MFLOPS Price MFLOPS/Million$ System Peak Max Lnpk Stream $10**6 Max Stream ----------------------------------------------------------------------- IBM 550 82 62 27 12 0.13 477 92 MIPS RC6280 24 16 10 8 0.20 80 40 IBM 320 40 29 9 6 <0.02 1450 300 ----------------------------------------------------------------------- Convex C-210 50 44 17 9 ~0.5 88 18 Convex C-240 200 166 26 36 ~1.6 104 23 ----------------------------------------------------------------------- Cray Y/MP-1 333 324 25 150 ~3.0 108 50 Cray Y/MP-8 2664 2144 275 1200 ~16.0 134 75 ----------------------------------------------------------------------- 1xIBM 3090E VF 116 71 13 11 ~3.0 24 4 2xIBM 3090E VF 232 141 26* 22 ~5.0 28 4 3xIBM 3090E VF 348 210 39* 33 ~7.0 30 5 ----------------------------------------------------------------------- SGI 4D/310 10 8 6 3 SGI 4D/380 80 52 48* 3 ~0.20 260 15 ----------------------------------------------------------------------- Stardent 3010 32 25 10 6 Stardent 3040 128 77 12 11 ~0.25 308 44 ----------------------------------------------------------------------- IBM 320 40 29 9 6 <0.02 1450 300 8x IBM 320 320 232* 72* 48 0.11 1450* 300 16x IBM 320 640 464* 144* 96 0.21 1450* 300 ----------------------------------------------------------------------- (*) indicates extrapolated figures. Definitions: ------------ "MFLOPS Peak" is the hardware-limited peak performance. This is the performance which the hardware is "guaranteed not to exceed". "MFLOPS Max" is the observed performance for highly optimized code in the solution of a 1000x1000 dense system of equations. All numbers are observed, unless marked by '*', in which case they are extrapolated. "MFLOPS Lnpk" is the observed performance on the LINPACK 100x100 system of equations using standard Fortran. "MFLOPS Stream" Is the bandwidth-limited speed for 64-bit dyadic vector operations, and is (usually) the best estimate for the speed of *unoptimized* 64-bit floating-point codes on each machine. -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET