mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (03/19/91)
Here is a table with some info I have derived from Jack Dongarra's
latest LINPACK report.
The important columns are the last two, giving the performance/price
ratio in MFLOPS per Million dollars for two different MFLOPS
estimators:
(1) The optimized LINPACK 1000x1000 case; and
(2) The memory-bandwidth-limited MFLOPS rate for long dyadic vector
operations.
The former number gives the best-case results for each machine, while
the latter number is (in my experience) a good estimator of the
performance of real, un-optimized (but vectorizable) codes.
I was surprised at how well the traditional supercomputers and minisupers
are holding up in price/performance.....
Given that these numbers are only accurate to within +/- 25% (at best?),
the bottom line is that only the IBM 320 noticeably exceeds the Y/MP in
"Streaming MFLOPS" per Million dollars. But since the difference
in performance is a factor of 200, it is not really appropriate to
make a one-for-one sort of comparison between these two machines.
For "cache-friendly" applications, on the other hand, several machines
are noticeably more cost-effective than the Cray's, notably the IBM
and Stardent machines, with a pretty good performance turned in by
the SGI 4D/380.
By the way, the prices used are for the best available University
discounts and including the cost of 3rd-party memory and disk drives.
-----------------------------------
Performance Summary Table - LINPACK
-----------------------------------
MFLOPS MFLOPS MFLOPS MFLOPS Price MFLOPS/Million$
System Peak Max Lnpk Stream $10**6 Max Stream
-----------------------------------------------------------------------
IBM 550 82 62 27 12 0.13 477 92
MIPS RC6280 24 16 10 8 0.20 80 40
IBM 320 40 29 9 6 <0.02 1450 300
-----------------------------------------------------------------------
Convex C-210 50 44 17 9 ~0.5 88 18
Convex C-240 200 166 26 36 ~1.6 104 23
-----------------------------------------------------------------------
Cray Y/MP-1 333 324 25 150 ~3.0 108 50
Cray Y/MP-8 2664 2144 275 1200 ~16.0 134 75
-----------------------------------------------------------------------
1xIBM 3090E VF 116 71 13 11 ~3.0 24 4
2xIBM 3090E VF 232 141 26* 22 ~5.0 28 4
3xIBM 3090E VF 348 210 39* 33 ~7.0 30 5
-----------------------------------------------------------------------
SGI 4D/310 10 8 6 3
SGI 4D/380 80 52 48* 3 ~0.20 260 15
-----------------------------------------------------------------------
Stardent 3010 32 25 10 6
Stardent 3040 128 77 12 11 ~0.25 308 44
-----------------------------------------------------------------------
IBM 320 40 29 9 6 <0.02 1450 300
8x IBM 320 320 232* 72* 48 0.11 1450* 300
16x IBM 320 640 464* 144* 96 0.21 1450* 300
-----------------------------------------------------------------------
(*) indicates extrapolated figures.
Definitions:
------------
"MFLOPS Peak" is the hardware-limited peak performance. This
is the performance which the hardware is "guaranteed
not to exceed".
"MFLOPS Max" is the observed performance for highly optimized
code in the solution of a 1000x1000 dense system of
equations. All numbers are observed, unless marked
by '*', in which case they are extrapolated.
"MFLOPS Lnpk" is the observed performance on the LINPACK 100x100
system of equations using standard Fortran.
"MFLOPS Stream" Is the bandwidth-limited speed for 64-bit dyadic
vector operations, and is (usually) the best estimate
for the speed of *unoptimized* 64-bit floating-point
codes on each machine.
--
John D. McCalpin mccalpin@perelandra.cms.udel.edu
Assistant Professor mccalpin@brahms.udel.edu
College of Marine Studies, U. Del. J.MCCALPIN/OMNET