[comp.arch] Price/Performance figures for Number-Crunching

mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (03/19/91)

Here is a table with some info I have derived from Jack Dongarra's
latest LINPACK report.  

The important columns are the last two, giving the performance/price
ratio in MFLOPS per Million dollars for two different MFLOPS
estimators: 

(1) The optimized LINPACK 1000x1000 case; and
(2) The memory-bandwidth-limited MFLOPS rate for long dyadic vector
    operations.  

The former number gives the best-case results for each machine, while
the latter number is (in my experience) a good estimator of the
performance of real, un-optimized (but vectorizable) codes.

I was surprised at how well the traditional supercomputers and minisupers
are holding up in price/performance..... 

Given that these numbers are only accurate to within +/- 25% (at best?),
the bottom line is that only the IBM 320 noticeably exceeds the Y/MP in 
"Streaming MFLOPS" per Million dollars.   But since the difference 
in performance is a factor of 200, it is not really appropriate to
make a one-for-one sort of comparison between these two machines.

For "cache-friendly" applications, on the other hand, several machines
are noticeably more cost-effective than the Cray's, notably the IBM
and Stardent machines, with a pretty good performance turned in by
the SGI 4D/380.

By the way, the prices used are for the best available University 
discounts and including the cost of 3rd-party memory and disk drives.

		-----------------------------------
		Performance Summary Table - LINPACK
		-----------------------------------

		MFLOPS	MFLOPS	MFLOPS	MFLOPS	Price	MFLOPS/Million$
System		Peak	Max	Lnpk	Stream	$10**6   Max	Stream
-----------------------------------------------------------------------
IBM 550		 82	 62	  27	 12	 0.13	 477	  92
MIPS RC6280	 24	 16	  10	  8	 0.20	  80	  40
IBM 320		 40	 29	   9	  6	<0.02	1450	 300
-----------------------------------------------------------------------
Convex C-210	 50	 44	  17	  9	~0.5	  88	  18
Convex C-240	200	166	  26	 36	~1.6	 104	  23
-----------------------------------------------------------------------
Cray Y/MP-1	333	324	  25	150	~3.0	 108	  50
Cray Y/MP-8    2664    2144	 275   1200    ~16.0	 134	  75
-----------------------------------------------------------------------
1xIBM 3090E VF	116	 71	  13	 11	~3.0	  24	   4
2xIBM 3090E VF	232	141	  26*	 22	~5.0	  28	   4
3xIBM 3090E VF	348	210	  39*	 33	~7.0	  30	   5
-----------------------------------------------------------------------
SGI 4D/310	 10	  8	   6	  3
SGI 4D/380	 80	 52	  48*	  3	~0.20	 260	  15
-----------------------------------------------------------------------
Stardent 3010	 32	 25	  10	  6
Stardent 3040	128	 77	  12	 11	~0.25	 308	  44
-----------------------------------------------------------------------
IBM 320		 40	 29	   9	  6	<0.02	1450	 300
8x  IBM 320	320	232*	  72*	 48	 0.11	1450*	 300
16x IBM 320	640	464*	 144*	 96	 0.21	1450*	 300
-----------------------------------------------------------------------
(*) indicates extrapolated figures.

Definitions:
------------
"MFLOPS Peak" is the hardware-limited peak performance.  This
	is the performance which the hardware is "guaranteed
	not to exceed".
"MFLOPS Max" is the observed performance for highly optimized
	code in the solution of a 1000x1000 dense system of
	equations.  All numbers are observed, unless marked
	by '*', in which case they are extrapolated.
"MFLOPS Lnpk" is the observed performance on the LINPACK 100x100
	system of equations using standard Fortran.
"MFLOPS Stream" Is the bandwidth-limited speed for 64-bit dyadic 
	vector operations, and is (usually) the best estimate 
	for the speed of *unoptimized* 64-bit floating-point 
	codes on each machine.
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET