[comp.arch] Block-mode Linear Algebra

mccalpin@vax1.acs.udel.EDU (John D Mccalpin) (04/13/90)

I have been looking into the performance of block-mode
algorithms for linear algebra, and came up with some interesting
results.

The test is the LINPACK 1000x1000 dense system of linear equations.
I have been interested in block-mode algorithms for scalar machines
because they improve cache hits markedly, giving a factor of two
improvement in performance on my machine (Silicon Graphics 4D/25,
w/20MHZ MIPS R-3000/3010 cpu & 32kB data cache).

As a first guess, I assumed that the block-mode algorithms would
not help much on the Cray X/MP and Y/MP, because they have enough
memory bandwidth to run the SAXPY operations in streaming mode, 
with loads and stores completely overlapped with calculations.

I was wrong.

Here are the results from 3 test cases on the Cray Y/MP4-432 at
Florida State.

Case (1) is the plain vanilla FORTRAN LINPACK.  The BLAS (including
	 SAXPY) are unrolled.  This is expected to cause a bit of
	 trouble for the vectorizer, but it still vectorizes.
Case (2) is simply using the SGEFA routine from the hand-coded
	 Cray SCI library.  Optimizations done by Cray on this
	 code include inlining of the BLAS.
Case (3) is an all-FORTRAN block-mode version of SGEFA, kindly
	 provided by earl@mips.com.  Thanks Earl!  It operates
	 on 8 columns of the matrix at a time, and does as much
	 work as possible on those 8 rows before moving to the
	 next 8.  This allows things to sit in vector registers
	 longer (or in cache on machines that have it).  It does
	 not use the BLAS routines -- all the SAXPY operations
	 are specified in-line using standard FORTRAN.

		LINPACK 1000x1000 Performance
----------------------------------------------------------
(1)	all Fortran, unrolled loops		 57 MFLOPS
(2)	Cray sci library, inlined CAL BLAS	145 MFLOPS
(3)	all Fortran, block-mode, inline BLAS	245 MFLOPS
----------------------------------------------------------

It is very interesting that the all-FORTRAN block-mode algorithm 
beats the pants off of the CRAY hand-coded version!
-- 
John D. McCalpin                               mccalpin@vax1.acs.udel.edu
Assistant Professor                            mccalpin@delocn.udel.edu
College of Marine Studies, U. Del.             mccalpin@scri1.scri.fsu.edu