[comp.sys.sgi] Basic Linear Algebra Subroutines

tff@na.toronto.edu (Tom Fairgrieve) (07/13/90)

Does SGI have an optimized version of the BLAS (Basic Linear Algebra 
Subroutines) available for the 4d/240?  If so, how does the performance
of this version compare to a version produced by the f77 compiler with
-O3 optimization level set?  I'm interested in all 3 levels of the BLAS.

Thanks for any information,
  Tom Fairgrieve
  tff@na.utoronto.ca

jpp@pipo.corp.sgi.com (Jean-Pierre Panziera) (07/14/90)

In article <90Jul13.100737edt.8304@ephemeral.ai.toronto.edu>,
tff@na.toronto.edu (Tom Fairgrieve) writes:
> From: tff@na.toronto.edu (Tom Fairgrieve)
> Subject: Basic Linear Algebra Subroutines (BLAS)
> Date: 13 Jul 90 14:08:02 GMT
> Organization: Department of Computer Science, University of Toronto
> 
> Does SGI have an optimized version of the BLAS (Basic Linear Algebra 
> Subroutines) available for the 4d/240?  If so, how does the performance
> of this version compare to a version produced by the f77 compiler with
> -O3 optimization level set?  I'm interested in all 3 levels of the BLAS.
> 
> Thanks for any information,
>   Tom Fairgrieve
>   tff@na.utoronto.ca


As far as I know SGI does not have an official version of BLAS3,
I may be wrong.

However I have optimized/parallelized a Fortran version of
the matrix multiplication routines of  Blas3 

I get pretty good results on a 220-GTX :

dgemm 5-11 Mflops
zgemm 10-14 Mflops
sgemm 10-16 Mflops
cgemm 12-17 Mflops

the lowest performances are for  A * trans(B), the highest for trans(A) * B

I am sure it can be improved and I do not warranty it is bug free.

bron@bronze.wpd.sgi.com (Bron Campbell Nelson) (07/17/90)

In article <90Jul13.100737edt.8304@ephemeral.ai.toronto.edu>, tff@na.toronto.edu (Tom Fairgrieve) writes:
> Does SGI have an optimized version of the BLAS (Basic Linear Algebra 
> Subroutines) available for the 4d/240?  If so, how does the performance
> of this version compare to a version produced by the f77 compiler with
> -O3 optimization level set?  I'm interested in all 3 levels of the BLAS.

As far as I know, SGI does not have versions of the BLAS libraries.
However, Kuck and Associates, Inc. (KAI) in Illinois does sell math
libraries that are tuned to run on SGI multiprocessors.  If I remember
correctly (always a dangerous assumption) one customer was able to
hit over 50MFLOPS on an 8 cpu machine using the KAI software.

My *personal* opinion is that the KAI library is very good and very fast.

Contact KAI directly for more info.  I believe Debbie Carr is still their
marketing person: try  dcarr@kai.com

Standard disclaimer:
This is provided for information only.  Neither I nor SGI make any
warrenties, either express or implied.  And so on blah blah blah etc. etc.

--
Bron Campbell Nelson
bron@sgi.com  or possibly  ..!ames!sgi!bron
These statements are my own, not those of Silicon Graphics.