[comp.sys.sgi] FFT's on 4D/2XX systems

treed%tom.dallas@SGI.COM (Thomas E Reed) (02/13/90)

Hi:


I'm looking for any FFT software that is available and runs on the
4D/2XX products. The faster the better especially if it is parallel code or
has been parallelized.


If there is interest let me know and I'll post any and all info.


                                           Thanks
--

					Tom Reed
					SGI - Dallas

				     email: treed@sgidal.dallas.sgi.com
				     vmail: 8705
				     phone: 214-788-4122

goss@SNOW-WHITE.MERIT-TECH.COM (Mike Goss) (02/13/90)

In reply to the message from Tom Reed:

> Date: Mon, 12 Feb 90 11:24:01 CST
> From: Thomas E Reed <treed%tom.dallas@sgi.com>
> Message-Id: <9002121724.AA17444@tom.dallas.sgi.com>
> To: info@tom.dallas.sgi.com
> Subject: FFT's on 4D/2XX systems
> .
> .
> .
> I'm looking for any FFT software that is available and runs on the
> 4D/2XX products. The faster the better especially if it is parallel code or
> has been parallelized.
> .
> .
> .

The book "Numerical Recipes in C" (also available in FORTRAN and Pascal
versions) has several good FFT routines, although not in a parallelized form.
I'd recommend this book for any numerical work.  It has lots of useful
code, ready to run, and good explanations of the algorithms involved.


------------------------------
Mike Goss
Merit Technology Inc.
(214)733-7018
goss@snow-white.merit-tech.com

bron@bronze.wpd.sgi.com (Bron Campbell Nelson) (02/13/90)

In article <9002121724.AA17444@tom.dallas.sgi.com>, treed%tom.dallas@SGI.COM (Thomas E Reed) writes:
> I'm looking for any FFT software that is available and runs on the
> 4D/2XX products. The faster the better especially if it is parallel code or
> has been parallelized.
> 

Kuck and Associates, Inc (KAI) has several numerical libraries that are
optimized for the SGI multi-processor.  I do not know for *sure* that they
include FFT's, but I believe they do.  The libraries are parallelized
to take advantage of SGI's multiprocessor machines.

They can be reached at 1906 Fox Drive; Champaign, IL 61820 (217)356-2288.
I believe Ms. Davida Bluhm is their marketing person.  They are also on the
net: I believe [d]bluhm@kai.com works, but I won't guarentee it.  I have no
benchmarks or pricing information.

All possible disclaimers apply; this is posted purely for informational
purposes.

--
Bron Campbell Nelson
bron@sgi.com  or possibly  ..!ames!sgi!bron
These statements are my own, not those of Silicon Graphics.

sgf@cs.brown.edu (Sam Fulcomer) (02/15/90)

In article <9002122052.AA21651@snow-white.merit-tech.com> goss@SNOW-WHITE.MERIT-TECH.COM (Mike Goss) writes:
>In reply to the message from Tom Reed:
>> I'm looking for any FFT software that is available and runs on the
>> 4D/2XX products. The faster the better especially if it is parallel code or
>
>The book "Numerical Recipes in C" (also available in FORTRAN and Pascal
>versions) has several good FFT routines, although not in a parallelized form.

Well, _Numerical_Recipes_ is ok, and I haven't bothered to try to p'ize the
f77 codes yet, however it might be worthwhile (I haven't poked them much).
It's quite possible that PFA won't like them much. Many numerical packages
(IMSL in particular) aren't very adaptable to parallel arches.

Another problem with all current (although NAG is working on it, as may be 
others) numerical packages is that they are not optimized for big-memory 
problems on cache machines (ie, as matrix size goes up data cache hits go
down, as does performance). Algorithms optimized for processing address-regions
of data in blocks are the solution to this problem (although monster data 
caches are another). 

The important thing to understand when trying to get performance out of a 
multi-proc SGI is to exactly typify the use which it's seeing when you want
the performance. Parallelized code will run well (on a 4-proc system) if it is 
the only (or nearly only) thing running on the system. If you've got 2 of the 
beasts running you _may_ still be getting better than single proc performance, 
but don't bet on it. Don't even bother running if you don't have (effectively)
2 idle processors. 

I haven't bothered using the PFA since we typically have 2 or 3 things going
on at any given time on our 4D/240GTX (64MB) with someone running 4Sight.
My experience with it has been limited to bitching at people who've run
multi-proc jobs on a busy system (and helping them PFA their code).

I am very pleased with the things performance on single proc jobs, though. On 
an idle system the machine will run 4 copies of the same computation in the 
same time that only one takes (wall clock). A one-processor job (heavy FPU)
seems to take about 2-3 times as much CPU time as on a 3090 with vector proc
(the program vectorized on the 3090).