treed%tom.dallas@SGI.COM (Thomas E Reed) (02/13/90)
Hi: I'm looking for any FFT software that is available and runs on the 4D/2XX products. The faster the better especially if it is parallel code or has been parallelized. If there is interest let me know and I'll post any and all info. Thanks -- Tom Reed SGI - Dallas email: treed@sgidal.dallas.sgi.com vmail: 8705 phone: 214-788-4122
goss@SNOW-WHITE.MERIT-TECH.COM (Mike Goss) (02/13/90)
In reply to the message from Tom Reed: > Date: Mon, 12 Feb 90 11:24:01 CST > From: Thomas E Reed <treed%tom.dallas@sgi.com> > Message-Id: <9002121724.AA17444@tom.dallas.sgi.com> > To: info@tom.dallas.sgi.com > Subject: FFT's on 4D/2XX systems > . > . > . > I'm looking for any FFT software that is available and runs on the > 4D/2XX products. The faster the better especially if it is parallel code or > has been parallelized. > . > . > . The book "Numerical Recipes in C" (also available in FORTRAN and Pascal versions) has several good FFT routines, although not in a parallelized form. I'd recommend this book for any numerical work. It has lots of useful code, ready to run, and good explanations of the algorithms involved. ------------------------------ Mike Goss Merit Technology Inc. (214)733-7018 goss@snow-white.merit-tech.com
bron@bronze.wpd.sgi.com (Bron Campbell Nelson) (02/13/90)
In article <9002121724.AA17444@tom.dallas.sgi.com>, treed%tom.dallas@SGI.COM (Thomas E Reed) writes: > I'm looking for any FFT software that is available and runs on the > 4D/2XX products. The faster the better especially if it is parallel code or > has been parallelized. > Kuck and Associates, Inc (KAI) has several numerical libraries that are optimized for the SGI multi-processor. I do not know for *sure* that they include FFT's, but I believe they do. The libraries are parallelized to take advantage of SGI's multiprocessor machines. They can be reached at 1906 Fox Drive; Champaign, IL 61820 (217)356-2288. I believe Ms. Davida Bluhm is their marketing person. They are also on the net: I believe [d]bluhm@kai.com works, but I won't guarentee it. I have no benchmarks or pricing information. All possible disclaimers apply; this is posted purely for informational purposes. -- Bron Campbell Nelson bron@sgi.com or possibly ..!ames!sgi!bron These statements are my own, not those of Silicon Graphics.
sgf@cs.brown.edu (Sam Fulcomer) (02/15/90)
In article <9002122052.AA21651@snow-white.merit-tech.com> goss@SNOW-WHITE.MERIT-TECH.COM (Mike Goss) writes: >In reply to the message from Tom Reed: >> I'm looking for any FFT software that is available and runs on the >> 4D/2XX products. The faster the better especially if it is parallel code or > >The book "Numerical Recipes in C" (also available in FORTRAN and Pascal >versions) has several good FFT routines, although not in a parallelized form. Well, _Numerical_Recipes_ is ok, and I haven't bothered to try to p'ize the f77 codes yet, however it might be worthwhile (I haven't poked them much). It's quite possible that PFA won't like them much. Many numerical packages (IMSL in particular) aren't very adaptable to parallel arches. Another problem with all current (although NAG is working on it, as may be others) numerical packages is that they are not optimized for big-memory problems on cache machines (ie, as matrix size goes up data cache hits go down, as does performance). Algorithms optimized for processing address-regions of data in blocks are the solution to this problem (although monster data caches are another). The important thing to understand when trying to get performance out of a multi-proc SGI is to exactly typify the use which it's seeing when you want the performance. Parallelized code will run well (on a 4-proc system) if it is the only (or nearly only) thing running on the system. If you've got 2 of the beasts running you _may_ still be getting better than single proc performance, but don't bet on it. Don't even bother running if you don't have (effectively) 2 idle processors. I haven't bothered using the PFA since we typically have 2 or 3 things going on at any given time on our 4D/240GTX (64MB) with someone running 4Sight. My experience with it has been limited to bitching at people who've run multi-proc jobs on a busy system (and helping them PFA their code). I am very pleased with the things performance on single proc jobs, though. On an idle system the machine will run 4 copies of the same computation in the same time that only one takes (wall clock). A one-processor job (heavy FPU) seems to take about 2-3 times as much CPU time as on a 3090 with vector proc (the program vectorized on the 3090).