ballen@convex.csd.uwm.edu (Bruce Allen) (05/24/91)
I'm an infrequent visitor to this newsgroup, but it seems like a reasonable place to post my request. I'm looking for a hand-coded FFT routine for a SparcStation 2. I posted a similar note on comp.sys.sun and got no useful replies. I am looking for something that was initially coded high-level, for instance, and then recoded by hand to wring out every last iota of performance. Please reply directly, thanks Bruce Allen (ballen@dirac.phys.uwm.edu)
nather@ut-emx.uucp (Ed Nather) (05/29/91)
In article <12392@uwm.edu>, ballen@convex.csd.uwm.edu (Bruce Allen) writes: > > I am looking for > something that was initially coded high-level, for instance, and > then recoded by hand to wring out every last iota of performance. Then you are looking for a procedure that is probably far less efficient than it could be. When a programmer codes a known algorithm, he thinks in terms of the operations available to him -- for example, in Fortran or C there are a set of operations to choose from, and he chooses from them. They are then translated into machine code by a compiler/assembler/linker. But when he codes directly in machine code (or in assembler) he thinks in terms of the basic machine operations available, and chooses from them. He can often find quicker ways to get the same result using this richer set of operations, than when he is limited by those available at a higher level. This effect was probably minimal in C when the target was a PDP-11, but can by much greater for other target machines. Wringing out the last iota of performance, when the whole coding approach is not optimal, can give the illusion of economy, but there may be greater economies to be found by starting with a different set of available operations. As an example, several years ago I ran a published version of the Sieve of Eratosthenes written in Fortran and timed it on a machine not noted for its Fortran compiler efficiency, then recoded the same algorithm in assembler and ran it. The assembler version ran 22 times faster. It made use of an obscure operation in the inner loop, where the combination "i + i + 1" was accomplished in a single instruction -- not something a compiler would be likely to generate. I believe in compilers, use them most of the time, but can still write faster code in assembler when performance is otherwise unacceptable. I keep waiting for the day when CPU speed is so fast I don't have to do this, but the day hasn't arrived yet -- at least not for computers I can afford. -- Ed Nather Astronomy Dept, U of Texas @ Austin