kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (09/15/90)
Initial report on break-even in discrete FFTs for XP multiply ------------------------------------------------------------- The thy regarding FFT and extended-precision multiply can be found in any reasonable book on computer algorithms and complexity (see, e.g. DEK, Hwang&Briggs, Wilf, etc). The main result, which can be applied to a class of algorithms including polynomial multiplication (encompassing fixed-point multiplication), show these can be performed in ``approximately'' O(n log n) time where n is the size of the operands. With careful selection of parameters (and exact details will remain vague until I verify whether any of this is worth publishing) the FFT ``constant of proportionality'' can be made quite reasonable. The question is -- where is the break-even point wrt naive ``shift-and-add'' multiplication? From an initial benchmark, that does not purport to be of a _correctly_ working fft multiply (regard it as ``synthetic'' at this point), I have obtained the following data (times in ms, figures in parens with optimization): ------------------------------------------------------- fft naive VAX 6440 4020 2590 (3840) (2590) VAX 8530 6617 6516 (6400) (6483) Sun 3/60 9434 9750 (7017) (5717) Sun Sparc 1 8233 19600 (5200) (6533) ------------------------------------------------------- The size of the operands is approximately 160 bits (giving a 160-bit ======== result: high-order being discarded). [Remember, I'm talking about XP calcs implemented in s/w here -- this figure is not directly applicable to h/w implementations of, say, extended precision FP multiplication]. As far as I can determine, these figures represent a true break-even point on all the above architectures except the 6440. Note also that optimization tends to shift the break-even (e.g. Sun 3/60). A more detailed analysis may determine _why_ this is in each case; meanwhile anyone else is free to jump the gun. Since FFTs have an extremely unusual ``constant of proportionality'', that depends on the canonical factorization of the size of the operands the exact characterization of a break-even point is quite difficult. Hence the ``preliminary-only'' nature of this 160-bit figure. [For those that may like to dispute this initial figure out of hand please consider: we wish to compare _apples_ with _apples_. Although it may be possible to hand-code a naive XP mult to give extremely good performance on a given architecture (and, naturally, _none_ of my code is in assembler), the same would be true for the FFT algorithm.] I will post a more detailed report, for those interested, inaculadayz. -Kym Horsell --- PS If anyone can save me some work by pointing at >1 authoritative reference ON THIS SAME WORK, please post or email.