flatau@handel.CS.ColoState.Edu (flatau) (01/11/91)
Archive-name: math/num-analysis/cray-fft/1991-01-10 Archive: perelandra.cms.udel.edu:/pub/fft/cray.tar [128.175.74.1] Original-posting-by: flatau@handel.CS.ColoState.Edu (flatau) Original-subject: Info on 3D FFT codes (FORTRAN) Reposted-by: emv@ox.com (Edward Vielmetti) Here are some notes on Public Domain FORTRAN codes for the 3D FFT. The text is written in LATEX. Run it twice and print. Peter F. ============ cut \documentstyle[11pt,apalike]{article} \topmargin -.42in \textheight 9.25in \textwidth 15.cm \hoffset=-.7in \voffset=-.3in \baselineskip 12pt \parindent 0in \begin{document} %\bibliographystyle{/usr/cirrus/lib/tex/inputs/apa99} \title{Public Domain FORTRAN Codes for Multidimensional Fast Fourier Transforms -- an Incomplete Summary} \author{P. J. Flatau and $^+$J. McCalpin\\ \\ \\ {\em Department of Atmospheric Science, Colorado State University}\\ {\em $^+$College of Marine Studies, University of Delware} \\ } \maketitle %---------------------------------------------------------------------- \section{Scalar FORTRAN codes} \begin{enumerate} \item ``Numerical recipes'' \cite{Pre86a,Pre88a} discusses FFT's in two or more dimensions. They provide FORTRAN code \verb|FOURN| which is a descendant of one written by N. M. Brenner of Lincoln Laboratories. The code is limited to $N$ being a power of 2. It provides \verb|NDIM|-dimensional transform. Runs fast on scalar computers and doesn't require work storage. The ``Computer physics'' \cite{Pre88a} paper contains small, specialized FORTRAN interface to \verb|FOURN|. \item The more complete routine is that by M. N. Brenner \cite{Bre69a} from which the ``Numerical Recipes'' code descended. The \verb|FOURT| routine works for arbitrary length of data and arbitrary-dimensional transform. It seems to be faster than the ``Numerical recipes'' routine \verb|FOURN|. It does not vectorize on CRAY-Y/MP. \item The code by \cite{Fra79a} provides multidimensional (up to 4) transform which can be done with external (disk) storage. \item A code reported in the recent literature is that by \cite{Nob88a}. We don't have access to it. It is available from the Computer Physics Communications library. \item The public domain library put together by the Naval Surface Warfare Center \cite{Mor90a} contains two routines \verb|MFFT| and \verb|MFFT1| for multivariate FFT's. These codes are described by \cite{Sin69a}. \end{enumerate} %------------------------------------------------------------------------- \section{Vector/Parallel FORTRAN codes} \begin{enumerate} \item In the CRAY Channels journal impressive 3-dimensional FFT performance has been reported \cite{Edw87a}. The paper reports that a $1024 \times 1024 \times 1024$ array was transformed using an optimized version of Clive Temperton's code \cite{Tem89a,Tem83a,Tem83b}. Temperton's code \verb|FFT99| was written while he was at European Centre for Medium Range Weather Forecasts. These codes allow for easy construction of multidimensional routines using parameters \verb|INC| (increment within a transform), \verb|JUMP| (increment between transforms), and \verb|LOT| (number of simultaneous transforms). An example of a 2-D complex-to-complex transform is given in the comments. Following Temperton's advice we wrote the simple 3D interface code. The code vectorize on CRAY-YMP and was 8-9 times faster in comparison with \verb|FOURN| from ``Numerical recipes''. It takes more memory. \item For real sequences ``netlib'' (netlib@research.att.com) offers \verb|VFFTPK| package by R. A. Sweet, L. L. Lindgren, and R. F. Boisvert \cite{Swe90a}. This package is suitable for 2D transforms but 3D case would require transposing indicies (Roland Sweet, private communication). This package was developed for fast elliptic solvers. This package is especially useful because it implements (in 1-D only) not only the ``ordinary'' complex-to-complex and complex-to-real/real-to-complex transforms, but also cosine and sine transforms and their inverses. Of course, in multiple dimensions the user has to do a fair bit of work to figure out how to do the unpacking of all of the variables for the second (and following) dimensions, since the partially transformed fields are no longer real. \item There is a set of FFT routines available from the Numerical Aerodynamic Simulation (NAS) Systems Division of NASA Ames. After considerable effort I was able to get hold of these routines. There are 2D and 3D routines in the package, all-FORTRAN and CRAY-specific. There is some documentation available. The contact at NASA Ames is Subhash Saini (saini@prandtl.nas.nasa.gov). The closest reference is \cite{Bai90a}. We did not test these routines. \end{enumerate} %-------------------------------------------------------------------------- \section{Proprietary Library Routines} Specialized implementations of multidimensional FFT are available from vendors. E.g. IBM has one \cite{Aga86a} on its IBM3090. We don't know if the source is available for customers. Cray Research provides a set of vectorized multiple-FFT routines in its standard libraries under UNICOS. Driver routines for parallel 2-D FFT's running at speeds up to almost 1500 MFLOPS on an 8-cpu Y/MP are available by anonymous ftp on {\sl perelandra.cms.udel.edu} in the file pub/fft/cray.tar. These entry points (or something very much like them) should be included in the Cray library beginning with UNICOS 6.0. %--------------------------------------------------------------------------- \section{Helpful Literature References} \begin{enumerate} \item An outstanding paper discussing applications and complexity of FFT (including multidimensional) is that by \cite{Hen79a}. \item Paper by \cite{Wel90a} discusses interesting application of 3D FFT convolution and offers the source code. \item Journal IEEE Transactions on Acoustics Speech and Signal Processing has special section on multidimensional transforms in almost every issue. \item Book by \cite{Bur85a} contains listing of several algorithms. \end{enumerate} \section{Availability} \newcommand{\vfftpk}{\vtop{ \hbox{Send mail to} \hbox{netlib@research.att.com} \hbox{send vfftpk} }} \newcommand{\fft}{\vtop{ \hbox{Clive Temperton} \hbox{European Centre for Medium Range Weather Forecasts} \hbox{Shinfield Park, Reading, Berkshire RG2 9AX} \hbox{England} \hbox{FAX 869450} \hbox{The code is available at NCAR (e.g.)} \hbox{3D driver is available from one of the authors (PJF)} }} \newcommand{\fourn}{\vtop{ \hbox{``Numerical recipes''} \hbox{FORTRAN, PASCAL, C versions} \hbox{Source code available with the purchase of the book} }} \newcommand{\nas}{\vtop{ \hbox{Numerical Aerodynamic Simulation Systems Division} \hbox{S. Saini} \hbox{(415) 604 4343} \hbox{saini@prandtl.nas.nasa.gov} \hbox{Non NAS users will need permission from Jim Craw or Lola Anacker} }} \newcommand{\fraser}{\vtop{ \hbox{Optimized mass storage FFT program} \hbox{Code available in printed form in:} \hbox{Programs for digital signal processing} \hbox{IEEE Press} \hbox{1979} \hbox{See also Singleton's reference} \hbox{New York} }} \newcommand{\mfft}{\vtop{ \hbox{Alfred H. Morris} \hbox{NSWC Library of mathematical subroutines} \hbox{Naval Surface Warfare Center} \hbox{Dahlgren, Virginia 22448} \hbox{1990} }} \newcommand{\fourt}{\vtop{ \hbox{Brenner (1969) indicates that the code is distributed through IBM} }} \vspace{0.3cm} \hbox{\hbox to 2.5cm{VFFTPK} \vfftpk} \vspace{0.3cm} \hbox{\hbox to 2.5cm{CFFT99}\fft} \vspace{0.3cm} \hbox{\hbox to 2.5cm{FOURN} \fourn} \vspace{0.3cm} \hbox{\hbox to 2.5cm{FOURT} \fourt} \vspace{0.3cm} \hbox{\hbox to 2.5cm{NAS} \nas} \vspace{0.3cm} \hbox{ \hbox to 2.5cm{Fraser} \fraser} \vspace{0.3cm} \hbox{\hbox to 2.5cm{MFFT} \mfft} \vspace{0.3cm} \section*{Acknowledgments} Clive Temperton provided insight on the use of his routines. Michael Hale send us Fraser's code. D. Bailey and S. Saini helped with the NAS FFT library. %\bibliography{fft} \begin{thebibliography}{} \bibitem[Agarwal and Cooley, 1986]{Aga86a} Agarwal, R.~C. and Cooley, J.~W. (1986). \newblock Fourier transform and convolution subroutines for the {IBM} 3090 vector facility. \newblock {\em IBM Journal of research and development}, 30:145--162. \bibitem[Bailey, 1990]{Bai90a} Bailey, D.~H. (1990). \newblock {FFT}s in external or hierarchical memory. \newblock {\em Journal of Supercomputing}, 4:23--25. \bibitem[Brenner, 1969]{Bre69a} Brenner, N.~M. (1969). \newblock Transform of externally stored data. \newblock {\em IEEE Trans. Audio and Electroacoust.}, AU-17:128--132. \bibitem[Burrus and Parks, 1985]{Bur85a} Burrus, C.~S. and Parks, T.~W. (1985). \newblock {\em {DFT}, {FFT} and convolution algorithms - {T}heory and implementation}. \newblock Wiley, NY. \bibitem[Edwards, 1987]{Edw87a} Edwards, M. (1987). \newblock Computation of fast {F}ourier transforms. multidimensional transforms come of age. \newblock {\em CRAY Channels}, 9:22--25. \bibitem[Fraser, 1979]{Fra79a} Fraser, D. (1979). \newblock Optimized mass storage {FFT} program. \newblock In {\em Programs for digital signal processing}, chapter 1.5-1. IEEE Press, New York. \bibitem[Henrici, 1979]{Hen79a} Henrici, P. (1979). \newblock Fast {F}ourier methods in computational comlex analysis. \newblock {\em SIAM Rev.}, 21:481. \bibitem[Morris, 1990]{Mor90a} Morris, A.~H. (1990). \newblock {NSWC} library of mathematics subroutines. \newblock Technical Report NSWC TR 90-21, Naval Surface Warfare Center, Dahlgren, Virginia. \bibitem[Nobile et~al., 1988]{Nob88a} Nobile, A., Roberto, V., and Saitta, F. (1988). \newblock {MFFT4}: Four dimensional vectorized fast {F}ourier transforms. \newblock {\em Comp. Phys. Comm.}, 48:313--318. \bibitem[Press et~al., 1986]{Pre86a} Press, W.~H., Flannery, B.~P., Teukolsky, S.~A., and Vettering, T. (1986). \newblock {\em Numerical recipes: the art of scientific computing}. \newblock Cambridge University Press, Cambridge. \bibitem[Press and Teukolsky, 1988]{Pre88a} Press, W.~H. and Teukolsky (1988). \newblock Fourier transforms of real data in two and three dimensions. \newblock {\em Computers in physics}, 3:84--86. \bibitem[Singleton, 1969]{Sin69a} Singleton, R.~C. (1969). \newblock An algorithm for computing the mixed radix {F}ast {F}ourier {T}ransform. \newblock {\em IEEE Trans. Audio and Electroacustics}, AU-17(2):93--103. \newblock Special issue on FFT. \bibitem[Sweet and Lindgren, 1990]{Swe90a} Sweet, R.~A. and Lindgren, L.~L. (1990). \newblock {VFFTPK}. \newblock Available from NETLIB (netlib . \bibitem[Temperton, 1983a]{Tem83b} Temperton, C. (1983a). \newblock Fast mixed-radix real {F}ourier transform. \newblock {\em J. Comput. Phys}, 52:340--350. \bibitem[Temperton, 1983b]{Tem83a} Temperton, C. (1983b). \newblock Self-sorting mixed-radix {F}ast {F}ourier {T}ransform. \newblock {\em J. Comput. Phys.}, 52:1--23. \bibitem[Temperton, 1989]{Tem89a} Temperton, C. (1989). \newblock Nesting strategies for prime factor {FFT} algorithms. \newblock {\em J. Comput. Phys.}, 82:247--268. \bibitem[Wells et~al., 1990]{Wel90a} Wells, N.~H., Burrus, C.~S., Desobry, G.~E., and Boyer, A.~L. (1990). \newblock Three-dimensional {F}ourier convolution with an array processor. \newblock {\em Computers in Physics}, 4:507--513. \end{thebibliography} \end{document} -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET