[comp.archives] [sci.math.num-analysis] Info on 3D FFT codes

flatau@handel.CS.ColoState.Edu (flatau) (01/11/91)
Archive-name: math/num-analysis/cray-fft/1991-01-10
Archive: perelandra.cms.udel.edu:/pub/fft/cray.tar [128.175.74.1]
Original-posting-by: flatau@handel.CS.ColoState.Edu (flatau)
Original-subject: Info on 3D FFT codes (FORTRAN)
Reposted-by: emv@ox.com (Edward Vielmetti)


Here are some notes on Public Domain FORTRAN codes
for the 3D FFT. The text is written in LATEX. Run it twice
and print. 

Peter F.
============ cut 


\documentstyle[11pt,apalike]{article}
\topmargin -.42in                
\textheight 9.25in
\textwidth 15.cm
\hoffset=-.7in                         
\voffset=-.3in 
\baselineskip 12pt
\parindent 0in

\begin{document}
%\bibliographystyle{/usr/cirrus/lib/tex/inputs/apa99}
\title{Public Domain FORTRAN Codes for Multidimensional
Fast Fourier Transforms
-- an Incomplete Summary}



\author{P. J. Flatau and $^+$J. McCalpin\\
\\ \\
{\em Department of Atmospheric Science, Colorado State University}\\
{\em $^+$College of Marine Studies, University of Delware} \\
}


\maketitle
%----------------------------------------------------------------------
\section{Scalar FORTRAN codes}

\begin{enumerate}

\item ``Numerical recipes'' \cite{Pre86a,Pre88a}  discusses FFT's in two
or more dimensions. They provide FORTRAN code \verb|FOURN| which is a
descendant of one written by N. M. Brenner of Lincoln Laboratories.
The code is limited to $N$ being a power of 2.  It provides
\verb|NDIM|-dimensional transform. Runs fast on scalar computers and
doesn't require work storage.  The ``Computer physics'' \cite{Pre88a}
paper contains small, specialized FORTRAN interface to \verb|FOURN|.

\item The more complete routine is that by M. N. Brenner
\cite{Bre69a} from which the ``Numerical Recipes'' code descended.  
The \verb|FOURT| routine works for arbitrary length of data and
arbitrary-dimensional transform. It seems to be faster than the
``Numerical recipes'' routine \verb|FOURN|.  It does not vectorize on
CRAY-Y/MP.  

\item The code by \cite{Fra79a} provides multidimensional (up to 4)
transform which can be done with external (disk) storage. 

\item A code reported in the recent literature is that by
\cite{Nob88a}. We don't have access to it. It is available from
the Computer Physics Communications library.

\item The public domain library put together by the Naval Surface Warfare
Center \cite{Mor90a} contains two routines \verb|MFFT| and
\verb|MFFT1| for multivariate FFT's.
These codes are described by  \cite{Sin69a}.
\end{enumerate}

%-------------------------------------------------------------------------
\section{Vector/Parallel FORTRAN codes}

\begin{enumerate}

\item In the CRAY Channels journal impressive 3-dimensional FFT
performance has been reported \cite{Edw87a}. The paper reports that a
$1024 \times 1024 \times 1024$ array was transformed using an optimized
version of Clive Temperton's code \cite{Tem89a,Tem83a,Tem83b}.
Temperton's code \verb|FFT99| was
written while he was at European Centre for Medium Range Weather
Forecasts.  These codes allow for easy construction of multidimensional
routines using parameters \verb|INC| (increment within a transform),
\verb|JUMP| (increment between transforms), and \verb|LOT| (number of
simultaneous transforms). An example of a 2-D complex-to-complex
transform is given in the comments. Following Temperton's advice we
wrote the simple 3D interface code.  The code vectorize on CRAY-YMP and
was 8-9 times faster in comparison with \verb|FOURN| from ``Numerical
recipes''.  It takes more memory.


\item For real sequences ``netlib'' (netlib@research.att.com) offers
\verb|VFFTPK| package by R. A. Sweet, L. L. Lindgren, and R. F.
Boisvert \cite{Swe90a}.  This package is suitable for 2D transforms but
3D case would require transposing indicies (Roland Sweet, private
communication). This package was developed for fast elliptic solvers.
This package is especially useful because it implements (in 1-D only)
not only the ``ordinary'' complex-to-complex and
complex-to-real/real-to-complex transforms, but also cosine and sine
transforms and their inverses.  Of course, in multiple dimensions the
user has to do a fair bit of work to figure out how to do the unpacking
of all of the variables for the second (and following) dimensions,
since the partially transformed fields are no longer real.

\item There is a set of FFT routines available from the  Numerical 
Aerodynamic Simulation (NAS) Systems Division of NASA Ames.  After
considerable effort I was able to get hold of these routines.  There
are 2D and 3D routines in the package, all-FORTRAN and CRAY-specific.
There is some documentation available.  The contact at NASA Ames is
Subhash Saini (saini@prandtl.nas.nasa.gov).  The closest reference is
\cite{Bai90a}.  We did not test these routines.

\end{enumerate}


%--------------------------------------------------------------------------
\section{Proprietary Library Routines}

Specialized implementations of multidimensional FFT are available
from vendors. E.g. IBM has one \cite{Aga86a} on its IBM3090.  We don't
know if the source is available for customers.

Cray Research provides a set of vectorized multiple-FFT routines in
its standard libraries under UNICOS.  Driver routines for parallel 2-D
FFT's running at speeds up to almost 1500 MFLOPS on an 8-cpu Y/MP are
available by anonymous ftp on {\sl perelandra.cms.udel.edu} in the file
pub/fft/cray.tar.  These entry points (or something very much like
them) should be included in the Cray library beginning with UNICOS 6.0.


%---------------------------------------------------------------------------
\section{Helpful Literature References}

\begin{enumerate}

\item An outstanding paper discussing applications and 
complexity of FFT (including multidimensional) 
is that by \cite{Hen79a}.

\item Paper by \cite{Wel90a} discusses interesting application
of 3D FFT convolution and offers the source code.

\item Journal IEEE Transactions on Acoustics Speech and Signal
Processing has special section on multidimensional transforms
in almost every issue.

\item Book by \cite{Bur85a} contains listing of several
algorithms.


\end{enumerate}


\section{Availability}



\newcommand{\vfftpk}{\vtop{
\hbox{Send mail to}
\hbox{netlib@research.att.com}
\hbox{send vfftpk}
}} 

\newcommand{\fft}{\vtop{
\hbox{Clive Temperton}
\hbox{European Centre for Medium Range Weather Forecasts}
\hbox{Shinfield Park, Reading, Berkshire RG2 9AX}
\hbox{England}
\hbox{FAX 869450}
\hbox{The code is available at NCAR (e.g.)}
\hbox{3D driver is available from one of the authors (PJF)}
}} 

\newcommand{\fourn}{\vtop{
\hbox{``Numerical recipes''}
\hbox{FORTRAN, PASCAL, C versions}
\hbox{Source code available with the purchase of the book}
}} 

\newcommand{\nas}{\vtop{
\hbox{Numerical Aerodynamic Simulation Systems Division}
\hbox{S. Saini}
\hbox{(415) 604 4343}
\hbox{saini@prandtl.nas.nasa.gov}
\hbox{Non NAS users will need permission from Jim Craw or Lola Anacker} 
}} 

\newcommand{\fraser}{\vtop{
\hbox{Optimized mass storage FFT program}
\hbox{Code available in printed form in:}
\hbox{Programs for digital signal processing}
\hbox{IEEE Press}
\hbox{1979}
\hbox{See also Singleton's reference}

\hbox{New York} 
}} 

\newcommand{\mfft}{\vtop{
\hbox{Alfred H. Morris}
\hbox{NSWC Library of mathematical subroutines}
\hbox{Naval Surface Warfare Center}
\hbox{Dahlgren, Virginia 22448}
\hbox{1990}
}} 

\newcommand{\fourt}{\vtop{
\hbox{Brenner (1969) indicates that the code is distributed through IBM}
}} 


\vspace{0.3cm}

\hbox{\hbox to 2.5cm{VFFTPK} \vfftpk} 

\vspace{0.3cm}

\hbox{\hbox to 2.5cm{CFFT99}\fft}  

\vspace{0.3cm}

\hbox{\hbox to 2.5cm{FOURN} \fourn}

\vspace{0.3cm}

\hbox{\hbox to 2.5cm{FOURT} \fourt}

\vspace{0.3cm}

\hbox{\hbox to 2.5cm{NAS} \nas}

\vspace{0.3cm}

\hbox{ \hbox to 2.5cm{Fraser} \fraser}

\vspace{0.3cm}

\hbox{\hbox to 2.5cm{MFFT} \mfft}

\vspace{0.3cm}

\section*{Acknowledgments}
Clive Temperton provided insight on the use of his routines.
Michael Hale send us Fraser's code.
D. Bailey and S. Saini helped with the NAS FFT library.




%\bibliography{fft}           
\begin{thebibliography}{}

\bibitem[Agarwal and Cooley, 1986]{Aga86a}
Agarwal, R.~C. and Cooley, J.~W. (1986).
\newblock Fourier transform and convolution subroutines for the {IBM} 3090
  vector facility.
\newblock {\em IBM Journal of research and development}, 30:145--162.

\bibitem[Bailey, 1990]{Bai90a}
Bailey, D.~H. (1990).
\newblock {FFT}s in external or hierarchical memory.
\newblock {\em Journal of Supercomputing}, 4:23--25.

\bibitem[Brenner, 1969]{Bre69a}
Brenner, N.~M. (1969).
\newblock Transform of externally stored data.
\newblock {\em IEEE Trans. Audio and Electroacoust.}, AU-17:128--132.

\bibitem[Burrus and Parks, 1985]{Bur85a}
Burrus, C.~S. and Parks, T.~W. (1985).
\newblock {\em {DFT}, {FFT} and convolution algorithms - {T}heory and
  implementation}.
\newblock Wiley, NY.

\bibitem[Edwards, 1987]{Edw87a}
Edwards, M. (1987).
\newblock Computation of fast {F}ourier transforms. multidimensional transforms
  come of age.
\newblock {\em CRAY Channels}, 9:22--25.

\bibitem[Fraser, 1979]{Fra79a}
Fraser, D. (1979).
\newblock Optimized mass storage {FFT} program.
\newblock In {\em Programs for digital signal processing}, chapter 1.5-1. IEEE
  Press, New York.

\bibitem[Henrici, 1979]{Hen79a}
Henrici, P. (1979).
\newblock Fast {F}ourier methods in computational comlex analysis.
\newblock {\em SIAM Rev.}, 21:481.

\bibitem[Morris, 1990]{Mor90a}
Morris, A.~H. (1990).
\newblock {NSWC} library of mathematics subroutines.
\newblock Technical Report NSWC TR 90-21, Naval Surface Warfare Center,
  Dahlgren, Virginia.

\bibitem[Nobile et~al., 1988]{Nob88a}
Nobile, A., Roberto, V., and Saitta, F. (1988).
\newblock {MFFT4}: Four dimensional vectorized fast {F}ourier transforms.
\newblock {\em Comp. Phys. Comm.}, 48:313--318.

\bibitem[Press et~al., 1986]{Pre86a}
Press, W.~H., Flannery, B.~P., Teukolsky, S.~A., and Vettering, T. (1986).
\newblock {\em Numerical recipes: the art of scientific computing}.
\newblock Cambridge University Press, Cambridge.

\bibitem[Press and Teukolsky, 1988]{Pre88a}
Press, W.~H. and Teukolsky (1988).
\newblock Fourier transforms of real data in two and three dimensions.
\newblock {\em Computers in physics}, 3:84--86.

\bibitem[Singleton, 1969]{Sin69a}
Singleton, R.~C. (1969).
\newblock An algorithm for computing the mixed radix {F}ast {F}ourier
  {T}ransform.
\newblock {\em IEEE Trans. Audio and Electroacustics}, AU-17(2):93--103.
\newblock Special issue on FFT.

\bibitem[Sweet and Lindgren, 1990]{Swe90a}
Sweet, R.~A. and Lindgren, L.~L. (1990).
\newblock {VFFTPK}.
\newblock Available from NETLIB (netlib .

\bibitem[Temperton, 1983a]{Tem83b}
Temperton, C. (1983a).
\newblock Fast mixed-radix real {F}ourier transform.
\newblock {\em J. Comput. Phys}, 52:340--350.

\bibitem[Temperton, 1983b]{Tem83a}
Temperton, C. (1983b).
\newblock Self-sorting mixed-radix {F}ast {F}ourier {T}ransform.
\newblock {\em J. Comput. Phys.}, 52:1--23.

\bibitem[Temperton, 1989]{Tem89a}
Temperton, C. (1989).
\newblock Nesting strategies for prime factor {FFT} algorithms.
\newblock {\em J. Comput. Phys.}, 82:247--268.

\bibitem[Wells et~al., 1990]{Wel90a}
Wells, N.~H., Burrus, C.~S., Desobry, G.~E., and Boyer, A.~L. (1990).
\newblock Three-dimensional {F}ourier convolution with an array processor.
\newblock {\em Computers in Physics}, 4:507--513.

\end{thebibliography}
\end{document}

--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET