[comp.sources.misc] v05i015: fft generating routines

dan@srs.UUCP (Dan Kegel) (10/28/88)
Posting-number: Volume 5, Issue 15
Submitted-by: "Dan Kegel" <dan@srs.UUCP>
Archive-name: dsp-fft

Here are some routines I slapped together to generate inline code
to perform FFT's of any power-of-2 size; this was done for a TMS32020
project, but it may be useful for any DSP chip.
It hasn't been tested, but it looks correct, and should be helpful
to anybody who was about to do the same thing :-)

#!/bin/sh
#
# shar archiver, delete everything above the #!/bin/sh line
# and run through sh (not csh)
#
echo 'shar: extracting "README" (735 characters)'
# 'README' has a checksum of 18124 on BSD and 468 on System V.
sed 's/^X//' > README << 'XXX_EOF_XXX'
XHere's a little package I wrote out of frustration with the example
XFFT routines given by TI in their online application notes.
XAlthough they gave examples for a couple sizes, they didn't show how to 
Xgenerate FFT routines for other sizes.
XWell... a little lurking around in Oppenheim & Schafer and in Blahut,
Xand voila, two programs to generate inline decimation-in-time fft routines.
X
XTo try it out, unpack the archive, then type "make demo_fft.asm demo_bit.asm",
Xwhich will generate fft and bit reversal routines for a 256-point fft
Xthat are very similar to those in the TI application note mentioned in
Xgen_fft.doc.
X
XI hope others find them helpful.
X
X- Dan Kegel
X  S.R.Systems, Rochester, NY
X  rochester!srs!dan
X  GEnie: d.r.kegel
XXX_EOF_XXX
if test 735 -ne "`wc -c < README`"
then
    echo 'shar: transmission error on "README"'
fi
chk=`sum README | awk '{print $1}'`
if test 18124 -ne $chk -a 468 -ne $chk
then
    echo 'shar: checksum error on "README"'
fi
echo 'shar: extracting "gen_fft.doc" (3115 characters)'
# 'gen_fft.doc' has a checksum of 22225 on BSD and 63097 on System V.
sed 's/^X//' > gen_fft.doc << 'XXX_EOF_XXX'
Xgen_fft - generate inline code to perform a given size FFT
X
XUSAGE
X    gen_fft log_fft_size < instruction_formats
X
XDESCRIPTION
X    Reads the log base two of the fft size from the command line,
X    then reads seven template lines from stdin;
X    builds a section of inline code from the given templates
X    that implements a decimation-in-time FFT of the given size.
X    Can generate FFT's of length 8, 16, 32... up to some arbitrary limit.
X
X    See Oppenheim & Schafer, "Digital Signal Processing", 1975, p. 318; 
X    the particular dataflow graph implemented is that of figure 6.10, 
X    which uses bit-reversed input, and accesses coefficients in normal order.
X
X    See the application note "Implementation of the Fast Fourier Transform 
X    Algorithms with the TMS32020," in
X    "Digital Signal Processing Applications with the TMS320 Family",
X    Texas Instruments Inc., for macros to use with this program.
X    The routines in that note are available online from the Texas Instruments
X    DSP Bulletin Board, 300-2400 baud, telephone 713-274-2323,
X    in the archive FFT32020.ARC.
X
X    This program was tested by generating a 256-point FFT and comparing
X    with the routine FFT256, "A 256-POINT RADIX-2 DIT COMPLEX FFT FOR THE 
X    TMS32020", given in that application note.
X
X    The twiddle factors are represented as fixed-point numbers with
X    12 bits of fraction, where unity is 4096, to accomodate the TMS320's
X    MPYK instruction.  Of course, 4096 can't be represented in a 13-bit
X    field, but that's okay, because only the special-case butterflies
X    need to use unity, and they don't use MPYK to do it.
X
X    Although this was written to generate code for the TMS320x0,
X    the fact that it is template based should make it very easy to
X    modify to generate code for other architectures.
X
X    The resulting code expects the input array to be in bit-reversed
X    order; see the file 'gen_brev.doc' for info on generating bit-reversal
X    routines.
X
XINPUT
X    Reads seven lines from stdin giving the format of six macros:
X    1. radix-2 butterfly for any twiddle factor
X    2. radix-2 butterfly for twiddle factor of 1              (k / N = 0)
X    3. radix-2 butterfly for twiddle factor of (i+1)/sqrt(2)  (k / N = 1/8)
X    4. radix-2 butterfly for twiddle factor of i              (k / N = 1/4)
X    5. radix-2 butterfly for twiddle factor of (i-1)/sqrt(2)  (k / N = 3/8)
X    6. radix-4 butterfly for stages 1 and 2; twiddle factors are all trivial
X    7. comment macro used to explain the purpose of the following code
X
X    In all butterfly templates, the first 'radix' %d's are expanded to the 
X    index of the locations the butterfly is to operate upon.
X
X    In the radix-2 butterflies, the third and fourth %d's are expanded to
X    the real and imaginary componants of the twiddle factor, expressed
X    as 12-bit fixed-point fractions for use with the TMS320's MPYK instruction.
X
X    In the comment template, %s is expanded to the text of a comment.
X
X    The expansion of %d's is done by the C function printf(), so you can
X    use fancier formatting (for example, %03d) if you like.
X
XOUTPUT
X
XXX_EOF_XXX
if test 3115 -ne "`wc -c < gen_fft.doc`"
then
    echo 'shar: transmission error on "gen_fft.doc"'
fi
chk=`sum gen_fft.doc | awk '{print $1}'`
if test 22225 -ne $chk -a 63097 -ne $chk
then
    echo 'shar: checksum error on "gen_fft.doc"'
fi
echo 'shar: extracting "gen_brev.doc" (1021 characters)'
# 'gen_brev.doc' has a checksum of 30043 on BSD and 22604 on System V.
sed 's/^X//' > gen_brev.doc << 'XXX_EOF_XXX'
Xgen_brev - generate inline in-place bit-reversal code for use with FFT
X
XUSAGE
X    gen_brev log_fft_size < instruction_template_file
X
XDESCRIPTION
X    Reads log base two of the fft size from the commandline, then
X    outputs inline code to implement bit reversal for 2^log_fft_size-point fft.
X
XINPUT
X    Input is two swap instruction templates, for example:
X	SWAP	X%03d,X%03d
X	* (swap %d and %d, so do nothing)
X    Second line used when operands are identical; it is there in case
X    you wish to use autoincrement indexing for the swap macro,
X    and need to increment a pointer regardless of performing a swap.
X
X    The first %d in each template is expanded to the bit-reversed index,
X    the second %d (if any) is expanded to the normal index.
X
X    Each %d is expanded via the C function printf(), so you can use fancier
X    formats if you like.
X    
XOUTPUT
X    Output is fft_size/2 lines which scan over all fft_size elements of
X    the input array in order, swapping any location that hasn't already
X    been swapped.
XXX_EOF_XXX
if test 1021 -ne "`wc -c < gen_brev.doc`"
then
    echo 'shar: transmission error on "gen_brev.doc"'
fi
chk=`sum gen_brev.doc | awk '{print $1}'`
if test 30043 -ne $chk -a 22604 -ne $chk
then
    echo 'shar: checksum error on "gen_brev.doc"'
fi
echo 'shar: extracting "Makefile" (745 characters)'
# 'Makefile' has a checksum of 32750 on BSD and 64324 on System V.
sed 's/^X//' > Makefile << 'XXX_EOF_XXX'
X# Makefile for fft code generating routines
X# Could be used for any processor, but will need to change FRAC_BITS in
X# gen_fft.c to use with anything but the TMS320 family.
X# Tested on a Sun workstation; Makefile will need changes for MS-DOS machines.
X
Xgen_fft: gen_fft.c
X	cc gen_fft.c -lm -o gen_fft
Xgen_brev: gen_brev.c
X	cc gen_brev.c -o gen_brev
X
X# Perform example run to generate code for a 256-point FFT.
Xdemo_fft.asm: gen_fft demo_fft.dat
X	gen_fft 8 < demo_fft.dat > demo_fft.asm
Xdemo_bit.asm: gen_brev demo_bit.dat
X	gen_brev 8 < demo_bit.dat > demo_bit.asm
X
X# Build distribution archive.
XARCFILES = README gen_fft.doc gen_brev.doc Makefile \
X	gen_fft.c gen_brev.c demo_fft.dat demo_bit.dat
Xgen.shar: $(ARCFILES)
X	shar gen.shar $(ARCFILES)
XXX_EOF_XXX
if test 745 -ne "`wc -c < Makefile`"
then
    echo 'shar: transmission error on "Makefile"'
fi
chk=`sum Makefile | awk '{print $1}'`
if test 32750 -ne $chk -a 64324 -ne $chk
then
    echo 'shar: checksum error on "Makefile"'
fi
echo 'shar: extracting "gen_fft.c" (6510 characters)'
# 'gen_fft.c' has a checksum of 20944 on BSD and 28831 on System V.
sed 's/^X//' > gen_fft.c << 'XXX_EOF_XXX'
X/*--------------------------------------------------------------------------
X Generate inline fft code for a N-point radix-2 DIT FFT.
X Reads six lines from stdin giving the format of six macros:
X 1. radix-2 butterfly for any twiddle factor
X 2. radix-2 butterfly for twiddle factor of 1              (k / N = 0)
X 3. radix-2 butterfly for twiddle factor of (i+1)/sqrt(2)  (k / N = 1/8)
X 4. radix-2 butterfly for twiddle factor of i              (k / N = 1/4)
X 5. radix-2 butterfly for twiddle factor of (i-1)/sqrt(2)  (k / N = 3/8)
X 6. radix-4 butterfly for stages 1 and 2; twiddle factors are all trivial
X
X In all cases, the first 'radix' %d's are expanded to the index
X of the locations the butterfly is to operate upon.
X
X In the radix-2 butterflies, the third and fourth %d's are expanded to
X the real and imaginary componants of the twiddle factor, expressed
X as 13-bit fixed-point fractions.
X
X See Oppenheim & Schafer, "Digital Signal Processing", 1975, p. 318; 
X the particular dataflow graph implemented is that of figure 6.10, 
X which uses bit-reversed input, and accesses coefficients in normal order.
X
X Dan Kegel, rochester!srs!dan, S.R. Systems, Oct '88
X This code is hereby placed in the public domain without any promise
X that it functions as intended, or at all.
X--------------------------------------------------------------------------*/
X#include <stdio.h>
X#include <math.h>
X
X#ifndef PI
X#define PI 3.1415926
X#endif
X
X/* Symbolic constants for the indices of the format array, read in from stdin.
X * Each entry handles a different special case, except BTRFLY2_ANY,
X * which is general.
X */
X#define BTRFLY2_ANY      0
X#define BTRFLY2_0_4TH_PI 1
X#define BTRFLY2_1_4TH_PI 2
X#define BTRFLY2_2_4TH_PI 3
X#define BTRFLY2_3_4TH_PI 4
X#define BTRFLY4_STAGE0   5
X#define COMMENT 6
X#define N_FORMATS 7
X
X#define FRAC_BITS 12		/* for TMS320 inline multiply */
X
X/* The following global should be set to (1 << FRAC_BITS) if
X * multiplies by unity are optimized away, and to
X * (1 << FRAC_BITS) - 1 if multiplies by unity are done in the same
X * manner as other multiplies.
X * I think.
X */
Xstatic float i_unity = (float) (1 << FRAC_BITS);
X
X/*--------------------------------------------------------------------------
X Routine to convert floating point number to integer format.
X Result is a fixed-point number with FRAC_BITS of fraction.
X
X Constructed so that ftoi(x) = -ftoi(-x), for whatever that's worth.
X--------------------------------------------------------------------------*/
Xint
Xftoi(f)
X    float f;
X{
X    int i;
X    if (f < 0.0)
X	i = - (int) (0.5 - f * i_unity);
X    else
X	i = (int) (0.5 + f * i_unity);
X    return i;
X}
X
X/*--------------------------------------------------------------------------
X Routine to calculate real part of j'th twiddle factor for an n-point DIT fft.
X--------------------------------------------------------------------------*/
Xint
Xicos(j, n)
X    int j, n;
X{
X    return ftoi(cos((2 * PI * j) / (float) n));
X}
X
X/*--------------------------------------------------------------------------
X Routine to calculate imaginary part of j'th twiddle factor for an 
X n-point DIT fft.
X--------------------------------------------------------------------------*/
Xint
Xisin(j, n)
X    int j, n;
X{
X    return ftoi(sin((2 * PI * j) / (float) n));
X}
X
Xmain(argc, argv)
X    int argc;
X    char **argv;
X{
X    int log_n;
X    int i, j, k, n;
X    int stage;
X    char format[N_FORMATS][256];
X    char comment[256];
X
X    if (argc < 2) {
X	(void) fprintf(stderr, "usage: %s log_fft_size < instruction_formats\n\
XOutputs inline code to implement given size DIT fft.\n\
XInput is six butterfly template lines, in which %%d is expanded to array \n\
Xindices and real and imaginary parts of the twiddle factors; for instance,\n\
X	BFLY2	X%%03d,X%%03d,%%d,%%d\n\
X	BFLY2_0PI4	X%%03d,X%%03d\n\
X	BFLY2_1PI4	X%%03d,X%%03d\n\
X	BFLY2_2PI4	X%%03d,X%%03d\n\
X	BFLY2_3PI4	X%%03d,X%%03d\n\
X	BFLY4_SPECIAL	X%%03d,X%%03d,X%%03d,X%%03d\n\
Xfollowed by a comment template line, in which %%s is expanded to a remark\n\
Xabout the following code, for example\n\
X	* %%s\n\
X", argv[0]);
X	exit(1);
X    }
X
X    /* Get arguments. */
X    log_n = atoi(argv[1]);
X    if (log_n < 2 || log_n > 13) {
X	(void) fprintf(stderr, 
X	    "log_fft_size %s out of range, must be 2 to 13\n", argv[1]);
X	exit(1);
X    }
X    n = 1 << log_n;
X
X    /* Read templates. */
X    for (i=0; i<N_FORMATS; i++) {
X	if (gets(format[i]) == NULL) {
X	    (void) fprintf(stderr, 
X		"%s: couldn't read all %d formats from stdin\n",
X		argv[0], N_FORMATS);
X	    exit(1);
X	}
X    }
X
X    (void) sprintf(comment, "Inline code for %d-point FFT.", n);
X    (void) printf(format[COMMENT], comment);
X    (void) putchar('\n');
X
X    (void) printf(format[COMMENT], "Radix-4 butterflies for stages 1 and 2.");
X    (void) putchar('\n');
X    /* For each 4-point DFT making up the combined first two stages: */
X    for (j=0; j < n/4; j++) {
X	(void) printf(format[BTRFLY4_STAGE0], 4*j, 4*j+1, 4*j+2, 4*j+3);
X	(void) putchar('\n');
X    }
X
X    /* For each following stage of the Cooley-Tukey FFT */
X    for (stage=3; stage<=log_n; stage++) {
X	int n_dft;
X	int n_butterfly;
X	int dft_size;
X
X	(void) sprintf(comment, "Stage %d.", stage);
X	(void) printf(format[COMMENT], comment);
X	(void) putchar('\n');
X
X	n_dft = 1 << (log_n - stage);		/* # of dft's */
X	n_butterfly = 1 << (stage-1);		/* # of butterflies per dft */
X	dft_size = 1 << stage;			/* size of each dft */
X
X	/* For each (dft_size)-point dft making up this stage: */
X	for (j=0; j < n_dft; j++) {
X
X	    (void) sprintf(comment, 
X		"Radix-2 butterflies for %d-point dft %d of stage %d.", 
X		    dft_size, j, stage);
X	    (void) printf(format[COMMENT], comment);
X	    (void) putchar('\n');
X
X	    /* For each butterfly making up this dft: */
X	    for (k=0; k<n_butterfly; k++) {
X		int f;
X		int index1, index2;
X
X		/* Calculate the indices of the two locations this butterfly
X		 * operates upon.
X		 */
X		index1 = j * 2 * n_butterfly + k;
X		index2 = index1 + n_butterfly;
X
X		/* Decide which butterfly to use.
X		 * Special cases that can be optimized are those where
X		 *    ln (twiddle factor)
X		 * is a multiple of sqrt(-1) * pi/4.
X		 */
X		if ( ((8 * k) % dft_size) != 0) {
X		    /* Must use general case. */
X		    f = BTRFLY2_ANY;
X		} else {
X		    /* Use one of four optimized cases. */
X		    int pi_quarter = (8 * k) / dft_size;
X		    f = pi_quarter + BTRFLY2_0_4TH_PI;
X		}
X
X		/* Output the butterfly. */
X		(void) printf(format[f],
X		    index1, index2, icos(k, dft_size), isin(k, dft_size));
X		(void) putchar('\n');
X	    }
X	}
X    }
X
X
X    exit(0);
X}
XXX_EOF_XXX
if test 6510 -ne "`wc -c < gen_fft.c`"
then
    echo 'shar: transmission error on "gen_fft.c"'
fi
chk=`sum gen_fft.c | awk '{print $1}'`
if test 20944 -ne $chk -a 28831 -ne $chk
then
    echo 'shar: checksum error on "gen_fft.c"'
fi
echo 'shar: extracting "gen_brev.c" (2617 characters)'
# 'gen_brev.c' has a checksum of 17463 on BSD and 52233 on System V.
sed 's/^X//' > gen_brev.c << 'XXX_EOF_XXX'
X/*--------------------------------------------------------------------------
X Generate bit-reversing code for a N-point radix-2 FFT.
X Reads two lines from stdin giving the format of the macro
X that swaps two complex numbers, for instance
X    SWAP_COMPLEX FFT_BUF+%d FFT_BUF+%d
X    *SWAP_COMPLEX FFT_BUF+%d FFT_BUF+%d
X
X The first %d is expanded to the bit-reversed number; the second %d is
X expanded to the non-bit-reversed number.
X If desired, the second %d may be omitted.
X The second line is used when the two arguments would be identical.
X
X Dan Kegel, rochester!srs!dan, S.R. Systems, Oct '88
X This code is hereby placed in the public domain without any promise
X that it functions as intended, or at all.
X--------------------------------------------------------------------------*/
X#include <stdio.h>
X
X/*--------------------------------------------------------------------------
X Stupid bitreversal routine.  Works for n <= 8192.
X--------------------------------------------------------------------------*/
Xint
Xbitrev(i, n)
X    register int i;
X    register int n;
X{
X    n >>= 1;
X    return
X	((i & 1<<0) ? (n>>0) : 0) +
X	((i & 1<<1) ? (n>>1) : 0) +
X	((i & 1<<2) ? (n>>2) : 0) +
X	((i & 1<<3) ? (n>>3) : 0) +
X	((i & 1<<4) ? (n>>4) : 0) +
X	((i & 1<<5) ? (n>>5) : 0) +
X	((i & 1<<6) ? (n>>6) : 0) +
X	((i & 1<<7) ? (n>>7) : 0) +
X	((i & 1<<8) ? (n>>8) : 0) +
X	((i & 1<<9) ? (n>>9) : 0) +
X	((i & 1<<10) ? (n>>10) : 0) +
X	((i & 1<<11) ? (n>>11) : 0) +
X	((i & 1<<12) ? (n>>12) : 0) +
X	((i & 1<<13) ? (n>>13) : 0);
X}
X
Xmain(argc, argv)
X    int argc;
X    char **argv;
X{
X    int log_n;
X    int n;
X    int i;
X    char format[2][256];	/* [0] is normal swap, [1] is no-op swap */
X
X    if (argc < 2) {
X	(void) fprintf(stderr, "usage: %s log_fft_size < instruction_format\n\
XOutputs inline code to implement bit reversal for 2^log_fft_size-point fft.\n\
XInput is two swap instruction templates, for example:\n\
X    SWAP	X%%03d,X%%03d\n\
X    * (swap %%d and %%d, so do nothing)\n\
XSecond line used when operands are identical.\n\
X", argv[0]);
X	exit(1);
X    }
X
X    /* Get arguments */
X    log_n = atoi(argv[1]);
X    if (log_n < 2 || log_n > 13) {
X	(void) fprintf(stderr, "log_fft_size %s out of range 2..13\n", 
X	    argv[1]);
X	exit(1);
X    }
X    n = 1 << log_n;
X
X    /* Get formats for normal and no-op swaps. */
X    for (i=0; i<2; i++)
X	(void) gets(format[i]);
X
X    /* Output bitreversal code. */
X    for (i=0; i<n; i++) {
X	int b = bitrev(i,n);
X	if (b > i) {
X	    (void) printf(format[0], b, i);
X	    (void) putchar('\n');
X	} else if (b == i) {
X	    (void) printf(format[1], b, i);
X	    (void) putchar('\n');
X	}
X    }
X
X    exit(0);
X}
XXX_EOF_XXX
if test 2617 -ne "`wc -c < gen_brev.c`"
then
    echo 'shar: transmission error on "gen_brev.c"'
fi
chk=`sum gen_brev.c | awk '{print $1}'`
if test 17463 -ne $chk -a 52233 -ne $chk
then
    echo 'shar: checksum error on "gen_brev.c"'
fi
echo 'shar: extracting "demo_fft.dat" (159 characters)'
# 'demo_fft.dat' has a checksum of 55690 on BSD and 8952 on System V.
sed 's/^X//' > demo_fft.dat << 'XXX_EOF_XXX'
X	BTRFLI	X%03d,X%03d,%d,%d,512
X	ZEROI	X%03d,X%03d,512
X	PBY4I	X%03d,X%03d,512
X	PBY2I	X%03d,X%03d,512
X	P3BY4I	X%03d,X%03d,512
X	COMBO	X%03d,X%03d,X%03d,X%03d
X*	%s
XXX_EOF_XXX
if test 159 -ne "`wc -c < demo_fft.dat`"
then
    echo 'shar: transmission error on "demo_fft.dat"'
fi
chk=`sum demo_fft.dat | awk '{print $1}'`
if test 55690 -ne $chk -a 8952 -ne $chk
then
    echo 'shar: checksum error on "demo_fft.dat"'
fi
echo 'shar: extracting "demo_bit.dat" (58 characters)'
# 'demo_bit.dat' has a checksum of 04304 on BSD and 3136 on System V.
sed 's/^X//' > demo_bit.dat << 'XXX_EOF_XXX'
X	  BITRVI    X%03d,X%03d,512
X	  * BITRVI  X%03d,X%03d,512
XXX_EOF_XXX
if test 58 -ne "`wc -c < demo_bit.dat`"
then
    echo 'shar: transmission error on "demo_bit.dat"'
fi
chk=`sum demo_bit.dat | awk '{print $1}'`
if test 04304 -ne $chk -a 3136 -ne $chk
then
    echo 'shar: checksum error on "demo_bit.dat"'
fi