[comp.arch] fft routine

chris@mimsy.UUCP (Chris Torek) (07/03/88)

(N.B.: this is a language (code design) issue, not an architectural
issue, so I have redirected followups to comp.lang.misc.)

The context of the following quote is benchmarking; in this case one
is, of course, concerned with running existing code quickly.  Yet the
code below is all too typical: there are no comments; the variables
have names that tell nothing.  In short, the code is unmaintainable.

>fft(xr,xi,n,m,wr,wi)
>	float xr[],xi[],wr[],wi[];
>	unsigned int n,m;
>{
>
>	unsigned int n1,n2,ia,ie,i,j,k,l;
>	float c,s,xrt,xit;
>
>	n2 = n;
>	ie = 1;
>	for (k=0 ; k<m ; k++)
>	{
>		n1 = n2;
>		n2 = n2>>1;
>		ia = 0;
>		for (j=0 ; j<n2 ; j++)
>		{
>			c = wr[ia];
>			s = wi[ia];
>			ia = ia+ie;
>
>			/* BEGIN INNER LOOP */
>
>			for (i=0 ; i<n ; i+=n1)
>			{
>				l = i+n2;
>				xrt = xr[i] - xr[l];
>				xr[i] = xr[i] + xr[l];
>				xit = xi[i] - xi[l];
>				xi[i] = xi[i] + xi[l];
>				xr[l] = c*xrt - s*xit;
>				xi[l] = c*xit + s*xrt;
>			}
>
>			/* END INNER LOOP */
>
>		}
>		ie = ie<<1;
>	}
>}

Can you tell what is going on?  If you are familiar with FFTs, I
suppose it becomes clear enough after a while.  If not, it is a rather
painful process to discover that the inner loop is really just

	complex x[n], w[n], diff, half;

	for i from 0 to n-1 step n/2^k:
		half = i + n/2^(k+1);
		diff = x[i] - x[half];
		x[i] += x[half];
		x[half] = diff * w[j * 2^k];
	rof

Unfortunately, C does not have a `complex' datatype.  The task is
better suited to C++, or some other language with both shifts (which
are lacking in FORTRAN) and complex expressions.  But I think one can
do better than the code quoted with `>' above, without losing
efficiency.  Here is one such attempt:

	typedef struct complex {
		float	re;
		float	im;
	} complex;

	/* could really use C++ here */
	#define	C_MUL3(res, a, b) \
		((res).re = (a).re * (b).re - (a).im * (b).im, \
		 (res).im = (a).im * (b).re + (a).re * (b).im)
	#define	C_ADD2(res, a) \
		((res).re += (a).re, (res).im += (a).im)
	#define	C_SUB3(res, a, b) \
		((res).re = (a).re - (b).re, (res).im = (a).im - (b).im)

	/*
	 * Perform m-step (m = log2(n)) FFT on x[0..n)
	 */
	void
	fft(x, n, m, w)
		struct complex *x;
		unsigned int n, m;
		struct complex *w;
	{
		unsigned int fullstep, halfstep, i, j, k;
		complex *xp, *xh, wj, diff;

		/*
		 * as k goes from to 0 n-1,
		 * fullstep goes from n to n/2 to n/4 to ... to 2,
		 * and halfstep goes from n/2 to n/4 to ... to 1.
		 */
		for (k = 0; k < m; k++) {
			fullstep = n >> k;		/* n/2^k */
			halfstep = fullstep >> 1;	/* n/2^(k+1) */
			for (j = 0; j < halfstep; j++) {
				i = 1 << k;		/* 2^k */
				wj = w[j * i];
				/*
				 * inner loop always runs at least once,
				 * so test is at bottom to save time.
				 *
				 * xp ranges from &x[0] to &x[n-1], moving
				 * by fullstep steps; xh is always halfstep
				 * ahead of xp.
				 */
				for (xp = x;; xp += fullstep) {
					xh = &xp[halfstep];
					C_SUB3(diff, *xp, *xh);
					C_ADD2(*xp, *xh);
					C_MUL3(*xh, wj, diff);
					if (--i == 0)
						break;
				}
			}
		}
	}

With the comments removed, this version is actually one line shorter
than the original.  My original version of the innermost loop read

		for (i = 0; i < n; i += fullstep) {
			half = i + halfstep;
			C_SUB3(diff, x[i], x[half]);
			C_ADD2(x[i], x[half]);
			C_MUL3(x[half], wj, diff);
		}

which is more obvious but (probably) slower.  Certainly any good
compiler should make `xh' unnecessary, so I suppose

		for (xp = 0;; xp += fullstep) {
			C_SUB3(diff, *xp, xp[halfstep]);
			C_ADD2(*xp, xp[halfstep]);
			C_MUL3(xp[halfstep], wj, diff);
			if (--i == 0)
				break;
		}

should be as good, but I distrust optimisation in C compilers (being
stuck with PCC myself).  Adding some strategic `register' declarations
may also help.

Curiously enough, both versions result in almost the same number of
instructions; the `readable' one uses two more, but also uses simpler
addressing modes.  Overall, I imagine they run about the same speed,
yet I would far rather deal with one of my versions (of course! :-) ).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris