[comp.lang.misc] fft routine

chris@mimsy.UUCP (Chris Torek) (07/03/88)

(N.B.: this is a language (code design) issue, not an architectural
issue, so I have redirected followups to comp.lang.misc.)

The context of the following quote is benchmarking; in this case one
is, of course, concerned with running existing code quickly.  Yet the
code below is all too typical: there are no comments; the variables
have names that tell nothing.  In short, the code is unmaintainable.

>fft(xr,xi,n,m,wr,wi)
>	float xr[],xi[],wr[],wi[];
>	unsigned int n,m;
>{
>
>	unsigned int n1,n2,ia,ie,i,j,k,l;
>	float c,s,xrt,xit;
>
>	n2 = n;
>	ie = 1;
>	for (k=0 ; k<m ; k++)
>	{
>		n1 = n2;
>		n2 = n2>>1;
>		ia = 0;
>		for (j=0 ; j<n2 ; j++)
>		{
>			c = wr[ia];
>			s = wi[ia];
>			ia = ia+ie;
>
>			/* BEGIN INNER LOOP */
>
>			for (i=0 ; i<n ; i+=n1)
>			{
>				l = i+n2;
>				xrt = xr[i] - xr[l];
>				xr[i] = xr[i] + xr[l];
>				xit = xi[i] - xi[l];
>				xi[i] = xi[i] + xi[l];
>				xr[l] = c*xrt - s*xit;
>				xi[l] = c*xit + s*xrt;
>			}
>
>			/* END INNER LOOP */
>
>		}
>		ie = ie<<1;
>	}
>}

Can you tell what is going on?  If you are familiar with FFTs, I
suppose it becomes clear enough after a while.  If not, it is a rather
painful process to discover that the inner loop is really just

	complex x[n], w[n], diff, half;

	for i from 0 to n-1 step n/2^k:
		half = i + n/2^(k+1);
		diff = x[i] - x[half];
		x[i] += x[half];
		x[half] = diff * w[j * 2^k];
	rof

Unfortunately, C does not have a `complex' datatype.  The task is
better suited to C++, or some other language with both shifts (which
are lacking in FORTRAN) and complex expressions.  But I think one can
do better than the code quoted with `>' above, without losing
efficiency.  Here is one such attempt:

	typedef struct complex {
		float	re;
		float	im;
	} complex;

	/* could really use C++ here */
	#define	C_MUL3(res, a, b) \
		((res).re = (a).re * (b).re - (a).im * (b).im, \
		 (res).im = (a).im * (b).re + (a).re * (b).im)
	#define	C_ADD2(res, a) \
		((res).re += (a).re, (res).im += (a).im)
	#define	C_SUB3(res, a, b) \
		((res).re = (a).re - (b).re, (res).im = (a).im - (b).im)

	/*
	 * Perform m-step (m = log2(n)) FFT on x[0..n)
	 */
	void
	fft(x, n, m, w)
		struct complex *x;
		unsigned int n, m;
		struct complex *w;
	{
		unsigned int fullstep, halfstep, i, j, k;
		complex *xp, *xh, wj, diff;

		/*
		 * as k goes from to 0 n-1,
		 * fullstep goes from n to n/2 to n/4 to ... to 2,
		 * and halfstep goes from n/2 to n/4 to ... to 1.
		 */
		for (k = 0; k < m; k++) {
			fullstep = n >> k;		/* n/2^k */
			halfstep = fullstep >> 1;	/* n/2^(k+1) */
			for (j = 0; j < halfstep; j++) {
				i = 1 << k;		/* 2^k */
				wj = w[j * i];
				/*
				 * inner loop always runs at least once,
				 * so test is at bottom to save time.
				 *
				 * xp ranges from &x[0] to &x[n-1], moving
				 * by fullstep steps; xh is always halfstep
				 * ahead of xp.
				 */
				for (xp = x;; xp += fullstep) {
					xh = &xp[halfstep];
					C_SUB3(diff, *xp, *xh);
					C_ADD2(*xp, *xh);
					C_MUL3(*xh, wj, diff);
					if (--i == 0)
						break;
				}
			}
		}
	}

With the comments removed, this version is actually one line shorter
than the original.  My original version of the innermost loop read

		for (i = 0; i < n; i += fullstep) {
			half = i + halfstep;
			C_SUB3(diff, x[i], x[half]);
			C_ADD2(x[i], x[half]);
			C_MUL3(x[half], wj, diff);
		}

which is more obvious but (probably) slower.  Certainly any good
compiler should make `xh' unnecessary, so I suppose

		for (xp = 0;; xp += fullstep) {
			C_SUB3(diff, *xp, xp[halfstep]);
			C_ADD2(*xp, xp[halfstep]);
			C_MUL3(xp[halfstep], wj, diff);
			if (--i == 0)
				break;
		}

should be as good, but I distrust optimisation in C compilers (being
stuck with PCC myself).  Adding some strategic `register' declarations
may also help.

Curiously enough, both versions result in almost the same number of
instructions; the `readable' one uses two more, but also uses simpler
addressing modes.  Overall, I imagine they run about the same speed,
yet I would far rather deal with one of my versions (of course! :-) ).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

g-rh@cca.CCA.COM (Richard Harter) (07/03/88)

In article <12295@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:

>The context of the following quote is benchmarking; in this case one
>is, of course, concerned with running existing code quickly.  Yet the
>code below is all too typical: there are no comments; the variables
>have names that tell nothing.  In short, the code is unmaintainable.

	.... fft routine deleted ....

The deleted code looks like a straight transliteration of a fortran
fft routine.  It is not as bad as Chris says -- if you are used to
the nomenclature conventions commonly used in fortran coding and know
how an fft works.  Thus xr and xi are *obviously* the real and imaginary
parts of the data, c and s are *obviously* cosine and sine, and wr and wi
are *obviously* roots of unity. 

>Can you tell what is going on?  If you are familiar with FFTs, I
>suppose it becomes clear enough after a while.  If not, it is a rather
>painful process to discover that the inner loop is really just
>
>	complex x[n], w[n], diff, half;

>	for i from 0 to n-1 step n/2^k:
>		half = i + n/2^(k+1);
>		diff = x[i] - x[half];
>		x[i] += x[half];
>		x[half] = diff * w[j * 2^k];
>	rof

>Unfortunately, C does not have a `complex' datatype.  The task is
>better suited to C++, or some other language with both shifts (which
>are lacking in FORTRAN) and complex expressions.  But I think one can
>do better than the code quoted with `>' above, without losing
>efficiency.  Here is one such attempt:

>	typedef struct complex {
>		float	re;
>		float	im;
>	} complex;

and Chris goes on to define macros for complex multiplies and adds.
He then reworks the code to use complex as a type with macro calls
to do the algebra.

Clearly this is a matter of style and what one is used to and so on.
However I find the original (which he dislikes) better than the proposed
substitute.

First a technical point.  In the original the real and imaginary parts are
passed separately; Chris replaces them with a structure array.  There are
reasons for having separate arrays.  The reasons don't matter -- what
does matter is that you don't change interfaces for strictly internal
reasons.

My reaction is that the algebra in the original is clear, immediately
recognizable, and easy to read; for me the macro calls obscure the code
rather than making it clearer.  I suspect, perhaps wrongly, that this will
regularly be the case with people who work with that kind of math.  This
says to me that the notion of clarity is not absolute.  

The major flaw in the original, as I see it, is simply that it lacked any
documentation.
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

karsh@trifolium.SGI.COM (Bruce Karsh) (07/04/88)

In article <12295@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>(N.B.: this is a language (code design) issue, not an architectural
>issue, so I have redirected followups to comp.lang.misc.)
>
>The context of the following quote is benchmarking; in this case one
>is, of course, concerned with running existing code quickly.  Yet the
>code below is all too typical: there are no comments; the variables
>have names that tell nothing.  In short, the code is unmaintainable.

 [ An fft procedure follows ]

  There is an extensive literature on the fft algorithm.  This fft procedure
is pretty simple, if you are familiar with what has been written about the
fft algorithm.  If not, then I can understand why this would look confusing.
This fft procedure seems exceptionally clear to me.  Most fft procedures that
I have seen are much more confusing than this one.

  Why should somebody who has not read about how fft's work be expected to
be able to "maintain" one?

  If I were to criticize this fft, I would point out:

     That it is lacking a bit reversal step,

     That n is a function of m.  In fact, n = 1<<m.  So n should not be
     passed in, it should be computed.  (Does anyone know of a good reason
     why anybody would ever want to run this with n != 1<<m)?

     That C compilers will perform the "float" arithmetic as "double"
     arithmetic by default.  Many compilers have an option switch to allow
     float arithmetic to be used.  You should compile this procedure with
     tha switch turned on.


>		/*
>		 * as k goes from to 0 n-1,
>		 * fullstep goes from n to n/2 to n/4 to ... to 2,
>		 * and halfstep goes from n/2 to n/4 to ... to 1.
>		 */
>		for (k = 0; k < m; k++) {
>			fullstep = n >> k;		/* n/2^k */
>			halfstep = fullstep >> 1;	/* n/2^(k+1) */

     This comment has at least two errors in it.  First, k goes from
     0 to m-1, not n-1.  Secondly, halfstep starts at n/4, not n/2.
     Secondly, both comments are gratuitous in that they give no more
     information than what is immediately visible in the source code.
     It's kind of like saying:

        pi=7.3232;    /* set pi to 7.3232 */

     The comment tells you something that you already know, but doesn't
     answer the question that you have!  What you don't know, and what
     you want the comment to tell you, is why are you changing the value
     of pi, and where did 7.3232 come from anyways?

--
Every time I bite my tongue, "Mr. Palomar concludes mentally,
"I must think not only of what I am about to say or not say, but
also of everything that, whether I say it or do not say it, will be
said or not said by me or by others."

chris@mimsy.UUCP (Chris Torek) (07/05/88)

>In article <12295@mimsy.UUCP> I griped about an fft routine:
>>... there are no comments; the variables
>>have names that tell nothing.  In short, the code is unmaintainable.

(which was not entirely true, but there were variables `ia', `n1', `n2',
`m', `n' which had various conditions asserted that were never mentioned.
For instance, m had to be log2(n); this was up to the caller to get right.
There is nothing inherently wrong with that, but it should certainly
be noted in the code!)

In article <16920@sgi.SGI.COM> karsh@trifolium.SGI.COM (Bruce Karsh) writes:
>  There is an extensive literature on the fft algorithm.

True enough, and one could be expected to read some of it before working
with that routine, but:

>  Why should somebody who has not read about how fft's work be expected to
>be able to "maintain" one?

This sort of thing always happens, even though it should not.  It helps
to put a pointer to some reference in a comment.

>  ... I would point out:
>     That n is a function of m.  In fact, n = 1<<m.  So n should not be
>     passed in, it should be computed.  (Does anyone know of a good reason
>     why anybody would ever want to run this with n != 1<<m)?

I have no idea, but since it was a possibility, I left it in.  (Of
course I changed the interface from pairs of arrays to a structure...!)

>     That C compilers will perform the "float" arithmetic as "double"
>     ....  Many compilers have an option switch to allow
>     float ... should compile ... with that switch turned on.

Actually, I did, when I was checking the code generated by the various
versions.  But since that option is not standard it is hard to turn it
on from code.  (4.3BSD uses `cc -f'; SunOS uses `cc -fsingle'.)

>>		/*
>>		 * as k goes from to 0 n-1,
>>		 * fullstep goes from n to n/2 to n/4 to ... to 2,
>>		 * and halfstep goes from n/2 to n/4 to ... to 1.
>>		 */

Ack!

>     This comment has at least two errors in it.  First, k goes from
>     0 to m-1, not n-1.  Secondly, halfstep starts at n/4, not n/2.

Sloppy editing on my part (including the `from to 0' phrasing).

>     Secondly, both comments are gratuitous in that they give no more
>     information than what is immediately visible in the source code.

(Two `secondly's?)  The point is that this is not *immediately*
visible, especially in the original version, where `fullstep' and
`halfstep' were n1 and n2.  (The comments went in before I changed the
code to shift by k.)

>     It's kind of like saying:
>
>        pi=7.3232;    /* set pi to 7.3232 */

Actually, it is a bit more like

	pi = 7.3232;	/* 2.3311 * 3.1415 */

:-)

>     The comment tells you something that you already know, but doesn't
>     answer the question that you have!  What you don't know, and what
>     you want the comment to tell you, is why are you changing the value
>     of pi, and where did 7.3232 come from anyways?

Well, once you know that w[t] *really* stands for omega sub theta (sine
and cosine of the angle), and the complex arithmetic is visible, then
the mathematical means by which an FFT transforms becomes immediately
obvious from the source :-) ....

More seriously, the whole system should probably be redone.  As Richard
Harter pointed out, a complete FFT package should have code to set up
w[], calcuate m=log2(n), and all that sort of stuff.  But all I wanted
to do was make the inner loop obvious without losing efficiency.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris