chris@mimsy.UUCP (Chris Torek) (07/03/88)
(N.B.: this is a language (code design) issue, not an architectural issue, so I have redirected followups to comp.lang.misc.) The context of the following quote is benchmarking; in this case one is, of course, concerned with running existing code quickly. Yet the code below is all too typical: there are no comments; the variables have names that tell nothing. In short, the code is unmaintainable. >fft(xr,xi,n,m,wr,wi) > float xr[],xi[],wr[],wi[]; > unsigned int n,m; >{ > > unsigned int n1,n2,ia,ie,i,j,k,l; > float c,s,xrt,xit; > > n2 = n; > ie = 1; > for (k=0 ; k<m ; k++) > { > n1 = n2; > n2 = n2>>1; > ia = 0; > for (j=0 ; j<n2 ; j++) > { > c = wr[ia]; > s = wi[ia]; > ia = ia+ie; > > /* BEGIN INNER LOOP */ > > for (i=0 ; i<n ; i+=n1) > { > l = i+n2; > xrt = xr[i] - xr[l]; > xr[i] = xr[i] + xr[l]; > xit = xi[i] - xi[l]; > xi[i] = xi[i] + xi[l]; > xr[l] = c*xrt - s*xit; > xi[l] = c*xit + s*xrt; > } > > /* END INNER LOOP */ > > } > ie = ie<<1; > } >} Can you tell what is going on? If you are familiar with FFTs, I suppose it becomes clear enough after a while. If not, it is a rather painful process to discover that the inner loop is really just complex x[n], w[n], diff, half; for i from 0 to n-1 step n/2^k: half = i + n/2^(k+1); diff = x[i] - x[half]; x[i] += x[half]; x[half] = diff * w[j * 2^k]; rof Unfortunately, C does not have a `complex' datatype. The task is better suited to C++, or some other language with both shifts (which are lacking in FORTRAN) and complex expressions. But I think one can do better than the code quoted with `>' above, without losing efficiency. Here is one such attempt: typedef struct complex { float re; float im; } complex; /* could really use C++ here */ #define C_MUL3(res, a, b) \ ((res).re = (a).re * (b).re - (a).im * (b).im, \ (res).im = (a).im * (b).re + (a).re * (b).im) #define C_ADD2(res, a) \ ((res).re += (a).re, (res).im += (a).im) #define C_SUB3(res, a, b) \ ((res).re = (a).re - (b).re, (res).im = (a).im - (b).im) /* * Perform m-step (m = log2(n)) FFT on x[0..n) */ void fft(x, n, m, w) struct complex *x; unsigned int n, m; struct complex *w; { unsigned int fullstep, halfstep, i, j, k; complex *xp, *xh, wj, diff; /* * as k goes from to 0 n-1, * fullstep goes from n to n/2 to n/4 to ... to 2, * and halfstep goes from n/2 to n/4 to ... to 1. */ for (k = 0; k < m; k++) { fullstep = n >> k; /* n/2^k */ halfstep = fullstep >> 1; /* n/2^(k+1) */ for (j = 0; j < halfstep; j++) { i = 1 << k; /* 2^k */ wj = w[j * i]; /* * inner loop always runs at least once, * so test is at bottom to save time. * * xp ranges from &x[0] to &x[n-1], moving * by fullstep steps; xh is always halfstep * ahead of xp. */ for (xp = x;; xp += fullstep) { xh = &xp[halfstep]; C_SUB3(diff, *xp, *xh); C_ADD2(*xp, *xh); C_MUL3(*xh, wj, diff); if (--i == 0) break; } } } } With the comments removed, this version is actually one line shorter than the original. My original version of the innermost loop read for (i = 0; i < n; i += fullstep) { half = i + halfstep; C_SUB3(diff, x[i], x[half]); C_ADD2(x[i], x[half]); C_MUL3(x[half], wj, diff); } which is more obvious but (probably) slower. Certainly any good compiler should make `xh' unnecessary, so I suppose for (xp = 0;; xp += fullstep) { C_SUB3(diff, *xp, xp[halfstep]); C_ADD2(*xp, xp[halfstep]); C_MUL3(xp[halfstep], wj, diff); if (--i == 0) break; } should be as good, but I distrust optimisation in C compilers (being stuck with PCC myself). Adding some strategic `register' declarations may also help. Curiously enough, both versions result in almost the same number of instructions; the `readable' one uses two more, but also uses simpler addressing modes. Overall, I imagine they run about the same speed, yet I would far rather deal with one of my versions (of course! :-) ). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris