chris@mimsy.UUCP (Chris Torek) (07/03/88)
(N.B.: this is a language (code design) issue, not an architectural issue, so I have redirected followups to comp.lang.misc.) The context of the following quote is benchmarking; in this case one is, of course, concerned with running existing code quickly. Yet the code below is all too typical: there are no comments; the variables have names that tell nothing. In short, the code is unmaintainable. >fft(xr,xi,n,m,wr,wi) > float xr[],xi[],wr[],wi[]; > unsigned int n,m; >{ > > unsigned int n1,n2,ia,ie,i,j,k,l; > float c,s,xrt,xit; > > n2 = n; > ie = 1; > for (k=0 ; k<m ; k++) > { > n1 = n2; > n2 = n2>>1; > ia = 0; > for (j=0 ; j<n2 ; j++) > { > c = wr[ia]; > s = wi[ia]; > ia = ia+ie; > > /* BEGIN INNER LOOP */ > > for (i=0 ; i<n ; i+=n1) > { > l = i+n2; > xrt = xr[i] - xr[l]; > xr[i] = xr[i] + xr[l]; > xit = xi[i] - xi[l]; > xi[i] = xi[i] + xi[l]; > xr[l] = c*xrt - s*xit; > xi[l] = c*xit + s*xrt; > } > > /* END INNER LOOP */ > > } > ie = ie<<1; > } >} Can you tell what is going on? If you are familiar with FFTs, I suppose it becomes clear enough after a while. If not, it is a rather painful process to discover that the inner loop is really just complex x[n], w[n], diff, half; for i from 0 to n-1 step n/2^k: half = i + n/2^(k+1); diff = x[i] - x[half]; x[i] += x[half]; x[half] = diff * w[j * 2^k]; rof Unfortunately, C does not have a `complex' datatype. The task is better suited to C++, or some other language with both shifts (which are lacking in FORTRAN) and complex expressions. But I think one can do better than the code quoted with `>' above, without losing efficiency. Here is one such attempt: typedef struct complex { float re; float im; } complex; /* could really use C++ here */ #define C_MUL3(res, a, b) \ ((res).re = (a).re * (b).re - (a).im * (b).im, \ (res).im = (a).im * (b).re + (a).re * (b).im) #define C_ADD2(res, a) \ ((res).re += (a).re, (res).im += (a).im) #define C_SUB3(res, a, b) \ ((res).re = (a).re - (b).re, (res).im = (a).im - (b).im) /* * Perform m-step (m = log2(n)) FFT on x[0..n) */ void fft(x, n, m, w) struct complex *x; unsigned int n, m; struct complex *w; { unsigned int fullstep, halfstep, i, j, k; complex *xp, *xh, wj, diff; /* * as k goes from to 0 n-1, * fullstep goes from n to n/2 to n/4 to ... to 2, * and halfstep goes from n/2 to n/4 to ... to 1. */ for (k = 0; k < m; k++) { fullstep = n >> k; /* n/2^k */ halfstep = fullstep >> 1; /* n/2^(k+1) */ for (j = 0; j < halfstep; j++) { i = 1 << k; /* 2^k */ wj = w[j * i]; /* * inner loop always runs at least once, * so test is at bottom to save time. * * xp ranges from &x[0] to &x[n-1], moving * by fullstep steps; xh is always halfstep * ahead of xp. */ for (xp = x;; xp += fullstep) { xh = &xp[halfstep]; C_SUB3(diff, *xp, *xh); C_ADD2(*xp, *xh); C_MUL3(*xh, wj, diff); if (--i == 0) break; } } } } With the comments removed, this version is actually one line shorter than the original. My original version of the innermost loop read for (i = 0; i < n; i += fullstep) { half = i + halfstep; C_SUB3(diff, x[i], x[half]); C_ADD2(x[i], x[half]); C_MUL3(x[half], wj, diff); } which is more obvious but (probably) slower. Certainly any good compiler should make `xh' unnecessary, so I suppose for (xp = 0;; xp += fullstep) { C_SUB3(diff, *xp, xp[halfstep]); C_ADD2(*xp, xp[halfstep]); C_MUL3(xp[halfstep], wj, diff); if (--i == 0) break; } should be as good, but I distrust optimisation in C compilers (being stuck with PCC myself). Adding some strategic `register' declarations may also help. Curiously enough, both versions result in almost the same number of instructions; the `readable' one uses two more, but also uses simpler addressing modes. Overall, I imagine they run about the same speed, yet I would far rather deal with one of my versions (of course! :-) ). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
g-rh@cca.CCA.COM (Richard Harter) (07/03/88)
In article <12295@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >The context of the following quote is benchmarking; in this case one >is, of course, concerned with running existing code quickly. Yet the >code below is all too typical: there are no comments; the variables >have names that tell nothing. In short, the code is unmaintainable. .... fft routine deleted .... The deleted code looks like a straight transliteration of a fortran fft routine. It is not as bad as Chris says -- if you are used to the nomenclature conventions commonly used in fortran coding and know how an fft works. Thus xr and xi are *obviously* the real and imaginary parts of the data, c and s are *obviously* cosine and sine, and wr and wi are *obviously* roots of unity. >Can you tell what is going on? If you are familiar with FFTs, I >suppose it becomes clear enough after a while. If not, it is a rather >painful process to discover that the inner loop is really just > > complex x[n], w[n], diff, half; > for i from 0 to n-1 step n/2^k: > half = i + n/2^(k+1); > diff = x[i] - x[half]; > x[i] += x[half]; > x[half] = diff * w[j * 2^k]; > rof >Unfortunately, C does not have a `complex' datatype. The task is >better suited to C++, or some other language with both shifts (which >are lacking in FORTRAN) and complex expressions. But I think one can >do better than the code quoted with `>' above, without losing >efficiency. Here is one such attempt: > typedef struct complex { > float re; > float im; > } complex; and Chris goes on to define macros for complex multiplies and adds. He then reworks the code to use complex as a type with macro calls to do the algebra. Clearly this is a matter of style and what one is used to and so on. However I find the original (which he dislikes) better than the proposed substitute. First a technical point. In the original the real and imaginary parts are passed separately; Chris replaces them with a structure array. There are reasons for having separate arrays. The reasons don't matter -- what does matter is that you don't change interfaces for strictly internal reasons. My reaction is that the algebra in the original is clear, immediately recognizable, and easy to read; for me the macro calls obscure the code rather than making it clearer. I suspect, perhaps wrongly, that this will regularly be the case with people who work with that kind of math. This says to me that the notion of clarity is not absolute. The major flaw in the original, as I see it, is simply that it lacked any documentation. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
karsh@trifolium.SGI.COM (Bruce Karsh) (07/04/88)
In article <12295@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >(N.B.: this is a language (code design) issue, not an architectural >issue, so I have redirected followups to comp.lang.misc.) > >The context of the following quote is benchmarking; in this case one >is, of course, concerned with running existing code quickly. Yet the >code below is all too typical: there are no comments; the variables >have names that tell nothing. In short, the code is unmaintainable. [ An fft procedure follows ] There is an extensive literature on the fft algorithm. This fft procedure is pretty simple, if you are familiar with what has been written about the fft algorithm. If not, then I can understand why this would look confusing. This fft procedure seems exceptionally clear to me. Most fft procedures that I have seen are much more confusing than this one. Why should somebody who has not read about how fft's work be expected to be able to "maintain" one? If I were to criticize this fft, I would point out: That it is lacking a bit reversal step, That n is a function of m. In fact, n = 1<<m. So n should not be passed in, it should be computed. (Does anyone know of a good reason why anybody would ever want to run this with n != 1<<m)? That C compilers will perform the "float" arithmetic as "double" arithmetic by default. Many compilers have an option switch to allow float arithmetic to be used. You should compile this procedure with tha switch turned on. > /* > * as k goes from to 0 n-1, > * fullstep goes from n to n/2 to n/4 to ... to 2, > * and halfstep goes from n/2 to n/4 to ... to 1. > */ > for (k = 0; k < m; k++) { > fullstep = n >> k; /* n/2^k */ > halfstep = fullstep >> 1; /* n/2^(k+1) */ This comment has at least two errors in it. First, k goes from 0 to m-1, not n-1. Secondly, halfstep starts at n/4, not n/2. Secondly, both comments are gratuitous in that they give no more information than what is immediately visible in the source code. It's kind of like saying: pi=7.3232; /* set pi to 7.3232 */ The comment tells you something that you already know, but doesn't answer the question that you have! What you don't know, and what you want the comment to tell you, is why are you changing the value of pi, and where did 7.3232 come from anyways? -- Every time I bite my tongue, "Mr. Palomar concludes mentally, "I must think not only of what I am about to say or not say, but also of everything that, whether I say it or do not say it, will be said or not said by me or by others."
chris@mimsy.UUCP (Chris Torek) (07/05/88)
>In article <12295@mimsy.UUCP> I griped about an fft routine: >>... there are no comments; the variables >>have names that tell nothing. In short, the code is unmaintainable. (which was not entirely true, but there were variables `ia', `n1', `n2', `m', `n' which had various conditions asserted that were never mentioned. For instance, m had to be log2(n); this was up to the caller to get right. There is nothing inherently wrong with that, but it should certainly be noted in the code!) In article <16920@sgi.SGI.COM> karsh@trifolium.SGI.COM (Bruce Karsh) writes: > There is an extensive literature on the fft algorithm. True enough, and one could be expected to read some of it before working with that routine, but: > Why should somebody who has not read about how fft's work be expected to >be able to "maintain" one? This sort of thing always happens, even though it should not. It helps to put a pointer to some reference in a comment. > ... I would point out: > That n is a function of m. In fact, n = 1<<m. So n should not be > passed in, it should be computed. (Does anyone know of a good reason > why anybody would ever want to run this with n != 1<<m)? I have no idea, but since it was a possibility, I left it in. (Of course I changed the interface from pairs of arrays to a structure...!) > That C compilers will perform the "float" arithmetic as "double" > .... Many compilers have an option switch to allow > float ... should compile ... with that switch turned on. Actually, I did, when I was checking the code generated by the various versions. But since that option is not standard it is hard to turn it on from code. (4.3BSD uses `cc -f'; SunOS uses `cc -fsingle'.) >> /* >> * as k goes from to 0 n-1, >> * fullstep goes from n to n/2 to n/4 to ... to 2, >> * and halfstep goes from n/2 to n/4 to ... to 1. >> */ Ack! > This comment has at least two errors in it. First, k goes from > 0 to m-1, not n-1. Secondly, halfstep starts at n/4, not n/2. Sloppy editing on my part (including the `from to 0' phrasing). > Secondly, both comments are gratuitous in that they give no more > information than what is immediately visible in the source code. (Two `secondly's?) The point is that this is not *immediately* visible, especially in the original version, where `fullstep' and `halfstep' were n1 and n2. (The comments went in before I changed the code to shift by k.) > It's kind of like saying: > > pi=7.3232; /* set pi to 7.3232 */ Actually, it is a bit more like pi = 7.3232; /* 2.3311 * 3.1415 */ :-) > The comment tells you something that you already know, but doesn't > answer the question that you have! What you don't know, and what > you want the comment to tell you, is why are you changing the value > of pi, and where did 7.3232 come from anyways? Well, once you know that w[t] *really* stands for omega sub theta (sine and cosine of the angle), and the complex arithmetic is visible, then the mathematical means by which an FFT transforms becomes immediately obvious from the source :-) .... More seriously, the whole system should probably be redone. As Richard Harter pointed out, a complete FFT package should have code to set up w[], calcuate m=log2(n), and all that sort of stuff. But all I wanted to do was make the inner loop obvious without losing efficiency. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris