g-rh@XAIT.XEROX.COM (Richard Harter) (10/01/88)
In article <836@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes: >At one time, I spent a lot of my spare time improving other >people's code. One of the things I discovered is that almost all >the code I was dealing with had better than 25% execution time >fat. What I mean is that when I merely changed all of the code >to follow my coding practices (which are not anything special, >just obvious things that every coder ought to know), it got that >much better; I am not talking about mere tweaks, nor am I talking >about optimizing particular sections of code. I am talking about >reading each line of code and making obvious changes (things like >using temporaries when needed, adding appropriate register >declarations, eliminating obviously unnecessary function calls, >etc.) I would say that we all know what Bill is talking about here, except that "we" all obviously don't. Basically this is using hand optimization as a default in coding style. One can take the view that a good optimizer will do many of these things, e.g. use of temporaries, moving static expressions outside loops, placing the appropriate variables in registers, etc. One can also take the view that the capabilities and realities of optimizing compilers are less than claimed. Hand optimizing is safer across a variety of environments. In my experience hand optimizing in original code is less efficient, in the sense of code written per unit time, then writing code initially in a "clear" style first, and then hand optimizing afterwards. In short, write it in the simplest way first, get it working, and then improve performance. ... Discussion of practical superiority of pointers deleted. Let me give a simple example. Ignoring library routines, inline routines, etc, suppose that we want to copy n bytes from one place to another, say array src to array dst. We might write for(i=0;i<n;i++) dst[i]=src[i]; Let's hand optimize this. Instead of using indices (presumably less efficent) we will use pointers. Since we don't want to destroy dst and src as pointers we need temporaries. Thus we might write tmp1 = dst; tmp2 = src; for (i=0;i<n;i++) *tmp1++ = *tmp2++; [Leave us not worry about whether there is a more efficient way to handle the loop control.] The "optimized" version requires more statements; it requires more variables. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
francis@proxftl.UUCP (Francis H. Yu) (10/03/88)
In article <34112@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: ! ! Let me give a simple example. Ignoring library routines, !inline routines, etc, suppose that we want to copy n bytes from one !place to another, say array src to array dst. We might write ! ! for(i=0;i<n;i++) dst[i]=src[i]; ! !Let's hand optimize this. Instead of using indices (presumably less !efficent) we will use pointers. Since we don't want to destroy dst !and src as pointers we need temporaries. Thus we might write ! ! tmp1 = dst; ! tmp2 = src; ! for (i=0;i<n;i++) *tmp1++ = *tmp2++; ! The better code is tmp1 = dst; tmp2 = src; tmp3 = dst + n; while (tmp1 != tmp3) { *tmp1++ = *tmp2++; }
g-rh@XAIT.Xerox.COM (Richard Harter) (10/03/88)
In article <846@proxftl.UUCP> francis@proxftl.UUCP (Francis H. Yu) writes: >In article <34112@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >!Let's hand optimize this. Instead of using indices (presumably less >!efficent) we will use pointers. Since we don't want to destroy dst >!and src as pointers we need temporaries. Thus we might write >! tmp1 = dst; >! tmp2 = src; >! for (i=0;i<n;i++) *tmp1++ = *tmp2++; >The better code is > tmp1 = dst; > tmp2 = src; > tmp3 = dst + n; > while (tmp1 != tmp3) { > *tmp1++ = *tmp2++; > } The point of this little exercise was to illustrate that using pointers quite often means extra lines of code and temporaries. However your "better" code is not necessarily better. In many machines it is more efficient to control loops by counting a register down to zero and escaping on zero than it is to exit on a compare. A good compiler will do exactly that with the sample code. If we are hand optimizing, we write register int i; ... tmp1 = dst; tmp2 = src; for (i=n;i;--i) *tmp1++ = *tmp++; Your while loop, in pseudo machine language, runs something like this mv dst r1 # load register 1 with dst mv src r2 # load register 2 with src mv *n,r3 # move n to register 3 ble a2 # n already done, skip loop add r1,r3 # add n to get termination br a1 # loop test at bottom, go there a0: mv *r2++ *r1++ # Move src to dst, incrementing a1: compare r1,r3 # compare term with with dst ptr bne a0 # not eq, go to loop start a2: The corresponding code for a countdown loop is mv dst r1 # load register 1 with dst mv src r2 # load register 2 with src mv n r3 # load register 3 with n ble a2 # n already done, skip loop br a1 # loop test at bottom, go there a0: mv *r2++, *r1++ # Move src to dst, incrementing a1: dec r3 # Decrement count down index bgt a0 # index not negative, do more a2: The loop bodies are the same except that the compare is replaced with a decrement, which may be faster (it is on most machines) and is never slower (as far as I know). Moreover many machines have a decrement and test instruction. For example, the PDP-11 has an SOB instruction which combines the two. Note: It is more efficient to put the loop test at the bottom of the loop and branch there to initiate the loop; it saves a branch instruction. Lesson: If efficiency is a concern, countdown loops are faster than compare value loops. Lesson: If one is optimizing code one has to think about what the machine will have to do in different implementations when comparing them. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
chris@mimsy.UUCP (Chris Torek) (10/04/88)
>In article <34112@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >>Ignoring library routines, inline routines, etc, suppose that we want >>to copy n bytes from one place to another, say array src to array dst. >>We might write >> for(i=0;i<n;i++) dst[i]=src[i]; >>Let's hand optimize this. [This was part of an argument *against*, but let that stand:] >> tmp1 = dst; >> tmp2 = src; >> for (i=0;i<n;i++) *tmp1++ = *tmp2++; In article <846@proxftl.UUCP> francis@proxftl.UUCP (Francis H. Yu) writes: >The better code is > tmp1 = dst; > tmp2 = src; > tmp3 = dst + n; > while (tmp1 != tmp3) { > *tmp1++ = *tmp2++; > } Better for whom? On a 68010, the following is *much* better: register short n1; if ((n1 = n - 1) >= 0) do *tmp1++ = *tmp2++; while (--n1 != -1); because it can compile into a `dbra' loop and take advantage of the 68010 loop mode. But this is much less efficient on a VAX than the `movc3' instruction that the compiler might generate for the original loop. But the second way is better for the Foobar CPU, which has a `count-up' loop mode; but the third is better for the BazWoRKS chip. This is micro-efficiency at its finest: you cannot characterise it outside of its environment. Which loop is `best' is heavily machine dependent. If that loop takes much time, go ahead and optimise it, but if not, you might as well not bother, since everyone else will just have to re-optimise it differently anyway. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
ok@quintus.uucp (Richard A. O'Keefe) (10/04/88)
In article <34196@XAIT.Xerox.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >>! tmp1 = dst; >>! tmp2 = src; >>! for (i=0;i<n;i++) *tmp1++ = *tmp2++; >In many machines it is more >efficient to control loops by counting a register down to zero and escaping >on zero than it is to exit on a compare. A good compiler will do exactly >that with the sample code. This is only true if the machine's loop control instructions fit the loop in question *very* closely. For example, at least some MC680x0 C compilers will generate DBcc loops for neither of these forms: for (i = 0; i < n; i++) <stmt> for (i = n; --i >= 0; ) <stmt> Instead, you have to write for (i = n-1; --i != -1; ) <stmt> >Lesson: If efficiency is a concern, countdown loops are faster than >compare value loops. >Lesson: If one is optimizing code one has to think about what the machine >will have to do in different implementations when comparing them. Lesson: the 2nd lesson above renders the 1st moot. Countdown loops may be faster than others. Or they may be slower. Or it might depend on the size of the count variable. And it depends on the compiler as well as the machine. Lesson: looking for a way to avoid the loop entirely is going to pay off on most machines, tweaking it for one machine may make it very bad for another.
johnl@ima.ima.isc.com (John R. Levine) (10/04/88)
In article <34196@XAIT.Xerox.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >>! [ first allegedly optimal code ] >>! tmp1 = dst; >>! tmp2 = src; >>! for (i=0;i<n;i++) *tmp1++ = *tmp2++; > >> [second allegedly optimal code] >> tmp1 = dst; >> tmp2 = src; >> tmp3 = dst + n; >> while (tmp1 != tmp3) { >> *tmp1++ = *tmp2++; > [ third allegedly optimal code] > register int i; > ... > tmp1 = dst; > tmp2 = src; > for (i=n;i;--i) *tmp1++ = *tmp++; On an Intel 386, assuming your compiler isn't smart enough to recognize such loops and generate string move instructions, and assuming the two blocks don't overlap, your best bet probably is: register i, rdst = dst, rsrc = src; for(i = n; --i; ) rdst[i] = rsrc[i]; This uses the 386's scaled index modes and loop control instructions and generates a loop two instructions long. On non-Vax machines *p++ does not generate particularly good code, after all. The message here is that unless you have a specific performance problem in a specific environment, such micro-optimization is a waste of time since the "best" code depends heavily on the particular instruction set and addressing model in use. -- John R. Levine, IECC, PO Box 349, Cambridge MA 02238-0349, +1 617 492 3869 { bbn | think | decvax | harvard | yale }!ima!johnl, Levine@YALE.something Rome fell, Babylon fell, Scarsdale will have its turn. -G. B. Shaw
space@sns.UUCP (Lars Soltau) (10/05/88)
In article <34196@XAIT.Xerox.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >Lesson: If one is optimizing code one has to think about what the machine >will have to do in different implementations when comparing them. Lesson: never optimize C code if you have not written the compiler yourself, it's far easier and safer to optimize the assembler code. -- Lars Soltau UUCP: uunet!unido!sns!spcnet!space BIX: -- no bucks -- Here's looking at you, kid! -- the Medusa
g-rh@XAIT.Xerox.COM (Richard Harter) (10/08/88)
In article <222@sns.UUCP> space@sns.UUCP (Lars Soltau) writes: >In article <34196@XAIT.Xerox.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >>Lesson: If one is optimizing code one has to think about what the machine >>will have to do in different implementations when comparing them. >Lesson: never optimize C code if you have not written the compiler yourself, >it's far easier and safer to optimize the assembler code. If only life were so simple. If you are maintaining programs across a host of machines and operating systems assembler code is a strong minus. For many of us the issue of interest is optimization within the contraints of portability. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.