dfh@scirtp.UUCP (David F. Hinnant) (10/03/85)
I had several requests to post the responses I received on what the C optimizer really does. So.... Here they are. I also uncovered form my net.sources archives a series of postings a year or so ago from mi-cec!dvk concerning how the UNIX C optimiser doesn't optimise, it neatens. If anyone would like a copy of these postings, let me know. David Hinnant SCI Systems Inc ...akgua!mcnc!rti-sel!scirtp!dfh =============================================================================== From: Arnold Robbins <rti-sel!vrdxhq!seismo!gatech!arnold> <arnold@seism.UUCP> Subject: Re: What does the optimiser optimise? An interesting thing that c2 does (under BSD, anyway) is the following: switch (x) { case 1: foo ("x"); break; case 2: foo("y"); break; .... } where all it does is call foo() with different arguments for different values of x. The code will determine the argument based on the value of x, and then call foo(). In other words, the procedure call code is only generated once! I found this out when trying to optomize such a routine by just assigning a character pointer and then calling the routine. The code got bigger! ============================ From: rti-sel!mcnc!clyde!ulysses!sfmag!mjs <mjs@c.UUCP> Subject: Re: What does the optimiser optimise? Most optimizers (actually `assembly language improvers') do common tail merging, unreachable code removal, fixup branches to branches, and a host of other machine independant improvements. Depending on the code generated, many will also reuse registers known to have constant values (instead of using an immediate operand), collapse some sequences of instructions into shorter or faster sequences (often both, but typically only by accident), using both different instructions and/or different addressing modes. The arithmetic example you posted is unlikely to be optimizable because the compiler tries to reuse scratch registers too quickly, so the "(a+b)" in your example is unlikely to stay in a scratch register long enough (i.e., across enough instructions) to be reused. ================================ From: rti-sel!mcnc!bellcore!hammond (Rich A. Hammond) <hammond@bellcor.UUCP> Subject: Re: What does the optimiser optimise? 1) Common sub-expressions are tricky in C, for example : x = (a+b); (receive signal, change state of a or b) y = (a+b) +c; or even if you don't worry about signals, longjumps, ... then you must be sure that: a) None of the operands in the CSE could have their address taken, b) No indirection is done, c) No function calls (unless you do complete flow analysis). For these reasons, most optimisers for C don't play with CSE's much. Besides, in C using pointer arithmetic gets rid of the frequent and hidden to the programmer CSE's from array indexing arithmetic. I wrote an optimiser for the IBM Series/1 C compiler we used at U of Delaware and it only found a few (1 or 2) CSE's in an awful lot of code run through it. 2) For, while loops: On the PDP-11, which had a decrement and branch instruction, register int i; for (i = n; --i>0; ) would be turned into the dbr instruction. I suspect that machines with add one and branch and subtract one and branch with loop counters in registers might try and take advantage of those instructions, but this is not guaranteed. 3) Common code sequences: (another place besides IF ELSE is in SWITCH CASE) Yes, the PDP-11 optimiser found them and merged them, note that this is a space optimization and not a time optimization, since one sequence will be replaced by an unconditional branch plus the sequence. 4) Code that is never reached is often removed, note that the way the PDP-11 optimiser handled the branch-tail merging was to insert the unconditional branch ahead of the sequence and leave the sequence to be removed on the next pass by the unreachable code remover. In general, the quality of optimisers varies tremendously, and no two do the same set of "optimisations." I tried the trivial program register int i; register double a, b, c; for (i = 0; i < 1000000; i++) { a = b + c; /* 100 copies of this line */ } and only one optimiser noticed the CSE, even though this was the trivial case with all the variables in registers, no indirection, ... It was smart enough to just do the statement once per loop. Made the floating point unit look nice compared to the other machines. =================================== From: rti-sel!mcnc!decvax!ucbvax!ucdavis!deneb!ccrrick (Rick Heli) <ccrrick@.UUCP> Subject: Re: What does the optimiser optimise? Code that can never reached should be flagged as an error! --rick heli (... ucbvax!ucdavis!groucho!ccrrick) =================================== From: Vincent P. Broman <rti-sel!mcnc!decvax!sdcsvax!noscvax!broman> <broman@.UUCP> Subject: C optimizer David Hinnant, One way to investigate such questions is to run /lib/c2 with the -d option standalone. It prints out how many of what kind of improvements it did, at least on 4.xBSD UNIX. =================================== From: rti-sel!mcnc!decvax!astrovax!tl-vaxa!harbison (Samuel P. Harbison) <harbison@.UUCP> Subject: Optimization examples Dear Mr. Hinnant, Regarding your info-c post about compiler optimizations, I put your example programs--and some variants of my own--through our Tartan C compiler for Vax/UNIX, and I have included the generated code below. Most of the major optimizations we do are shown. I am not a disinterested observer insofar as I am the development manager for this product. I am quite proud of our compiler, which I think is the best optimizing C compiler available under UNIX. The compiler used in the examples is our latest update, release 2.1, which will be shipped to current and new customers within a month. Sam Harbison Tartan Laboratories Incorporated int x,y,z,a,b,c,d; /* Here's the first example, slightly modified from the original. We eliminate the common subexpressions (a+b) and (a+b)+c. */ int f1a() { x = (a+b); y = (a+b) +c; z = ((a+b) +c) / d; return 0; } /* Generated code: _f1a: .word 0 addl3 _b,_a,r0 movl r0,_x addl2 _c,r0 movl r0,_y divl3 _d,r0,_z clrl r0 ret */ /* Here's the previous example in its original form. Because the same variable is being overwritten, only the last assignment is kept. */ int f1b() { x = (a+b); x = (a+b) +c; x = (a+b) +c / d; return 0; } /* Generated code: _f1b: .word 0 addl3 _b,_a,r0 divl3 _d,_c,r1 addl3 r1,r0,_x clrl r0 ret */ /* Here are some looping examples. Note that the local variables go into registers even though they are not declared "register". */ int ary[10][10]; int f2() { int i, sum=0; /* The compiler can't know if the following loop will be executed even once, so it must perform the test first: */ for (i=1; i < x; i++) printf("First\n"); /* Generated code: movl $1,r6 cmpl _x,$1 bleq L45 L34: pushab Def001 calls $1,_printf aoblss _x,r6,L34 L45: ... .data Def001: .ascii "First" .byte 10, 0 */ /* The compiler knows this loop will be done at least once, so eliminates the top-of-loop test: */ for (i=1; i< 10; i++) {printf("Second\n");} /* Generated code: movl $1,r6 L36: pushab Def002 calls $1,_printf aobleq $9,r6,L36 ... .data Def002: .ascii "Second" .byte 10, 0 */ /* The compiler knows this loop will never be executed, and so generates no code: */ for (i=10; i<10; i++) {printf("Third\n");} /* Generated code: */ /* This one is more interesting; it is an example of "strength reduction". The multiplication i*40 in the address computation of ary[i][i] has been converted to an addition in the loop ("addl2 $40,r3"). */ for (i=0; i<10; i++) sum += ary[i][i]; return sum; } /* Generated code: clrl r7 # r7 is sum ... clrl r6 # r6 is i movab _ary,r3 # r3 is &ary[i][0] L43: addl2 (r3)[r6],r7 addl2 $40,r3 aobleq $9,r6,L43 movl r7,r0 ret */ /* Here's an example of hoisting an invariant expression out of a loop */ int f2b() { int i,sum=0; for(i=0;i<10;i++) sum += x*(a+b); /* x*(a+b) is invariant */ return sum; } /* Generated code: _f2b: .word 0 clrl r4 # r4 is sum clrl r3 # r3 is i addl3 _b,_a,r0 mull2 _x,r0 # r0 is x*(a+b) L45: addl2 r0,r4 aobleq $9,r3,L45 movl r4,r0 ret */ /* Here's your example of cross jumping. (The resulting code has a superfluous test; you can't win them all...) */ int f3() { if (x) { x = 1; y = 1; z = 1; } else { x = 1; y = 1; z = 1; } return 0; } /* Generated code: _f3: .word 0 tstl _x movl $1,_x movl $1,_y movl $1,_z clrl r0 ret */ /* Dead code elimination (and constant propagation): */ int f4() { int t=1; if (t) x=1; else x=2; return 0; } /* Generated code: _f4: .word 0 movl $1,_x clrl r0 ret */ /* Finally, here's one I don't think you'll find in any other C compiler: inline expansion of functions */ int abs(i) int i; { return i<0 ? -i : i; } int f5(x) int x; { return abs(x); } /* Generated code for f5(): [ Code for abs() is also generated but not shown.] We should have used r0 instead of r3, but... _f5: .word 0 movl 4(ap),r3 tstl r3 bgeq L55 mnegl r3,r3 L55: movl r3,r0 ret */ ============================== -- David Hinnant SCI Systems, Inc. {decvax, akgua}!mcnc!rti-sel!scirtp!dfh