mikpe@amnesix.liu.se (Mikael Pettersson) (04/23/88)
[] While hacking a piece of code the other day, I came across what looks like a case of bad optimization in PCC-derivated compilers. Relevant C code --------------- (np, ep and envp are register char *, val is a parameter char *) (1) while(*np && (*np++ == *ep++)) /* empty */ ; (2) if(!*np) *envp = val; Notice that the value of the test at line (2) is known if the '*np' part of line (1) evaluates to 0. Actual code generated (compiling flags: -O -S): Sun CC, SunOS3.2 GNU CC, ver 1.18 for Sun3 ----------------------------------------------------------- L18: tstb a3@ L5: moveb a1@,d0 (+)jeq L19 <-----THIS-----> (+)jeq L6 cmpmb a4@+,a3@+ addql #1,a1 jeq L18 cmpb a0@+,d0 L19: tstb a3@ jeq L5 (*)jne L20 L6: tstb a1@ movl a6@(12),a5@ (*)jne L7 L20: movel d1,a2@ L7: (Goulds UTX/32 cc produced very similar code) It seems to me that a *good* peep-hole optimizer should be able to recognize that if the jump at line (+) is taken then the jump at line (*) will not be taken. So why didn't they generate something like: L18: tstb a3@ (+)jeq L19 cmpmb a4@+,a3@+ jeq L18 tstb a3@ jne L20 L19: movl a6@(12),a5@ L20: which would save the unnecessary test+jump if the jump at (+) is taken? Any PCC (or GCC) guru out there who have an answer? Yours -- Mikael Pettersson ! Internet:mpe@ida.liu.se Dept of Comp & Info Science ! UUCP: mpe@liuida.uucp -or- University of Linkoping ! {mcvax,munnari,uunet}!enea!liuida!mpe Sweden ! ARPA: mpe%ida.liu.se@uunet.uu.net
guy@gorodish.Sun.COM (Guy Harris) (04/25/88)
> While hacking a piece of code the other day, I came across what > looks like a case of bad optimization in PCC-derivated compilers. The peephole optimizer isn't part of PCC, so this doesn't have anything to do with PCC per se. > It seems to me that a *good* peep-hole optimizer should be able to > recognize that if the jump at line (+) is taken then the jump at line > (*) will not be taken. Nope. What happens if "*np" is asynchronously changed between the two tests?
chris@mimsy.UUCP (Chris Torek) (04/25/88)
In article <793@amnesix.liu.se> mikpe@amnesix.liu.se (Mikael Pettersson) writes: >While hacking a piece of code the other day, I came across what >looks like a case of bad optimization in PCC-derivated compilers. [Note that Mikael means `poor', not `incorrect', optimisation] [Sun version, trimmed:] >L18: tstb a3@ > (+)jeq L19 ... >L19: tstb a3@ > (*)jne L20 > movl a6@(12),a5@ >L20: > >It seems to me that a *good* peep-hole optimizer should be able to >recognize that if the jump at line (+) is taken then the jump at line >(*) will not be taken. As long as a3@ is not `volatile', or local equivalent. >So why didn't they generate something like: >L18: tstb a3@ > (+)jeq L19 ... >L19: movl a6@(12),a5@ >L20: The peephole optimisers we have are simply not that ambitious. The Vax c2 comes close; it carries ccodes around (sort of) and notices branches to tests whose result is already known (see my recent bug fix for branches to tests whose result is discarded). But it is a bit scared of tests on memory locations, lest it break code that depends on variables being volatile: register char *cp; ... while (*cp == 0) /*void*/; /* wait for signal */ compiles to (Vax) L17: tstb (r11) jeql Ln The result is `obvious', so c2 `ought' to change this to L17: tstb (r11) L2000000:jeql L2000000 which is *really* an infinite loop. Instead it just leaves memory references alone. I do not know why GCC, which *does* have volatile, does not catch that one. Incidentally, if you rewrite the original code > while(*np && (*np++ == *ep++)) > /* empty */ ; > if(!*np) > *envp = val; as do { if (*np == 0) { *envp = val; break; } } while (*np++ == *ep++); it compiles to (again, Vax) L18: tstb (r11) jneq L17 movl -4(fp),(r9) jbr L16 L17: cmpb (r11)+,(r10)+ jeql L18 L16: (although why you would care that much about a microsecond in `setenv' I am not sure...). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
gnu@hoptoad.uucp (John Gilmore) (04/26/88)
Mikael Pettersson complained that neither pcc nor gcc optimized away a second test of *np which occurs at the target label of another test of *np (where np is char *, not declared volatile). Chris Torek did a good job of explaining that pcc doesn't optimize this because its optimizer transforms assembler into assembler, and doesn't know whether np was volatile or not (or even that this register contains your variable "np", or even that the volatile keyword exists). While I'm not an expert at gcc's internals, I think that gcc does not catch this because the common subexpression optimizer does not propagate the values of variables across basic blocks. In other words, after a successful branch it does not know that the value of "*np" is in a register already. I'll let Richard's comments (in cse.c) tell it: /* The basic idea of common subexpression elimination is to go through the code, keeping a record of expressions that would have the same value at the current scan point, and replacing expressions encountered with the cheapest equivalent expression. It is too complicated to keep track of the different possibilities when control paths merge; so, at each label, we forget all that is known and start fresh. This can be described as processing each basic block separately. Note, however, that these are not quite the same as the basic blocks found by a later pass and used for data flow analysis and register packing. We do not need to start fresh after a conditional jump instruction if there is no label there. */ Richard Stallman, gcc's author, constantly tries to balance the compile time and memory requirements versus how much optimization can be gained. The Microsoft Cmerge compiler provides a good example where serious optimization is done but the compiler is too slow as a result. This is one optimization I'd like to see too, and someday I may figure (and implement) a cheap way to do it. -- John Gilmore {sun,pacbell,uunet,pyramid,ihnp4}!hoptoad!gnu gnu@toad.com /* No comment */
firth@sei.cmu.edu (Robert Firth) (04/27/88)
In article <793@amnesix.liu.se> mikpe@amnesix.liu.se (Mikael Pettersson) writes: [a missed optimisation in some C compilers: the code reads as shown] >(1) while(*np && (*np++ == *ep++)) > /* empty */ ; >(2) if(!*np) > *envp = val; > >Notice that the value of the test at line (2) is known if the '*np' >part of line (1) evaluates to 0. > >Actual code generated (compiling flags: -O -S): > >Sun CC, SunOS3.2 GNU CC, ver 1.18 for Sun3 >----------------------------------------------------------- >L18: tstb a3@ L5: moveb a1@,d0 > (+)jeq L19 <-----THIS-----> (+)jeq L6 > cmpmb a4@+,a3@+ addql #1,a1 > jeq L18 cmpb a0@+,d0 >L19: tstb a3@ jeq L5 > (*)jne L20 L6: tstb a1@ > movl a6@(12),a5@ (*)jne L7 >L20: movel d1,a2@ > L7: Just to add my own comments. The optimisation requested should happen as part of a set of transformations that might be called "branch chaining". This kind of thing is pretty easy: goto L1 ... L1: goto L2 The optimiser looks at the target of the jump, sees it is another jump, and simply changes the first jump to read 'goto L2', naturally also decrementing the reference count of L1. (Incidentally, it is sometimes advantageous to make the reverse transformation, but that's another story.) As a next step, observe that the first jump can easily be conditional; the transformation is still valid: if B goto L1 ... L1: goto L2 => if B goto L2 The third step is where BOTH jumps are conditional. If the truth of the first condition implies the truth of the second, we can still chain: if B1 goto L1 ... L1: if B2 goto L2 /* B1 implies B2 */ => if B1 goto L2 Finally, if the truth of the first condition implies the FALSEHOOD of the second condition, we can retarget the first branch, giving: if B1 goto L1a ... L1: if B2 goto L2 /* B1 implies NOT B2 */ L1a: This is the required transformation. Note, though, that it is a lot easier to do on the attributed parse tree - optimising the Assembler code involves also understanding why the second test instruction can be bypassed, for example.