ok@edai.UUCP (06/02/83)
/* Some questions about "optimised" C code. */ struct {unsigned a:24; unsigned b:8} x; main() { int s, t; /* */ t = x.a; /* 1. opt&un extzv $0,$24,_x,-8(fp) */ x.a = t; /* 2. opt&un insv -8(fp),$0,$24,_x */ t = x.b; /* 3. un extzv $24,$8,_x,-8(fp) opt movzbl 3+_x,-8(fp) */ x.b = t; /* 4. opt&un insv -8(fp),$24,$8,_x why not movb -8(fp),_x+3 ?? */ t = s&0xffffff; /* 5. un bicl3 $-16777216,-4(fp),r0 un movl r0,-8(fp) opt extzv $0,$24,-4(fp),-8(fp) why not bicl3 $-16777216,-4(fp),-8(fp) ?? */ t = s>>24; /* 6. opt&un ashl $-24,-4(fp),r0 opt&un movl r0,-8(fp) why not ashl $-24,-4(fp),-8(fp) ?? */ } /* The questions are 1: The use of byte addressing in (3) is a significant improvement, about 3 micro-seconds as compared with 5 on a VAX 750. Why is is not used in (4)? 2: bicl3 $(-1<<N),<source>,<dest> is a significant improvement over extzv $0,$N,<source>,<dest> in speed (4 usec vs 5). Why is it not used in (5)? 3: The C compiler has a very strong tendency to generate <calculate answer in>,r0 movl r0,<dest> even when <dest> doesn't appear in the calculation. How come the peep-hole -Optimiser misses the opportunity of eliminating the movl? 4: This is just a guess, because I don't know how -O works, but the improvements referred to in q1 and q2 involve replacing one instruction by one instruction, so should be fairly easy to add to the -Optimiser. How may it be done? q3's cure is probably very much harder. If these changes aren't improvements on the 11/780, perhaps there could be a #define for selecting the machine? This isn't a frivolous problem constructed to "beat the optimiser". We have a program which does these operations an awful lot, and a 10-20% speedup obtained by making a few minor (and universally beneficial on 750s) tweaks to the compiler seems like a good bargain. */
mckusick@ucbvax.UUCP (06/07/83)
The reason that the optimizer frequently "misses" oportunities to get rid of someopt src,r0 movl r0,dst is because r0 is used as the return value register. If all of these were uniformly replaced by someopt src,dst then statements such as return (a = b + 1); that generate code as addl3 $1,_b,r0 movl r0,_a would get optimized to addl3 $1,_b,_a and would never set r0, thus returning garbage. The optimizer takes a very pessimistic view and never optimizes away assignments to r0 if the assignment is at the end of a basic block. The appropriate fix would be to have the C compiler avoid using r0 as a temporary unless it either forced to, or meant to set the return value. Kirk McKusick ucbvax!mckusick mckusick@berkeley
chris@umcp-cs.UUCP (06/09/83)
(This is a reply to mckusic's (sp?) answer about not optimizing r0 references.) Suggestion: change the optimizer to go ahead and optimize if the r0 instructions are followed by anything except a 'ret'. The C compiler will only generate an r0 instruction that also includes the return value before a return(e), correct? Another question: why does the optimizer only put sobgeq's around ``small'' loops? Is this just running into the limit on lines? I don't remember the specific example but it was something of the form if (expr) { while (--foo >= 0) { some; stmts; } } else { other; code; } } The --foo can be a sobgeq but the jbr to the ret below stumped the optimizer (is that as hard to read as it sounds?). I got it to optimize by adding a return after the while statement. - Chris (seismo!umcp-cs!chris)
martin@vax135.UUCP (06/10/83)
It would quite a problem to look for operations that would optimise out the usage of a register and check for then next statment to be a return. The better solution would be to do the calulation in another register if the result will not be passed to a return statment. Then the optimiser could check if the register was r0 and if so then not do the optimisation. This splits the work between the compiler and the optimiser. On another note, consider shell scripts to do edits on the asm file after it has been complied. Why not have a flag to cc to allow the passing of a program to be placed in the pipeline between c2 and as. How about:- $ cc main.c -O -SH :rofix -o main martin levy.
bj@yale-com.UUCP (06/13/83)
(This is a reply to mckusic's (sp?) answer about not optimizing r0 references.) Suggestion: change the optimizer to go ahead and optimize if the r0 instructions are followed by anything except a 'ret'. The C compiler will only generate an r0 instruction that also includes the return value before a return(e), correct? This could cause problems in cases like return ( e ? a = b+1 : c = d+1 ) This problem is caused because the optimizer only has the code, it does not know the intent of statements. This is also the reason that device driver code can not be optimized. B.J.