ok@edai.UUCP (06/02/83)
/* Some questions about "optimised" C code.
*/
struct {unsigned a:24; unsigned b:8} x;
main()
{
int s, t; /*
*/ t = x.a; /* 1.
opt&un extzv $0,$24,_x,-8(fp)
*/ x.a = t; /* 2.
opt&un insv -8(fp),$0,$24,_x
*/ t = x.b; /* 3.
un extzv $24,$8,_x,-8(fp)
opt movzbl 3+_x,-8(fp)
*/ x.b = t; /* 4.
opt&un insv -8(fp),$24,$8,_x
why not movb -8(fp),_x+3 ??
*/ t = s&0xffffff; /* 5.
un bicl3 $-16777216,-4(fp),r0
un movl r0,-8(fp)
opt extzv $0,$24,-4(fp),-8(fp)
why not bicl3 $-16777216,-4(fp),-8(fp) ??
*/ t = s>>24; /* 6.
opt&un ashl $-24,-4(fp),r0
opt&un movl r0,-8(fp)
why not ashl $-24,-4(fp),-8(fp) ??
*/ } /*
The questions are
1: The use of byte addressing in (3) is a significant improvement,
about 3 micro-seconds as compared with 5 on a VAX 750. Why is
is not used in (4)?
2: bicl3 $(-1<<N),<source>,<dest> is a significant improvement over
extzv $0,$N,<source>,<dest> in speed (4 usec vs 5). Why is it not
used in (5)?
3: The C compiler has a very strong tendency to generate
<calculate answer in>,r0
movl r0,<dest>
even when <dest> doesn't appear in the calculation. How come the
peep-hole -Optimiser misses the opportunity of eliminating the
movl?
4: This is just a guess, because I don't know how -O works, but the
improvements referred to in q1 and q2 involve replacing one
instruction by one instruction, so should be fairly easy to add
to the -Optimiser. How may it be done? q3's cure is probably
very much harder. If these changes aren't improvements on the
11/780, perhaps there could be a #define for selecting the machine?
This isn't a frivolous problem constructed to "beat the optimiser".
We have a program which does these operations an awful lot, and a 10-20%
speedup obtained by making a few minor (and universally beneficial on
750s) tweaks to the compiler seems like a good bargain.
*/mckusick@ucbvax.UUCP (06/07/83)
The reason that the optimizer frequently "misses" oportunities to get rid of someopt src,r0 movl r0,dst is because r0 is used as the return value register. If all of these were uniformly replaced by someopt src,dst then statements such as return (a = b + 1); that generate code as addl3 $1,_b,r0 movl r0,_a would get optimized to addl3 $1,_b,_a and would never set r0, thus returning garbage. The optimizer takes a very pessimistic view and never optimizes away assignments to r0 if the assignment is at the end of a basic block. The appropriate fix would be to have the C compiler avoid using r0 as a temporary unless it either forced to, or meant to set the return value. Kirk McKusick ucbvax!mckusick mckusick@berkeley
chris@umcp-cs.UUCP (06/09/83)
(This is a reply to mckusic's (sp?) answer about not optimizing r0
references.) Suggestion: change the optimizer to go ahead and
optimize if the r0 instructions are followed by anything except a
'ret'. The C compiler will only generate an r0 instruction that also
includes the return value before a return(e), correct?
Another question: why does the optimizer only put sobgeq's around
``small'' loops? Is this just running into the limit on lines? I
don't remember the specific example but it was something of the form
if (expr) {
while (--foo >= 0) {
some; stmts;
}
}
else {
other; code;
}
}
The --foo can be a sobgeq but the jbr to the ret below stumped the
optimizer (is that as hard to read as it sounds?). I got it to
optimize by adding a return after the while statement.
- Chris (seismo!umcp-cs!chris)martin@vax135.UUCP (06/10/83)
It would quite a problem to look for operations that would optimise out the usage of a register and check for then next statment to be a return. The better solution would be to do the calulation in another register if the result will not be passed to a return statment. Then the optimiser could check if the register was r0 and if so then not do the optimisation. This splits the work between the compiler and the optimiser. On another note, consider shell scripts to do edits on the asm file after it has been complied. Why not have a flag to cc to allow the passing of a program to be placed in the pipeline between c2 and as. How about:- $ cc main.c -O -SH :rofix -o main martin levy.
bj@yale-com.UUCP (06/13/83)
(This is a reply to mckusic's (sp?) answer about not optimizing r0
references.) Suggestion: change the optimizer to go ahead and
optimize if the r0 instructions are followed by anything except a
'ret'. The C compiler will only generate an r0 instruction that also
includes the return value before a return(e), correct?
This could cause problems in cases like
return ( e ? a = b+1 : c = d+1 )
This problem is caused because the optimizer only has the code, it does
not know the intent of statements. This is also the reason that device
driver code can not be optimized.
B.J.