[net.unix-wizards] Some questions about the C -Optimiser

ok@edai.UUCP (06/02/83)

/*  Some questions about "optimised" C code.
*/
struct {unsigned a:24; unsigned b:8} x;

main()
    {
	int s, t;	/*

*/	t = x.a;	/*				1.
opt&un		extzv	$0,$24,_x,-8(fp)

*/	x.a = t;	/*				2.
opt&un		insv	-8(fp),$0,$24,_x

*/	t = x.b;	/*				3.
un		extzv	$24,$8,_x,-8(fp)
opt		movzbl	3+_x,-8(fp)

*/	x.b = t;	/*				4.
opt&un		insv	-8(fp),$24,$8,_x
why not		movb	-8(fp),_x+3			??

*/	t = s&0xffffff;	/*				5.
un		bicl3	$-16777216,-4(fp),r0
un		movl	r0,-8(fp)
opt		extzv	$0,$24,-4(fp),-8(fp)
why not 	bicl3	$-16777216,-4(fp),-8(fp)	??

*/	t = s>>24;	/*				6.
opt&un		ashl	$-24,-4(fp),r0
opt&un		movl	r0,-8(fp)
why not		ashl	$-24,-4(fp),-8(fp)		??
*/  }			/*

The questions are

1:  The use of byte addressing in (3) is a significant improvement,
    about 3 micro-seconds as compared with 5 on a VAX 750.  Why is
    is not used in (4)?

2:  bicl3 $(-1<<N),<source>,<dest> is a significant improvement over
    extzv $0,$N,<source>,<dest> in speed (4 usec vs 5).  Why is it not
    used in (5)?

3:  The C compiler has a very strong tendency to generate
	<calculate answer in>,r0
	movl r0,<dest>
    even when <dest> doesn't appear in the calculation.  How come the
    peep-hole -Optimiser misses the opportunity of eliminating the
    movl?

4:  This is just a guess, because I don't know how -O works, but the
    improvements referred to in q1 and q2 involve replacing one
    instruction by one instruction, so should be fairly easy to add
    to the -Optimiser.  How may it be done?  q3's cure is probably
    very much harder.  If these changes aren't improvements on the
    11/780, perhaps there could be a #define for selecting the machine?

This isn't a frivolous problem constructed to "beat the optimiser".
We have a program which does these operations an awful lot, and a 10-20%
speedup obtained by making a few minor (and universally beneficial on
750s) tweaks to the compiler seems like a good bargain.
*/

mckusick@ucbvax.UUCP (06/07/83)

The reason that the optimizer frequently "misses" oportunities
to get rid of

	someopt	src,r0
	movl	r0,dst

is because r0 is used as the return value register. If all of
these were uniformly replaced by
	
	someopt	src,dst

then statements such as

	return (a = b + 1);

that generate code as

	addl3	$1,_b,r0
	movl	r0,_a

would get optimized to

	addl3	$1,_b,_a

and would never set r0, thus returning garbage. The optimizer
takes a very pessimistic view and never optimizes away assignments
to r0 if the assignment is at the end of a basic block. The appropriate
fix would be to have the C compiler avoid using r0 as a temporary
unless it either forced to, or meant to set the return value.

	Kirk McKusick
	ucbvax!mckusick
	mckusick@berkeley

chris@umcp-cs.UUCP (06/09/83)

(This  is  a  reply  to mckusic's (sp?)  answer about not optimizing r0
references.)  Suggestion:    change  the  optimizer  to  go  ahead  and
optimize  if  the  r0  instructions  are  followed by anything except a
'ret'.  The C compiler will only generate an r0 instruction  that  also
includes the return value before a return(e), correct?

Another  question:    why  does  the optimizer only put sobgeq's around
``small'' loops?  Is this just running into the  limit  on  lines?    I
don't remember the specific example but it was something of the form

	if (expr) {
		while (--foo >= 0) {
			some; stmts;
		}
	}
	else {
		other; code;
	}
}

The  --foo  can  be  a  sobgeq but the jbr to the ret below stumped the
optimizer (is that as hard to read  as  it  sounds?).    I  got  it  to
optimize by adding a return after the while statement.

			- Chris (seismo!umcp-cs!chris)

martin@vax135.UUCP (06/10/83)

It would quite a  problem  to  look  for  operations  that  would
optimise  out  the  usage  of  a register and check for then next
statment to be a return.

The better solution would be to  do  the  calulation  in  another
register  if  the result will not be passed to a return statment.
Then the optimiser could check if the register was r0 and  if  so
then  not  do  the optimisation. This splits the work between the
compiler and the optimiser.

On another note, consider shell scripts to do edits  on  the  asm
file  after  it  has been complied.  Why not have a flag to cc to
allow the passing of a program  to  be  placed  in  the  pipeline
between c2 and as.

How about:-
	$ cc main.c -O -SH :rofix -o main

martin levy.

bj@yale-com.UUCP (06/13/83)

    (This  is  a  reply  to mckusic's (sp?)  answer about not optimizing r0
    references.)  Suggestion:    change  the  optimizer  to  go  ahead  and
    optimize  if  the  r0  instructions  are  followed by anything except a
    'ret'.  The C compiler will only generate an r0 instruction  that  also
    includes the return value before a return(e), correct?

This could cause problems in cases like
	return ( e ? a = b+1 : c = d+1 )

This problem is caused because the optimizer only has the code, it does
not know the intent of statements.  This is also the reason that device
driver code can not be optimized.
						B.J.