[comp.lang.c] Bad optimization in PCC?

mikpe@amnesix.liu.se (Mikael Pettersson) (04/23/88)

[]
While hacking a piece of code the other day, I came across what
looks like a case of bad optimization in PCC-derivated compilers.

Relevant C code
---------------
(np, ep and envp are register char *, val is a parameter char *)

(1) while(*np && (*np++ == *ep++))
        /* empty */ ;
(2) if(!*np)
        *envp = val;

Notice that the value of the test at line (2) is known if the '*np'
part of line (1) evaluates to 0.

Actual code generated (compiling flags: -O -S):

Sun CC, SunOS3.2                  GNU CC, ver 1.18 for Sun3
-----------------------------------------------------------
L18:    tstb    a3@               L5:     moveb a1@,d0
     (+)jeq     L19   <-----THIS-----> (+)jeq L6
        cmpmb   a4@+,a3@+                 addql #1,a1
        jeq     L18                       cmpb a0@+,d0
L19:    tstb    a3@                       jeq L5
     (*)jne     L20               L6:     tstb a1@
        movl    a6@(12),a5@            (*)jne L7
L20:                                      movel d1,a2@
                                  L7:

(Goulds UTX/32 cc produced very similar code)

It seems to me that a *good* peep-hole optimizer should be able to
recognize that if the jump at line (+) is taken then the jump at line
(*) will not be taken. So why didn't they generate something like:

L18:    tstb   a3@
     (+)jeq    L19
        cmpmb  a4@+,a3@+
        jeq    L18
        tstb   a3@
        jne    L20
L19:    movl   a6@(12),a5@
L20:

which would save the unnecessary test+jump if the jump at (+) is taken?

Any PCC (or GCC) guru out there who have an answer?

Yours
-- 
Mikael Pettersson           ! Internet:mpe@ida.liu.se
Dept of Comp & Info Science ! UUCP:    mpe@liuida.uucp  -or-
University of Linkoping     !          {mcvax,munnari,uunet}!enea!liuida!mpe
Sweden                      ! ARPA:    mpe%ida.liu.se@uunet.uu.net

guy@gorodish.Sun.COM (Guy Harris) (04/25/88)

> While hacking a piece of code the other day, I came across what
> looks like a case of bad optimization in PCC-derivated compilers.

The peephole optimizer isn't part of PCC, so this doesn't have anything to do
with PCC per se.

> It seems to me that a *good* peep-hole optimizer should be able to
> recognize that if the jump at line (+) is taken then the jump at line
> (*) will not be taken.

Nope.  What happens if "*np" is asynchronously changed between the two tests?

chris@mimsy.UUCP (Chris Torek) (04/25/88)

In article <793@amnesix.liu.se> mikpe@amnesix.liu.se (Mikael Pettersson)
writes:
>While hacking a piece of code the other day, I came across what
>looks like a case of bad optimization in PCC-derivated compilers.

[Note that Mikael means `poor', not `incorrect', optimisation]

[Sun version, trimmed:]
>L18:	tstb	a3@
>    (+)jeq	L19
	...
>L19:	tstb	a3@
>    (*)jne	L20
>	movl	a6@(12),a5@
>L20:
>
>It seems to me that a *good* peep-hole optimizer should be able to
>recognize that if the jump at line (+) is taken then the jump at line
>(*) will not be taken.

As long as a3@ is not `volatile', or local equivalent.

>So why didn't they generate something like:
>L18:	tstb	a3@
>    (+)jeq	L19
	...
>L19:	movl	a6@(12),a5@
>L20:

The peephole optimisers we have are simply not that ambitious.  The Vax
c2 comes close; it carries ccodes around (sort of) and notices branches
to tests whose result is already known (see my recent bug fix for
branches to tests whose result is discarded).  But it is a bit scared
of tests on memory locations, lest it break code that depends on
variables being volatile:

	register char *cp;
	...
	while (*cp == 0) /*void*/;	/* wait for signal */

compiles to (Vax)

	L17:	tstb	(r11)
		jeql	Ln

The result is `obvious', so c2 `ought' to change this to

	L17:	tstb	(r11)
	L2000000:jeql	L2000000

which is *really* an infinite loop.  Instead it just leaves memory
references alone.

I do not know why GCC, which *does* have volatile, does not catch
that one.

Incidentally, if you rewrite the original code

>	while(*np && (*np++ == *ep++))
>		/* empty */ ;
>	if(!*np)
>		*envp = val;

as

	do {
		if (*np == 0) {
			*envp = val;
			break;
		}
	} while (*np++ == *ep++);

it compiles to (again, Vax)

	L18:	tstb	(r11)
		jneq	L17
		movl	-4(fp),(r9)
		jbr	L16
	L17:	cmpb	(r11)+,(r10)+
		jeql	L18
	L16:

(although why you would care that much about a microsecond in `setenv' I
am not sure...).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gnu@hoptoad.uucp (John Gilmore) (04/26/88)

Mikael Pettersson complained that neither pcc nor gcc optimized away
a second test of *np which occurs at the target label of another test of *np
(where np is char *, not declared volatile).  Chris Torek did a good job of
explaining that pcc doesn't optimize this because its optimizer transforms
assembler into assembler, and doesn't know whether np was volatile or not
(or even that this register contains your variable "np", or even that the
volatile keyword exists).

While I'm not an expert at gcc's internals, I think that gcc does not
catch this because the common subexpression optimizer does not
propagate the values of variables across basic blocks.  In other words,
after a successful branch it does not know that the value of "*np" is in a
register already.  I'll let Richard's comments (in cse.c) tell it:

/* The basic idea of common subexpression elimination is to go
   through the code, keeping a record of expressions that would
   have the same value at the current scan point, and replacing
   expressions encountered with the cheapest equivalent expression.
 
   It is too complicated to keep track of the different possibilities
   when control paths merge; so, at each label, we forget all that is
   known and start fresh.  This can be described as processing each
   basic block separately.  Note, however, that these are not quite
   the same as the basic blocks found by a later pass and used for
   data flow analysis and register packing.  We do not need to start fresh
   after a conditional jump instruction if there is no label there.
 */

Richard Stallman, gcc's author, constantly tries to balance the compile
time and memory requirements versus how much optimization can be
gained.  The Microsoft Cmerge compiler provides a good example where
serious optimization is done but the compiler is too slow as a result.

This is one optimization I'd like to see too, and someday I may figure
(and implement) a cheap way to do it.
-- 
John Gilmore  {sun,pacbell,uunet,pyramid,ihnp4}!hoptoad!gnu        gnu@toad.com
/* No comment */

firth@sei.cmu.edu (Robert Firth) (04/27/88)

In article <793@amnesix.liu.se> mikpe@amnesix.liu.se (Mikael Pettersson) writes:

[a missed optimisation in some C compilers: the code reads as shown]

>(1) while(*np && (*np++ == *ep++))
>        /* empty */ ;
>(2) if(!*np)
>        *envp = val;
>
>Notice that the value of the test at line (2) is known if the '*np'
>part of line (1) evaluates to 0.
>
>Actual code generated (compiling flags: -O -S):
>
>Sun CC, SunOS3.2                  GNU CC, ver 1.18 for Sun3
>-----------------------------------------------------------
>L18:    tstb    a3@               L5:     moveb a1@,d0
>     (+)jeq     L19   <-----THIS-----> (+)jeq L6
>        cmpmb   a4@+,a3@+                 addql #1,a1
>        jeq     L18                       cmpb a0@+,d0
>L19:    tstb    a3@                       jeq L5
>     (*)jne     L20               L6:     tstb a1@
>        movl    a6@(12),a5@            (*)jne L7
>L20:                                      movel d1,a2@
>                                  L7:

Just to add my own comments.  The optimisation requested should happen
as part of a set of transformations that might be called "branch chaining".

This kind of thing is pretty easy:

	goto L1
	...
    L1: goto L2

The optimiser looks at the target of the jump, sees it is another jump,
and simply changes the first jump to read 'goto L2', naturally also
decrementing the reference count of L1.  (Incidentally, it is sometimes
advantageous to make the reverse transformation, but that's another
story.)

As a next step, observe that the first jump can easily be conditional;
the transformation is still valid:

	if B goto L1
	...
    L1: goto L2

=>	if B goto L2

The third step is where BOTH jumps are conditional.  If the truth
of the first condition implies the truth of the second, we can still
chain:
	if B1 goto L1
	...
    L1: if B2 goto L2	/* B1 implies B2 */

=>	if B1 goto L2

Finally, if the truth of the first condition implies the FALSEHOOD of
the second condition, we can retarget the first branch, giving:

	if B1 goto L1a
	...
    L1: if B2 goto L2	/* B1 implies NOT B2 */
    L1a:

This is the required transformation.  Note, though, that it is a lot
easier to do on the attributed parse tree - optimising the Assembler
code involves also understanding why the second test instruction can
be bypassed, for example.