4bsd-f77@utah-cs.UUCP (4.2 BSD f77 bug reports) (08/01/84)
From: Donn Seeley <donn@utah-cs.arpa> Subject: f77 seriously breaks certain expressions with exponentiations Index: usr.bin/f77/src/f77pass1/optcse.c 4.2BSD Description: If an expression or series of expressions contain subexpressions of the form 'x ** N(i)' where N(i) are integer constants, f77 can generate code that turns all such expressions into 'x ** M' where M is one such N(i). This bug was brought to my attention by Sean Byrne. Repeat-By: Create the following program and compile it with the optimizer on: ---------------------------------------------------------------- program xmpl integer i i = 5 i = i ** 3 + i ** 5 print *, i stop end ---------------------------------------------------------------- When run, the program prints '250' when it should print '3250'. Fix: Here's the relevant material from the assembly code that's generated, prettied up slightly: ---------------------------------------------------------------- movl $5,{i} movl {i},-4(fp) mull3 -4(fp),-4(fp),r0 # r0 <- i ** 2 mull3 -4(fp),r0,-8(fp) # -8(fp) <- i ** 3 movl {i},-12(fp) mull2 -12(fp),-12(fp) # -12(fp) <- i ** 2 addl3 -8(fp),-8(fp),{i} # i <- i ** 3 + i ** 3 ---------------------------------------------------------------- Notice how the value in -12(fp) is developed but never gets used for anything. If you turn off common subexpression elimination and copy propagation in the compiler by supplying the flag '-d7,13' as an argument you get the following correct code instead (you need to turn off copy propagation if you haven't installed my earlier copy propagation bug fix, otherwise turning off CSE isn't enough!): ---------------------------------------------------------------- movl $5,{i} movl {i},-4(fp) movl -4(fp),-8(fp) + mull3 -4(fp),-4(fp),r0 # r0 <- i ** 2 + mull3 -8(fp),r0,-12(fp) # -12(fp) <- i ** 3 movl {i},-16(fp) movl -16(fp),-20(fp) - mull2 -16(fp),-16(fp) # -16(fp) <- i ** 2 * mull3 -16(fp),-16(fp),r0 # r0 <- i ** 4 * mull3 -20(fp),r0,-24(fp) # -24(fp) <- i ** 5 addl3 -24(fp),-12(fp),{i} # i <- i ** 3 + i ** 5 ---------------------------------------------------------------- What is happening in the former example is that the common subexpression elimination code has noticed that the two computations marked '+' and '*' in the latter example are identical; hence it eliminates the second subexpression in favor of the first. Unfortunately the two subexpressions are NOT computing the same value, because the assignment which results in the instruction marked '-' in the latter example changes the value of the temporary. The problem is that the CSE code doesn't realize that the '*=' operator is different from the '=' operator. The '*=' is generated by the code which handles the inline expansion of exponentiations with constant integer exponents, and it is a '*=' expression which produces the instruction marked '-' above. If it were an '=' expression instead of a '*=', the CSE would be correct. The fix is to make the CSE code understand that the lvalue of a '*=' operator does not receive the value of the right hand side (in function scantree() in optcse.c): ---------------------------------------------------------------- *** /tmp/,RCSt1028888 Thu Jul 19 19:21:54 1984 --- optcse.c Thu Jul 19 12:03:55 1984 *************** *** 595,602 switch (p->exprblock.opcode) { case OPASSIGN : - case OPPLUSEQ : - case OPSTAREQ : { Addrp ap; --- 604,609 ----- switch (p->exprblock.opcode) { case OPASSIGN : { Addrp ap; *************** *** 609,614 } new = newnode(p,idp->initval,NULL,NULL); appendnode(new); new->rs = new; return(new->rs); } --- 616,638 ----- } new = newnode(p,idp->initval,NULL,NULL); appendnode(new); + new->rs = new; + return(new->rs); + } + + /* + * Don't optimize these... they're a real hassle. + */ + case OPPLUSEQ : + case OPSTAREQ : + { + Addrp ap; + + ap = (Addrp) p->exprblock.leftp; + idp = findid(ap); + killdepnodes(idp); + idp->assgnval = NULL; + new = newnode(p,lnode,rnode,NULL); new->rs = new; return(new->rs); } ---------------------------------------------------------------- As you can see by the comment, the effect of this fix is to largely prevent CSE between exponentiations. But a substantial amount of time can be wasted doing extra multiplies in the computation of (say) a polynomial if you don't take advantage of the fact that (say) 'x ** 8' can be computed from 'x ** 4 * x ** 4', where 'x ** 4' was computed for an earlier term. I have two different attacks on this problem. Both attacks involve the removal of '*=' operators so that CSE can proceed. The first line of attack is to produce a new version of the code that does inline expansion of exponentiations, which follows the same strategy as the old code but uses simple multiplies instead of '*='. This version has the problem that CSE can't remove some of the computations, because it doesn't realize that some of the temporaries are only going to be used to compute the exponentiation and can be thrown away if their value isn't needed. The second line of attack is to avoid creating any temporaries at all in the exponentation code, and let CSE do all the work of pruning the expression tree. This has the advantage of drastically simplifying the inline expansion code, but there is a cost. The problem is that the expression tree can get very large (on the order of the size of the exponent) before CSE gets a chance to prune it (to something closer to the log of the exponent in size). What I did is simply pick a largest exponent beyond which exponentiation is not done inline. This is not perhaps as reasonable as it sounds, because the worst case exponent (2 ** 31 - 1) only requires 60-odd multiplies under the previous system, whereas the new system requires a subroutine call. Since most constant integer exponents are small, though, this probably won't hurt very much. To install the latter fix, replace the routine buffpower() in optim.c with the following two routines: ---------------------------------------------------------------- /* * Called by expand() to eliminate exponentiations to integer constants. */ LOCAL expptr buffpower( p ) expptr p; { expptr base; Addrp newtemp; expptr storetemp = ENULL; expptr powtree(); expptr result; ftnint exp; if ( ! ISICON( p->exprblock.rightp ) ) fatal( "buffpower: bad non-integer exponent" ); base = p->exprblock.leftp; exp = p->exprblock.rightp->constblock.const.ci; if ( exp < 2 ) fatal( "buffpower: bad exponent less than 2" ); if ( exp > 64 ) { /* * Let's be reasonable, here... * * We need to fox the constant exponent to get mkpower() to * do its job properly. */ newtemp = mktemp( TYINT, ENULL ); storetemp = mkexpr( OPASSIGN, cpexpr( (expptr) newtemp ), mkintcon( exp ) ); result = mkexpr( OPPOWER, cpexpr( base ), cpexpr( newtemp ) ); result = fixexpr( result ); result = mkexpr( OPCOMMA, storetemp, result ); return ( result ); } /* * If the base is not a simple variable, evaluate it and copy the * result into a temporary. */ if ( ! (base->tag == TADDR && ISCONST( base->addrblock.memoffset )) ) { newtemp = mktemp( base->headblock.vtype, PNULL ); storetemp = mkexpr( OPASSIGN, cpexpr( (expptr) newtemp ), cpexpr( base ) ); base = (expptr) newtemp; } result = powtree( base, exp ); if ( storetemp != ENULL ) result = mkexpr( OPCOMMA, storetemp, result ); frexpr( p ); return ( result ); } /* * powtree( base, exp ) -- Create a tree of multiplications which computes * base ** exp. The tree is built so that CSE will compact it if * possible. The routine works by creating subtrees that compute * exponents which are powers of two, then multiplying these * together to get the result; this gives a log2( exp ) tree depth * and lots of subexpressions which can be eliminated. */ LOCAL expptr powtree( base, exp ) expptr base; register ftnint exp; { register expptr r = ENULL, r1; register int i; for ( i = 0; exp; ++i, exp >>= 1 ) if ( exp & 1 ) if ( i == 0 ) r = (expptr) cpexpr( base ); else { r1 = powtree( base, 1 << (i - 1) ); r1 = mkexpr( OPSTAR, r1, cpexpr( r1 ) ); r = (r ? mkexpr( OPSTAR, r1, r ) : r1); } return ( r ); } ---------------------------------------------------------------- I made a couple other aesthetic changes to the rest of optim.c as well, in expand() and buffmnmx(): ---------------------------------------------------------------- *** /tmp/,RCSt1029088 Thu Jul 19 20:00:10 1984 --- /tmp/,RCSt2029088 Thu Jul 19 20:00:13 1984 *** 476,482 { Addrp t; expptr q; ! Addrp buffmnmx(), buffpower(); if (!p) return (ENULL); --- 487,493 ----- { Addrp t; expptr q; ! expptr buffmnmx(), buffpower(); if (!p) return (ENULL); *************** *** 592,598 * local version of routine putmnmx in putpcc.c, called by expand */ ! LOCAL Addrp buffmnmx(p) register expptr p; { int op, type; --- 603,609 ----- * local version of routine putmnmx in putpcc.c, called by expand */ ! LOCAL expptr buffmnmx(p) register expptr p; { int op, type; *************** *** 636,642 frtemp(newtemp); frchain( &p0 ); ! return ( (Addrp) result); } --- 647,653 ----- frtemp(newtemp); frchain( &p0 ); ! return (result); } ---------------------------------------------------------------- The code that this generates looks like this (for the same expressions which were displayed above): ---------------------------------------------------------------- movl $5,{i} mull3 {i},{i},-4(fp) mull3 {i},-4(fp),r0 mull3 -4(fp),-4(fp),r1 mull2 {i},r1 addl3 r1,r0,{i} ---------------------------------------------------------------- Even this is not optimal -- a multiply could have been saved if the tree had been structured so that 'x ** 3' was a subexpression in the tree for 'x ** 5'. (The algorithm favors exponents of the form '2 ** n' or '2 ** (n-1)', which you can see if you spend a little while looking at powtree().) But it's simple, and it's also the best I have for now. Donn Seeley University of Utah CS Dept donn@utah-cs.arpa 40 46' 6"N 111 50' 34"W (801) 581-5668 decvax!utah-cs!donn