4bsd-f77@utah-cs.UUCP (4.2 BSD f77 bug reports) (08/24/84)
From: Donn Seeley <donn@utah-cs.arpa> Subject: F77 sometimes can't handle mixed REAL and REAL*8 expressions Index: f77/src/f1/order.c 4.2BSD Description: When generating code for certain complicated expressions which mix REAL and REAL*8 values, f77 will bomb out because it underestimates the number of registers that the expression will require. I found this bug in a long-lost report from Kirk Thege at UC Berkeley. Repeat-By: Clip out the following program (based on a program by Kirk Thege) and try to compile it. It doesn't matter whether the optimizer is on or off, but the examples of code below assume that it is on: ---------------------------------------------------------------- program xmpl integer i real a, b, c, d real*8 x(3), y, z i = 1 a = 1.5 b = 2.0 c = 2.5 d = 3.0 x(1) = 3.5 y = 4.0 z = (a + 1.0) * ((x(i) / b) + ((d / y) * c)) print *, z stop end ---------------------------------------------------------------- When you try to compile, the compiler will print something like this: ---------------------------------------------------------------- xmpl.f: MAIN xmpl: xmpl.f, line 15: compiler error: no table entry for op REG compiler error. ---------------------------------------------------------------- Fix: The problem is that the code generator has painted itself into a corner. If you run the code generator manually, you can see what it's trying to do (code has been prettied up): ---------------------------------------------------------------- addf3 {0x4080},{a},r0 movl {i},r1 cvtfd {b},r2 divd3 r2,&{x}-8[r1],r2 cvtfd {d},r4 divd2 {y},r4 <bombs> ---------------------------------------------------------------- When it crashes, the code generator has just finished emitting the code for '(d / y)', and now needs to multiply the value of 'c' into r4. Unfortunately, 'c' is a REAL and the expression is of type REAL*8, so 'c' needs to be converted; but there are no more registers available for the conversion, since we have used up r0-r5 to hold temporary results of subexpressions. The code generator frantically attempts to rewrite the expression tree, and when it can't find anything appropriate it finally runs over one of its many 'never happens' bugs and croaks. Why didn't the code generator arrange to compute the value of one or more subexpressions on the stack? The problem is that the Sethi-Ullman algorithm which the code generator uses to estimate its needs for registers is not getting the right information; it is led to believe that it can get away with fewer registers than are really required. Looking at the intermediate code, we see the following (more prettied up and ASCIIfied output): ---------------------------------------------------------------- oreg (double) {z} oreg (float) {a} name (float) L21 + (float) reg (double *) r11 int (long) &{x}-8 + (double *) oreg (long) {i} int (long) 3 << (long) + (double *) * (indir) (double) oreg (float) {b} / (double) oreg (float) {d} oreg (double) {y} / (double) oreg (float) {c} * (double) + (double) * (double) = (double) ---------------------------------------------------------------- The parser pass of the compiler does not generate conversion operations from REAL to REAL*8! There are a few lines in fixexpr() in f77/src/f77pass1/expr.c which show that this is intentional. Apparently the code generator can handle certain conversions of REAL to REAL*8 without recourse to explicit conversion operators, since there are entries in the code table for these contingencies. However the Sethi-Ullman algorithm doesn't know about these implicit conversions and as a result it thinks that a REAL operand requires the same space regardless of whether it needs to be converted to REAL*8. There are two solutions -- either force conversion operators to be emitted or fix the Sethi-Ullman algorithm in the code generator so that it knows about implicit conversions. I will present a fix for the code generator, since both sets of changes do the job, and the code generator fixes produce better code. There seems to be a bug in the code generator which the parser pass is taking advantage of -- if an implicit REAL to REAL*8 conversion occurs to the left operand of an expression, then the Sethi-Ullman algorithm allocates fewer registers for this expression than for the corresponding expression with an explicit conversion operator forcing the left operand to be type REAL*8. There are special code table entries for REAL*8 expressions with REAL operands which can conserve registers. The problem crops up because registers are really only conserved if the REAL operand is on the left, where it can be converted into REAL*8 in situ. For operands on the right an extra register pair is required for a conversion. The obvious change to the code generator is to force the Sethi-Ullman algorithm to pretend that there is an explicit conversion operation if an implicit conversion must apply to the right operand but not the left. (A somewhat cleaner but more extensive change would be to recognize explicit conversions and handle them this way anyway...) The changes are in the routine sucomp() in order.c: ---------------------------------------------------------------- *** /tmp/,RCSt1001501 Tue Jul 31 17:05:42 1984 --- order.c Tue Jul 31 15:44:47 1984 *************** *** 75,80 number, or local equivalent */ register o, ty, sul, sur, r; o = p->in.op; ty = optype( o ); --- 75,82 ----- number, or local equivalent */ register o, ty, sul, sur, r; + register NODE *temp; + int szr; o = p->in.op; ty = optype( o ); *************** *** 117,122 sul = p->in.left->in.su; sur = p->in.right->in.su; if( o == ASSIGN ){ /* computed by doing right, then left (if not in mem), then doing it */ --- 119,130 ----- sul = p->in.left->in.su; sur = p->in.right->in.su; + szr = szty( p->in.right->in.type ); + if( szty( p->in.type ) > szr && szr >= 1 ) { + /* implicit conversion in rhs */ + szr = szty( p->in.type ); + sur = max( szr, sur ); + } if( o == ASSIGN ){ /* computed by doing right, then left (if not in mem), then doing it */ *************** *** 154,159 p->in.su = max( max(sul,sur), 1); return; case PLUS: case OR: case ER: --- 162,168 ----- p->in.su = max( max(sul,sur), 1); return; + case MUL: case PLUS: case OR: case ER: *************** *** 159,165 case ER: /* commutative ops; put harder on left */ if( p->in.right->in.su > p->in.left->in.su && !istnode(p->in.left) ){ - register NODE *temp; temp = p->in.left; p->in.left = p->in.right; p->in.right = temp; --- 168,173 ----- case ER: /* commutative ops; put harder on left */ if( p->in.right->in.su > p->in.left->in.su && !istnode(p->in.left) ){ temp = p->in.left; p->in.left = p->in.right; p->in.right = temp; *************** *** 163,168 temp = p->in.left; p->in.left = p->in.right; p->in.right = temp; } break; } --- 171,184 ----- temp = p->in.left; p->in.left = p->in.right; p->in.right = temp; + sul = p->in.left->in.su; + sur = p->in.right->in.su; + szr = szty( p->in.right->in.type ); + if( szty( p->in.type ) > szr && szr >= 1 ) { + /* implicit conversion in rhs */ + szr = szty( p->in.type ); + sur = max( szr, sur ); + } } break; } *************** *** 168,174 } /* binary op, computed by left, then right, then do op */ ! p->in.su = max(sul,szty(p->in.right->in.type)+sur); /* if( o==MUL||o==DIV||o==MOD) p->in.su = max(p->in.su,fregs); */ --- 184,190 ----- } /* binary op, computed by left, then right, then do op */ ! p->in.su = max(sul,szr+sur); /* if( o==MUL||o==DIV||o==MOD) p->in.su = max(p->in.su,fregs); */ ---------------------------------------------------------------- After installing these fixes, the code generator succeeds in processing the output of the parser pass, producing this: ---------------------------------------------------------------- movl {i},r0 cvtfd {b},r2 divd3 r2,&{x}-8[r0],r2 cvtfd {d},r0 divd2 {y},r0 cvtfd {c},r4 muld2 r0,r4 addd2 r4,r2 addf3 {0x4080},{a},r0 cvtfd r0,r0 muld3 r2,r0,{z} ---------------------------------------------------------------- Notice that the new code does not require stack temporaries to store subexpression results... This is because the algorithm is now smart enough to postpone evaluating the subexpression (a + 1.0), which is a result of adding '*' to the list of commutative operators. Without the latter change, correct code is still produced but a stack temporary is required (I won't present any more gory details). This is a pretty sensitive change, I hope it doesn't have any nasty side effects... Donn Seeley University of Utah CS Dept donn@utah-cs.arpa 40 46' 6"N 111 50' 34"W (801) 581-5668 decvax!utah-cs!donn