donn@sdchema.UUCP (11/23/83)
Subject: Bug in f77 loop optimizer generates incorrect code (serious!) Index: /usr/src/usr.bin/f77/src/f77pass1/exec.c 4.2BSD Description: This problem occurs in the f77 compiler supplied on a tape made on 8/23/83. In f77 DO loops, a variable loop limit is squirreled away in a local variable so that it cannot be altered during the course of the loop. (This is because the standard says that DO loop initializations, limits and increments are only evaluated once, when the loop is first entered.) Unfortunately the loop limit is saved in a temporary variable which may be reallocated when subroutine arguments are evaluated in the loop. (Since f77 requires that arguments be passed to subroutines by reference rather than by value, these temporaries are used to give an address to the output of an expression.) This leads to loops which are executed an unpredictable number of times, clearly a major error. The problem is mitigated slightly by the fact that unless the loop is complicated, the loop limit quantity will migrate into register from its place on the stack, and after this it is safe from being clobbered. Repeat-By: Copy the following f77 program into the file bug1.f: -------------------------------------------------------------- program bug1 integer i, j, k, l, m, n, o j = 2 k = 3 l = 4 m = 5 n = 6 o = 7 do 20 i=1,k j = j + l l = i + j j = l * j l = j - l m = l * i n = l * m o = m - l m = o + 3 n = o / m o = j + n call dummy( i+1, j+2, l+3 ) write(unit=6,fmt=10) i 10 format('Loop pass ', i3) 20 continue stop end subroutine dummy( a, b, c ) integer a, b, c return end -------------------------------------------------------------- Notice that the expected output is: Loop pass 1 Loop pass 2 Loop pass 3 In fact this program goes into an infinite loop, counting up to infinity. To see why, compile again using the command: f77 -d14 -S -O bug1.f F77pass1 will print the debugging comment that offset -4 is being reused. If you look at the assembler output file bug1.s, you should see that the loop limit is stored at -4(fp), and that the call to dummy() clobbers -4(fp) with the value of i+1... hence the infinite loop. Fix: The best fix I can find is to cause the loop limit to be put in a TADDR-type temporary rather than a TTEMP temporary. TTEMP stack temporaries can be re-used quickly, while TADDR temporaries are not recycled until the end of a routine. The loop increment is also a TTEMP temporary, so for safety I have also made that a TADDR temporary. The changes to src/f77pass1/exec.c are: *************** *** 444,450 if( CONSTLIMIT ) ctlstack->domax = DOLIMIT; else ! ctlstack->domax = (expptr) mktemp(dotype, PNULL); if( CONSTINCR ) { --- 461,467 ----- if( CONSTLIMIT ) ctlstack->domax = DOLIMIT; else ! ctlstack->domax = (expptr) mkaltemp(dotype, PNULL); if( CONSTINCR ) { *************** *** 455,461 } else { ! ctlstack->dostep = (expptr) mktemp(dotype, PNULL); ctlstack->dostepsign = VARSTEP; } --- 472,478 ----- } else { ! ctlstack->dostep = (expptr) mkaltemp(dotype, PNULL); ctlstack->dostepsign = VARSTEP; } Donn Seeley UCSD Chemistry Dept. RRCF ucbvax!sdcsvax!sdchema!donn 32 52' 30"N 117 14' 25"W (619) 452-4016 sdcsvax!sdchema!donn@noscvax
donn@sdchema.UUCP (11/23/83)
Subject: Computed GOTOs can cause f77 to dump core Index: /usr/src/usr.bin/f77/src/f77pass1/regalloc.c 4.2BSD Description: This problem occurs in the f77 compiler supplied on a tape made on 8/23/83. While compiling a program with 'f77 -O ...' that contains a computed GOTO, f77 stops with a 'Termination code 11' or 'Termination code 132' and a core dump of f77pass1 is left behind. This only occurs with a certain very large program which I have that is generated with a ratfor-like pre- processor and contains zillions of gotos and DO loops. Repeat-By: The program that causes this is much too large to send by mail or news, but it is available on request. I have tried replicating the problem with smaller programs, to no avail. The program contains the line: GOTO(16,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15),COMBO Fix: Fortunately the error is fairly obvious. Running dbx on f77pass1 shows that it dies with the PC in the routine alreg() of regalloc.c. At that point it is in code that handles computed GOTOs; it is running through a list of all the labels used by the computed GOTO statement. If it finds that a label is outside of the current DO loop, it flags it and breaks the loop. If it runs through the list and finds no such branches, it should just fall out. Unfortunately the test for the last label at the top of the for loop has an off-by-one error and skips off the end into never-never land. This test IS handled correctly in very similar code earlier in the same routine, so it is clear what the correct code should be (in src/f77pass1/regalloc.c): *************** *** 835,841 case SKCMGOTO: lp = (struct Labelblock **) sp->ctlinfo; ! for (i = 0; i <= sp->label; i++, lp++) if (!locallabel((*lp)->labelno)) { gensetall(sp); --- 861,867 ----- case SKCMGOTO: lp = (struct Labelblock **) sp->ctlinfo; ! for (i = 0; i < sp->label; i++, lp++) if (!locallabel((*lp)->labelno)) { gensetall(sp); Donn Seeley UCSD Chemistry Dept. RRCF ucbvax!sdcsvax!sdchema!donn 32 52' 30"N 117 14' 25"W (619) 452-4016 sdcsvax!sdchema!donn@noscvax
donn@sdchema.UUCP (11/28/83)
Subject: f77 won't put REAL variables in register Index: usr.bin/f77/src/f77pass1/regalloc.c 4.2BSD Description: This problem occurs in the f77 compiler supplied on a tape made on 8/23/83. The new f77 compiler will put INTEGERs in register but not REALs even though the VAX allows REAL (4-byte floating point) values to appear in general registers. This adds unnecessary overhead to programs which do lots of computations with REAL values (that is to say, virtually all typical f77 programs). Repeat-By: Clip out the following f77 program and put it in a file named bug4.f: -------------------------------------------------------------------- program bug4 integer i real a, b, c a = 2.0 b = 1.0 c = 0.999 do 100 i = 1, 1000000 a = (a + b) * c 100 continue stop end -------------------------------------------------------------------- Compile this program with the command 'f77 -S -O -c bug4.f'. The assembler output shows that REAL variables are not put in register while integer values are. The following is a pretty-printed version of the assembler file, where variables of the form 'v.4-v.1(r11)' are written '{variable}' and addresses of constants of the form 'L25' are written '{constant}' (you can get the pretty-printer by sending mail to me asking for it): -------------------------------------------------------------------- .globl _MAIN_ .set LF1,0 _MAIN_: .word LWM1 subl2 $LF1,sp jmp L12 L13: movl {0x4100},{a} movl {0x4080},{b} movl {0xbe77407f},{c} movl {i},r10 movl $1,r10 L17: addf3 {b},{a},r0 mulf3 {c},r0,{a} aobleq $1000000,r10,L17 movl r10,{i} pushl $0 pushal {00,00} calls $2,_s_stop ret .align 1 _bug4_: .word LWM1 L12: moval v.1,r11 jmp L13 -------------------------------------------------------------------- Notice that the INTEGER variable 'i' is put in register 10 but the only time that a register is used for a REAL is when it is necessary to hold the intermediate result of an expression computation; ordinary REAL variables are not assigned registers when DO loops are optimized. Fix: A simple change can be made to the compiler to cause it to assign REAL variables to registers -- in fact the change is so simple it is suspicious; why didn't anyone do this before? But I have been unable to find any evidence that this change is harmful, and all of the programs I have tested the new version of the compiler on have worked correctly. The change is in f77/src/f77pass1/regalloc.c: -------------------------------------------------------------------- *************** *** 31,36 #define VARTABSIZE 1009 #define TABLELIMIT 12 #define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG) #define ISREGTYPE(x) ONEOF(x, MSKREGTYPES) --- 34,42 ----- #define VARTABSIZE 1009 #define TABLELIMIT 12 + #if HERE==VAX + #define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG) | M(TYREAL) + #else #define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG) #endif *************** *** 32,37 #define TABLELIMIT 12 #define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG) #define ISREGTYPE(x) ONEOF(x, MSKREGTYPES) --- 38,44 ----- #define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG) | M(TYREAL) #else #define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG) + #endif #define ISREGTYPE(x) ONEOF(x, MSKREGTYPES) -------------------------------------------------------------------- Notice that the change does not affect DOUBLE PRECISION variables (it would probably take a lot more work to get them in register). After changing the compiler in this way, the code that is generated for 'bug4.f' changes to this: -------------------------------------------------------------------- .globl _MAIN_ .set LF1,0 _MAIN_: .word LWM1 subl2 $LF1,sp jmp L12 L13: movl {0x4100},{a} movl {0x4080},{b} movl {0xbe77407f},{c} movl {a},r10 movl {b},r9 movl {c},r8 movl {i},r7 movl $1,r7 L17: addf3 r9,r10,r0 mulf3 r8,r0,r10 aobleq $1000000,r7,L17 movl r7,{i} movl r10,{a} pushl $0 pushal {00,00} calls $2,_s_stop ret .align 1 _bug4_: .word LWM1 L12: moval v.1,r11 jmp L13 -------------------------------------------------------------------- I timed the old and new versions of 'bug4' and observed the following values ('time' is the 'user' time returned by the C-shell 'time' command): -------------------------------------------------------------------- Version Time (sec) Type of System Old 32.2 VAX11/750, no FPA New 29.0 VAX11/750, no FPA Old 11.7 VAX11/750 with FPA New 8.7 VAX11/750 with FPA -------------------------------------------------------------------- On systems with no FPA, the operand fetch time is much smaller than the actual computation time for floating point operations -- notice that the improvement is independent of using an FPA, being approx. 3 seconds in both cases. Still the improvement is 10% even without an FPA (closer to 25% if you have one). One oddity -- I notice that the compiler invariably translates floating point assignment operations to 'movl' instructions instead of 'movf' instructions. I think this is because 'movl' is faster than 'movf' and the compiler guarantees that nothing depends on side effects of assignments, but I haven't pinned this down for sure yet. Donn Seeley UCSD Chemistry Dept. RRCF ucbvax!sdcsvax!sdchema!donn 32 52' 30"N 117 14' 25"W (619) 452-4016 sdcsvax!sdchema!donn@noscvax
sanders@menlo70.UUCP (Rex Sanders) (12/05/83)
References: <960@sdchema.UUCP> I ran your example program (that modifies the values of constants passed to subroutines) on a Honeywell Multics system: ---------- d*fortran con New Fortran 10.1 d*con Error: no_write_permission condition by >user_dir_dir>Pascalx>RSanders>con|65 (line 36) referencing >user_dir_dir>Pascalx>RSanders>con|0 level 2,10d* ---------- On Multics, programs are split up into many areas, and constants are in the same read-only segment as the executable code. Unix f77 is a step back to IBM 360 Fortran G&H in this respect. The old IBM compiler had many interesting "features", which I will bore no-one with unless asked - preferably in net.lang.f77. -- Rex, the ancient Fortran hacker