donn@sdchema.UUCP (11/23/83)
Subject: Bug in f77 loop optimizer generates incorrect code (serious!)
Index: /usr/src/usr.bin/f77/src/f77pass1/exec.c 4.2BSD
Description:
This problem occurs in the f77 compiler supplied on a tape
made on 8/23/83.
In f77 DO loops, a variable loop limit is squirreled away in a
local variable so that it cannot be altered during the course
of the loop. (This is because the standard says that DO loop
initializations, limits and increments are only evaluated once,
when the loop is first entered.) Unfortunately the loop limit
is saved in a temporary variable which may be reallocated when
subroutine arguments are evaluated in the loop. (Since f77
requires that arguments be passed to subroutines by reference
rather than by value, these temporaries are used to give an
address to the output of an expression.) This leads to loops
which are executed an unpredictable number of times, clearly a
major error. The problem is mitigated slightly by the fact
that unless the loop is complicated, the loop limit quantity
will migrate into register from its place on the stack, and
after this it is safe from being clobbered.
Repeat-By:
Copy the following f77 program into the file bug1.f:
--------------------------------------------------------------
program bug1
integer i, j, k, l, m, n, o
j = 2
k = 3
l = 4
m = 5
n = 6
o = 7
do 20 i=1,k
j = j + l
l = i + j
j = l * j
l = j - l
m = l * i
n = l * m
o = m - l
m = o + 3
n = o / m
o = j + n
call dummy( i+1, j+2, l+3 )
write(unit=6,fmt=10) i
10 format('Loop pass ', i3)
20 continue
stop
end
subroutine dummy( a, b, c )
integer a, b, c
return
end
--------------------------------------------------------------
Notice that the expected output is:
Loop pass 1
Loop pass 2
Loop pass 3
In fact this program goes into an infinite loop, counting up to
infinity. To see why, compile again using the command:
f77 -d14 -S -O bug1.f
F77pass1 will print the debugging comment that offset -4 is being
reused. If you look at the assembler output file bug1.s, you
should see that the loop limit is stored at -4(fp), and that the
call to dummy() clobbers -4(fp) with the value of i+1... hence
the infinite loop.
Fix:
The best fix I can find is to cause the loop limit to be put in a
TADDR-type temporary rather than a TTEMP temporary. TTEMP stack
temporaries can be re-used quickly, while TADDR temporaries are
not recycled until the end of a routine. The loop increment is
also a TTEMP temporary, so for safety I have also made that a
TADDR temporary. The changes to src/f77pass1/exec.c are:
***************
*** 444,450
if( CONSTLIMIT )
ctlstack->domax = DOLIMIT;
else
! ctlstack->domax = (expptr) mktemp(dotype, PNULL);
if( CONSTINCR )
{
--- 461,467 -----
if( CONSTLIMIT )
ctlstack->domax = DOLIMIT;
else
! ctlstack->domax = (expptr) mkaltemp(dotype, PNULL);
if( CONSTINCR )
{
***************
*** 455,461
}
else
{
! ctlstack->dostep = (expptr) mktemp(dotype, PNULL);
ctlstack->dostepsign = VARSTEP;
}
--- 472,478 -----
}
else
{
! ctlstack->dostep = (expptr) mkaltemp(dotype, PNULL);
ctlstack->dostepsign = VARSTEP;
}
Donn Seeley UCSD Chemistry Dept. RRCF ucbvax!sdcsvax!sdchema!donn
32 52' 30"N 117 14' 25"W (619) 452-4016 sdcsvax!sdchema!donn@noscvaxdonn@sdchema.UUCP (11/23/83)
Subject: Computed GOTOs can cause f77 to dump core
Index: /usr/src/usr.bin/f77/src/f77pass1/regalloc.c 4.2BSD
Description:
This problem occurs in the f77 compiler supplied on a tape
made on 8/23/83.
While compiling a program with 'f77 -O ...' that contains
a computed GOTO, f77 stops with a 'Termination code 11' or
'Termination code 132' and a core dump of f77pass1 is left
behind. This only occurs with a certain very large program
which I have that is generated with a ratfor-like pre-
processor and contains zillions of gotos and DO loops.
Repeat-By:
The program that causes this is much too large to send by
mail or news, but it is available on request. I have tried
replicating the problem with smaller programs, to no avail.
The program contains the line:
GOTO(16,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15),COMBO
Fix:
Fortunately the error is fairly obvious. Running dbx on
f77pass1 shows that it dies with the PC in the routine alreg()
of regalloc.c. At that point it is in code that handles
computed GOTOs; it is running through a list of all the labels
used by the computed GOTO statement. If it finds that a label
is outside of the current DO loop, it flags it and breaks the
loop. If it runs through the list and finds no such branches,
it should just fall out. Unfortunately the test for the last
label at the top of the for loop has an off-by-one error and
skips off the end into never-never land. This test IS handled
correctly in very similar code earlier in the same routine, so
it is clear what the correct code should be (in
src/f77pass1/regalloc.c):
***************
*** 835,841
case SKCMGOTO:
lp = (struct Labelblock **) sp->ctlinfo;
! for (i = 0; i <= sp->label; i++, lp++)
if (!locallabel((*lp)->labelno))
{
gensetall(sp);
--- 861,867 -----
case SKCMGOTO:
lp = (struct Labelblock **) sp->ctlinfo;
! for (i = 0; i < sp->label; i++, lp++)
if (!locallabel((*lp)->labelno))
{
gensetall(sp);
Donn Seeley UCSD Chemistry Dept. RRCF ucbvax!sdcsvax!sdchema!donn
32 52' 30"N 117 14' 25"W (619) 452-4016 sdcsvax!sdchema!donn@noscvaxdonn@sdchema.UUCP (11/28/83)
Subject: f77 won't put REAL variables in register
Index: usr.bin/f77/src/f77pass1/regalloc.c 4.2BSD
Description:
This problem occurs in the f77 compiler supplied on a tape
made on 8/23/83.
The new f77 compiler will put INTEGERs in register but not
REALs even though the VAX allows REAL (4-byte floating point)
values to appear in general registers. This adds unnecessary
overhead to programs which do lots of computations with REAL
values (that is to say, virtually all typical f77 programs).
Repeat-By:
Clip out the following f77 program and put it in a file named
bug4.f:
--------------------------------------------------------------------
program bug4
integer i
real a, b, c
a = 2.0
b = 1.0
c = 0.999
do 100 i = 1, 1000000
a = (a + b) * c
100 continue
stop
end
--------------------------------------------------------------------
Compile this program with the command 'f77 -S -O -c bug4.f'.
The assembler output shows that REAL variables are not put in
register while integer values are. The following is a
pretty-printed version of the assembler file, where variables
of the form 'v.4-v.1(r11)' are written '{variable}' and
addresses of constants of the form 'L25' are written
'{constant}' (you can get the pretty-printer by sending mail to
me asking for it):
--------------------------------------------------------------------
.globl _MAIN_
.set LF1,0
_MAIN_:
.word LWM1
subl2 $LF1,sp
jmp L12
L13:
movl {0x4100},{a}
movl {0x4080},{b}
movl {0xbe77407f},{c}
movl {i},r10
movl $1,r10
L17:
addf3 {b},{a},r0
mulf3 {c},r0,{a}
aobleq $1000000,r10,L17
movl r10,{i}
pushl $0
pushal {00,00}
calls $2,_s_stop
ret
.align 1
_bug4_:
.word LWM1
L12:
moval v.1,r11
jmp L13
--------------------------------------------------------------------
Notice that the INTEGER variable 'i' is put in register 10 but
the only time that a register is used for a REAL is when it is
necessary to hold the intermediate result of an expression
computation; ordinary REAL variables are not assigned registers
when DO loops are optimized.
Fix:
A simple change can be made to the compiler to cause it to
assign REAL variables to registers -- in fact the change is so
simple it is suspicious; why didn't anyone do this before? But
I have been unable to find any evidence that this change is
harmful, and all of the programs I have tested the new version
of the compiler on have worked correctly. The change is in
f77/src/f77pass1/regalloc.c:
--------------------------------------------------------------------
***************
*** 31,36
#define VARTABSIZE 1009
#define TABLELIMIT 12
#define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG)
#define ISREGTYPE(x) ONEOF(x, MSKREGTYPES)
--- 34,42 -----
#define VARTABSIZE 1009
#define TABLELIMIT 12
+ #if HERE==VAX
+ #define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG) | M(TYREAL)
+ #else
#define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG)
#endif
***************
*** 32,37
#define TABLELIMIT 12
#define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG)
#define ISREGTYPE(x) ONEOF(x, MSKREGTYPES)
--- 38,44 -----
#define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG) | M(TYREAL)
#else
#define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG)
+ #endif
#define ISREGTYPE(x) ONEOF(x, MSKREGTYPES)
--------------------------------------------------------------------
Notice that the change does not affect DOUBLE PRECISION
variables (it would probably take a lot more work to get them
in register).
After changing the compiler in this way, the code that is
generated for 'bug4.f' changes to this:
--------------------------------------------------------------------
.globl _MAIN_
.set LF1,0
_MAIN_:
.word LWM1
subl2 $LF1,sp
jmp L12
L13:
movl {0x4100},{a}
movl {0x4080},{b}
movl {0xbe77407f},{c}
movl {a},r10
movl {b},r9
movl {c},r8
movl {i},r7
movl $1,r7
L17:
addf3 r9,r10,r0
mulf3 r8,r0,r10
aobleq $1000000,r7,L17
movl r7,{i}
movl r10,{a}
pushl $0
pushal {00,00}
calls $2,_s_stop
ret
.align 1
_bug4_:
.word LWM1
L12:
moval v.1,r11
jmp L13
--------------------------------------------------------------------
I timed the old and new versions of 'bug4' and observed
the following values ('time' is the 'user' time returned
by the C-shell 'time' command):
--------------------------------------------------------------------
Version Time (sec) Type of System
Old 32.2 VAX11/750, no FPA
New 29.0 VAX11/750, no FPA
Old 11.7 VAX11/750 with FPA
New 8.7 VAX11/750 with FPA
--------------------------------------------------------------------
On systems with no FPA, the operand fetch time is much smaller
than the actual computation time for floating point operations
-- notice that the improvement is independent of using an FPA,
being approx. 3 seconds in both cases. Still the improvement
is 10% even without an FPA (closer to 25% if you have one).
One oddity -- I notice that the compiler invariably translates
floating point assignment operations to 'movl' instructions
instead of 'movf' instructions. I think this is because 'movl'
is faster than 'movf' and the compiler guarantees that nothing
depends on side effects of assignments, but I haven't pinned
this down for sure yet.
Donn Seeley UCSD Chemistry Dept. RRCF ucbvax!sdcsvax!sdchema!donn
32 52' 30"N 117 14' 25"W (619) 452-4016 sdcsvax!sdchema!donn@noscvaxsanders@menlo70.UUCP (Rex Sanders) (12/05/83)
References: <960@sdchema.UUCP> I ran your example program (that modifies the values of constants passed to subroutines) on a Honeywell Multics system: ---------- d*fortran con New Fortran 10.1 d*con Error: no_write_permission condition by >user_dir_dir>Pascalx>RSanders>con|65 (line 36) referencing >user_dir_dir>Pascalx>RSanders>con|0 level 2,10d* ---------- On Multics, programs are split up into many areas, and constants are in the same read-only segment as the executable code. Unix f77 is a step back to IBM 360 Fortran G&H in this respect. The old IBM compiler had many interesting "features", which I will bore no-one with unless asked - preferably in net.lang.f77. -- Rex, the ancient Fortran hacker