[net.bugs.4bsd] 4.2 BSD f77 compiler bug

donn@sdchema.UUCP (11/23/83)

Subject: Bug in f77 loop optimizer generates incorrect code (serious!)
Index:	/usr/src/usr.bin/f77/src/f77pass1/exec.c 4.2BSD

Description:
	This problem occurs in the f77 compiler supplied on a tape
	made on 8/23/83.

	In f77 DO loops, a variable loop limit is squirreled away in a
	local variable so that it cannot be altered during the course
	of the loop.  (This is because the standard says that DO loop
	initializations, limits and increments are only evaluated once,
	when the loop is first entered.) Unfortunately the loop limit
	is saved in a temporary variable which may be reallocated when
	subroutine arguments are evaluated in the loop.  (Since f77
	requires that arguments be passed to subroutines by reference
	rather than by value, these temporaries are used to give an
	address to the output of an expression.) This leads to loops
	which are executed an unpredictable number of times, clearly a
	major error.  The problem is mitigated slightly by the fact
	that unless the loop is complicated, the loop limit quantity
	will migrate into register from its place on the stack, and
	after this it is safe from being clobbered.

Repeat-By:
	Copy the following f77 program into the file bug1.f:

	--------------------------------------------------------------
	program bug1
	integer i, j, k, l, m, n, o

	j = 2
	k = 3
	l = 4
	m = 5
	n = 6
	o = 7

	do 20 i=1,k

		j = j + l
		l = i + j
		j = l * j
		l = j - l
		m = l * i
		n = l * m
		o = m - l
		m = o + 3
		n = o / m
		o = j + n
		call dummy( i+1, j+2, l+3 )

		write(unit=6,fmt=10) i
10		format('Loop pass ', i3)

20	continue

	stop
	end

	subroutine dummy( a, b, c )
	integer a, b, c

	return
	end
	--------------------------------------------------------------

	Notice that the expected output is:

		Loop pass   1
		Loop pass   2
		Loop pass   3

	In fact this program goes into an infinite loop, counting up to
	infinity.  To see why, compile again using the command:

		f77 -d14 -S -O bug1.f

	F77pass1 will print the debugging comment that offset -4 is being
	reused.  If you look at the assembler output file bug1.s, you
	should see that the loop limit is stored at -4(fp), and that the
	call to dummy() clobbers -4(fp) with the value of i+1...  hence
	the infinite loop.

Fix:
	The best fix I can find is to cause the loop limit to be put in a
	TADDR-type temporary rather than a TTEMP temporary.  TTEMP stack
	temporaries can be re-used quickly, while TADDR temporaries are
	not recycled until the end of a routine.  The loop increment is
	also a TTEMP temporary, so for safety I have also made that a
	TADDR temporary.  The changes to src/f77pass1/exec.c are:

	***************
	*** 444,450
	    if( CONSTLIMIT )
	      ctlstack->domax = DOLIMIT;
	    else
	!     ctlstack->domax = (expptr) mktemp(dotype, PNULL);

	    if( CONSTINCR )
	      {

	--- 461,467 -----
	    if( CONSTLIMIT )
	      ctlstack->domax = DOLIMIT;
	    else
	!     ctlstack->domax = (expptr) mkaltemp(dotype, PNULL);

	    if( CONSTINCR )
	      {
	***************
	*** 455,461
	      }
	    else
	      {
	!       ctlstack->dostep = (expptr) mktemp(dotype, PNULL);
		ctlstack->dostepsign = VARSTEP;
	      }


	--- 472,478 -----
	      }
	    else
	      {
	!       ctlstack->dostep = (expptr) mkaltemp(dotype, PNULL);
		ctlstack->dostepsign = VARSTEP;
	      }

Donn Seeley    UCSD Chemistry Dept. RRCF    ucbvax!sdcsvax!sdchema!donn
32 52' 30"N 117 14' 25"W  (619) 452-4016    sdcsvax!sdchema!donn@noscvax

donn@sdchema.UUCP (11/23/83)

Subject: Computed GOTOs can cause f77 to dump core
Index:	/usr/src/usr.bin/f77/src/f77pass1/regalloc.c 4.2BSD

Description:
	This problem occurs in the f77 compiler supplied on a tape
	made on 8/23/83.

	While compiling a program with 'f77 -O ...' that contains
	a computed GOTO, f77 stops with a 'Termination code 11' or
	'Termination code 132' and a core dump of f77pass1 is left
	behind.  This only occurs with a certain very large program
	which I have that is generated with a ratfor-like pre-
	processor and contains zillions of gotos and DO loops.

Repeat-By:
	The program that causes this is much too large to send by
	mail or news, but it is available on request.  I have tried
	replicating the problem with smaller programs, to no avail.
	The program contains the line:

		GOTO(16,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15),COMBO

Fix:
	Fortunately the error is fairly obvious.  Running dbx on
	f77pass1 shows that it dies with the PC in the routine alreg()
	of regalloc.c.  At that point it is in code that handles
	computed GOTOs; it is running through a list of all the labels
	used by the computed GOTO statement.  If it finds that a label
	is outside of the current DO loop, it flags it and breaks the
	loop.  If it runs through the list and finds no such branches,
	it should just fall out.  Unfortunately the test for the last
	label at the top of the for loop has an off-by-one error and
	skips off the end into never-never land.  This test IS handled
	correctly in very similar code earlier in the same routine, so
	it is clear what the correct code should be (in
	src/f77pass1/regalloc.c):

	***************
	*** 835,841
	  
		    case SKCMGOTO:
		      lp = (struct Labelblock **) sp->ctlinfo;
	! 	      for (i = 0; i <= sp->label; i++, lp++)
			if (!locallabel((*lp)->labelno))
			  {
			    gensetall(sp);

	--- 861,867 -----
	  
		    case SKCMGOTO:
		      lp = (struct Labelblock **) sp->ctlinfo;
	! 	      for (i = 0; i < sp->label; i++, lp++)
			if (!locallabel((*lp)->labelno))
			  {
			    gensetall(sp);

Donn Seeley    UCSD Chemistry Dept. RRCF    ucbvax!sdcsvax!sdchema!donn
32 52' 30"N 117 14' 25"W  (619) 452-4016    sdcsvax!sdchema!donn@noscvax

donn@sdchema.UUCP (11/28/83)

Subject: f77 won't put REAL variables in register
Index:	usr.bin/f77/src/f77pass1/regalloc.c 4.2BSD

Description:
	This problem occurs in the f77 compiler supplied on a tape
	made on 8/23/83.

	The new f77 compiler will put INTEGERs in register but not
	REALs even though the VAX allows REAL (4-byte floating point)
	values to appear in general registers.  This adds unnecessary
	overhead to programs which do lots of computations with REAL
	values (that is to say, virtually all typical f77 programs).

Repeat-By:
	Clip out the following f77 program and put it in a file named
	bug4.f:

	--------------------------------------------------------------------
		program bug4

		integer i
		real a, b, c

		a = 2.0
		b = 1.0
		c = 0.999

		do 100 i = 1, 1000000

			a = (a + b) * c

	100	continue

		stop
		end
	--------------------------------------------------------------------

	Compile this program with the command 'f77 -S -O -c bug4.f'.
	The assembler output shows that REAL variables are not put in
	register while integer values are.  The following is a
	pretty-printed version of the assembler file, where variables
	of the form 'v.4-v.1(r11)' are written '{variable}' and
	addresses of constants of the form 'L25' are written
	'{constant}' (you can get the pretty-printer by sending mail to
	me asking for it):

	--------------------------------------------------------------------
		.globl	_MAIN_
		.set	LF1,0
	_MAIN_:
		.word	LWM1
		subl2	$LF1,sp
		jmp	L12
	L13:
		movl	{0x4100},{a}
		movl	{0x4080},{b}
		movl	{0xbe77407f},{c}
		movl	{i},r10
		movl	$1,r10
	L17:
		addf3	{b},{a},r0
		mulf3	{c},r0,{a}
		aobleq	$1000000,r10,L17
		movl	r10,{i}
		pushl	$0
		pushal	{00,00}
		calls	$2,_s_stop
		ret
		.align	1
	_bug4_:
		.word	LWM1
	L12:
		moval	v.1,r11
		jmp	L13
	--------------------------------------------------------------------

	Notice that the INTEGER variable 'i' is put in register 10 but
	the only time that a register is used for a REAL is when it is
	necessary to hold the intermediate result of an expression
	computation; ordinary REAL variables are not assigned registers
	when DO loops are optimized.

Fix:
	A simple change can be made to the compiler to cause it to
	assign REAL variables to registers -- in fact the change is so
	simple it is suspicious; why didn't anyone do this before?  But
	I have been unable to find any evidence that this change is
	harmful, and all of the programs I have tested the new version
	of the compiler on have worked correctly.  The change is in
	f77/src/f77pass1/regalloc.c:

	--------------------------------------------------------------------
	***************
	*** 31,36
	  #define VARTABSIZE 1009
	  #define TABLELIMIT 12
	  
	  #define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG)
	  
	  #define ISREGTYPE(x) ONEOF(x, MSKREGTYPES)

	--- 34,42 -----
	  #define VARTABSIZE 1009
	  #define TABLELIMIT 12
	  
	+ #if HERE==VAX
	+ #define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG) | M(TYREAL)
	+ #else
	  #define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG)
	  #endif
	  
	***************
	*** 32,37
	  #define TABLELIMIT 12
	  
	  #define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG)
	  
	  #define ISREGTYPE(x) ONEOF(x, MSKREGTYPES)
	  

	--- 38,44 -----
	  #define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG) | M(TYREAL)
	  #else
	  #define MSKREGTYPES M(TYLOGICAL) | M(TYADDR) | M(TYSHORT) | M(TYLONG)
	+ #endif
	  
	  #define ISREGTYPE(x) ONEOF(x, MSKREGTYPES)
	  
	--------------------------------------------------------------------

	Notice that the change does not affect DOUBLE PRECISION
	variables (it would probably take a lot more work to get them
	in register).

	After changing the compiler in this way, the code that is
	generated for 'bug4.f' changes to this:

	--------------------------------------------------------------------
		.globl	_MAIN_
		.set	LF1,0
	_MAIN_:
		.word	LWM1
		subl2	$LF1,sp
		jmp	L12
	L13:
		movl	{0x4100},{a}
		movl	{0x4080},{b}
		movl	{0xbe77407f},{c}
		movl	{a},r10
		movl	{b},r9
		movl	{c},r8
		movl	{i},r7
		movl	$1,r7
	L17:
		addf3	r9,r10,r0
		mulf3	r8,r0,r10
		aobleq	$1000000,r7,L17
		movl	r7,{i}
		movl	r10,{a}
		pushl	$0
		pushal	{00,00}
		calls	$2,_s_stop
		ret
		.align	1
	_bug4_:
		.word	LWM1
	L12:
		moval	v.1,r11
		jmp	L13
	--------------------------------------------------------------------

	I timed the old and new versions of 'bug4' and observed
	the following values ('time' is the 'user' time returned
	by the C-shell 'time' command):

	--------------------------------------------------------------------
	Version		Time (sec)	 Type of System

	  Old		   32.2		VAX11/750, no FPA
	  New		   29.0		VAX11/750, no FPA

	  Old		   11.7		VAX11/750 with FPA
	  New		    8.7		VAX11/750 with FPA
	--------------------------------------------------------------------

	On systems with no FPA, the operand fetch time is much smaller
	than the actual computation time for floating point operations
	-- notice that the improvement is independent of using an FPA,
	being approx. 3 seconds in both cases.  Still the improvement
	is 10% even without an FPA (closer to 25% if you have one).

	One oddity -- I notice that the compiler invariably translates
	floating point assignment operations to 'movl' instructions
	instead of 'movf' instructions.  I think this is because 'movl'
	is faster than 'movf' and the compiler guarantees that nothing
	depends on side effects of assignments, but I haven't pinned
	this down for sure yet.

Donn Seeley    UCSD Chemistry Dept. RRCF    ucbvax!sdcsvax!sdchema!donn
32 52' 30"N 117 14' 25"W  (619) 452-4016    sdcsvax!sdchema!donn@noscvax

sanders@menlo70.UUCP (Rex Sanders) (12/05/83)

References: <960@sdchema.UUCP>

  I ran your example program (that modifies the values of 
constants passed to subroutines) on a Honeywell Multics system:

----------

d*fortran con
New Fortran 10.1  
d*con

Error:  no_write_permission condition by >user_dir_dir>Pascalx>RSanders>con|65 (line 36)
referencing >user_dir_dir>Pascalx>RSanders>con|0

 level 2,10d*

----------

  On Multics, programs are split up into many areas, and constants
are in the same read-only segment as the executable code.  Unix f77
is a step back to IBM 360 Fortran G&H in this respect.
The old IBM compiler had many interesting "features", which I will
bore no-one with unless asked - preferably in net.lang.f77.

-- Rex, the ancient Fortran hacker