[net.bugs.4bsd] C compiler bug

ado@elsie.UUCP (02/12/84)

Subject: 4.?bsd C compiler error (better fix)
Index:	.../pcc/local2.c in 4.?BSD

Description:
	The C compiler generates incorrect code in some cases.
	A fix posted earlier fixed the problem while degrading code in some
	cases; this fix avoids code degradation.
Repeat-By:
	Use the command
		cc -S test.c
	where test.c contains:
		test()
		{
			register struct {
				short	i;
				short	j;
			} * sp;
			while ((sp++)->i != 0);
		}
	and then look at the "test.s" file produced.  You'll see that the
	"while" loop generates this code:
		 L16:
			tstw	(r11)+
			jeql	L17
			jbr 	L16
	which only increments r11 by two (rather than four) each time through.
Fix:	This is the fix to the version distributed by Berkeley.

	ed - .../pcc/local2.c
	/ISPTR(p->in.left->in.type)/c
	#ifdef OLDVERSION
					if ( ISPTR(p->in.left->in.type) ) {
	#else
	/*
	** We want to look for a pointer to a pointer, rather than a pointer.
	*/
					if ( ISPTR(p->in.left->in.type) &&
					   ISPTR(DECREF(p->in.left->in.type))) {
	#endif OLDVERSION
	.
	w
	q
-- 
UUCP:	decvax!harpo!seismo!rlgvax!cvl!elsie!ado
DDD:	(301) 496-5688

chris@umcp-cs.UUCP (08/25/84)

It's obviously a compiler bug.  More interesting is the code that
comes out of the (4.2BSD) compiler:

	% /lib/ccom
	int one = (unsigned) 2 >> 1;
[compiler output]
		.data
		.align	2
		.globl	_one
	_one:
		movl	$1,r0
		movl	r0,-4(fp)
		movl	-4(fp),r0
		movl	r0,-8(fp)
		movl	-8(fp),r0
		.
		.
		.

It only happens with the typecasts; ``int one = 2 >> 1;'' works.
``int one = (char) 2 >> 1;'' generates a similar endless stream, but
it begins differently:

		.data
		.align	2
		.globl	_one
	_one:
		movl	$2,r0
		movl	r0,-4(fp)
		mnegl	$1,r0
		movl	r0,-8(fp)
		movl	-8(fp),r0
		movl	r0,-12(fp)
		.
		.
		.

Obviously what's happening is this: during expression tree
optimization, the compiler is trying to compute the value at RUNtime.
However, as there are no free registers, it keeps sticking values into
r0, finding out it can't get another free register, pushing r0 for
safekeeping, getting the next number into r0, finding out it doesn't
have a free register for the ``ashl'' operation, . . . .  Things work
if the operation is done inside {}s because there are some registers
free.

The bug is probably simply that the compiler is missing optimizations
for expressions involving types other than `int' (and `long') for the
">>" and "<<" operators when both LHS and RHS are integer constants.
(Both << and >> generate the endless stream of junk).

(I wonder why the register allocation code doesn't scream loudly when
asked for registers while code generation is off?)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

chris@umcp-cs.UUCP (Chris Torek) (08/25/84)

Index: lib/pcc 4.2BSD

Description:
	The C compiler aborts with a compiler error if function arguments
	are declared `static'.

	(I happened to notice that the code which does arguments doesn't
	check for `static's among its declared arguments; probably that
	is all that is required to `fix' it.)

Repeat-By:
	Compile this:

		foo (i)
		static i;
		{
		}

donn@utah-cs.UUCP (08/25/84)

To recap:  The following program blows up the 4.2 (== System III?)
C compiler:

------------------------------------------------------------------------
int one = (unsigned) 2 >> 1;
------------------------------------------------------------------------

The precise message you get is:

------------------------------------------------------------------------
"one.c", line 1: compiler error: expression causes compiler loop: try simplifying
------------------------------------------------------------------------

Chris Torek's observation is close to the mark (if not on it), but
perhaps it's not as clear as it could be.  The problem is that the
initializer is not being reduced to a constant before the code
generator receives it.  If the code generator gets an expression, it
goes ahead and generates the instructions needed to compute it;
apparently it doesn't mind if it ends up generating instructions into
data space, nor does it seem to care if any of the startup code for
processing text has been run.  The code generator can handle constant
initializers (with a fake opcode of INIT) and that's it.

More directly, what's happening is that when certain casts are
transmuted into conversions, the compiler neglects to canonicalize the
resulting trees.  (Some casts get a second chance -- try substituting
'+' for '<<' in the example and see what happens.) It's a fairly simple
matter to ensure that makety(), the routine which does the
transmutations, always canonicalizes the trees it returns.  Here are
the changes to /usr/src/lib/mip/trees.c:

------------------------------------------------------------------------
*** /tmp/,RCSt1008238	Sat Aug 25 06:30:16 1984
--- trees.c	Sat Aug 25 04:37:26 1984
***************
*** 1061,1067
  
  	if( t & TMASK ){
  		/* non-simple type */
! 		return( block( PCONV, p, NIL, t, d, s ) );
  		}
  
  	if( p->in.op == ICON ){

--- 1061,1067 -----
  
  	if( t & TMASK ){
  		/* non-simple type */
! 		return( clocal( block( PCONV, p, NIL, t, d, s ) ) );
  		}
  
  	if( p->in.op == ICON ){
***************
*** 1079,1085
  			}
  		}
  
! 	return( block( SCONV, p, NIL, t, d, s ) );
  
  	}
  

--- 1079,1085 -----
  			}
  		}
  
! 	return( clocal( block( SCONV, p, NIL, t, d, s ) ) );
  
  	}
  
------------------------------------------------------------------------

I'm not sure that the first change is strictly necessary, but better
safe than sorry.  At any rate the sample program now compiles
correctly.  I don't think these changes have any nasty side effects --
I recompiled the compiler with the changes installed and it produced
exactly the same binary as the old compiler did, for what it's worth.

If the System V C compiler fixes this, I don't want to hear about it,

Donn Seeley    University of Utah CS Dept    donn@utah-cs.arpa
40 46' 6"N 111 50' 34"W    (801) 581-5668    decvax!utah-cs!donn

chris@umcp-cs.UUCP (Chris Torek) (08/26/84)

As long as we have people fixing bugs, how about someone patching up the
pcc/ code that converts RS into LS and negates the shift?  It fixes up
constants but it doesn't remove double negations.  Example:

	foo () { register int i,j,k; k = i >> -j; }

produces (prettified)

	.align	1
	.globl	_foo
_foo:	.word	0xe000
	mnegl	r10,r0
	mnegl	r0,r0
	ashl	r0,r11,r0
	movl	r0,r9
	ret

What's the idea of the double mnegl?  I got disgusted enough with this
(and other similar but tougher-to-optimize) stuff in the output stage of
my Versatec TeX filter that I stuck in asm()s (under #ifdef vax though).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

root@ur-valhalla.UUCP (08/27/84)

Well, I've installed Donn Seeley's fix for the C compiler screwing up on
initializing shifted, unsigned variables, and found out a nasty side effect:
The (unsigned) cast is IGNORED!  This is my program:

	main()                          
	{       int big_number = ((~((unsigned) 0)) >> 1);
		printf("this is a big number: %d\n",big_number);
	}                               

Before installing the fix the program produced this output:

	this is a big number: 2147483647

(To remind the original problem: should big_number be declared as static or
as a global variable, C compiler would blow up and go into a loop).
After installing the fix, I got the following output:

	this is a big number: -1

Sigh.... 
-- 
Krzysztof Kozminski (truly sorry for not being able to fix the problem myself)
{seismo,allegra,decvax}!rochester!ur-valhalla!kris