[comp.unix.i386] Is this

imdave@cbnewsc.att.com (david.e.bodenstab) (07/24/90)

Here is a little program derived from a problem we tracked down in a
driver:

	main()
	{
	    unsigned y = 0x7ffff0ff;
	    unsigned x = 0x7ffff000;

	    while (1) {
		if ( ((int)(y - x)) < 0 )
		    printf("WRONG x=%x y=%x (y - x)=%x\n",x,y,y-x);
		x++;
		y++;
	    }
	}

This program `fails' on both a 386 and another cpu in-house -- the
range of values for the `failure' are:

	0x80000000 <= y <= 0x800000fe
	0x7fffff01 <= x <= 0x7fffffff

The assembly instructions generated for both cpu's were:

	sub	x from y
	bge	skip		# branch greater than or equal
	push	args for printf...
	call	printf
skip:

At first glance this seems inexplicable; (y-x) will always be
0xff, and 0xff is *never* < 0.  So the next thing looked at was the
condition code bits (this was not done on the 386).  The condition
code bits after 0x80000000 - 0x7fffff01 were VNCZ=1100 (V=overflow,
N=negative, C=carry, Z=zero)

The BGE instruction takes the branch if (Z==1 || N==0).  The program
does NOT take the branch when the values of x & y have the above values.

I don't understand this.  Admittedly, I'm digging the following up
from my 20 year old memory of two's compliment arithmetic.

As I recall, the subtraction y-x in two's compliment arithmetic consists of:

	take the two's compliment of x
	add to y
	if there is a carry out of the sign bit, set C
	if the carry into the sign bit does not match the carry out
		of the sign bit, set V
	if the sign bit == 1, set N
	if all bits are zero, set Z

So,	x = 0x7fffff01 = 0111 1111 1111 1111 1111 1111 0000 0001
			 1000 0000 0000 0000 0000 0000 1111 1111  (2's of x)
	y = 0x80000000 = 1000 0000 0000 0000 0000 0000 0000 0000
		       1 0000 0000 0000 0000 0000 0000 1111 1111  (sum)

		       | |
		carry--+ |
		sign bit-+

	there was a carry from the high order bit, so set C=1
	there was no carry into the sign bit and there was a carry from
		the high order bit, so V=1
	the sign bit is zero, so N=0
	the result is not zero, so Z=0

The BGE instruction takes the branch if (Z==1 || N==0), so the branch
should be taken.  *BUT*, when I modify the program to save and print
the condition code bits, I find C=0, V=1, N=1, Z=0 immediately after
the subtraction!

Here's my analysis of the subtraction for the preceeding iteration:

	x = 0x7fffff00 = 0111 1111 1111 1111 1111 1111 0000 0000
			 1000 0000 0000 0000 0000 0001 0000 0000  (2's of x)
	y = 0x7fffffff = 0111 1111 1111 1111 1111 1111 1111 1111
		       1 0000 0000 0000 0000 0000 0000 1111 1111  (sum)

		       | |
		carry--+ |
		sign bit-+

	there was a carry from the high order bit, so set C=1
	there was a carry into the sign bit and there was a carry from
		the high order bit, so V=0
	the sign bit is zero, so N=0
	the result is not zero, so Z=0

In this case the branch is taken because (Z==1 || N==0).


What really concerns me is that I (and probably anyone else) would, if
given the task of writing an assembler routine to subtract two unsigned
values and compare for a result >= 0, write the *exact* same sequence of
instructions!  This idiosyncracy of SUB/BGE would probably not be uncovered by
testing unless the values happened to be such that the program fails --
and then tracking the problem down would probably be very time-consuming.
We "solved" our problem with `if ( (int)((y - x) & 0xffffffff) < 0 )' --
this reset the condition codes correctly, and the program behaves as
expected.

*I'm* not going to take out an ad and claim the 386 is broken! :-)

But what's going on here?  The above program *works* on a SUN, a 80286
[msdos and unix, ints and longs], and also a MIPS cpu.  Is it the compiler?
Is it the chip?  Does the program work on *your* cpu?

Thoughtful suggestions will be appreciated -- flames will be ignored.
Thanks!

Dave Bodenstab
...att!iwtmx!imdave

andrew@ramona.Cary.NC.US (Andrew Ernest) (07/30/90)

In article <1990Jul23.223341.26431@cbnewsc.att.com> imdave@cbnewsc.att.com (david.e.bodenstab) writes:
>code bits after 0x80000000 - 0x7fffff01 were VNCZ=1100 (V=overflow,
>N=negative, C=carry, Z=zero)

>The BGE instruction takes the branch if (Z==1 || N==0).  The program
>does NOT take the branch when the values of x & y have the above values.

First of all, the 386 ref manual says bge takes the jump when
SF xor OF == 0 (see page D-2).  Secondly, I traced the program by
hand and found that SF is always clear (since the result is always
0xff, which doesn't have the 32-bit sign flag set).  What happens
when y == 0x80000000 and x == 0x7fffff01 is that the OF flag is set.
Now SF xor OF == 1 so the branch is not taken and the error message
gets printed out.

All this *does* make sense (as far as the chip is concerned, that is...
I don't know whether or not the C compiler is generating the correct
code for "((int) (y - x)) < 0").  0x7fffffff - 0x7fffff00 is the
difference of two positive numbers.  Add one to each and you have
0x80000000 - 0x7fffff01 which is the subtraction of a very large positive
number from the most negative 32-bit number.  Clearly, this is an overflow
condition (the result is too negative to represent in 32 bits).  And it
makes sense that bge doesn't take the jump because 0x80000000 is very
negative and is most certainly not greater than or equal to the very
positive 0x7fffff01.  Note that bge is a signed comparison.

The compiler may or may not be broken (depending upon how
"((int) (y - x)) < 0" should be interpreted) but the 386 definitely isn't.
-- 
Andrew Ernest <andrew@ramona.Cary.NC.US>

scjones@thor.UUCP (Larry Jones) (07/30/90)

In article <1990Jul23.223341.26431@cbnewsc.att.com>, imdave@cbnewsc.att.com (david.e.bodenstab) writes:
< 	main()
< 	{
< 	    unsigned y = 0x7ffff0ff;
< 	    unsigned x = 0x7ffff000;
< 
< 	    while (1) {
< 		if ( ((int)(y - x)) < 0 )
< 		    printf("WRONG x=%x y=%x (y - x)=%x\n",x,y,y-x);
< 		x++;
< 		y++;
< 	    }
< 	}
< 
< This program `fails' on both a 386 and another cpu in-house -- the
< range of values for the `failure' are:
< 	0x80000000 <= y <= 0x800000fe
< 	0x7fffff01 <= x <= 0x7fffffff
< The assembly instructions generated for both cpu's were:
< 	sub	x from y
< 	bge	skip		# branch greater than or equal
< 	push	args for printf...
< 	call	printf
< skip:

That's a code generation error, although not an obvious one.
Those instructions corresponds to  "if ((int)y - (int)x < 0)"
which is not, of course, what you wanted.  bge fails because
you have taken a large negative number and subtracted a large
positive number which results in a very large negative number
which overflows.  The correct instruction would be bns (branch
not sign) since you really just want to test the sign of the
final result.  I suspect that using a temporary variable to
hold the intermediate result rather than counting on the cast
the compiler will do the right thing.
----
Larry Jones                         UUCP: uunet!sdrc!thor!scjones
SDRC                                      scjones@thor.UUCP
2000 Eastman Dr.                    BIX:  ltl
Milford, OH  45150-2789             AT&T: (513) 576-2070
"This probably just goes to show something, but I sure don't know what."
-Calvin

darryl@ism780c.isc.com (Darryl Richman) (07/30/90)

In article <1990Jul23.223341.26431@cbnewsc.att.com> imdave@cbnewsc.att.com (david.e.bodenstab) writes:
"	main()
"	{
"	    unsigned y = 0x7ffff0ff;
"	    unsigned x = 0x7ffff000;
"
"	    while (1) {
"		if ( ((int)(y - x)) < 0 )
"		    printf("WRONG x=%x y=%x (y - x)=%x\n",x,y,y-x);
"		x++;
"		y++;
"	    }
"	}
"
"This program `fails' on both a 386 and another cpu in-house -- the
"range of values for the `failure' are:
"
"	0x80000000 <= y <= 0x800000fe
"	0x7fffff01 <= x <= 0x7fffffff

You have uncovered a very subtle bug in your compilers.  As the
assembly language shows, you do a subtraction and then a conditional
branch based on the result as if the subtraction were signed.  Your C
code asks the compiler to do an unsigned subtraction, and then test the
result in a signed fashion.  This would require the compiler to
generate code approximately like:

	movl	y, %eax
	subl	x, %eax
	cmpl	$0, %eax
	bge	skip
	  ...
skip:

In your cases, the compiler (or perhaps the optimizer) has not done the
separate compare (on line 3), knowing that the subtract produced
condition codes, but it failed to note that they weren't correct for
the cast you requested.  So, when y becomes "negative", and until x goes
"negative", you will get incorrect results.

I'd be very interested to know the rationale for this code.  If y > x,
the result of the subtraction will have a potential range of
[1..max_unsigned], and if y < x, it will also have a range of
[1..max_unsigned], but with a borrow.  But you are asking to cast
only the result, not including the borrow, so the ranges are
indistinguishable.  What do you *really* want to do?

		--Darryl Richman

-- 
Copyright (c) 1990 Darryl Richman    The views expressed are the author's alone
darryl@ism780c.isc.com 		      INTERACTIVE Systems Corp.-A Kodak Company
 "For every problem, there is a solution that is simple, elegant, and wrong."
	-- H. L. Mencken

small@quiche.cs.mcgill.ca (Alain PETIT) (07/30/90)

	Hi!  I try this bug.c with Turbo C++ (New Version, C and C++
	in the same compiler!).  So I got the error, but when I
	run it under TD (Last version that come with the upgrade!)
	I got no error at all...  I already experience something
	like that.  Nota: The code generated assume the unsigned
	as a WORD (no problem with this assumtion), so I (long) this
	unsigned and I get this bug...  Nota2:  The code generated
	is little different from the one posted, I got a JNE.  You
	know Borland!  I keep you posted if I have other bits of
	infos!

	Bye!
-- 
------------------------------------------------------------------
small@quiche.cs.mcgill.ca   | 
McGill University           | Life is the primary cause for Death.
Montreal, Quebec            |

md@sco.COM (Michael Davidson) (07/31/90)

imdave@cbnewsc.att.com (david.e.bodenstab) writes:

>                                                     The condition
>code bits after 0x80000000 - 0x7fffff01 were VNCZ=1100 (V=overflow,
>N=negative, C=carry, Z=zero)

>The BGE instruction takes the branch if (Z==1 || N==0).  The program
>does NOT take the branch when the values of x & y have the above values.

>I don't understand this.  Admittedly, I'm digging the following up
>from my 20 year old memory of two's compliment arithmetic.

Most 2's complement CPUs do not distinguish between signed and unsigned
values when they perform addition and subtraction - what matters is the
way in which you interpret the condition codes after the operation.

The definition that you give for the BGE (branch if greater than or
equal) is the normal one for a SIGNED conditional branch - ie BGE is
interpreting the condition codes in a manner appropriate for
signed operands. What you need here is an unsigned conditional branch
which on the 386 would be BHS (branch if higher than or same) which
takes the branch if (C==0).

Summary: sounds like your compiler is broken (although I will defer
to more experienced C gurus if they think that your usage of
unsigned variables and (int) type casts was the cause of the
problem - looked OK to me ...)

jon@hitachi.uucp (Jon Ryshpan) (07/31/90)

In article <1990Jul23.223341.26431@cbnewsc.att.com> imdave@cbnewsc.att.com (david.e.bodenstab) writes:

>Here is a little program derived from a problem we tracked down in a
>driver:
>
>	main()
>	{
>	    unsigned y = 0x7ffff0ff;
>	    unsigned x = 0x7ffff000;
>
>	    while (1) {
>		if ( ((int)(y - x)) < 0 )
>		    printf("WRONG x=%x y=%x (y - x)=%x\n",x,y,y-x);
>		x++;
>		y++;
>	    }
>	}
>
>This program `fails' on both a 386 and another cpu in-house -- the
>range of values for the `failure' are:
>
>	0x80000000 <= y <= 0x800000fe
>	0x7fffff01 <= x <= 0x7fffffff
>

It's a compiler bug.  On the 68020 (Sun 3) I get the following code from
the standard Sun compiler:

	L14:
		movl	a6@(-0x4),d0
		subl	a6@(-0x8),d0
		tstl	d0
		jge	L16
		...
		jbsr	_printf
		lea	sp@(0x10),sp
	L16:

This is correct.  Note the key instruction "tstl d0", which is the
result of the cast from unsigned to int.

Under Interactive 386/ix, using the Interactive port of the ATT C
compiler (or whatever is the standard C compiler for 386/ix ver 2.02) I
get this code:

	.L16:
		movl    -4(%ebp),%eax
		subl    -8(%ebp),%eax
		jge     .L17
		...
		call    printf
		addl    $16,%esp
	.L17:

This code is wrong since there is no cast from unsigned to int.  It's
not possible to fix this by using an unsigned rather than a signed
branch, since this would just substitute one range of values which
produce an incorrect branch for another.

Jonathan Ryshpan		<...!uunet!hitachi!jon>

M/S 420				(415) 244-7369  	
Hitachi America Ltd.
2000 Sierra Pt. Pkwy.
Brisbane CA 94005-1819

scjones@thor.UUCP (Larry Jones) (07/31/90)

In article <463@hitachi.uucp>, jon@hitachi.uucp (Jon Ryshpan) writes:
> This code is wrong since there is no cast from unsigned to int.  It's
> not possible to fix this by using an unsigned rather than a signed
> branch, since this would just substitute one range of values which
> produce an incorrect branch for another.

The 386 does, however, have branch instrctions which just test the
sign bit (js and jns in Intel mnemonics) which do exactly the right
thing in this case.  Now, if the compiler could just be persuaded
to generate the darn things...
----
Larry Jones                         UUCP: uunet!sdrc!thor!scjones
SDRC                                      scjones@thor.UUCP
2000 Eastman Dr.                    BIX:  ltl
Milford, OH  45150-2789             AT&T: (513) 576-2070
"This probably just goes to show something, but I sure don't know what."
-Calvin

pwilcox@paldn.UUCP (Peter McLeod Wilcox) (08/02/90)

The cc shipped with Microport SysV 386/3.0 has an amusing response to the
problem.  It looked at the test ( "if ( (y-x) < 0 ) printf..." ) and decided
that the operation should be perform in unsigned math which would never
produce a negative value and thus unconditionally jumped around the true
clause.  When the optimizer was turned on the code generated for:

main() {
	unsigned long y = 0x7FFFF0FF;
	unsigned long x = 0x7FFFF000;

	while ( 1 ) {
		if ( (y-x) < 0 ) ....
		y++;
		x++;
	}
}

Reduced to:

main:
	pushl	%ebp
	movl	%esp,%ebp
	subl	$0x8,%esp
	movl	$0x7ffff00f,0xfc(%ebp)
	movl	$0x7ffff000,0xf8(%ebp)
loop:
	incl	0xfc(%ebp)
	incl	0xf8(%ebp)
	jmp	loop

This code, of course, generates the proper result (no result), but probably for
the wrong reasons.  The problem seems to be one that would break many compilers
since the compiler is unable to promote the operands to signed integers.  The
moral may be that arithmatic operations that can generate a negative value
may not be used on unsigned longs.  It should be noted that such an operation
can generate a value impossible to represent, i.e. a negative number larger in
magnitude than 0x80000000 ( 1-0xffffff00 will break anything ).
-- 
Pete Wilcox		...gatech!nanovx!techwood!paldn!pwilcox