[comp.lang.c] assigning an integer to the negation of ...

chris@mimsy.UUCP (Chris Torek) (12/21/88)

In article <1911@pembina.UUCP> lake@alberta.UUCP (Robert Lake) writes:
>		i = -(unsigned short)j;
[where i is an int and j is 1---j's type is irrelevant]

To figure out what one `should' get, follow the rules:

	op		value		type
	--		-----		----
1a.	j		1		lvalue, int
1b.	expansion	1		rvalue, int
2a.	(u_short)	1		temporary, u_short	[note 1]
2b.	expansion	1		rvalue, int or u_int	[note 2]
3.	-		-1|-(u_int)1	rvalue, int or u_int
4a.	i=		-1		temporary, int
4b.	expansion	-1		rvalue, int

	Notes:
	1: u_T is a shorthand for `unsigned T'
	2: u_int under `sign-preserving' rules, either under
	   dpANS `value-preserving' rules, depending on whether
	   sizeof(int) > sizeof(short); on Suns it would be int

So the correct answer is -1, not 65535, under either set of rules.

>If I run this program on a VAX 11/780 using 4.3 BSD, I obtain -1 as the
>answer.  However, if I run this on a SUN using SUN OS 3.5, I obtain 65535
>as the answer.  Who is right?

4.3BSD got it right; SunOS got it wrong (in the name of optimisation :-) ).

So what is going on in the table above?

Every time C uses a value for some operation, the value should be an
`rvalue'.  It might be an lvalue or a `temporary'---the result of an
assignment, including casts, is a `temporary'; I made up the notion
just now, to provide a placeholder for the expressions that are neither
lvalues nor properly-expanded rvalues.  If it is not already a properly
expanded rvalue, it is expanded, either according to unsigned-preserving
rules (in the table below) or value-preserving rules (which cannot be
listed except for specific compiler systems, since they depend on the
number of bits in each type).

	original	expansion
    (lvalue or temp)
	--------	---------
	signed char	int
	u_char		u_int
	short		int
	u_short		u_int
	int		int		(already proper)
	u_int		u_int		(already proper)
	long		long		(already proper)
	u_long		u_long		(already proper)

Each expansion does the `obvious' thing: if the expansion is from
signed, any new high-order bits in the expanded signed version come
about by sign extension; if from unsigned, new high-order bits are
zeroes.  (This holds true in both expansion systems.)

So why does the Sun compiler produce 65535?

All the expansions above are expensive on some machines---including
680x0s, where it takes up to two instructions per expansion, and
possibly a temporary register.  If the compiler can prove to itself
that the expansion has no effect, it can suppress it.  For instance,
if the assignment were:

	u_short j;
	j = -(u_short)(1);

we would have the sequence (unsigned-preserving rules)

	1		(1, int, rvalue)
	(u_short)	(1, u_short, temp)
	expand		(1, u_int, rvalue)
	-		(0xffffffff, u_int, rvalue)
	j=		(0xffff, u_short, temp)

which puts 65535 in j.  The expansion had no effect on the answer:
wihtout it, we have

	1		(1, int, rvalue)
	(u_short)	(1, u_short, temp => fake rvalue)
	-		(0xffff, u_short, fake rvalue)
	j=		(0xffff, u_short, temp)

The SunOS 3.5 compiler incorrectly deduces that the expansion had no
effect (it forgets to look at the LHS of the assignment), so it drops
it from the expression tree and gets the wrong answer.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

desnoyer@Apple.COM (Peter Desnoyers) (12/22/88)

In article <15090@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <1911@pembina.UUCP> lake@alberta.UUCP (Robert Lake) writes:
>>		i = -(unsigned short)j;
>[where i is an int and j is 1---j's type is irrelevant]
>

It was pointed out that the context in which this problem arose was
TCP software - I would bet it was the window size calculations.
Quoting V. Jacobsen and R. Braden from RFC 1072 (TCP Extensions for
long-delay paths):

  The TCP header uses a 16 bit field to report the receive window size
  to the sender. Therefore, the largest window that can be used is 2**16
  = 65K bytes. (In practice, some TCP implementations will "break" for
  windows exceeding 2**15, because of their failure to do unsigned
  arithmetic.) 

I would also guess that the broken TCP implementations actually try to
do unsigned arithmetic, but don't get it right, as in the original,
subtly flawed example.

				Peter Desnoyers

stevea@laidbak.UUCP (Steve Alexander) (12/23/88)

In article <22651@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes:
[ About i = -(unsigned short)j producing different results on BSD and SunOS ]
>
>It was pointed out that the context in which this problem arose was
>TCP software - I would bet it was the window size calculations.
>				Peter Desnoyers

Err, umm, (sorry, Guy) actually, this code is in IP, not TCP.  It occurs 
in netinet/ip_input.c at the point where IP is deciding how big the 
datagram is.  It negates the data length in the IP header 
(i = -(unsigned short)ip->ip_len) and then adds the lengths of all mbufs
in the chain to the negated length.  If the result is negative, the 
datagram is thrown away because it is too short.  Otherwise, m_adj is 
called to trim the excess garbage from the end of the chain.

-- 
Steve Alexander, TCP/IP Development | stevea%laidbak@sun.com
Lachman Associates, Inc.            | ...!sun!laidbak!stevea