chris@mimsy.UUCP (Chris Torek) (12/21/88)
In article <1911@pembina.UUCP> lake@alberta.UUCP (Robert Lake) writes: > i = -(unsigned short)j; [where i is an int and j is 1---j's type is irrelevant] To figure out what one `should' get, follow the rules: op value type -- ----- ---- 1a. j 1 lvalue, int 1b. expansion 1 rvalue, int 2a. (u_short) 1 temporary, u_short [note 1] 2b. expansion 1 rvalue, int or u_int [note 2] 3. - -1|-(u_int)1 rvalue, int or u_int 4a. i= -1 temporary, int 4b. expansion -1 rvalue, int Notes: 1: u_T is a shorthand for `unsigned T' 2: u_int under `sign-preserving' rules, either under dpANS `value-preserving' rules, depending on whether sizeof(int) > sizeof(short); on Suns it would be int So the correct answer is -1, not 65535, under either set of rules. >If I run this program on a VAX 11/780 using 4.3 BSD, I obtain -1 as the >answer. However, if I run this on a SUN using SUN OS 3.5, I obtain 65535 >as the answer. Who is right? 4.3BSD got it right; SunOS got it wrong (in the name of optimisation :-) ). So what is going on in the table above? Every time C uses a value for some operation, the value should be an `rvalue'. It might be an lvalue or a `temporary'---the result of an assignment, including casts, is a `temporary'; I made up the notion just now, to provide a placeholder for the expressions that are neither lvalues nor properly-expanded rvalues. If it is not already a properly expanded rvalue, it is expanded, either according to unsigned-preserving rules (in the table below) or value-preserving rules (which cannot be listed except for specific compiler systems, since they depend on the number of bits in each type). original expansion (lvalue or temp) -------- --------- signed char int u_char u_int short int u_short u_int int int (already proper) u_int u_int (already proper) long long (already proper) u_long u_long (already proper) Each expansion does the `obvious' thing: if the expansion is from signed, any new high-order bits in the expanded signed version come about by sign extension; if from unsigned, new high-order bits are zeroes. (This holds true in both expansion systems.) So why does the Sun compiler produce 65535? All the expansions above are expensive on some machines---including 680x0s, where it takes up to two instructions per expansion, and possibly a temporary register. If the compiler can prove to itself that the expansion has no effect, it can suppress it. For instance, if the assignment were: u_short j; j = -(u_short)(1); we would have the sequence (unsigned-preserving rules) 1 (1, int, rvalue) (u_short) (1, u_short, temp) expand (1, u_int, rvalue) - (0xffffffff, u_int, rvalue) j= (0xffff, u_short, temp) which puts 65535 in j. The expansion had no effect on the answer: wihtout it, we have 1 (1, int, rvalue) (u_short) (1, u_short, temp => fake rvalue) - (0xffff, u_short, fake rvalue) j= (0xffff, u_short, temp) The SunOS 3.5 compiler incorrectly deduces that the expansion had no effect (it forgets to look at the LHS of the assignment), so it drops it from the expression tree and gets the wrong answer. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
desnoyer@Apple.COM (Peter Desnoyers) (12/22/88)
In article <15090@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >In article <1911@pembina.UUCP> lake@alberta.UUCP (Robert Lake) writes: >> i = -(unsigned short)j; >[where i is an int and j is 1---j's type is irrelevant] > It was pointed out that the context in which this problem arose was TCP software - I would bet it was the window size calculations. Quoting V. Jacobsen and R. Braden from RFC 1072 (TCP Extensions for long-delay paths): The TCP header uses a 16 bit field to report the receive window size to the sender. Therefore, the largest window that can be used is 2**16 = 65K bytes. (In practice, some TCP implementations will "break" for windows exceeding 2**15, because of their failure to do unsigned arithmetic.) I would also guess that the broken TCP implementations actually try to do unsigned arithmetic, but don't get it right, as in the original, subtly flawed example. Peter Desnoyers
stevea@laidbak.UUCP (Steve Alexander) (12/23/88)
In article <22651@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes: [ About i = -(unsigned short)j producing different results on BSD and SunOS ] > >It was pointed out that the context in which this problem arose was >TCP software - I would bet it was the window size calculations. > Peter Desnoyers Err, umm, (sorry, Guy) actually, this code is in IP, not TCP. It occurs in netinet/ip_input.c at the point where IP is deciding how big the datagram is. It negates the data length in the IP header (i = -(unsigned short)ip->ip_len) and then adds the lengths of all mbufs in the chain to the negated length. If the result is negative, the datagram is thrown away because it is too short. Otherwise, m_adj is called to trim the excess garbage from the end of the chain. -- Steve Alexander, TCP/IP Development | stevea%laidbak@sun.com Lachman Associates, Inc. | ...!sun!laidbak!stevea