chris@mimsy.UUCP (Chris Torek) (04/10/88)
Several people have expressed confusion over the difference between `sign preserving' rules and `value preserving' rules. These rules control the result when the compiler has to expand an unsigned char, unsigned short, or unsigned int value to a larger type. (From here on the unsigned prefix will be abbreviated |u_|.) The first kind of expansion happens whenever an object of type |u_char| or |u_short| appears in an expression. The object must be widened to |int| or |u_int|. The second occurs when |u_int| values (possibly produced by the former expansion) are mixed with |long| or |u_long| values in any arithmetic expression. The `sign preserving' rules can be stated in four words: the result is unsigned. The table below shows the result of each conversion (u_int:long means u_int in long context): SIGN PRESERVING RULES input type output type ---------- ----------- u_char u_int u_short u_int u_int u_int u_int:long u_long The `value preserving' rule table looks like this: VALUE PRESERVING RULES input type output type ---------- ----------- u_char int or u_int u_short int or u_int u_int u_int u_int:long long or u_long Whether |int| or |u_int| (|long| or |u_long|) is chosen depends on whether |int| (|long|) can hold all the values of the input type. More specifically, on a machine with 16-bit |int|s and 32-bit |long|s (e.g., IBM PC, PDP-11, some 68000 systems), the table looks like this: VALUE PRESERVING RULES FOR PDP-11/IBM-PC input type output type ---------- ----------- u_char int u_short u_int u_int u_int u_int:long long whereas on a 32-bit |int| and |long| machine (e.g., VAX, IBM PS/2 in 386 mode, most 68000 systems), it appears instead as VALUE PRESERVING RULES FOR VAX/SUN/IBM PS2 input type output type ---------- ----------- u_char int u_short int u_int u_int u_int:long u_long The Rationale provides the following, er, rationale: The unsigned preserving rules greatly increase the number of situations where |unsigned int| confronts |signed int| [in an expression] to yeild a questionably signed result [where a negative number suddenly becomes a large positive number, a possibly unintended result], whereas the value preserving rules minimize such confrontations. Thus, the value preserving rules were considered to be safer for the novice, or unwary, programmer. After much discussion, the Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving. QUIET CHANGE A program that depended upon unsigned preserving arithmetic conversions will behave differently, probably without complaint. This is considered the most serious semantic change made by the Committee to a widespread current practice. I claim that the value-preserving rules are no easier for novices, particularly because the expansion of |u_short| is so terribly context-dependent. One might note that the following prints "conformant" twice on every existing conformant implementation: unsigned char uc = -1; unsigned int ui = -1; if (-uc < 0) printf("conformant\n"); if (-ui > 0) printf("conformant\n"); We are supposed to believe that this is somehow less confusing than the alternative (-uc > 0, -ui > 0). The Rationale notes that the behaviour of expressions such as if (-(unsigned short)-1 < 0) is machine-dependent, without going so far as to give examples like those above. It also notes that all the ambiguity (along with the default rules) can be eliminated with judicious use of casts. Why not, then, ask novices always to write those casts, and/or to remember the rule `unsigned widens to unsigned'. In find it significant that the unsigned preserving rules can be stated in four words, while the value preserving rules require a paragraph full of conditional wording. How can something that is that hard to say be `safer'? As for the argument that the value-preserving rules minimise the presence of mixed signed and unsigned operations, I submit that a majority of these will occur between |u_int| and |long| objects, and I note that in this case, on most modern systems (counting the 80386 as modern, but not the 286), the value preserving rules help not at all. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/11/88)
In article <11000@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >In find it significant that the unsigned preserving rules can be stated >in four words, while the value preserving rules require a paragraph >full of conditional wording. Although I did not support the adoption of value-preserving rules (and AT&T's representatives vociferously opposed it), I feel obliged to note that you have carefully arranged the argument to put the best light on your preference. It takes more than 4 words to express the complete signedness-preserving rules! In fact the complete conversion rules are approximately as difficult to express completely for either approach. The following example shows what X3J11 seems to have had in mind when adopting value-preserving rules (which were based on some subset of existing practice, by the way): unsigned short us = 1; int i = us - 2; printf("%d\n",i); which one would think should print "-1", which it does under the value-preserving rules but not under the signedness-preserving rules. It has been reported that when AT&T recompiled all the UNIX system sources with an experimental value-preserving compiler, practically nothing broke. My experience has been that mixed-signedness expressions have tended to be buggy under signedness-preserving rules and that a change to value-preserving rules would straighten out some of the bugs I have encountered. The bottom line is that this is a change we can probably live with.
chris@mimsy.UUCP (Chris Torek) (04/12/88)
>In article <11000@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >>In find it significant that the unsigned preserving rules can be stated >>in four words, while the value preserving rules require a paragraph >>full of conditional wording. In article <7651@brl-smoke.ARPA> gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: >Although I did not support the adoption of value-preserving rules >(and AT&T's representatives vociferously opposed it), (Oh dear: I find myself in the embarrassing position of agreeing with AT&T :-) ...) >I feel obliged to note that you have carefully arranged the argument >to put the best light on your preference. Well, naturally. (My mother was a lawyer. I chose the rather more honorable profession of Unix Beach Bum, but retain a few tricks.) >It takes more than 4 words to express the complete signedness- >preserving rules! To express them? No. To put them in a standard, yes, but to express them, and hence to remember them, just remember that unsigned widens to unsigned. To remember the value preserving rules, the best I can come up with us `unsigned widens to signed if it fits, otherwise unsigned; it fits if there are more bits in the signed' (19 words). [example of value-preserving being `unsurprising' deleted] >It has been reported that when AT&T recompiled all the UNIX system >sources with an experimental value-preserving compiler, practically >nothing broke. I expect the same. In fact, since signedness is, in C, in the eye of the evaluator, even the most serious type incompatibility---exemplified by the (currently correct) % cat file1.c main() { f((unsigned char)0); exit(0); } % cat file2.c f(x) unsigned int x; { x = x; } % lint -h file?.c % ---will not in fact cause any harm, even for functions without prototypes. >My experience has been that mixed-signedness expressions have tended >to be buggy under signedness-preserving rules and that a change to >value-preserving rules would straighten out some of the bugs I have >encountered. I expect the same number of latent bugs with either scheme, but I find sign-preserving rules simpler. In fairness, I might say that the 4BSD compiler used to get some of them wrong, and apparently no one noticed. >The bottom line is that this is a change we can probably live with. Agreed. I still think it is a botch, and I shall continue to call it a botch. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris