[comp.lang.c] ANSIfication: value preserving rules

chris@mimsy.UUCP (Chris Torek) (04/10/88)

Several people have expressed confusion over the difference between
`sign preserving' rules and `value preserving' rules.  These rules
control the result when the compiler has to expand an unsigned char,
unsigned short, or unsigned int value to a larger type.  (From here on
the unsigned prefix will be abbreviated |u_|.)

The first kind of expansion happens whenever an object of type |u_char|
or |u_short| appears in an expression.  The object must be widened to
|int| or |u_int|.  The second occurs when |u_int| values (possibly
produced by the former expansion) are mixed with |long| or |u_long|
values in any arithmetic expression.

The `sign preserving' rules can be stated in four words: the result is
unsigned.  The table below shows the result of each conversion
(u_int:long means u_int in long context):

	SIGN PRESERVING RULES
	input type	output type
	----------	-----------
	u_char		u_int
	u_short		u_int
	u_int		u_int
	u_int:long	u_long

The `value preserving' rule table looks like this:

	VALUE PRESERVING RULES
	input type	output type
	----------	-----------
	u_char		int or u_int
	u_short		int or u_int
	u_int		u_int
	u_int:long	long or u_long

Whether |int| or |u_int| (|long| or |u_long|) is chosen depends on
whether |int| (|long|) can hold all the values of the input type.
More specifically, on a machine with 16-bit |int|s and 32-bit
|long|s (e.g., IBM PC, PDP-11, some 68000 systems), the table
looks like this:

	VALUE PRESERVING RULES FOR PDP-11/IBM-PC
	input type	output type
	----------	-----------
	u_char		int
	u_short		u_int
	u_int		u_int
	u_int:long	long

whereas on a 32-bit |int| and |long| machine (e.g., VAX, IBM PS/2
in 386 mode, most 68000 systems), it appears instead as

	VALUE PRESERVING RULES FOR VAX/SUN/IBM PS2
	input type	output type
	----------	-----------
	u_char		int
	u_short		int
	u_int		u_int
	u_int:long	u_long

The Rationale provides the following, er, rationale:

    The unsigned preserving rules greatly increase the number of
    situations where |unsigned int| confronts |signed int| [in an
    expression] to yeild a questionably signed result [where a negative
    number suddenly becomes a large positive number, a possibly
    unintended result], whereas the value preserving rules minimize
    such confrontations.  Thus, the value preserving rules were
    considered to be safer for the novice, or unwary, programmer.
    After much discussion, the Committee decided in favor of value
    preserving rules, despite the fact that the UNIX C compilers had
    evolved in the direction of unsigned preserving.

			QUIET CHANGE
	A program that depended upon unsigned preserving arithmetic
	conversions will behave differently, probably without
	complaint.  This is considered the most serious semantic
	change made by the Committee to a widespread current practice.

I claim that the value-preserving rules are no easier for novices,
particularly because the expansion of |u_short| is so terribly
context-dependent.  One might note that the following prints
"conformant" twice on every existing conformant implementation:

	unsigned char uc = -1;
	unsigned int ui = -1;

	if (-uc < 0)
		printf("conformant\n");
	if (-ui > 0)
		printf("conformant\n");

We are supposed to believe that this is somehow less confusing than the
alternative (-uc > 0, -ui > 0).  The Rationale notes that the behaviour
of expressions such as

	if (-(unsigned short)-1 < 0)

is machine-dependent, without going so far as to give examples like
those above.  It also notes that all the ambiguity (along with the
default rules) can be eliminated with judicious use of casts.  Why
not, then, ask novices always to write those casts, and/or to remember
the rule `unsigned widens to unsigned'.

In find it significant that the unsigned preserving rules can be stated
in four words, while the value preserving rules require a paragraph
full of conditional wording.  How can something that is that hard to
say be `safer'?  As for the argument that the value-preserving rules
minimise the presence of mixed signed and unsigned operations, I submit
that a majority of these will occur between |u_int| and |long| objects,
and I note that in this case, on most modern systems (counting the
80386 as modern, but not the 286), the value preserving rules help
not at all.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/11/88)

In article <11000@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In find it significant that the unsigned preserving rules can be stated
>in four words, while the value preserving rules require a paragraph
>full of conditional wording.

Although I did not support the adoption of value-preserving rules
(and AT&T's representatives vociferously opposed it), I feel obliged
to note that you have carefully arranged the argument to put the
best light on your preference.  It takes more than 4 words to express
the complete signedness-preserving rules!  In fact the complete
conversion rules are approximately as difficult to express completely
for either approach.

The following example shows what X3J11 seems to have had in mind when
adopting value-preserving rules (which were based on some subset of
existing practice, by the way):

	unsigned short us = 1;
	int i = us - 2;
	printf("%d\n",i);

which one would think should print "-1", which it does under the
value-preserving rules but not under the signedness-preserving rules.

It has been reported that when AT&T recompiled all the UNIX system
sources with an experimental value-preserving compiler, practically
nothing broke.  My experience has been that mixed-signedness
expressions have tended to be buggy under signedness-preserving rules
and that a change to value-preserving rules would straighten out some
of the bugs I have encountered.

The bottom line is that this is a change we can probably live with.

chris@mimsy.UUCP (Chris Torek) (04/12/88)

>In article <11000@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>>In find it significant that the unsigned preserving rules can be stated
>>in four words, while the value preserving rules require a paragraph
>>full of conditional wording.

In article <7651@brl-smoke.ARPA> gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
>Although I did not support the adoption of value-preserving rules
>(and AT&T's representatives vociferously opposed it),

(Oh dear: I find myself in the embarrassing position of agreeing with
AT&T :-) ...)

>I feel obliged to note that you have carefully arranged the argument
>to put the best light on your preference.

Well, naturally.  (My mother was a lawyer.  I chose the rather more
honorable profession of Unix Beach Bum, but retain a few tricks.)

>It takes more than 4 words to express the complete signedness-
>preserving rules!

To express them?  No.  To put them in a standard, yes, but to express
them, and hence to remember them, just remember that unsigned widens to
unsigned.  To remember the value preserving rules, the best I can
come up with us `unsigned widens to signed if it fits, otherwise
unsigned; it fits if there are more bits in the signed' (19 words).

[example of value-preserving being `unsurprising' deleted]

>It has been reported that when AT&T recompiled all the UNIX system
>sources with an experimental value-preserving compiler, practically
>nothing broke.

I expect the same.  In fact, since signedness is, in C, in the eye of
the evaluator, even the most serious type incompatibility---exemplified
by the (currently correct)

	% cat file1.c
	main() { f((unsigned char)0); exit(0); }
	% cat file2.c
	f(x) unsigned int x; { x = x; }
	% lint -h file?.c
	%

---will not in fact cause any harm, even for functions without prototypes.

>My experience has been that mixed-signedness expressions have tended
>to be buggy under signedness-preserving rules and that a change to
>value-preserving rules would straighten out some of the bugs I have
>encountered.

I expect the same number of latent bugs with either scheme, but I find
sign-preserving rules simpler.  In fairness, I might say that the 4BSD
compiler used to get some of them wrong, and apparently no one noticed.

>The bottom line is that this is a change we can probably live with.

Agreed.  I still think it is a botch, and I shall continue to call it
a botch.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris