[comp.lang.c] Puzzle on unsigned promotions

friedl@vsi.UUCP (Stephen J. Friedl) (06/30/88)

Hi net.C.wizards,

     Can any of you help with a puzzle?  I've been trying to
understand the unsigned- vs value-perserving rules of various C
compilers, and I'm afraid I've run into a case I don't
understand.  I have pored over the dpANS plus Chris Torek's notes
on this but am still confused.  Note that this question applies
to all compilers, not just dsANSI ones, so this discussion is not
in comp.std.c.

	/*
	 * what is printed?
	 */
	#define		MINUS_ONE	0xffff

	main()
	{
	unsigned short	sval;
	long		lval1, lval2;

		sval = MINUS_ONE;

		lval1 = - sval;
		lval2 = - (unsigned short) MINUS_ONE;

		printf("lval #1 = %ld    #2 = %ld\n", lval1, lval2);
	}

The normal answer I get is:

	lval #1 = -65535    #2 = -65535

but on the HP9000 I see:

	lval #1 = 1    #2 = -65535

For what it's worth, the HP9000 has sizeof(short) = 2 and
sizeof(int) = 4, and I get the same results when I #define
MINUS_ONE to be (-1) or 0xffff.

My specific questions:

	(1) is this a case of questionable unsigned-ness?

	(2) if I have a vendor who asserts "this is a value-preserving
	    compiler", what is the necessary value of "lval"?

	(3) if I have a vendor who asserts "this is a unsigned-preserving
	    compiler", what is the necessary value of "lval"?

	(4) if I have a vendor who asserts "this is a dpANS-conformant"
	    compiler", what is the necessary value of "lval"? [OK, OK,
	    they're not supposed to assert that yet, but you know what
	    I mean here]

	(5) how about them Lakers? :-)


     Thanks for your help.  Please email, I'll summarize and post.

     Steve
-- 
Steve Friedl     V-Systems, Inc. (714) 545-6442     3B2-kind-of-guy
friedl@vsi.com     {backbones}!vsi.com!friedl    attmail!vsi!friedl

Nancy Reagan on John DeLorean: "Just say snow"

chris@mimsy.UUCP (Chris Torek) (07/01/88)

In article <736@vsi.UUCP> friedl@vsi.UUCP (Stephen J. Friedl) writes:
>... unsigned- vs value-perserving rules ...

[and the following code, reproduced without > to fool inews; slightly
compressed]
	main()
	{
		unsigned short	sval;
		long		lval1, lval2;

		sval = 0xffff;

		lval1 = -sval;
		lval2 = -(unsigned short)0xffff;

		printf("lval #1 = %ld    #2 = %ld\n", lval1, lval2);
	}

Well, first, no matter what, you should get the same value in lval1
and lval2.  A cast is equivalent to assignment to an unnamed temporary,
so the second assignment is like writing

		{ unsigned short tmp; tmp = 0xffff; lval2 = -tmp; }

which is semantically identical to `lval1 = -sval'.

Now, to answer `what should I get', you need to know the relative
sizes of short, int, and long (and their unsigned variants).

    sval = 0xffff; sizeof(short) = 2; sizeof(int) = sizeof(long) = 4:

	lval = -sval:

	Old rule: unsigned stays unsigned
		First turn sval into an rvalue: (u_short)0xffff is
		extended to (u_int), result is (u_int)0x0000ffff
	
		Next negate: result is (u_int)0xffff0001

		Next convert to (signed long) by `bits-as-is' due
		to assignment: (long)0xffff0001 or -65535.

	New rule: unsigned stays unsigned if the new type does not
	have more bits, otherwise unsigned becomes signed:
		First turn sval into an rvalue: (u_short)0xffff is
		extended to (signed int), result is (int)0x0000ffff.

[note that the bit pattern produced by both rules is always identical;
the difference is in the interpretation of that bit pattern---signed vs
unsigned]

		Next negate: result is (int)0xffff0001

		Convert to long by `bits-as-is': (long)0xffff0001 or
		-65535.

>The normal answer I get is:
>
>	lval #1 = -65535    #2 = -65535

which is correct.

>but on the HP9000 I see:
>
>	lval #1 = 1    #2 = -65535

Your HP9000 compiler is broken.

>Please email, I'll summarize and post.

Oops :-)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

jeff@unh.UUCP (Jeffrey E. F. Friedl) (07/01/88)

In article <12251@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> In article <736@vsi.UUCP> friedl@vsi.UUCP (Stephen J. Friedl) writes:
> >... unsigned- vs value-perserving rules ...
			  ^^^^^^^^^^

(personal note to Steve: you misspelled this word.
 Did you mean "preserving", or "perversing" here? [(-8])

> []
> 
> Now, to answer `what should I get', you need to know the relative
> sizes of short, int, and long (and their unsigned variants).
[]
>     sval = 0xffff; sizeof(short) = 2; sizeof(int) = sizeof(long) = 4:
> 
> 	lval = -sval:
> 
> 	Old rule: unsigned stays unsigned
> 		First turn sval into an rvalue: (u_short)0xffff is
> 		extended to (u_int), result is (u_int)0x0000ffff

Here's (I think) the problem (I'm being a devil's advocate here.. keep
the flame level low while you read)....  I would *love* to be proven
wrong on this, but no where in K&R-I can I find where it specifically
stipulates that the conversion from unsigned short to a larger unsigned
int is done by zero padding and not sign (sign with an unsigned?) extending.

The main problem is K&R's use of terms and fonts.  In some places, when
they mean to say INT (i.e. 'int' in compu-font), they say 'integral'
and vicea-verca.  I can find quotes (taken from context) that support
both the thought that the sign should be extended and that it shouldn't
be.

Of course, it makes sense that it shouldn't be, but the language definition
(K&R-I) is wishy-washy about it.....

Specifically, see K&R-I, section 6.1 (page 183):
  "A ... short integer may be used wherever an integer may be used.....
   Conversion of a shorter integer to a longer always involves sign
   extension; integers are signed quantities"

No, they're not. (I hear flames being set to "cinderize"....)
"UNSIGNED SHORT INT" is a "shorter integer" but is not signed.
If they had meant to exclude unsigned quantities, they should have said it.
But then, if they excluded unsigned numbers, that would mean to say that
unsigned shorts may NOT be used wherever an integer may be used.

So, it seems that in the above quote, the first two uses of the word "integer"
referred to both signed and unsigned quantities, and the later uses of the
word refer to signed quantities only.

K&R-I, section 6.5 (page 184), 2nd paragraph:
    "When an unsigned integer is converted to LONG, the value of the
     result is the same numerically as that of the unsigned integer"

My problem with the above quotes is that in the first they say "integer"
rather than "INT", "SHORT", or "LONG".

They say "LONG", not something like "unsigned integer" which I think they
should.

So, when converting from a 2 byte u_int to a 4 byte int, do you
u_int 0xffffffff or 0x0000ffff (the latter being the most natural)?

-----------------------------------------------------------------
How about this: when sizeof(short, int, long) == (2,2,4) under rules
given int both K&R-I and K&R-II,
	(long)(unsigned short)0xffff
is
	signedlong 0x00000001

(I think.  I worked through it quite carefully -- If I made a mistake,
I'm sure a dozen kind souls will let me know [(-:] )

> >but on the HP9000 I see:
> >
> >	lval #1 = 1    #2 = -65535
> 
> Your HP9000 compiler is broken.

But it's only half boken.  "Two (different) answers are better than one!"
Well.... Maybe not.

> >Please email, I'll summarize and post.
> 
> Oops :-)
Oops :-)

"Scotty, we've got to *flame*. We *need* more energy"
"Ay, I'm trying as hard as I can, cap'an"

	*jeff*

-------------------------------------------------------------------------------
Jeffrey Eric Francis Friedl, Box 2146 Babcock House, Durham New Hampshire 03824
..!{uunet,decvax}!unh!jeff   j_friedl@unhh.bitnet  ..!ucbvax!kentvax!jfriedl

I hope I'm not around Jan 18, 2038 at 10:14:08PM

Nancy Reagan on Steven Friedl: just say "hiho"

chris@mimsy.UUCP (Chris Torek) (07/03/88)

In article <565@unh.UUCP> jeff@unh.UUCP (Jeffrey E. F. Friedl) writes:
>... no where in K&R-I can I find where it specifically
>stipulates that the conversion from unsigned short to a larger unsigned
>int is done by zero padding and not sign (sign with an unsigned?) extending.
>... The main problem is K&R's use of terms and fonts.

No, actually, the main problem is that K&R C has only one unsigned
type, namely unsigned int.  Unsigned short, unsigned char, and unsigned
long do not exist in K&R 1st ed.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

jeff@unh.UUCP (Jeffrey E. F. Friedl) (07/04/88)

In article <12291@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> In article <565@unh.UUCP> I wrote:
> >... no where in K&R-I can I find where it specifically
> >stipulates that the conversion from unsigned short to a larger unsigned
> >int is done by zero padding and not sign (sign with an unsigned?) extending.
> >... The main problem is K&R's use of terms and fonts.
> 
> No, actually, the main problem is that K&R C has only one unsigned
> type, namely unsigned int.  Unsigned short, unsigned char, and unsigned
> long do not exist in K&R 1st ed.

Ahhhh.  I always knew I should try to get a copy of page 193.
Years ago, my... uhh... dog (ya, that's it) ate it and I just sort
of forgot about it.....

Seeing my mistake for blaming K&R ("forgive me, for I have strayed"),
I ask if the answer to the original poster (friedl@vsi) about what
is "correct" with "unsigned short", etc, is to say
    "whatever the compiler writer wanted to do, because it's not
     standard C anyway"
(where "standard" refers to K&R-I, not what X3J11 is doing)?

Perhaps the best thing to say about "unsigned short" is
"don't do it" (ANSI C coming our way not withstanding)?

	*jeff*

-------------------------------------------------------------------------------
Jeffrey Eric Francis Friedl, Box 2146 Babcock House, Durham New Hampshire 03824
..!{uunet,decvax}!unh!jeff   BITNET%"j_friedl@unhh"  ..!ucbvax!kentvax!jfriedl

peter@ficc.UUCP (Peter da Silva) (07/05/88)

In article <565@unh.UUCP>, jeff@unh.UUCP (Jeffrey E. F. Friedl) writes:
> Here's (I think) the problem (I'm being a devil's advocate here.. keep
> the flame level low while you read)....  I would *love* to be proven
> wrong on this, but no where in K&R-I can I find where it specifically
> stipulates that the conversion from unsigned short to a larger unsigned
> int is done by zero padding and not sign (sign with an unsigned?) extending.

Last I looked, K&R never refer to "unsigned long" or "unsigned short".
In fact I don't think that unsigned was a modifier back then... it was
a type.

I guess that's why K&R never brings that up...
-- 
-- `-_-' Peter (have you hugged your wolf today) da Silva.
--   U   Ferranti International Controls Corporation.
-- Phone: 713-274-5180. CI$: 70216,1076. ICBM: 29 37 N / 95 36 W.
-- UUCP: {uunet,academ!uhnix1,bellcore!tness1}!sugar!ficc!peter.

iiit-sh@cybaswan.UUCP (s.hosgood) (07/09/88)

Here at Swansea, we used to have an ancient C-compiler in the days
of V6 Un*x - in those days there was no unsigned' type, many people used to
use 'char *' as a kludge.

When K&R first came out it was in the 'documents with unix' distribution, but
on V7 it wasn't distributed as the authors decided to sell the book by then. I
presume they missed points like the sign-extension properties of 'unsigned'
during revision...

I found a similar problem to the above-mentioned occurs with assigning
constants. K&R don't really seem to specify if they consider a constant to
be signed or unsigned. However, they *do* seem to say that constants behave
like 'int's unless they are too long to be 'int', in which case they magically
become 'long'. I once actually had to write a compiler from scratch, and the
assignment:

	long	temp;

	temp = 0xFFFF;

..gave rise to the question of what to do? - treat the constant as an 'int'
of -1 and sign-extend, or treat it as an unsigned bit-pattern and just
zero-fill it? I chose the latter, but I never found a satisfactory answer
from K&R!

bill@proxftl.UUCP (T. William Wells) (07/13/88)

In article <28@cybaswan.UUCP>, iiit-sh@cybaswan.UUCP (s.hosgood) writes:
> I found a similar problem to the above-mentioned occurs with assigning
> constants. K&R don't really seem to specify if they consider a constant to
> be signed or unsigned. However, they *do* seem to say that constants behave
> like 'int's unless they are too long to be 'int', in which case they magically
> become 'long'. I once actually had to write a compiler from scratch, and the
> assignment:
>
>       long    temp;
>
>       temp = 0xFFFF;
>
> ..gave rise to the question of what to do? - treat the constant as an 'int'
> of -1 and sign-extend, or treat it as an unsigned bit-pattern and just
> zero-fill it? I chose the latter, but I never found a satisfactory answer
> from K&R!

Quoting K&R from 2.4.1, p.180:

"An integer constant is...

...A decimal constant whose value exceeds the largest signed
machine integer is taken to be long; an octal or hex constant
which exceeds the largest unsigned machine integer is likewise
taken to be long."

As far as I know, K&R does not speak of unsigned constants; that
version of C simply does not have them.

Assuming two's complement and two byte integers, 0xFFFF
represents the integer constant whose bit pattern is 0xFFFF,
i.e., the value -1. So, counter-intuitive or not, that temp =
0xFFFF; ought to assign -1 to temp. I'm not saying that I think
that is the proper language definition; only that a reading of
K&R that does not include one's own wishing does not support
unsigned constants or the assignment of 0xFFFF to temp.