[comp.lang.c] 0xFF != '\xFF' ?

daveh@marob.uucp (Dave Hammond) (04/08/91)

In the following code examples, the '\nnn' representation causes the
compiler to sign-extend 8 bit values, causing comparisons to fail (255
!= -1).  This occurred on every machine I tested (Generic 386, Altos
386, DEC uVax).  Is this because the '\nnn' token being seen as a char
value and chars are signed on the machines I tested on?  Or must I
expect this result from all C compilers?


/* c1.c */
main()
{
	int a = '\xFF';
	printf("a=%o\n",a);
}

/* c2.c */
main()
{
	int a = 0xFF;
	printf("a=%o\n",a);
}


output from diff c1.s c2.s:
4c4
< 	TITLE	$c1
---
> 	TITLE	$c2
33c33
< 	mov  	DWORD PTR [ebp-4], -1
---
> 	mov  	DWORD PTR [ebp-4], 255

--
Dave Hammond
daveh@marob.uucp
uunet!rutgers!phri!marob!daveh

grogers@convex.com (Geoffrey Rogers) (04/10/91)

In article <28007837.35A9@marob.uucp> daveh@marob.uucp (Dave Hammond) writes:
>In the following code examples, the '\nnn' representation causes the
>compiler to sign-extend 8 bit values, causing comparisons to fail (255
>!= -1).  This occurred on every machine I tested (Generic 386, Altos
>386, DEC uVax).  Is this because the '\nnn' token being seen as a char
>value and chars are signed on the machines I tested on?  

Yes, this is correct. This is because char's can be either signed or
unsigned. On the machines you mention, char is signed.

>Or must I expect this result from all C compilers?

No.

+------------------------------------+---------------------------------+
| Geoffrey C. Rogers   		     | "Whose brain did you get?"      |
| grogers@convex.com                 | "Abbie Normal!"                 |
| {sun,uunet,uiucdcs}!convex!grogers |                                 |
+------------------------------------+---------------------------------+

gwyn@smoke.brl.mil (Doug Gwyn) (04/10/91)

In article <28007837.35A9@marob.uucp> daveh@marob.uucp (Dave Hammond) writes:
>Is this because the '\nnn' token being seen as a char value and chars are
>signed on the machines I tested on?

Essentially, yes.  The value 255 gets stuffed into a char then converted
to type int, which propagates the sign bit when char acts as a signed type.

steve@taumet.com (Stephen Clamage) (04/10/91)

grogers@convex.com (Geoffrey Rogers) writes:

|In article <28007837.35A9@marob.uucp> daveh@marob.uucp (Dave Hammond) writes:
|>....  Is this because the '\nnn' token being seen as a char
|>value and chars are signed on the machines I tested on?  

|Yes, this is correct. This is because char's can be either signed or
|unsigned. On the machines you mention, char is signed.

|>Or must I expect this result from all C compilers?

|No.

Le me also point out that you must be careful of code which depends on
whether the default char type is signed or unsigned; sooner or later
it will be ported to an environment where it will no longer work.

When possible, use explicit declarations of 'signed char' or 'unsigned
char'.  (Unfortunately, not all compilers accept the 'signed' keyword.)

In the posted example, you could do something like:
	int i = (unsigned char) '\xFF';
although it is unclear to me why this is better than using a non-char value:
	int i = 0xFF;
-- 

Steve Clamage, TauMetric Corp, steve@taumet.com

wolfram@cip-s08.informatik.rwth-aachen.de (Wolfram Roesler) (04/12/91)

It's best to compare in the following way:
	char x = -1;
	char y = 0xff;
	if ((unsigned char)x = (unsigned char)y)
	  ...

... so before comparing, cast both to unsigned char. This because you
do not know (because it's undefined by the language definition) if char
is unsigned or not.

Sepp@ppcger.ppc.sub.org (Josef Wolf) (04/15/91)

wolfram@cip-s08.informatik.rwth-aachen.de (Wolfram Roesler) writes:
] It's best to compare in the following way:
] 	char x = -1;
] 	char y = 0xff;
] 	if ((unsigned char)x = (unsigned char)y)
] 	  ...
] ... so before comparing, cast both to unsigned char. This because you
] do not know (because it's undefined by the language definition) if char
] is unsigned or not.

You even don't know about the number of bits of a char (or is it defined?)
So you can get into trouble with this.

Greetings
    Sepp

| Josef Wolf, Germersheim, Germany | +49 7274 8047  -24 Hours- (call me :-) |
|     sepp@ppcger.ppc.sub.org      | +49 7274 8048  -24 Hours-              |
| ...!ira.uka.de!smurf!ppcger!sepp | +49 7274 8967  18:00-8:00, Sa + Su 24h |
| ----=> kommt Zeit => kommt Frau  | all lines  300/1200/2400 bps 8n1       |
|  kommt Frau => geht Zeit, geht Zeit => geht Frau, geht Frau => kommt Zeit |

wolfram@cip-s02.informatik.rwth-aachen.de (Wolfram Roesler) (04/18/91)

Sepp@ppcger.ppc.sub.org (Josef Wolf) writes:

>wolfram@cip-s08.informatik.rwth-aachen.de (Wolfram Roesler) writes:
>] It's best to compare in the following way:
>] 	char x = -1;
>] 	char y = 0xff;
>] 	if ((unsigned char)x = (unsigned char)y)
>] 	  ...
>] ... so before comparing, cast both to unsigned char. This because you
>] do not know (because it's undefined by the language definition) if char
>] is unsigned or not.

>You even don't know about the number of bits of a char (or is it defined?)
>So you can get into trouble with this.

A char is defined to contain a single character, so this is usually 8 bits.
However, I dont think my prg will cause trouble since 
	char x = -1
includes an implicit cast. Assume a char is 8 bits and an int is 16, then
this line will not simply copy the low 8 bits of -1 into x, but it will
do this preserving the sign (if chars are signed, which we assume here).
Your compiler might give a warning about signed assignement to an unsigned
var, but that's all.

bhoughto@bishop.intel.com (Blair P. Houghton) (04/19/91)

In article <wolfram.671964442@cip-s02> wolfram@cip-s02.informatik.rwth-aachen.de (Wolfram Roesler) writes:
>Sepp@ppcger.ppc.sub.org (Josef Wolf) writes:
>>You even don't know about the number of bits of a char (or is it defined?)
>>So you can get into trouble with this.
>
>A char is defined to contain a single character, so this is usually 8 bits.

Look in <limits.h> (or grep through your compiler's include-files)
for `CHAR_BIT'.  This is the definition of the number of bits
in a `char'.

				--Blair
				  "Give me egrep and a place to
				   stand and I will move the earth."

wolfram@cip-s01.informatik.rwth-aachen.de (Wolfram Roesler) (05/08/91)

BTW, isnt a char defined to contain at least 8 bits in Ansi-C?
I read something about that in a (however rather unprecise) book.

worley@compass.com (Dale Worley) (05/08/91)

In article <wolfram.673701563@cip-s01> wolfram@cip-s01.informatik.rwth-aachen.de (Wolfram Roesler) writes:
   Subject: Re: 0xFF != '\xFF' ?

   BTW, isnt a char defined to contain at least 8 bits in Ansi-C?
   I read something about that in a (however rather unprecise) book.

Yes, chars are required to have at least 8 bits.  See section 2.2.4.2
of the standard.

However beware that "character constants" in C really have type int.
The standard states: "If [a] character constant contains a single
character or escape sequence, its value is the one that results when
an object with type char whose value is that of the single character
or escape sequence is converted to type int." (3.1.3.4, page 30, line
33, Dec 88 draft)  This means that if your chars are *signed*,
character constants with the high bit on will be *sign-extended* into
ints.  If you want to avoid this, you will probably have to say

	(unsigned char)'\xFF'

Then the conversion to int is done by zero-extending rather than
sign-extending.

In general, signed char's are losing, especially if you have
characters with values >127.

Dale

Dale Worley		Compass, Inc.			worley@compass.com
--
You have joined the Legion to die.  We will send you where you can die.
-- Plaque reputedly posted in mess halls of the French Foreign Legion.