[comp.lang.c] Comparing chars to constants

david@elroy.Jpl.Nasa.Gov (David Robinson) (10/18/87)

I have run into a difference between the SunOS 3.4 C compiler
and the Masscomp C compiler for the following segment
of code:

foo(p)
	char p;
{
	if (p == 0x80)
		return (1);
	return (0);
}


On a Sun, it generates the code that is commonly expected, but on a Masscomp
is gives a warning "Comparison always false" and optimizes away the
if statement.  In reading K&R I was unsure of what is to happen,
under 7.6 and 7.7 it claims that the "usual arithmetic conversions are
performed".  From 6.6 both sides are converted into ints, 0x80 stays the
same but if p is equal to 0x80 it may always be sign extended on
some machines depending on its character sign convientions.  Does this
imply that the above comparision is truly non-portable? Both machines
give the same result if it is rewritten as:

	if (p == (char)0x80)


-- 
	David Robinson		elroy!david@csvax.caltech.edu     ARPA
				david@elroy.jpl.nasa.gov (new)
				seismo!cit-vax!elroy!david UUCP
Disclaimer: No one listens to me anyway!

ark@alice.UUCP (10/18/87)

In article <4663@elroy.Jpl.Nasa.Gov>, david@elroy.UUCP writes:
> From 6.6 both sides are converted into ints, 0x80 stays the
> same but if p is equal to 0x80 it may always be sign extended on
> some machines depending on its character sign convientions.

(the original question had to do with the expression  c==0x80  where
 c is a character variable)

Chars are indeed extended to ints for comparison, and that
extension may or may not involve sign extension.  Saying

	c == (char) 0x80

may indeed be more portable, as one would hope 0x80 will be
converted to a char and then undergo the same sign exension
as c, but I wouldn't want to bet on the compiler getting it
right.  Instead, consider making c an unsigned char or casting it:

	((unsigned char) c) == 0x80

(the outer parentheses are there for clarity)

guy%gorodish@Sun.COM (Guy Harris) (10/19/87)

> On a Sun, it generates the code that is commonly expected, but on a Masscomp
> is gives a warning "Comparison always false" and optimizes away the
> if statement.

Umm, on a Sun running 3.4, what it does is say

"foo.c", line 4: warning: constant 128 is out of range of char comparison 
"foo.c", line 4: warning: value coerced to -128 for bug compatibility
"foo.c", line 4: warning: do not expect this coercion in release 4.0

and then generate the code that is commonly, but incorrectly, expected.  When
it says "do not expect this coercion", it MEANS it; this WILL go away, and the
behavior WILL be the same as it is on the Masscomp compiler.  (You will get a
warning from the compiler when it does this.)

> In reading K&R I was unsure of what is to happen, under 7.6 and 7.7 it
> claims that the "usual arithmetic conversions are performed".  From 6.6
> both sides are converted into ints, 0x80 stays the same but if p is equal
> to 0x80 it may always be sign extended on some machines depending on its
> character sign convientions.  Does this imply that the above comparision
> is truly non-portable?

You bet!  It will work just fine on a 3B[2-20], since "char" is unsigned on
those machines' C implementations (which means you'd better not do something
such as:

	char c;

	if ((c = getchar()) == EOF)

since on those machines it won't do what you might expect), but it won't work
on machines on whose C implementations "char" is signed, unless they don't
correctly implement the "usual arithmetic conversions" (that's why it says "bug
compatibility"; it's providing bug-for-bug compatibility with compilers in
earlier 3.x releases).

> Both machines give the same result if it is rewritten as:
> 
> 	if (p == (char)0x80)

Which is what you should write here.  Alternatively, you can write

	if (p == '\200')
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

ron@topaz.rutgers.edu (Ron Natalie) (10/19/87)

Correct.  The character comparison "c == 0x80" is non portable since
chars may or may not be unsigned innately.  On machines where char's
are signed, 0x80 would likely become -0x80 (e.g.  on 32 bit ints,
0xFFFFFF80) when converted to int (usual arithmatic conversions).  The
compiler in question is being smart in noting that no conversion of a
char to an int has values outside the range -0x80 to +0x7F.

Using the (char) cast on the constant causes the conversion of char
to int to occur on the constant yielding the sign extension.  This
doesn't happen otherwise because 0x80 is already of type integer so
no conversion is called for.

-Ron

henry@utzoo.UUCP (Henry Spencer) (10/19/87)

> ... Alternatively, you can write
> 
> 	if (p == '\200')

That's what I would do.  I find that such problems are largely avoided,
and (in my opinion) the code is clearer, if one treats char as a distinct
type and intermixes it with ints only in certain situations (e.g. using
an int to hold a char value or EOF).  Following that rule, when one wants
to compare a char to a constant, one uses a char constant, not an int
constant.  (Yes, I look for a string terminator as '\0', not 0.)

For those barbarous folk who prefer hexadecimal to octal (God clearly meant
man to use octal, the thumbs are parity bits), X3J11 has added hex string
escapes.
-- 
"Mir" means "peace", as in           |  Henry Spencer @ U of Toronto Zoology
"the war is over; we've won".        | {allegra,ihnp4,decvax,utai}!utzoo!henry

jwhitnel@csib.UUCP (10/21/87)

In article <7373@alice.UUCP> ark@alice.UUCP writes:
|In article <4663@elroy.Jpl.Nasa.Gov>, david@elroy.UUCP writes:
|> From 6.6 both sides are converted into ints, 0x80 stays the
|> same but if p is equal to 0x80 it may always be sign extended on
|> some machines depending on its character sign convientions.
|
|(the original question had to do with the expression  c==0x80  where
| c is a character variable)
|
|Chars are indeed extended to ints for comparison, and that
|extension may or may not involve sign extension.  Saying
|
|	c == (char) 0x80
|
|may indeed be more portable, as one would hope 0x80 will be
|converted to a char and then undergo the same sign exension
|as c, but I wouldn't want to bet on the compiler getting it
|right.  Instead, consider making c an unsigned char or casting it:
|
|	((unsigned char) c) == 0x80
|
|(the outer parentheses are there for clarity)

If portability is of concern, some compilers don't support unsigned chars.
The preferred technique is 

    ( ( (int) c ) & 0xff ) == 0x80

The (int) cast can be left off.

Jerry Whitnell                           It's a damn poor mind that can only
Communication Solutions, Inc.            think of one way to spell a word.
						-- Andrew Jackson

dant@tekla.UUCP (10/22/87)

Henry Spencer writes:
>
>For those barbarous folk who prefer hexadecimal to octal (God clearly meant
>man to use octal, the thumbs are parity bits), X3J11 has added hex string
>escapes.

What's the format of these hex string escapes?  [If God wanted people to use
octal, why are computers made with 16 or 32 bit words?  (Yes, I know some
computers have 36 bit words, but they are clearly meant for space aliens
from Arcturus.)]

---
Dan Tilque
dant@tekla.tek.com  or dant@tekla.UUCP

"Some people at Bell Labs, of course, will suggest that in order to define
UNIX accurately, you have to know what Dennis Ritchie is running on his
workstation at the time."
				-- Rob Kolstad

franka@mmintl.UUCP (Frank Adams) (10/23/87)

In article <4663@elroy.Jpl.Nasa.Gov> david@elroy.Jpl.Nasa.Gov (David Robinson) writes:
>	if (p == 0x80)

As several people have noted, this is indeed not portable.  I agree that the
best way to write this is:

	if (p == '\200')

or whatever the ANSI syntax is for hex constants in strings/characters.  I
will note that one other way to write this, with the new ANSI standard,
would be:

	if (p == 0x80u)

This will coerce p to unsigned.
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

karl@haddock.ISC.COM (Karl Heuer) (10/23/87)

In article <1261@csib.UUCP> jwhitnel@csib.UUCP (Jerry Whitnell) writes:
>The preferred technique is    ( ( (int) c ) & 0xff ) == 0x80

This assumes that char is 8 bits.  (Though the use of "0x80" may already be
making that assumption.)  I'd go with "c == (char)0x80", and if the compiler
can't handle it, send in a bug report and/or change vendors.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

chris@mimsy.UUCP (Chris Torek) (10/26/87)

In article <2515@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
>... I will note that one other way to write this, with the new ANSI
>standard, would be:
>
>	if (p == 0x80u)
>
>This will coerce p to unsigned.

 . . . which means that if p is a signed character, it will extend
from 0x80 == -128 to 0xff80 (assuming 16 bit int), and the comparison
will still always be false.

Incidentally, there is a bug in certain Sun compilers that causes

	char c = 0x80;
	if (c == (int)(char)0x80) ...

to fail.  Changing the code to

	if (c == (int)(char)(int)0x80) ...

makes the test succeed.  Curiously,

	char c = 0x80, d = 0x80;
	if (c == (int)(char)d) ...

succeeds without the extra (int) cast.

Chris
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

karl@haddock.ISC.COM (Karl Heuer) (10/27/87)

In article <2515@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
>In article <4663@elroy.Jpl.Nasa.Gov> david@elroy.Jpl.Nasa.Gov (David Robinson) writes:
>	if (p == 0x80u)
>This will coerce p to unsigned.

Yes it will, but that doesn't give you the right answer.  If p is a (signed)
char containing 0x80, the left side is (unsigned)(int)p which is 0xffffff80U,
not 0x00000080U.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint