david@elroy.Jpl.Nasa.Gov (David Robinson) (10/18/87)
I have run into a difference between the SunOS 3.4 C compiler
and the Masscomp C compiler for the following segment
of code:
foo(p)
char p;
{
if (p == 0x80)
return (1);
return (0);
}
On a Sun, it generates the code that is commonly expected, but on a Masscomp
is gives a warning "Comparison always false" and optimizes away the
if statement. In reading K&R I was unsure of what is to happen,
under 7.6 and 7.7 it claims that the "usual arithmetic conversions are
performed". From 6.6 both sides are converted into ints, 0x80 stays the
same but if p is equal to 0x80 it may always be sign extended on
some machines depending on its character sign convientions. Does this
imply that the above comparision is truly non-portable? Both machines
give the same result if it is rewritten as:
if (p == (char)0x80)
--
David Robinson elroy!david@csvax.caltech.edu ARPA
david@elroy.jpl.nasa.gov (new)
seismo!cit-vax!elroy!david UUCP
Disclaimer: No one listens to me anyway!
ark@alice.UUCP (10/18/87)
In article <4663@elroy.Jpl.Nasa.Gov>, david@elroy.UUCP writes: > From 6.6 both sides are converted into ints, 0x80 stays the > same but if p is equal to 0x80 it may always be sign extended on > some machines depending on its character sign convientions. (the original question had to do with the expression c==0x80 where c is a character variable) Chars are indeed extended to ints for comparison, and that extension may or may not involve sign extension. Saying c == (char) 0x80 may indeed be more portable, as one would hope 0x80 will be converted to a char and then undergo the same sign exension as c, but I wouldn't want to bet on the compiler getting it right. Instead, consider making c an unsigned char or casting it: ((unsigned char) c) == 0x80 (the outer parentheses are there for clarity)
guy%gorodish@Sun.COM (Guy Harris) (10/19/87)
> On a Sun, it generates the code that is commonly expected, but on a Masscomp > is gives a warning "Comparison always false" and optimizes away the > if statement. Umm, on a Sun running 3.4, what it does is say "foo.c", line 4: warning: constant 128 is out of range of char comparison "foo.c", line 4: warning: value coerced to -128 for bug compatibility "foo.c", line 4: warning: do not expect this coercion in release 4.0 and then generate the code that is commonly, but incorrectly, expected. When it says "do not expect this coercion", it MEANS it; this WILL go away, and the behavior WILL be the same as it is on the Masscomp compiler. (You will get a warning from the compiler when it does this.) > In reading K&R I was unsure of what is to happen, under 7.6 and 7.7 it > claims that the "usual arithmetic conversions are performed". From 6.6 > both sides are converted into ints, 0x80 stays the same but if p is equal > to 0x80 it may always be sign extended on some machines depending on its > character sign convientions. Does this imply that the above comparision > is truly non-portable? You bet! It will work just fine on a 3B[2-20], since "char" is unsigned on those machines' C implementations (which means you'd better not do something such as: char c; if ((c = getchar()) == EOF) since on those machines it won't do what you might expect), but it won't work on machines on whose C implementations "char" is signed, unless they don't correctly implement the "usual arithmetic conversions" (that's why it says "bug compatibility"; it's providing bug-for-bug compatibility with compilers in earlier 3.x releases). > Both machines give the same result if it is rewritten as: > > if (p == (char)0x80) Which is what you should write here. Alternatively, you can write if (p == '\200') Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com
ron@topaz.rutgers.edu (Ron Natalie) (10/19/87)
Correct. The character comparison "c == 0x80" is non portable since chars may or may not be unsigned innately. On machines where char's are signed, 0x80 would likely become -0x80 (e.g. on 32 bit ints, 0xFFFFFF80) when converted to int (usual arithmatic conversions). The compiler in question is being smart in noting that no conversion of a char to an int has values outside the range -0x80 to +0x7F. Using the (char) cast on the constant causes the conversion of char to int to occur on the constant yielding the sign extension. This doesn't happen otherwise because 0x80 is already of type integer so no conversion is called for. -Ron
henry@utzoo.UUCP (Henry Spencer) (10/19/87)
> ... Alternatively, you can write > > if (p == '\200') That's what I would do. I find that such problems are largely avoided, and (in my opinion) the code is clearer, if one treats char as a distinct type and intermixes it with ints only in certain situations (e.g. using an int to hold a char value or EOF). Following that rule, when one wants to compare a char to a constant, one uses a char constant, not an int constant. (Yes, I look for a string terminator as '\0', not 0.) For those barbarous folk who prefer hexadecimal to octal (God clearly meant man to use octal, the thumbs are parity bits), X3J11 has added hex string escapes. -- "Mir" means "peace", as in | Henry Spencer @ U of Toronto Zoology "the war is over; we've won". | {allegra,ihnp4,decvax,utai}!utzoo!henry
jwhitnel@csib.UUCP (10/21/87)
In article <7373@alice.UUCP> ark@alice.UUCP writes: |In article <4663@elroy.Jpl.Nasa.Gov>, david@elroy.UUCP writes: |> From 6.6 both sides are converted into ints, 0x80 stays the |> same but if p is equal to 0x80 it may always be sign extended on |> some machines depending on its character sign convientions. | |(the original question had to do with the expression c==0x80 where | c is a character variable) | |Chars are indeed extended to ints for comparison, and that |extension may or may not involve sign extension. Saying | | c == (char) 0x80 | |may indeed be more portable, as one would hope 0x80 will be |converted to a char and then undergo the same sign exension |as c, but I wouldn't want to bet on the compiler getting it |right. Instead, consider making c an unsigned char or casting it: | | ((unsigned char) c) == 0x80 | |(the outer parentheses are there for clarity) If portability is of concern, some compilers don't support unsigned chars. The preferred technique is ( ( (int) c ) & 0xff ) == 0x80 The (int) cast can be left off. Jerry Whitnell It's a damn poor mind that can only Communication Solutions, Inc. think of one way to spell a word. -- Andrew Jackson
dant@tekla.UUCP (10/22/87)
Henry Spencer writes: > >For those barbarous folk who prefer hexadecimal to octal (God clearly meant >man to use octal, the thumbs are parity bits), X3J11 has added hex string >escapes. What's the format of these hex string escapes? [If God wanted people to use octal, why are computers made with 16 or 32 bit words? (Yes, I know some computers have 36 bit words, but they are clearly meant for space aliens from Arcturus.)] --- Dan Tilque dant@tekla.tek.com or dant@tekla.UUCP "Some people at Bell Labs, of course, will suggest that in order to define UNIX accurately, you have to know what Dennis Ritchie is running on his workstation at the time." -- Rob Kolstad
franka@mmintl.UUCP (Frank Adams) (10/23/87)
In article <4663@elroy.Jpl.Nasa.Gov> david@elroy.Jpl.Nasa.Gov (David Robinson) writes: > if (p == 0x80) As several people have noted, this is indeed not portable. I agree that the best way to write this is: if (p == '\200') or whatever the ANSI syntax is for hex constants in strings/characters. I will note that one other way to write this, with the new ANSI standard, would be: if (p == 0x80u) This will coerce p to unsigned. -- Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108
karl@haddock.ISC.COM (Karl Heuer) (10/23/87)
In article <1261@csib.UUCP> jwhitnel@csib.UUCP (Jerry Whitnell) writes: >The preferred technique is ( ( (int) c ) & 0xff ) == 0x80 This assumes that char is 8 bits. (Though the use of "0x80" may already be making that assumption.) I'd go with "c == (char)0x80", and if the compiler can't handle it, send in a bug report and/or change vendors. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
chris@mimsy.UUCP (Chris Torek) (10/26/87)
In article <2515@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes: >... I will note that one other way to write this, with the new ANSI >standard, would be: > > if (p == 0x80u) > >This will coerce p to unsigned. . . . which means that if p is a signed character, it will extend from 0x80 == -128 to 0xff80 (assuming 16 bit int), and the comparison will still always be false. Incidentally, there is a bug in certain Sun compilers that causes char c = 0x80; if (c == (int)(char)0x80) ... to fail. Changing the code to if (c == (int)(char)(int)0x80) ... makes the test succeed. Curiously, char c = 0x80, d = 0x80; if (c == (int)(char)d) ... succeeds without the extra (int) cast. Chris -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
karl@haddock.ISC.COM (Karl Heuer) (10/27/87)
In article <2515@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes: >In article <4663@elroy.Jpl.Nasa.Gov> david@elroy.Jpl.Nasa.Gov (David Robinson) writes: > if (p == 0x80u) >This will coerce p to unsigned. Yes it will, but that doesn't give you the right answer. If p is a (signed) char containing 0x80, the left side is (unsigned)(int)p which is 0xffffff80U, not 0x00000080U. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint