[comp.lang.c] Difference between char and unsigned char

ycy@walt.cc.utexas.edu (Joseph Yip) (07/19/90)

Hi,

The char and  unsigned  char problem has been with me for a long time. I
know char represents 7 bits ASCII and unsigned char works with 8-bit.
Most of UNIX string functions (strcpy, strcmp,...). malloc() also
returns *char, not *unsigned char. 

Some systems have char defaulted to 8-bit. Other require you to declare
explicitly as unsigned char.

If I pass a unsigned char pointer to a function that expects a char
pointer, e.g. 

int foo( char *p);
...
unsigned char *buf;
a = foo(buf);

will there be a problem? Will the foo() mask off my 7th-bit?

You know I hate writing the same system library functions where the
only difference is the 7th-bit. 

If I am using ANSI C, the compiler will give me warnings or errors because
of the type mismatch!

Thank you

- Joseph Yip

Email: joseph@zeus.ee.utexas.edu

karl@haddock.ima.isc.com (Karl Heuer) (07/19/90)

In article <34292@ut-emx.UUCP> ycy@walt.cc.utexas.edu (Joseph Yip) writes:
>I know char represents 7 bits ASCII and unsigned char works with 8-bit.

Not quite.  `char' is an arithmetic type which is at least eight bits wide,
but it's implementation-defined whether it's signed or unsigned.  For normal
use in text processing, you shouldn't need to know the integer value of a
character, so `char' is sufficient.

The unfortunate exceptions are that the return value of `getc()' and the
argument to a <ctype.h> function are a bastard type: instead of the logically
correct `char', they use the union of `unsigned char' and { EOF }.

Now, since all normal% characters are contained within the intersection of
`char' and `unsigned char', you can safely ignore this botch if you *know*
you're dealing with the most restrictive kind of text.

>If I pass a unsigned char pointer to a function that expects a char
>pointer ... will there be a problem?  Will [it] mask off my 7th-bit?

No.  At worst you'll need to use an explicit cast, but I believe the Standard
contains a clause to guarantee that the behavior is as you expect.

My recommendation is to always use `char *' for text, and do conversions to
`unsigned char' only in the context of <ctype.h> functions.

>You know I hate writing the same system library functions where the
>only difference is the 7th-bit.

I don't see any need.  Save your energy for a *real* problem, like wchar_t.

Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint
________
% Besides being true of all ASCII characters, this guarantee is also extended
  to the entire C source character set in non-ASCII alphabets.  Basically this
  forbids an EBCDIC implementation from making `char' a signed 8-bit type.

evil@arcturus.uucp (Wade Guthrie) (07/24/90)

> My recommendation is to always use `char *' for text, and do 
> conversions to `unsigned char' only in the context of <ctype.h> 
> functions.

Unless, of course, you need to do byte-style things rather than
character-style things.  In that case, unsigned char can be real
neat!
 
-- 
Wade Guthrie (evil@arcturus.UUCP)    | "He gasped in terror at what sounded
Rockwell International; Anaheim, CA  | like a man trying to gargle while
My opinions, not my employer's.      | fighting off a pack of wolves"
                                     |                Hitchhiker's Guide