[comp.lang.c] unsigned & char's

jcvw@cs.vu.nl (Winkel van Jan Christiaan) (01/07/88)

I have the following question about typecasts and auto character conversion:

Suppose I want to convert a character using a table, for example from
ebcdic to ascii. I could do this in a way like this:

extern char ebc2asc[];       /* this contains the ascii values for ebcdic
				characters. example: ebc2asc[0xf1]='1' */
	    
char buf[100];  /* contains the characters to convert */
char *p /* a pointer */

p=buf;

for (all characters in the buffer) {
  *p = ebc2asc[*p];
  p++;
}

The problem is that characters like 0xf1 (ebcdic for '1') are negative 
when using signed char's (the default in most c's). The compiler will then
generate code to sign extend my char to an int, causing a negative index 
into my array (i.e. 0xfff1 or 0xfffffff1)..
Therefor I c use a typecast to an unsigned. The compiler will however still
generate the same code. The code does not change until I define p to be an
unsigned char *. I thought that the typecast made my wish clear to the
compiler that I do not want s sign extension, but just an unsigned extension
(with zeroes filled on the left of the old char) so that 0xf1 becomes 0x00f1 
in stead of 0xfff1. It seems to be that there is an extra intermediate type:
int. the path from char to unsigned is therefor not char -->  unsigned but
char --> int --> unsigned.
I have tried to see what K&R say about it, but I could not really find it. 
Is there anybody out ther  who knows how typecasts are defined in this context?
Many thanks.
Jan Christiaan van Winkel
jcvw@cs.vu.nl
typecast to a unsigned

chris@mimsy.UUCP (Chris Torek) (01/09/88)

In article <1167@ark.cs.vu.nl> jcvw@cs.vu.nl (Winkel van Jan Christiaan) writes:
>... characters like 0xf1 (ebcdic for '1') are negative when using
>signed char's (the default in most c's). ... Therefore I use a
>typecast to an unsigned. The compiler will however still generate
>the same code. The code does not change until I define p to be an
>unsigned char *. I thought that the typecast made my wish clear to
>the compiler that I do not want sign extension, but just an
>unsigned extension ...

A type cast is semantically equivalent to an assignment to an unnamed
variable with the type of the cast.  Hence

	char *p; ...
	*p = ebc2asc[*p];

and

	*p = ebc2asc[(unsigned)*p];

both ask to evaluate *p, then convert to type `int' (`char' extends
to `int').  The second asks futher that this value be converted to
type `unsigned int'.  The problem is getting *p to extend without
sign extension.  There are alternatives:

	*p & 0xff;		/* extend to int, then throw out the sign */
	(unsigned char) *p;	/* extend to int, truncate to u_char,
				   extend without sign extension to u_int */
	*(unsigned char *)p;	/* extend without sign extension to u_int */
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/09/88)

In article <1167@ark.cs.vu.nl> jcvw@cs.vu.nl (Winkel van Jan Christiaan) writes:
>... Therefor I c use a typecast to an unsigned. ...
>... I thought that the typecast made my wish clear to the compiler ..

Typecasts are not wish indicators but rather actual type conversion
operators.  In an expression, frequently there is some "normalization"
of operand types before they are operated on, and in your case this
involves applying the "default integer promotions", which among
other things widen the char to an int before the cast is applied.

You can use an (unsigned char) cast; or better, simply use unsigned
char data in the first place.

lied@ihuxy.ATT.COM (Bob Lied) (01/11/88)

In article <1167@ark.cs.vu.nl>, jcvw@cs.vu.nl (Winkel van Jan Christiaan) writes:
> I thought that the typecast made my wish clear to the
> compiler that I do not want sign extension, but just an unsigned extension
> ... It seems to be that there is an extra intermediate type:
> int. the path from char to unsigned is therefor not char -->  unsigned but
> char --> int --> unsigned.

Think of casting as an operator applied to the result of an
expression, not an instruction to the compiler. In this case,
the expression starts as type char, which by the "usual conversion"
rules, is converted to an int -- a signed int with extension
on your machine.  After the equivalent int is computed,
the cast operator is applied, but it's too late for you --
the high order bits have already changed to ones.
By declaring the type as unsigned char, a different conversion
rule applies, and you get an unsigned int even without a cast.

The conversion rules are described in K&R appendix A,
section 6.6, but it doesn't talk about unsigned *characters*
explicitly.  There is also a discussion of character conversions in
section 5.1.3 and chapter 6 of Harbison & Steele[1].  The
"usual conversions" are described (pretty clearly, I think)
in section 6.3 of H&S.

	Bob Lied	ihnp4!ihuxy!lied

---------
[1] Harbison, S.P. and Steele, G.L. jr.,
"C:  A Reference Manual," Prentice-Hall, Inc,
1987, ISBN 0-13-109810-1.