[comp.lang.c] Character types in ANSI C

cg@myrias.UUCP (02/14/87)

If this is a duplicate, I apologize, our poster was broken.

What exactly are the compatibility rules for character types in ANSI C?
I.e. which of the following are legal:

    char *p1;
    unsigned char *p2;
    signed char *p3;

    p1 = p2;	    /* case 1 */
    p1 = p3;	    /* case 2 */
    p2 = p3;	    /* case 3 */

I see three choices:

a) they are all legal

b) they are all illegal

c) case 3) and one of cases 1) and 2) are illegal, depending on whether
    the implementation treats 'char' as signed or unsigned.

I've currently implemented choice c), but am concerned that this will
result in programs that compile on some implementations and not others (and
vice versa). Several of us have argued about it here, but we can't find
anything solid in the draft.

Any definitive answers would be appreciated. Thanks in advance.

		Chris Gray {ubc-vision,sask,ihnp4}!alberta!myrias!cg

drw@cullvax.UUCP (02/19/87)

cg@myrias.UUCP (Chris Gray) writes:
> I.e. which of the following are legal:
> 
>     char *p1;
>     unsigned char *p2;
>     signed char *p3;
> 
>     p1 = p2;	    /* case 1 */
>     p1 = p3;	    /* case 2 */
>     p2 = p3;	    /* case 3 */

Well, the char's are all widened into the 'appropriate' int types.
(These are called integral promotions, or some such.)  Then the
appropriate comparisons of int's and/or unsigned int's are performed.

I think that the rule for widening char's to int's is "a character
type is promoted to unsigned int if all possible values of the
character type can be represented by unsigned int, otherwise it is
promoted to int".

Thus, you get:
	p2 = p2		<->	(unsigned int)p2 = (unsigned int)p2
	p3 = p3		<->	(int)p3 = (int)p3
	p2 = p3		<->	(unsigned int)p2 = (unsigned int)(int)p3
(p1 acts like p2 or p3, depending on whether chars are signed)

Dale
-- 
Dale Worley		Cullinet Software
UUCP: ...!seismo!harvard!mit-eddie!cullvax!drw
ARPA: cullvax!drw@eddie.mit.edu

msb@sq.UUCP (02/19/87)

> What exactly are the compatibility rules for character types in ANSI C?
> I.e. which of the following [pointer assignments] are legal:
> 
>     char *p1; unsigned char *p2; signed char *p3;
>     p1 = p2; p1 = p3; p2 = p3;

They are all illegal.  The draft specifies three different "char" types,
though in any particular implementation two of them are treated similarly.

To avoid confusion, let me add that the *character* assignments

	*p1 = *p2; *p1 = *p3; *p2 = *p3;

are all legal, and the *explicit* pointer conversions

	p1 = (char *) p2; p2 = (unsigned char *) p3;

are also legal.

Furthermore, the treatment of "signed" in conjunction with "char" is
different from its treatment in conjunction with "int" or "long".
In the latter cases, "signed" is a noise word.  Thus if "char" in
the original example was changed to "int", then p1=p3; would be legal.

In my formal submission, which was too long to post to this group,
I suggested that most of #3.1.2.5 needed editorial improvements, and
provided the following suggested text, which I believe to convey the
same facts as the existing draft is supposed to, but more understandably.
This is based on a close reading of the draft and mail conversations
with Larry Rosler.  Any errors are mine.

				---

   The following are always *signed integral types*:  "signed char",
   "short int", "int", and "long int".  For the last three types listed,
   the set of values of each type is a superset of the set of values of
   the preceding listed type.
   
   An object declared as "signed char" is large enough to store any
   member of the execution character set, and if any member of the re-
   quired source character set enumerated in #2.2.1 is stored in the ob-
   ject, its value is guaranteed to be positive.  The size of an object
   declared "int" is a natural size suggested by the architecture of the
   execution environment.
   
   Corresponding respectively to the above four types are the *unsigned
   integral types*:  "unsigned char", "unsigned short int", "unsigned
   int", and "unsigned long int".  In each case an object of unsigned in-
   tegral type utilizes the same amount of storage as does an object of
   the corresponding signed integral type, including its sign.  The set
   of nonnegative values of a signed integral type is a subset of that of
   the corresponding unsigned integral type, and the representation of
   the same value in each type is the same.  A computation in an unsigned
   integral type can never overflow, because a result that cannot be
   represented in the type is reduced modulo the largest number that can
   be represented in the type plus one.
   
   The type "char" is either a signed integral type with the same set of
   values as "signed char", or an unsigned integral type with the same
   set of values as "unsigned char"; which of the two applies is
   implementation-dependent.
   
   Even if the implementation defines two or more types of integers to
   have the same set of values, they are nevertheless different types.**

 **Thus even if "char" is a signed integral type, "signed char" is a
   different type.  On the other hand, as explained in #3.5.2, "signed
   int" is merely an alternate way of specifying the type "int".

				---
The reference to #3.5.2 is to the following text, which I would put there:
				---

   The keyword "signed" has no effect when specified in conjunction with
   "int" or in a construction where "int" is implied.**

 **Thus "signed" alone is equivalent to "int" alone.

				---
Mark Brader, utzoo!sq!msb
#define	MSB(type)	(~(((unsigned type)-1)>>1))

drw@cullvax.UUCP (02/20/87)

drw@cullvax.UUCP (Dale Worley) writes:
> Well, the char's are all widened into the 'appropriate' int types.
> (These are called integral promotions, or some such.)  Then the
> appropriate comparisons of int's and/or unsigned int's are performed.
> 
> I think that the rule for widening char's to int's is "a character
> type is promoted to unsigned int if all possible values of the
> character type can be represented by unsigned int, otherwise it is
> promoted to int".

Wrong!  The rule is "char is promoted to int if all possible values of
the char type can be represented by int, otherwise it's unsigned int".
(Personally I can't imagine when a (signed or unsigned) char couldn't
be represented as int.)

> Thus, you get:
> 	p2 = p2		<->	(unsigned int)p2 = (unsigned int)p2
> 	p3 = p3		<->	(int)p3 = (int)p3
> 	p2 = p3		<->	(unsigned int)p2 = (unsigned int)(int)p3
> (p1 acts like p2 or p3, depending on whether chars are signed)

Wrong!  All of these get casted directly to int and compared.

Dale
-- 
Dale Worley		Cullinet Software
UUCP: ...!seismo!harvard!mit-eddie!cullvax!drw
ARPA: cullvax!drw@eddie.mit.edu

john@viper.UUCP (02/21/87)

In article <816@cullvax.UUCP> drw@cullvax.UUCP (Dale Worley) writes:
 >cg@myrias.UUCP (Chris Gray) writes:
 >> I.e. which of the following are legal:
 >> 
 >>     char *p1;
 >>     unsigned char *p2;
 >>     signed char *p3;
 >> 
 >>     p1 = p2;	    /* case 1 */
 >>     p1 = p3;	    /* case 2 */
 >>     p2 = p3;	    /* case 3 */
 >
 >Well, the char's are all widened into the 'appropriate' int types.
 >(These are called integral promotions, or some such.)  Then the
 >appropriate comparisons of int's and/or unsigned int's are performed.
 >

  Wrong...  Not chars Dale... Pointers.

  I suspect you just misread the defines.  All three are assigning pointers
between pointer variables.  One of the cases (case 1 or 2 depending on the
implementation) is legal.  Case 3 is always illegal but will only be flagged
as a warning and an implicit cast-to-the-appropriate-pointer-type will be
done by some compilers...
  Also, there's no "comparisons" being done here at all.  The "=" operation
is an assignment.  The "==" operation is compare for equal...

---
John Stanley (john@viper.UUCP)
Software Consultant - DynaSoft Systems
UUCP: ...{amdahl,ihnp4,rutgers}!{meccts,dayton}!viper!john

meissner@dg_rtp.UUCP (02/27/87)

In article <472@myrias.UUCP> cg@myrias.UUCP (Chris Gray) writes:
> 
> What exactly are the compatibility rules for character types in ANSI C?
> I.e. which of the following are legal:
> 
>     char *p1;
>     unsigned char *p2;
>     signed char *p3;
> 
>     p1 = p2;	    /* case 1 */
>     p1 = p3;	    /* case 2 */
>     p2 = p3;	    /* case 3 */
> 
These are illegal.  Even though the underlying bit representation may be
the same, they are different types.  The same holds for pointers to int
and pointers to long on a machines where sizeof(int) == sizeof(long).
-- 
	Michael Meissner, Data General	Uucp: ...mcnc!rti-sel!dg_rtp!meissner

It is 11pm, do you know what your sendmail and uucico are doing?

meissner@dg_rtp.UUCP (02/28/87)

In article <816@cullvax.UUCP> drw@cullvax.UUCP (Dale Worley) writes:
> cg@myrias.UUCP (Chris Gray) writes:
> > I.e. which of the following are legal:
> > 
> >     char *p1;
> >     unsigned char *p2;
> >     signed char *p3;
> > 
> >     p1 = p2;	    /* case 1 */
> >     p1 = p3;	    /* case 2 */
> >     p2 = p3;	    /* case 3 */
> 
> Well, the char's are all widened into the 'appropriate' int types.
> (These are called integral promotions, or some such.)  Then the
> appropriate comparisons of int's and/or unsigned int's are performed.

Ughh, the above example is assigning pointers, not the items pointed
to.  Assigning pointers to different types (modulo const/volatile) is
illegal in ANSI.  Widening has nothing to do with it.
-- 
	Michael Meissner, Data General	Uucp: ...mcnc!rti-sel!dg_rtp!meissner

It is 11pm, do you know what your sendmail and uucico are doing?