[comp.std.c] integer value of multi-char constants

lai@mips.COM (David Lai) (10/17/89)

Below is an example of a possible bug in various C compilers, it has to
do with converting multi-character constants to integer values:

on a mips, vax, and sun the following is true:

	'\001\377' == '\000\377';

however on the same machines:

	'\001\177' != '\000\177';

The question is: does the above behaviour conform to ANSI C?

In section 3.1.3.4, it states that the value of an integer character constant
containing of more than one char,..., is implementation defined.  So seemingly
it is legal to map all multi-char constants to random values, and the above
cases indicate a conforming behaviour.  But is it intuitive that the above two
expressions can co-exist together?

I beleive the bug (if it is a bug) is inherent in all pcc
implementations.  I would like to know if there are some C implementations
out there in which the above is not true.

If you want to try this on a sun(3), you will have to use:
	'\377\000' and '\377\001'
for unknown reasons.

-- 
        "What is a DJ if he can't scratch?"  - Uncle Jamms Army
     David Lai (lai@mips.com || {ames,prls,pyramid,decwrl}!mips!lai)

chris@mimsy.umd.edu (Chris Torek) (10/17/89)

In article <29588@gumby.mips.COM> lai@mips.COM (David Lai) writes:
>on a mips, vax, and sun the following is true:
>	'\001\377' == '\000\377';
>however on the same machines:
>	'\001\177' != '\000\177';
>The question is: does the above behaviour conform to ANSI C?

Certainly.  The more important question is `why would anyone expect
otherwise?'  (Remember that character constants are constant expressions
with type int---NOT type char!)

The machines listed above all form two-character constants by computing
(more or less) c0*256+c1 (or c1*256+c0), where c0 and c1 are the first
and second characters in the constant.  Hence '\001\177' is 0x17f and
'\000\177' is 0x07f.  It would be very strange for these to be equal.
-- 
`They were supposed to be green.'
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

minow@mountn.dec.com (Martin Minow) (10/17/89)

In article <29588@gumby.mips.COM> lai@mips.COM (David Lai) writes:
>on a mips, vax, and sun the following is true:
>	'\001\377' == '\000\377';
>however on the same machines:
>	'\001\177' != '\000\177';
>The question is: does the above behaviour conform to ANSI C?

Don't know if it conforms, but both are not equal on a Vax/VMS system
running Vax-C.

Martin Minow
minow@thundr.enet.dec.com

cpcahil@virtech.UUCP (Conor P. Cahill) (10/17/89)

In article <20205@mimsy.umd.edu>, chris@mimsy.umd.edu (Chris Torek) writes:
> The machines listed above all form two-character constants by computing
> (more or less) c0*256+c1 (or c1*256+c0), where c0 and c1 are the first
> and second characters in the constant.  Hence '\001\177' is 0x17f and
> '\000\177' is 0x07f.  It would be very strange for these to be equal.

Maybe I am just being real dense, but how does this explain the authors question
that on the same machine

	'\001\377' == '\000\377';

By your explanation this should be 

	0x1ff      ==  0x0ff

Which seems wrong to me.
-- 
+-----------------------------------------------------------------------+
| Conor P. Cahill     uunet!virtech!cpcahil      	703-430-9247	!
| Virtual Technologies Inc.,    P. O. Box 876,   Sterling, VA 22170     |
+-----------------------------------------------------------------------+

chris@mimsy.umd.edu (Chris Torek) (10/17/89)

>In article <29588@gumby.mips.COM> lai@mips.COM (David Lai) writes:
>>on a mips, vax, and sun the following is true:
>>	'\001\377' == '\000\377';
>>however on the same machines:
>>	'\001\177' != '\000\177';
>>The question is: does the above behaviour conform to ANSI C?

In article <20205@mimsy.umd.edu> I wrote:
>Certainly.  The more important question is `why would anyone expect
>otherwise?'

Oops, for whatever reason I read the first line as

	'\000\377' == '\000\377'

However, the results are still easily ( :-) ) explained.  '\377' is
shorthand for -1, and the compiler expands multicharacter constant
values as follows (simplified: \ processing hidden):

	case '\'':
		if ((value = nextc()) == STOP)
			error("no characters in character constant");
		while ((c = nextc()) != STOP)
			value = (value << 8) | nextc();

So '\001\377' computes as

		value = 1;	/* \001 */
		c = -1;		/* \377 */
		value = (1 << 8) | -1;
		c = STOP;	/* ' */
		/* value = -1 */

while '\000\377' computes as

		value = 0;	/* \000 */
		c = -1;		/* \377 */
		value = (0 << 8) | -1;
		c = STOP;	/* ' */
		/* value = -1 */

If the compiler added the values, rather than ORing them, the results
would be different (and very peculiar).

Probably the compiler should not sign extend unless the character constant
contains only a single character.
-- 
`They were supposed to be green.'
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris