[comp.std.c] Multibyte character constants????

arnej@solan15.solan.unit.no (Arne Henrik Juul) (06/29/90)

We have stumbled across the subject of multibyte character
constants.  Is this defined anywhere?  For example, if we
say

main(){printf("%d\n",'AB');}

what should the output be?

We have used different compilers on different machines,
and all but one gave the same answer: 'A'*256+'B'= 16706
The odd one out was 'vcc', an ANSI complient compiler for
ULTRIX. It gave 'B'*256+'A' = 16961.

Just askin' - 

-- arnej@solan.unit.no -- juul@norunit.bitnet -- Arne.H.Juul@sintef.no --  
--                This disclaimer intentionally left blank             --

diamond@tkou02.enet.dec.com (diamond@tkovoa) (06/29/90)

In article <1990Jun28.221927.6823@idt.unit.no> arnej@solan1.solan.unit.no writes:

>We have stumbled across the subject of multibyte character
>constants.  Is this defined anywhere?

Exactly the opposite.  The standard did not even just leave it undefined
by saying nothing.  The standard explicitly says that it's undefined.

>For example, if we say
>main(){printf("%d\n",'AB');}
>what should the output be?

It does not have to compile.  It does not have to execute.  It it does,
it can print anything, or exec rogue.

As a quality-of-implementation issue, a lot of vendors (as extensions)
define a meaning for it, and some of them even tell you what they have
defined.  You found some of them:
>We have used different compilers on different machines,
>and all but one gave the same answer: 'A'*256+'B'= 16706
>The odd one out was 'vcc', an ANSI complient compiler for
>ULTRIX. It gave 'B'*256+'A' = 16961.
But it would not be a good idea to depend on this behavior unless you
find a definition in the vendor's manual.  And you would not use it
at all in a portable program.
-- 
Norman Diamond, Nihon DEC     diamond@tkou02.enet.dec.com
This is me speaking.  If you want to hear the company speak, you need DECtalk.

gwyn@smoke.BRL.MIL (Doug Gwyn) (06/29/90)

In article <1990Jun28.221927.6823@idt.unit.no> arnej@solan1.solan.unit.no writes:
>We have stumbled across the subject of multibyte character
>constants.  Is this defined anywhere?  For example, if we
>say
>main(){printf("%d\n",'AB');}
>what should the output be?

This isn't what the standard refers to as "multibyte characters", but
rather is a very old feature of C, probably dating all the way back to
the first C compiler.  The encoding of such a character constant is
allowed to depend on the specific implementation, precisely to allow
for such natural packing variations as you reported.  Note that 'AB'
has the same size as 'A'; in both cases the type of the constant is int.
The use of such multiple-character constants is nonportable and thus
not recommended for general use.

steve@taumet.com (Stephen Clamage) (06/29/90)

In article <1990Jun28.221927.6823@idt.unit.no> arnej@solan1.solan.unit.no writes:
>
>We have stumbled across the subject of multibyte character
>constants.  Is this defined anywhere?  For example, if we
>say
>main(){printf("%d\n",'AB');}
>what should the output be?

The standard says (section 3.1.3.4):
"The value of an integer character constant containing more than one
character, or containing a character or escape sequence not represented
in the basic execution character set, is implementation-defined."

This means that the implementation can do anything it likes, but it
must tell you what it does.
-- 

Steve Clamage, TauMetric Corp, steve@taumet.com

diamond@tkou02.enet.dec.com (diamond@tkovoa) (06/30/90)

In article <1990Jun28.221927.6823@idt.unit.no> arnej@solan1.solan.unit.no writes:
>>We have stumbled across the subject of multibyte character
>>constants.  Is this defined anywhere?
>>For example, if we say
>>main(){printf("%d\n",'AB');}
>>what should the output be?

In article <1824@tkou02.enet.dec.com> I wrote:
>Exactly the opposite.  The standard did not even just leave it undefined
>by saying nothing.  The standard explicitly says that it's undefined.

Sorry, the standard actually says it's implementation defined.
So your vendor's manual does have to say what the compiler does with them.
(I happened to read something else in TFM that day which was in fact
undefined, and then mixed them up in my memory.  Sorry for misleading
anyone.)

>It does not have to compile.  It does not have to execute.  It it does,
>it can print anything, or exec rogue.

This is still true.  Only, your vendor DOES have to tell you what it will do.
And you still would not use it at all in a portable program.
-- 
Norman Diamond, Nihon DEC     diamond@tkou02.enet.dec.com
This is me speaking.  If you want to hear the company speak, you need DECtalk.

karl@haddock.ima.isc.com (Karl Heuer) (07/01/90)

In article <1990Jun28.221927.6823@idt.unit.no> arnej@solan1.solan.unit.no writes:
>We have stumbled across the subject of multibyte character constants...

As has been noted, X3J11 has already claimed the word `multibyte' for
something else (and also the term `wide character constant', which might have
been my second choice).  I have coined the term `siamese character constant'
to denote the thing we're talking about here.

>[What is the value of a constant 'AB'?]

Nothing is guaranteed about its value except that the implementation must
document it.  It's extremely likely that, if SCCs are supported at all, then
'AB' will be either 'A'<<CHAR_BIT|'B' or 'A'|'B'<<CHAR_BIT, but I wouldn't be
too surprised by an implementation that punted by making them all have value
zero (with a warning).

Trivium: an old version of stdio supported  putchar('AB');  which would output
both characters.  I had to rewrite the algorithm when we upgraded to V7.

Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/01/90)

In article <1824@tkou02.enet.dec.com> diamond@tkou02.enet.dec.com (diamond@tkovoa) writes:
>In article <1990Jun28.221927.6823@idt.unit.no> arnej@solan1.solan.unit.no writes:
>>We have stumbled across the subject of multibyte character
>>constants.  Is this defined anywhere?
>Exactly the opposite.  The standard did not even just leave it undefined
>by saying nothing.  The standard explicitly says that it's undefined.

Please do not provide incorrect information like that.

What the C standard actually says is:
	The value of an integer character constant containing more
	than one character, or containing a character or escape
	sequence not represented in the basic execution character
	set, is implementation-defined.

If you don't know the difference in meaning between "implementation-
defined" and "explictly ... undefined", then you should not be trying
to interpret the standard for others.

>>For example, if we say
>>main(){printf("%d\n",'AB');}
>>what should the output be?
>It does not have to compile.  It does not have to execute.  It it does,
>it can print anything, or exec rogue.

Apart from nonconformance introduced by using printf() without having
#included <stdio.h>, and of failing to return a value for the main()
function, the program would be a correct, conforming program that a
conforming implementation would be obliged to compile and do something
useful for when the program is executed.  The specific output obtained
from executing the program would depend on factors that a conforming
implementation is obliged to document.