arnej@solan15.solan.unit.no (Arne Henrik Juul) (06/29/90)
We have stumbled across the subject of multibyte character constants. Is this defined anywhere? For example, if we say main(){printf("%d\n",'AB');} what should the output be? We have used different compilers on different machines, and all but one gave the same answer: 'A'*256+'B'= 16706 The odd one out was 'vcc', an ANSI complient compiler for ULTRIX. It gave 'B'*256+'A' = 16961. Just askin' - -- arnej@solan.unit.no -- juul@norunit.bitnet -- Arne.H.Juul@sintef.no -- -- This disclaimer intentionally left blank --
diamond@tkou02.enet.dec.com (diamond@tkovoa) (06/29/90)
In article <1990Jun28.221927.6823@idt.unit.no> arnej@solan1.solan.unit.no writes: >We have stumbled across the subject of multibyte character >constants. Is this defined anywhere? Exactly the opposite. The standard did not even just leave it undefined by saying nothing. The standard explicitly says that it's undefined. >For example, if we say >main(){printf("%d\n",'AB');} >what should the output be? It does not have to compile. It does not have to execute. It it does, it can print anything, or exec rogue. As a quality-of-implementation issue, a lot of vendors (as extensions) define a meaning for it, and some of them even tell you what they have defined. You found some of them: >We have used different compilers on different machines, >and all but one gave the same answer: 'A'*256+'B'= 16706 >The odd one out was 'vcc', an ANSI complient compiler for >ULTRIX. It gave 'B'*256+'A' = 16961. But it would not be a good idea to depend on this behavior unless you find a definition in the vendor's manual. And you would not use it at all in a portable program. -- Norman Diamond, Nihon DEC diamond@tkou02.enet.dec.com This is me speaking. If you want to hear the company speak, you need DECtalk.
gwyn@smoke.BRL.MIL (Doug Gwyn) (06/29/90)
In article <1990Jun28.221927.6823@idt.unit.no> arnej@solan1.solan.unit.no writes: >We have stumbled across the subject of multibyte character >constants. Is this defined anywhere? For example, if we >say >main(){printf("%d\n",'AB');} >what should the output be? This isn't what the standard refers to as "multibyte characters", but rather is a very old feature of C, probably dating all the way back to the first C compiler. The encoding of such a character constant is allowed to depend on the specific implementation, precisely to allow for such natural packing variations as you reported. Note that 'AB' has the same size as 'A'; in both cases the type of the constant is int. The use of such multiple-character constants is nonportable and thus not recommended for general use.
steve@taumet.com (Stephen Clamage) (06/29/90)
In article <1990Jun28.221927.6823@idt.unit.no> arnej@solan1.solan.unit.no writes: > >We have stumbled across the subject of multibyte character >constants. Is this defined anywhere? For example, if we >say >main(){printf("%d\n",'AB');} >what should the output be? The standard says (section 3.1.3.4): "The value of an integer character constant containing more than one character, or containing a character or escape sequence not represented in the basic execution character set, is implementation-defined." This means that the implementation can do anything it likes, but it must tell you what it does. -- Steve Clamage, TauMetric Corp, steve@taumet.com
diamond@tkou02.enet.dec.com (diamond@tkovoa) (06/30/90)
In article <1990Jun28.221927.6823@idt.unit.no> arnej@solan1.solan.unit.no writes: >>We have stumbled across the subject of multibyte character >>constants. Is this defined anywhere? >>For example, if we say >>main(){printf("%d\n",'AB');} >>what should the output be? In article <1824@tkou02.enet.dec.com> I wrote: >Exactly the opposite. The standard did not even just leave it undefined >by saying nothing. The standard explicitly says that it's undefined. Sorry, the standard actually says it's implementation defined. So your vendor's manual does have to say what the compiler does with them. (I happened to read something else in TFM that day which was in fact undefined, and then mixed them up in my memory. Sorry for misleading anyone.) >It does not have to compile. It does not have to execute. It it does, >it can print anything, or exec rogue. This is still true. Only, your vendor DOES have to tell you what it will do. And you still would not use it at all in a portable program. -- Norman Diamond, Nihon DEC diamond@tkou02.enet.dec.com This is me speaking. If you want to hear the company speak, you need DECtalk.
karl@haddock.ima.isc.com (Karl Heuer) (07/01/90)
In article <1990Jun28.221927.6823@idt.unit.no> arnej@solan1.solan.unit.no writes: >We have stumbled across the subject of multibyte character constants... As has been noted, X3J11 has already claimed the word `multibyte' for something else (and also the term `wide character constant', which might have been my second choice). I have coined the term `siamese character constant' to denote the thing we're talking about here. >[What is the value of a constant 'AB'?] Nothing is guaranteed about its value except that the implementation must document it. It's extremely likely that, if SCCs are supported at all, then 'AB' will be either 'A'<<CHAR_BIT|'B' or 'A'|'B'<<CHAR_BIT, but I wouldn't be too surprised by an implementation that punted by making them all have value zero (with a warning). Trivium: an old version of stdio supported putchar('AB'); which would output both characters. I had to rewrite the algorithm when we upgraded to V7. Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint
gwyn@smoke.BRL.MIL (Doug Gwyn) (07/01/90)
In article <1824@tkou02.enet.dec.com> diamond@tkou02.enet.dec.com (diamond@tkovoa) writes: >In article <1990Jun28.221927.6823@idt.unit.no> arnej@solan1.solan.unit.no writes: >>We have stumbled across the subject of multibyte character >>constants. Is this defined anywhere? >Exactly the opposite. The standard did not even just leave it undefined >by saying nothing. The standard explicitly says that it's undefined. Please do not provide incorrect information like that. What the C standard actually says is: The value of an integer character constant containing more than one character, or containing a character or escape sequence not represented in the basic execution character set, is implementation-defined. If you don't know the difference in meaning between "implementation- defined" and "explictly ... undefined", then you should not be trying to interpret the standard for others. >>For example, if we say >>main(){printf("%d\n",'AB');} >>what should the output be? >It does not have to compile. It does not have to execute. It it does, >it can print anything, or exec rogue. Apart from nonconformance introduced by using printf() without having #included <stdio.h>, and of failing to return a value for the main() function, the program would be a correct, conforming program that a conforming implementation would be obliged to compile and do something useful for when the program is executed. The specific output obtained from executing the program would depend on factors that a conforming implementation is obliged to document.