mikeb@inset.UUCP (Mike Banahan) (10/03/85)
Subject: netnews comments The ANSI draft on C is clear on the subject of what characters are allowed in strings. It draws a distinction between the source character set (the one you compile in) and the destination character set (the one you execute in). It says explicitly that every character in the source character set except newline may be present in a string literal ( though some need escapes, eg \" ). Interestingly, it doesn't explain what happens to them. For example, if you use ASCII as your source character set, but execute in EBCDIC, and use a non-EBCDIC character in a literal: is that an error? It means that, if it is an error, then the ``legality'' of a program depends on the target environment. I am not sure that the committee has given a lot of thought to this problem. I remember some conversations on the character set subject and drew the conclusion that, although some committee members thought they understood it properly, there were a lot who didn't. I found the whole thing pretty confusing at the time - but have since had time to think hard about it. It does seem clear that the compiler is EXPECTED to transform the source character set into the target character set, not just pass byte streams through. The problems with the BSD compiler are presumably that someone thought they were entitled to use the top bit for some internal purpose. They made the assumption that the source character set was strictly seven bit. If you think the C compiler has problems here, there are a hell of a lot of other things that are worse! Of course, they weren't looking at the X3J11 proposals when they wrote it. It's really an educational issue; too many people think that ASCII was written on the back of the ten commandments and that its word is law. I found it took a conscious effort to realise that the repertoire and the encoding are unrelated; but then I'm probably just dim. Mike Banahan. -- Mike Banahan, Technical Director, The Instruction Set Ltd. mcvax!ukc!inset!mikeb