[net.bugs.4bsd] Problem with the C pre-processor

mikeb@inset.UUCP (Mike Banahan) (10/03/85)
Subject: netnews comments

The ANSI draft on C is clear on the subject of what characters
are allowed in strings. It draws a distinction between the source
character set (the one you compile in) and the destination character
set (the one you execute in).

It says explicitly that every character in the source character set
except newline may be present in a string literal ( though some need
escapes, eg \"  ).
Interestingly, it doesn't explain what happens to them.
For example, if you use ASCII as your source character set, but execute
in EBCDIC, and use a non-EBCDIC character in a literal: is that an error?
It means that, if it is an error, then the ``legality'' of a program
depends on the target environment. I am not sure that the committee
has given a lot of thought to this problem. I remember some conversations
on the character set subject and drew the conclusion that, although
some committee members thought they understood it properly,
there were a lot who didn't. I found the whole thing pretty confusing
at the time - but have since had time to think hard about it.

It does seem clear that the compiler is EXPECTED to transform the
source character set into the target character set, not just
pass byte streams through.

The problems with the BSD compiler are presumably that someone
thought they were entitled to use the top bit for some internal purpose.
They made the assumption that the source character set was strictly seven bit.
If you think the C compiler has problems here, there are a hell of a lot
of other things that are worse! Of course, they weren't looking at the
X3J11 proposals when they wrote it.

It's really an educational issue; too many people think that ASCII
was written on the back of the ten commandments and that its word is law.
I found it took a conscious effort to realise that the repertoire
and the encoding are unrelated; but then I'm probably just dim.

Mike Banahan.
-- 
Mike Banahan, Technical Director, The Instruction Set Ltd.
mcvax!ukc!inset!mikeb