[comp.std.c++] National character representation of C++

dag@control.lth.se (Dag Bruck) (11/27/90)

In article <1990Nov25.161506.9659@tsa.co.uk> domo@tsa.co.uk (Dominic Dunlop) writes:
>In article <1990Nov23.211727.2802@zoo.toronto.edu> henry@zoo.toronto.edu
>(Henry Spencer) writes:
>> I would have hoped that X3J16 would not be re-hashing all the dumb ideas...
>> The right answer to national character sets is ISO Latin 1 or equivalent,
>
>equipment which talks using an 8-bit character set such as ISO Latin 1
>is an obvious (minimum) requirement for program development.

Do you have an ISO Latin 1 keyboard?  Do you suggest I could use one?

ISO Latin 1 solves some of the output problems (ever considered why
it's called ISO Latin *1*?), but it does not solve the input problem.

I believe the current proposal by Bjarne Stroustrup has important
merits by combining readability and writability, compared to
trigraphs. I also know several people that prefer kerywords like
'or' instead of '|' even though they have a US keyboard.

Dag Michael Br\"uck (who has his own problems, as you can see)
--
Department of Automatic Control		E-mail: dag@control.lth.se
Lund Institute of Technology
P. O. Box 118				Phone:	+46 46-108779
S-221 00 Lund, SWEDEN			Fax:    +46 46-138118

henry@zoo.toronto.edu (Henry Spencer) (11/30/90)

In article <1990Nov27.143307.8086@lth.se> dag@control.lth.se (Dag Bruck) writes:
>Do you have an ISO Latin 1 keyboard?  Do you suggest I could use one?
>ISO Latin 1 solves some of the output problems (ever considered why
>it's called ISO Latin *1*?), but it does not solve the input problem.

Of course not.  Neither does ASCII or any other character set.  In all
cases, there has to be some sort of mapping between the 60-odd keys of
your keyboard and the rather larger internal character set.  Furthermore,
this mapping has to be locale-specific to some degree, since the set of
most-commonly-needed characters will vary.

I don't understand why you want your programming language to try to
solve the problem of keyboard mappings.

>I believe the current proposal by Bjarne Stroustrup has important
>merits by combining readability and writability, compared to
>trigraphs...

Nobody advocates using trigraphs as a way of writing programs; they are
strictly a data-interchange format.  Please find a real argument, not
this strawman.

> I also know several people that prefer kerywords like
>'or' instead of '|' even though they have a US keyboard.

sed 's/ or /|/g'

does it without any changes to the language, if you really care so little
about the readability of the text to your successors (who will be expecting
programs written in C).
-- 
"The average pointer, statistically,    |Henry Spencer at U of Toronto Zoology
points somewhere in X." -Hugh Redelmeier| henry@zoo.toronto.edu   utzoo!henry

frose@synoptics.COM (Flavio Rose) (11/30/90)

This is in response to the questions "why is there a 1 in ISO
Latin 1" and "does ISO Latin 1 address oriental languages like
Kanji". 

There's a 1 in ISO Latin 1 because there's also ISO Latin 2
through 4. The reason there were four is that when they
looked at all the letter-accent combinations used by
European languages in the Latin alphabet, there were too
many to fit in the 190 or so positions of an eight-bit
character set, so they had to define four.

ISO Latin 1 is not a solution e.g. for Turkey, or for
Hungary.  People in those countries need a different ISO
Latin n to mix text in their own language with material
that uses full ASCII.

Re "oriental languages like Kanji" -- for starters I don't
quite like that phrase.  I still think of kanji as denoting
certain characters while the name of the language is
Japanese. Calling kanji a language is like calling Cyrillic
a language. Anyway...

In essence, Japanese programmers don't suffer from the
problem that Europeans who use ISO 646 have; they are able
to type ASCII and see it on their screens the way
English-speakers do, while simultaneously typing their own
language. Since the Japanese don't have the problem, they
don't need ISO Latin 1 as a solution.

There is a slightly different issue: While Japanese C
programmers can usually use Japanese in comments and
character strings, compilers often won't let them have
identifier names come out in normal Japanese writing (they
can still use Japanese in identifiers by spelling it out
phonetically in Latin letters, e.g. Kyoto Common Lisp's
sys:nani, but obviously that's not so nice).  I would
suspect Europeans might have similar problems with many
compilers if they tried to put ISO Latin 1-encoded accented
letters into identifiers. But this is a different problem
from the one that trigraphs try to address.

Yours truly,
Flavio Rose
SynOptics Communications, Inc.
Newsgroups: comp.std.c++
Subject: Re: ISO Latin 1? (was Re: design by committee)
Summary: 
Expires: 
References: <1016@zinn.MV.COM> <1990Nov23.211727.2802@zoo.toronto.edu> <CIMSHOP!DAVIDM.90Nov26181052@uunet.UU.NET>
Sender: 
Followup-To: 
Distribution: comp
Organization: SynOptics Communications Inc. Mountain View, Ca.
Keywords: 

This is in response to the questions "why is there a 1 in ISO
Latin 1" and "does ISO Latin 1 address oriental languages like
Kanji". 

There's a 1 in ISO Latin 1 because there's also ISO Latin 2
through 4. The reason there were four is that when they
looked at all the letter-accent combinations used by
European languages in the Latin alphabet, there were too
many to fit in the 190 or so positions of an eight-bit
character set, so they had to define four.

ISO Latin 1 is not a solution e.g. for Turkey, or for
Hungary.  People in those countries need a different ISO
Latin n to mix text in their own language with material
that uses full ASCII.

Re "oriental languages like Kanji" -- for starters I don't
quite like that phrase.  I still think of kanji as denoting
certain characters while the name of the language is
Japanese. Calling kanji a language is like calling Cyrillic
a language. Anyway...

In essence, Japanese programmers don't suffer from the
problem that Europeans who use ISO 646 have; they are able
to type ASCII and see it on their screens the way
English-speakers do, while simultaneously typing their own
language. Since the Japanese don't have the problem, they
don't need ISO Latin 1 as a solution.

There is a slightly different issue: While Japanese C
programmers can usually use Japanese in comments and
character strings, compilers often won't let them have
identifier names come out in normal Japanese writing (they
can still use Japanese in identifiers by spelling it out
phonetically in Latin letters, e.g. Kyoto Common Lisp's
sys:nani, but obviously that's not so nice).  I would
suspect Europeans might have similar problems with many
compilers if they tried to put ISO Latin 1-encoded accented
letters into identifiers. But this is a different problem
from the one that trigraphs try to address.

Yours truly,
Flavio Rose
SynOptics Communications, Inc.