[comp.std.c++] ISO Latin 1?

davidm@uunet.UU.NET (David S. Masterson) (11/27/90)

>>>>> On 23 Nov 90 21:17:27 GMT, henry@zoo.toronto.edu (Henry Spencer) said:

Henry> The right answer to national character sets is ISO Latin 1 or
Henry> equivalent, not ridiculous contortions in language syntax that *every*
Henry> compiler *everywhere* then has to be able to parse.  Trigraphs were a
Henry> mistake.

Simple question (I think):

Does ISO Latin 1 address oriental languages like Kanji?  (Come to think of it,
does trigraphs?)
--
====================================================================
David Masterson					Consilium, Inc.
(415) 691-6311					640 Clyde Ct.
uunet!cimshop!davidm				Mtn. View, CA  94043
====================================================================
"If someone thinks they know what I said, then I didn't say it!"

henry@zoo.toronto.edu (Henry Spencer) (11/29/90)

In article <CIMSHOP!DAVIDM.90Nov26181052@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes:
>Henry> The right answer to national character sets is ISO Latin 1 or
>Henry> equivalent...
>
>Does ISO Latin 1 address oriental languages like Kanji?  (Come to think of it,
>does trigraphs?)

Neither addresses the issue at all.  ISO Latin 1 and its friends deal fairly
well with the small-alphabet languages -- ISO Latin 1 in particular gets very
nearly all the Roman-alphabet languages, although it has to punt to its
siblings for Greek or Cyrillic alphabets -- but they're 8-bit character sets
that haven't a prayer of coping with the large-alphabet languages.  Trigraphs
are just a minimal way of writing C entirely in the ISO 7-bit character set
(of which ASCII and most other 7-bit codes are supersets).

The syntactic perversions that the Danes are pushing don't deal with the
matter either, by the way.  All they do is make trigraphs a bit less ugly.
X3J11 didn't think this was worth the trouble, and neither do I.
-- 
"The average pointer, statistically,    |Henry Spencer at U of Toronto Zoology
points somewhere in X." -Hugh Redelmeier| henry@zoo.toronto.edu   utzoo!henry

birger@eik.ii.uib.no (Birger A. Wathne) (11/29/90)

Perhaps one should support ISO 2022 for switching between the 8859 (ISO latin) character sets as well?

-- 
     ____       ____    ____    ____   ____   birger@ii.uib.no
    /   /   /  /   /   /   /   /      /   /   
   /   /   /  /   /   /   /   /      /   /    Birger A. Wathne
  /---    /  /---    /  __   /---   /---      Blekenberg 14
 /    /  /  /    /  /    /  /      /    /     N-5037 SOLHEIMSVIK
/____/  /  /    /  /____/  /____  /    /      tlf: +47-5-20 00 62

doconnor@titania.srg.UUCP (Dennis O'Connor x4982 room 6-230N) (12/06/90)

With respect to Japanese character sets :

There are four character sets used in Japan :

	Kanji : 2000+ common characters, plus more uncommon ones. Most adults
	can read and write Kanji. Every "character" stands for a complete
	word or concept.

	Hiragana (sp?) : about 150 characters, I think. Used to phonetically
	spell out words that are native to the Japanese language.

	Katakana : again, about 150 characters. Used to phonetically spell
	out words that have been imported into Japanese from foriegn languages.

	Most young children know the two kana character sets, as do most
	adults of course. Once you've learned to write in katakana and hiragana
	you can write like a third grader. Few adult Japanese would wish to.

	Kanji symbols are derived the Chinese alphabet. Katakana and Hiragana
	are simplifications of Kanji symbols.

	"Romanji" : the Roman alphabet. I'm not sure when, who, et cetera.

keld@login.dkuug.dk (Keld J|rn Simonsen) (12/06/90)

doconnor@titania.srg.UUCP (Dennis O'Connor x4982 room 6-230N) writes:


>With respect to Japanese character sets :

>There are four character sets used in Japan :

>	Kanji : 2000+ common characters, plus more uncommon ones. Most adults
>	can read and write Kanji. Every "character" stands for a complete
>	word or concept.

>	Hiragana (sp?) : about 150 characters, I think. Used to phonetically
>	spell out words that are native to the Japanese language.

>	Katakana : again, about 150 characters. Used to phonetically spell
>	out words that have been imported into Japanese from foriegn languages.

The character sets I have seen has 86 katakana and 83 hiragana characters.
But then I am talking about "encoded character sets" like 8859-1 etc,
not the "character set repertoire" which is an abstract kind of guy.

There are to my knowledge several encoded character sets in usage in Japan today, including:

    X0201: an 8 bit character set including almost all of ASCII
           (except backslash which is a Yen sign) and the katakana
           characters.
    X0208: a 16 (14) bit character set consisting of matematical
           characters, latin, hiragana, katakana, cyrillic and greek
           and some box drawing characters, then the large section
           of kanji characters which are ordered in two parts,
           the more common ones in pronounciation order (Japanese 
           order!) and then the less frequent ones in radical/stroke
           order. Some 6500 kanji characters are included.

    X0212: A new 16 (14) bit character set with a lot of the latin, 
           cyrillic, greek and kanji characters not in X0208.

    X0201, X0208 and X0212 are published by JIS - the Japanese
    standards institute, but it seems like the 16 bit sets are
    not widely implemented, maybe because that demands a quite big 
    character set.

    Shift-JIS: 8/16 bits character set which is used on PCs.
           Normal ASCII characters are just in 8 bits, and some 
           katakana characters are also in just 8 bits. Then some
           of the characters are used as escape sequences to provide
           about 3000 kanji characters.

    EUC:   Enhanced UNIX Code: This is the above character sets
           and others encoded with ISO 2022 techniques.
           So it is possible to shift in and out between several
           character sets.

Keld Simonsen

frose@synoptics.COM (Flavio Rose) (12/06/90)

Supplementing quickly a previous posting:

There are 51 katakana and 51 hiragana, but there are also a
few wrinkles to counting them: You can add accent-like
objects called "nigori" to about half of them. Two of the
hiragana and three of the katakana are no longer really
used (since the 1948 spelling reform). Three of the
hiragana and at least three of the katakana come in both
normal and small versions.

There are said to be only 2,000 "commonly used" kanji. The
1948 spelling reform singled out a specific set of about
2,000 as being "preferred" in some sense. However, the
computer character set standard for Japanese
(JISX0208-1983) gives numbers to 2965 "first level" kanji
and 3388 "second level" kanji. That standard also includes
the two kanas, the Latin, Cyrillic and Greek alphabets, and
a large assortment of special symbols.

Yours truly,
Flavio Rose
SynOptics Communications, Inc.

frose@synoptics.COM (Flavio Rose) (12/06/90)

Oops, there are at least four hiragana (not three) used in
small versions (ya, yo, yu and tsu). Sorry for this
forgetfulness.

frose@synoptics.COM (Flavio Rose) (12/06/90)

Aaaargh. The number 51 in my previous posting (base number
of katakana and hiragana before you count variants like
small and with nigori) should be 49. I'm very sorry about
this confusion.

Yours truly,
Flavio