davidm@uunet.UU.NET (David S. Masterson) (11/27/90)
>>>>> On 23 Nov 90 21:17:27 GMT, henry@zoo.toronto.edu (Henry Spencer) said:
Henry> The right answer to national character sets is ISO Latin 1 or
Henry> equivalent, not ridiculous contortions in language syntax that *every*
Henry> compiler *everywhere* then has to be able to parse.  Trigraphs were a
Henry> mistake.
Simple question (I think):
Does ISO Latin 1 address oriental languages like Kanji?  (Come to think of it,
does trigraphs?)
--
====================================================================
David Masterson					Consilium, Inc.
(415) 691-6311					640 Clyde Ct.
uunet!cimshop!davidm				Mtn. View, CA  94043
====================================================================
"If someone thinks they know what I said, then I didn't say it!"henry@zoo.toronto.edu (Henry Spencer) (11/29/90)
In article <CIMSHOP!DAVIDM.90Nov26181052@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: >Henry> The right answer to national character sets is ISO Latin 1 or >Henry> equivalent... > >Does ISO Latin 1 address oriental languages like Kanji? (Come to think of it, >does trigraphs?) Neither addresses the issue at all. ISO Latin 1 and its friends deal fairly well with the small-alphabet languages -- ISO Latin 1 in particular gets very nearly all the Roman-alphabet languages, although it has to punt to its siblings for Greek or Cyrillic alphabets -- but they're 8-bit character sets that haven't a prayer of coping with the large-alphabet languages. Trigraphs are just a minimal way of writing C entirely in the ISO 7-bit character set (of which ASCII and most other 7-bit codes are supersets). The syntactic perversions that the Danes are pushing don't deal with the matter either, by the way. All they do is make trigraphs a bit less ugly. X3J11 didn't think this was worth the trouble, and neither do I. -- "The average pointer, statistically, |Henry Spencer at U of Toronto Zoology points somewhere in X." -Hugh Redelmeier| henry@zoo.toronto.edu utzoo!henry
birger@eik.ii.uib.no (Birger A. Wathne) (11/29/90)
Perhaps one should support ISO 2022 for switching between the 8859 (ISO latin) character sets as well?
-- 
     ____       ____    ____    ____   ____   birger@ii.uib.no
    /   /   /  /   /   /   /   /      /   /   
   /   /   /  /   /   /   /   /      /   /    Birger A. Wathne
  /---    /  /---    /  __   /---   /---      Blekenberg 14
 /    /  /  /    /  /    /  /      /    /     N-5037 SOLHEIMSVIK
/____/  /  /    /  /____/  /____  /    /      tlf: +47-5-20 00 62doconnor@titania.srg.UUCP (Dennis O'Connor x4982 room 6-230N) (12/06/90)
With respect to Japanese character sets : There are four character sets used in Japan : Kanji : 2000+ common characters, plus more uncommon ones. Most adults can read and write Kanji. Every "character" stands for a complete word or concept. Hiragana (sp?) : about 150 characters, I think. Used to phonetically spell out words that are native to the Japanese language. Katakana : again, about 150 characters. Used to phonetically spell out words that have been imported into Japanese from foriegn languages. Most young children know the two kana character sets, as do most adults of course. Once you've learned to write in katakana and hiragana you can write like a third grader. Few adult Japanese would wish to. Kanji symbols are derived the Chinese alphabet. Katakana and Hiragana are simplifications of Kanji symbols. "Romanji" : the Roman alphabet. I'm not sure when, who, et cetera.
keld@login.dkuug.dk (Keld J|rn Simonsen) (12/06/90)
doconnor@titania.srg.UUCP (Dennis O'Connor x4982 room 6-230N) writes: >With respect to Japanese character sets : >There are four character sets used in Japan : > Kanji : 2000+ common characters, plus more uncommon ones. Most adults > can read and write Kanji. Every "character" stands for a complete > word or concept. > Hiragana (sp?) : about 150 characters, I think. Used to phonetically > spell out words that are native to the Japanese language. > Katakana : again, about 150 characters. Used to phonetically spell > out words that have been imported into Japanese from foriegn languages. The character sets I have seen has 86 katakana and 83 hiragana characters. But then I am talking about "encoded character sets" like 8859-1 etc, not the "character set repertoire" which is an abstract kind of guy. There are to my knowledge several encoded character sets in usage in Japan today, including: X0201: an 8 bit character set including almost all of ASCII (except backslash which is a Yen sign) and the katakana characters. X0208: a 16 (14) bit character set consisting of matematical characters, latin, hiragana, katakana, cyrillic and greek and some box drawing characters, then the large section of kanji characters which are ordered in two parts, the more common ones in pronounciation order (Japanese order!) and then the less frequent ones in radical/stroke order. Some 6500 kanji characters are included. X0212: A new 16 (14) bit character set with a lot of the latin, cyrillic, greek and kanji characters not in X0208. X0201, X0208 and X0212 are published by JIS - the Japanese standards institute, but it seems like the 16 bit sets are not widely implemented, maybe because that demands a quite big character set. Shift-JIS: 8/16 bits character set which is used on PCs. Normal ASCII characters are just in 8 bits, and some katakana characters are also in just 8 bits. Then some of the characters are used as escape sequences to provide about 3000 kanji characters. EUC: Enhanced UNIX Code: This is the above character sets and others encoded with ISO 2022 techniques. So it is possible to shift in and out between several character sets. Keld Simonsen
frose@synoptics.COM (Flavio Rose) (12/06/90)
Supplementing quickly a previous posting: There are 51 katakana and 51 hiragana, but there are also a few wrinkles to counting them: You can add accent-like objects called "nigori" to about half of them. Two of the hiragana and three of the katakana are no longer really used (since the 1948 spelling reform). Three of the hiragana and at least three of the katakana come in both normal and small versions. There are said to be only 2,000 "commonly used" kanji. The 1948 spelling reform singled out a specific set of about 2,000 as being "preferred" in some sense. However, the computer character set standard for Japanese (JISX0208-1983) gives numbers to 2965 "first level" kanji and 3388 "second level" kanji. That standard also includes the two kanas, the Latin, Cyrillic and Greek alphabets, and a large assortment of special symbols. Yours truly, Flavio Rose SynOptics Communications, Inc.
frose@synoptics.COM (Flavio Rose) (12/06/90)
Oops, there are at least four hiragana (not three) used in small versions (ya, yo, yu and tsu). Sorry for this forgetfulness.
frose@synoptics.COM (Flavio Rose) (12/06/90)
Aaaargh. The number 51 in my previous posting (base number of katakana and hiragana before you count variants like small and with nigori) should be 49. I'm very sorry about this confusion. Yours truly, Flavio