Beebe@SCIENCE (Nelson H.F. Beebe) (03/31/88)
This message is going out to comp.text, TeXhax, and TeXmag, all of which have carried discussions recently about typesetting support for the Japanese language. Recent work in the January 1988 draft ANSI C standard, and at AT&T in Unix System V development, has been directed at enhancing support of large character sets in programming languages, tools, and operating systemsy, and programmers may therefore find it useful to enhance their background knowledge in this area. I've just finished reading a facinating book: J. Marshall Unger, ``The Fifth Generation Fallacy--Why Japan is Betting Its Future on Artificial Intelligence'', Oxford University Press (1987). ISBN 0-19-504939-X, ca. $US25. Its author is Professor of East Asian Languages and Literatures at the University of Hawaii. Although the main thesis of the book is a well-grounded defense of the author's view of the possibilities for success of the Japanese Fifth Generation Project, and the stumbling block that kanji input provides, the book is a goldmine of information, with up-to-date references and solid technical discussions of the problem of computer support of input, and to a lesser extent, output, of the Japanese language. There are passing remarks made as well about computer handling of Chinese and Korean. For those readers unfamiliar with it, it should be noted that Japanese has no significant linguistic relation to Chinese, yet it is conventionally written in a combination of letters from one of two phonetic alphabets (katakana and hiragana) with Chinese characters (kanji) sprinkled in. According to a report by A.V. Hershey, about 5500 kanji are in common use, but there are only 326 different pronunciations, resulting in an average of about 16 characters/pronunciation. This high homonym number poses a severe problem on input, and poses a nightmare for indexing and alphabetizing of kanji documents. -------