[comp.text] Japanese Kanji input problem--a review

Beebe@SCIENCE (Nelson H.F. Beebe) (03/31/88)

This message is going out to comp.text, TeXhax, and TeXmag,
all of which have carried discussions recently about
typesetting support for the Japanese language.

Recent work in the January 1988 draft ANSI C standard, and
at AT&T in Unix System V development, has been directed at
enhancing support of large character sets in programming
languages, tools, and operating systemsy, and programmers
may therefore find it useful to enhance their background
knowledge in this area.

I've just finished reading a facinating book:

	J. Marshall Unger, ``The Fifth Generation
	Fallacy--Why Japan is Betting Its Future on
	Artificial Intelligence'', Oxford University Press
	(1987).   ISBN 0-19-504939-X, ca. $US25.

Its author is Professor of East Asian Languages and
Literatures at the University of Hawaii.  

Although the main thesis of the book is a well-grounded
defense of the author's view of the possibilities for
success of the Japanese Fifth Generation Project, and the
stumbling block that kanji input provides, the book is a
goldmine of information, with up-to-date references and
solid technical discussions of the problem of computer
support of input, and to a lesser extent, output, of the
Japanese language.  There are passing remarks made as well
about computer handling of Chinese and Korean.

For those readers unfamiliar with it, it should be noted
that Japanese has no significant linguistic relation to
Chinese, yet it is conventionally written in a combination
of letters from one of two phonetic alphabets (katakana and
hiragana) with Chinese characters (kanji) sprinkled in.
According to a report by A.V. Hershey, about 5500 kanji are
in common use, but there are only 326 different
pronunciations, resulting in an average of about 16
characters/pronunciation.  This high homonym number poses a
severe problem on input, and poses a nightmare for indexing
and alphabetizing of kanji documents.
-------