[sci.lang.japan] What is a byte

lambert@mcvax.UUCP (08/16/87)

[I have removed comp.std.c from the Newsgroups line and added sci.lang.japan]

In article <479@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes:
) This, of course, makes it even more amazing that they have been so succesful
) in the world community. It seems likely to me, though, that at some point
) they're going to have to break down and drop Kanji for professional use.

There seems to be a good reason for not doing this: after romanization,
words written differently in Kanji may become the same.  Although
ambiguities caused by homonymy occur in all languages (like English "drill"
= 1. [the use of] a tool for boring holes, metaphorically also boring
exercise; 2. [[a tool for] sowing in] a furrow; 3. twilled cotton; 4. a
baboon), these seem nothing compared to what the Japanese would face.

For example, the word "kanji" itself can mean: 1. feeling, sensation,
impression; 2. Kanji (Chinese character); 3. manager, secretary;
4. inspector, superintendent; 5. smilingly.  These are all written
differently now.

A particularly bad example: "ko^ka" = 1. Faculty of Engineering;
2. consideration of services; 3. a high price; 4. an official price;
5. overhead, elevated; 6. merits and demerits; 7. effect, efficiency;
8. descent, fall; 9. the marriage of an Imperial princess to a subject;
10. mineralization; 11. colloid degeneration, gelatination; 12. hardening,
cementation, vulcanization, stiffening; 13. hard money, cash; 14. a leave
of absence; 15. taxes; 16. an evil effect; 17. the Yellow Peril; 18. an
unfortunate slip of the tongue; 19. amalgamation; 20. a school song; 21. a
high-rise building.

This high degree of ambiguity is the combined result of two characteristics
of Japanese.  One is that there are say 1850 Kanji characters in common
use, each having an independent semantic content and usually a one-syllable
"reading", the so-called On reading, derived from the Chinese
pronunciation.  There may be more than one On reading and there are some
bisyllabic On readings.  There is also a Kun (original Japanese) reading,
which is completely unrelated (like On = "chu^", Kun = "hiru"), and which
more often than not is polysyllabic, but most single syllables occur as a
Kun reading.  I haven't counted them, but say there are about ninety
syllables for readings of these 1850 characters, so typically a single
syllable may be the reading of 20 different characters.  The second
characteristic that is important here is the ease with which compound words
are formed in Japanese, often by stringing some Ons together.  Thus, all
the different "ko^ka"s above are the result of combining a highly ambiguous
"ko^" with a highly ambiguous "ka", and there are hundreds of other
potential meanings for this compound than the few given above (culled from
a dictionary).  Written in Kanji, there is no ambiguity.

Not all words are that ambiguous if spelled in Romaji, but glancing through
my dictionary I estimate that about one third to one half of the entries
have the same romanization as another entry, and the number of clusters of
four or five homonymous entries may be as many as one thousand (as I find
one on almost every page of 1000  pages, sometimes two or three).

It may be that I am overestimating the problem and that the context would
suffice well enough to disambiguate romanized Japanese to make it
acceptable for professional use.  Perhaps a Japanese reader of this article
may care to comment.

-- 

Lambert Meertens, CWI, Amsterdam; lambert@cwi.nl