[comp.std.internat] Chinese words

mike@turing.unm.edu (Michael I. Bushnell) (08/12/87)

I think the discussion about Chinese words would benefit from some
interesting knowledge I gleaned from a book on writing systems and the
alphabet.  

In Chinese, every word is one syllable.  Needless to say, there is
lots of overloading here, but the multiple meanings of a word are
usually quite different and can be easily distinguished from context.
There is not one character per word(== syllable), rather, there is one
character per word meaning.  

Representations of Chinese in data-processing is always assumed to be
like writing Chinese.  In that case, you need enough bits to hold the
large lexicon.  But is it not possible to represent the syllables?
There may be problems (many computer "things" have little context) but
it might be workable.  

As for the size of the lexicon, a recent article here said that the
OED had about 1,000,000 words, and English slightly more than that.
From this, the poster derived a figure of 1,000,000 for the size of
the Chinese lexicon.  But English is a remarkable language.  For most
types of things, we have TWO words, one Latinic, one Germanic.  For
example: teeth/dental dead/mortal car/automobile.  The list of such
pairs is huge.  In no other language to my knowledge is there such a
phenomenon.  My estimate, from this and other reasons for the large
English lexicon, is about 500,000 words in Chinese.  Unfortunately,
this means that each character would not fit in 16 bits.  But the
number of syllables is MUCH less.  That could probably fit.


					Michael I. Bushnell
					a/k/a Bach II
					mike@turing.UNM.EDU
---
Where do your SOCKS go when you lose them in th' WASHER?
				-- Zippy the Pinhead

stanwyck@drutx.UUCP (08/17/87)

in article <615@unmvax.UNM.EDU>, mike@turing.unm.edu (Michael I. Bushnell) says:
> In Chinese, every word is one syllable.  Needless to say, there is
> lots of overloading here, but the multiple meanings of a word are
> usually quite different and can be easily distinguished from context.
> There is not one character per word(== syllable), rather, there is one
> character per word meaning.  
> 
> 					mike@turing.UNM.EDU

Au contraire!  As a former Chinese translater I can catagorically state the
falseness of the above statement.  There are 2 errors, related, in the
above:
  1.  In Chinese, many words are multi-syllable (e.g., computer is 2
      syllables - dien nyau [ dien - electric, nyau - brain])

  2.  In Chinese, virtually every character is one syllable and has only
      one pronounciation.  This is the primary advantage of learning
      Chinese rather than Japanese (which I also speak).  Japanese kanji
      normally have at least two and as many as 23 different
      pronounciations (nama of nama tomago), some of which are single
      syllable and others of which are multiple syllables.

The result is the opposite of the above statement:  Chinese characters
generally map one-to-one with a syllable, and while many words are
monosyllabic, there are many compound words (see above example) that are
polysyllabic.  Most of the latter are of recent introduction.  
-- 
AT&T 				o  o			303-538-5004
Don Stanwyck		         ||   			ihnp4!drutx!stanwyck
Denver, CO USA			\__/			Telecom Standards