[comp.editors] Chinese character input scheme -- call for references

thomson@wasatch.UUCP (Rich Thomson) (12/13/88)

[ Please excuse the large newsgroup list, but also note that follow-ups are
  directed to comp.graphics. ]

I'm interested in a scheme for entering Chinese characters via a keyboard.
I've come up with the idea on my own, but the scheme seems obvious.  So
ovious that I imagine someone has already implemented it.

The basic problem is to design a user interface for input of Chinese
characters in a fashion that is analogous to the writing of the character
as a sequences of strokes.  There are 24 different basic strokes that I
know of for Chinese calligraphy, although there may be more.

When someone writes a Chinese character, the basic strokes are always written
in accordance with a set of rules (left to right, top to bottom, etc).  The
sequence of basic strokes comprising a character is consistent from person
to person.  Similarly, when printing the letter 'h', we are always taught to
draw the stem '|' first, and then the tail to complete the letter.

The user interface for input of the character should use the stroke
information (encoded on a key, for instance) in combination with the order
of the strokes to uniquely identify a given Chinese character, or perhaps
learn a new character.  The Roman alphabet equivalent is already
implemented in real-time spelling checker/completion programs that
currently run on many machines.

I believe that this is a most natural scheme for entering the characters as
it mimics the act of writing the character calligraphically.  This means
the user need only adapt their current method of writing characters for
machine input.  Similar to learning to type English words by pressing
sequences of letter keys in conjunction with the SHIFT key.

There is also the subtle issue of size in conjunction with the stroke type
and sequence.  The same stroke appears in many different characters but of
different sizes, so the user must be provided some fashion of adjusting the
size of the stroke to fit the character; perhaps an ALT, SHIFT or META key
can serve to identify this modifier to the stroke.

Given this type of a scheme, does anyone know of any implementations of
similar character entry systems, possibly for Japanese or other oriental
character sets?  Are there any journals (again, possibly Japanese) devoted
to the problem of oriental native language I/O?  Any references to
articles, journals, books, programs, etc., would be greatly appreciated.

					Thanks in advance,
						-- Rich
-- 
Rich Thomson	thomson@cs.utah.edu  {bellcore,hplabs}!utah-cs!thomson
"Tyranny, like hell, is not easily conquered; yet we have this consolation with
us, that the harder the conflict, the more glorious the triumph. What we obtain
too cheap, we esteem too lightly." Thomas Paine, _The Crisis_, Dec. 23rd, 1776

sun@venus.ycc.yale.edu (12/14/88)

In article <789@wasatch.UUCP>, thomson@wasatch.UUCP (Rich Thomson) writes...
> 
>The user interface for input of the character should use the stroke
>information (encoded on a key, for instance) in combination with the order
>of the strokes to uniquely identify a given Chinese character, or perhaps
                   ^^^^^^^^
>learn a new character.

	This scheme doesn't solve the problem of ambiguity, which is one of 
the major obstacles in Chinese character coding systems. For example, the 
character Jia3 (as in Jia3, Yi3, Bin3, Ding1, i.e., 1, 2, 3, 4, you know 
what I meant) and the character Shen1 (a family name) have the same number
and sequence of strokes, and the same size of strokes. The only difference 
is the relative position of the last vertical stroke.

	Besides, the number of keys pressed could be very large. Hence, 
even if such an implementation exists, it is a very inefficient one.

>er sets?  Are there any journals (again, possibly Japanese) devoted
>to the problem of oriental native language I/O?  Any references to
>articles, journals, books, programs, etc., would be greatly appreciated.
	I rember I read somewhere that there was a conference dedicated for 
Chinese Word Processing. But I fogot where. Maybe you can look for it.

geoff@lloyd.camex.uucp (Geoffrey Knauth) (12/14/88)

In article <45616@yale-celray.yale.UUCP> sun@venus.ycc.yale.edu writes:
>	Besides, the number of keys pressed could be very large. Hence, 
>even if such an implementation exists, it is a very inefficient one.
>
>>er sets?  Are there any journals (again, possibly Japanese) devoted
>>to the problem of oriental native language I/O?  Any references to
>>articles, journals, books, programs, etc., would be greatly appreciated.
>	I rember I read somewhere that there was a conference dedicated for 
>Chinese Word Processing. But I fogot where. Maybe you can look for it.

I suggest you contact IBM, which has done a lot of work in China.  You
should also read the 11/21/88 edition of the Seybold Report on
Publishing Systems, Vol. 18, No. 5, "IPEX, Part III: Non-Roman
Languages Take Center Stage."  An excerpt from that article reads,
"HTS [High Technology Systems, an industry leader] uses the so-called
'Dr. Zhi' method of typing Chinese, whereby four basic elements (out
of a set of 180) are used to construct a character.  Some common
characters can be entered with a single keystroke."
-- 
Geoffrey S. Knauth               ARPA: geoff%lloyd@hcsfvax.harvard.edu
Camex, Inc.                      UUCP: geoff@lloyd.uucp or hcsfvax!lloyd!geoff
75 Kneeland St., Boston, MA 02111
Tel: (617)426-3577  Fax: 426-9285            I do not speak for Camex.

curtc@pogo.GPID.TEK.COM (Curtis Charles) (12/15/88)

In article <789@wasatch.UUCP>, thomson@wasatch.UUCP (Rich Thomson) writes...
>The user interface for input of the character should use the stroke
>information (encoded on a key, for instance) in combination with the order
>of the strokes to uniquely identify a given Chinese character, or perhaps

Several years ago I saw a prototype for a keyboard well suited to
Chinese.  (I know very little about Chinese, so take this with a grain
of salt...)   The keyboard was flat, and lacked the tactile feeling
we've come to enjoy, and was much like a membrain keyboard.  The reason
that it was flat was that the glyphs were projected from behind onto
the keyboard.  Apparently, the Chinese alphabet can be thought of as
tree structured, so getting a character (glyph?) on the screen became a
process of menu selection.  Several thousand characters were programmed
in, and it took 3 to 5 (?) "menu picks" to get to a glyph on the screen.

Thought about a graphic tablet with recognition software?  (Probably
tougher than recognition for English...)
------------------------------------------------------------------------
Curt Charles              | "Let our swords run red with the blood of
curtc@pogo.GPID.TEK.COM   | infidels..."    Sean Connery

wu@sunybcs.uucp (Wan-Chung Wu) (12/15/88)

In article <45616@yale-celray.yale.UUCP> sun@venus.ycc.yale.edu writes:

>	I rember I read somewhere that there was a conference dedicated for 
>Chinese Word Processing. But I fogot where. Maybe you can look for it.

I know at least one annual conference discusses all stuff about Chinese
Processing.  The name of the conference is "International Conference on
Chinese Computings".

The proceedings of that conference should be able to give you some ideas
of Chinese input methods.

The one I attended is held on June 14~17, Chicago, IL, 1987. If somebody
want to know where can you get the proceeding, please let me know and I
will try my best to give you the pointer.

=========================================================================

	wu@cs.buffalo.edu

        Graphics Group
	University Computing Service
	State University of New York at Buffalo

========================================================================

jdm@h.cs.wvu.wvnet.edu (James D Mooney,205K,7,2913548) (12/15/88)

From article <283@lloyd.camex.uucp>, by geoff@lloyd.camex.uucp (Geoffrey Knauth):
> In article <45616@yale-celray.yale.UUCP> sun@venus.ycc.yale.edu writes:
>>>er sets?  Are there any journals (again, possibly Japanese) devoted
>>>to the problem of oriental native language I/O?  Any references to
>>>articles, journals, books, programs, etc., would be greatly appreciated.
>>	I rember I read somewhere that there was a conference dedicated for 
>>Chinese Word Processing. But I fogot where. Maybe you can look for it.
> 
> I suggest you contact IBM, which has done a lot of work in China.  You
> should also read the 11/21/88 edition of the Seybold Report on
> Publishing Systems, Vol. 18, No. 5, "IPEX, Part III: Non-Roman
> Languages Take Center Stage."  An excerpt from that article reads,
> "HTS [High Technology Systems, an industry leader] uses the so-called
> 'Dr. Zhi' method of typing Chinese, whereby four basic elements (out
> of a set of 180) are used to construct a character.  Some common
> characters can be entered with a single keystroke."

Another place this subject is discussed is at the annual PROTEXT
conferences organized by Professor J. Miller of Trinity College,
Dublin, Ireland.  PROTEXT IV, held October 1987 in Boston,
included some relevant papers including:

	Text Processing in Ideographic Languages, by Loh
		Shiu-Chang and Kong Luan

	Key Problems in Developing an Advanced Chinese Text
		Processing and Typesetting System, by
		Wang Xuan

Proceedings of all PROTEXT Conferences are available from

	Boole Press Limited
	P.O. Box 5
	Dun Laoghaire, Co. Dublin, Ireland

Jim Mooney				Dept. of Stat. & Computer Science
(304) 293-3607				West Virginia University
					Morgantown, WV 26506
USENET:  {allegra,bellcore,cadre,idis,psuvax1}!pitt!wvucsb!wvucsa!jdm

scottg@hpiacla.HP.COM (Scott Gulland) (12/16/88)

/ hpiacla:comp.editors / thomson@wasatch.UUCP (Rich Thomson) / 12:58 am  Dec 13, 1988 /
> I'm interested in a scheme for entering Chinese characters via a keyboard.
> I've come up with the idea on my own, but the scheme seems obvious.  So
> ovious that I imagine someone has already implemented it.

> The basic problem is to design a user interface for input of Chinese
> characters in a fashion that is analogous to the writing of the character
> as a sequences of strokes.  There are 24 different basic strokes that I
> know of for Chinese calligraphy, although there may be more.
> 
> When someone writes a Chinese character, the basic strokes are always written
> in accordance with a set of rules (left to right, top to bottom, etc).  The
> sequence of basic strokes comprising a character is consistent from person
> to person.  Similarly, when printing the letter 'h', we are always taught to
> draw the stem '|' first, and then the tail to complete the letter.
> 
> The user interface for input of the character should use the stroke
> information (encoded on a key, for instance) in combination with the order
> of the strokes to uniquely identify a given Chinese character, or perhaps
> learn a new character.  The Roman alphabet equivalent is already
> implemented in real-time spelling checker/completion programs that
> currently run on many machines.
> 
> I believe that this is a most natural scheme for entering the characters as
> it mimics the act of writing the character calligraphically.  This means
> the user need only adapt their current method of writing characters for
> machine input.  Similar to learning to type English words by pressing
> sequences of letter keys in conjunction with the SHIFT key.
> 
> There is also the subtle issue of size in conjunction with the stroke type
> and sequence.  The same stroke appears in many different characters but of
> different sizes, so the user must be provided some fashion of adjusting the
> size of the stroke to fit the character; perhaps an ALT, SHIFT or META key
> can serve to identify this modifier to the stroke.
> 
> Given this type of a scheme, does anyone know of any implementations of
> similar character entry systems, possibly for Japanese or other oriental
> character sets?  Are there any journals (again, possibly Japanese) devoted
> to the problem of oriental native language I/O?  Any references to
> articles, journals, books, programs, etc., would be greatly appreciated.

HP has offered full KANJI support for quite a number of years.  KANJI is used
in Japan and consist of approximately 30,000-50,000 ideograms.   Rather than
using a keystroke approach as given above (highly impractical), a very special
terminal is employed.  This terminal allows any of the 30K-50K ideograms to 
be entered with a single keystroke.  Note that each character in the KANJI 
languague is represented by 16-bits. 

wu@sunybcs.uucp (Wan-Chung Wu) (12/17/88)

To those who are interested in the Chinese Input schemes,

	As I promise to "try my best to give you a pointer" for the
   proceedings of International Conference on Chinese Computing, here
   are the persons you should contact with: 

    (Because there are too many people to request the information,
     I have to post the information here to save my tight schedule :-)  )

	Prof. Shi-Kuo Chang
	Department of Computer Science
	University of Pittsburgh

	Dr. Patrick S.P. Wang
	Department of Computer Science
	Northeastern University
	Boston, Massachusetts

	Dr. An-Chi Liu
	Department of Electrical and Computer Engineering
	Illinois Institute of Technology
	Chicago, Illinois


	The following are a list of papers in proceeding of ICCC'87 
    that related to Chinese Input:


	-------------------------------------------------------------------

	1. W.C.P. Yu, "Some New Advancement in High Speed Two-Stroke Chinese
	   Input System".

	2. H.L. Soo, "A Generic Chinese Input System".

	3. W.H. Wu, "Chinese Characters Encoded in Stroke-Sequences".

        4. J. Zhu and X. Liu, "A New Input System for Chinese Language
	   Processing".

	5. K.Y. Cheng and F.K. Yu, "On Disambiguous Chinese Phonetic Input"

	6. A. Mathur and F. Fowler, "Design of a Dynamically Reconfigurable 
	   Keyboard".

	7. A. MacDonald and Y.H. Ng, "Sequence Prediction for Chinese 
	   Language Input".

	8. H.C. Tien, "PINXXIEE: The Chinese Computer Input Language".

	9. V.C. Yeh, "The Phonetic Chinese Language Computer System".

       10. T.Y. Kiang and T.H. Cheng, "Survey on the Establishment of 
	   Indexing System for Composed Chinese Characters".

       11. T. Huang, "The Dai-E Chinese Encoding Method".

       ----------------------------------------------------------------

	I am sure that there should be more interesting papers
     in the proceeding of ICCC'88 or former ones.

	If you still have question, send me mail again.

	Sorry to response you guys so late!


==========================================================================

	wu@cs.buffalo.edu

	Graphics Group
	Univeristy Computing Service
	State University of New York at Buffalo

==========================================================================

huangt@psu-cs.UUCP (Techung Huang) (12/21/88)

From article <3316@cs.Buffalo.EDU> by wu@sunybcs.UUCP
>To those who are interested in the Chinese Input schemes,
>
>	The following are a list of papers in proceeding of ICCC'87 
>   that related to Chinese Input:
>
>[deleted]

For those who wants to get the complete picture of Chinese computing.
An in depth discussion about Chinese, Japanese and Korean computing can
be found in the book,
"AN INTRODUCTION TO CHINESES,JAPANESE AND KOREAN COMPUTING"
written by J K T Huang(Rep. of China) & T D Huang(USA)

----------------------------------------------------------------
Following is the brief info.
----------------------------------------------------------
This first book of its kind gives a comprehensive introduction to
Chinese, Japanese and Korean (CJK) Computing.  Every possible
related issue is covered but an in-depth look into Chinese,
Japanese and Korean computing problems and environment in particular,
is also discussed.
Besides being of interest to Oriental Language computing professionals,
it also provides a clear overview of the subject to individuals learning
CJK Computing and computer companies working on CJK systems.

Contents : Introduction; About the Chinese Language; Input Methods; Output
Methods - Chinese Character Generation; Internal Codes; Chinese Character
Code for Information Interchange; Chinese Software; A Real Implementation
Example.

400pp (approx.)           9971-50-664-5             
Book Code : ZB0657RB

For America only :
World Scientific Publishing Co., Inc.
687 Hartwell Street, Teaneck, NJ 07666, USA
Toll-free : 1-800-227-7562, Telefax : (201)837-8859
Tel : (201)837-8858, (201)837-1567

For other countries :
World Scientific Publishing Co. Pte Ltd.
Farrer Road, P. O. Box 128, Singapore 9128
Cable Address : "COSPUB", Telex : RS 28561 WSPC
Telexfax : 2737298, Tel: 2786188

=============================================================
UUCP:   {ucbvax,uunet,ihnp4,gatech}!tektronix!psu-cs!huangt
CSNET:  huangt@cs.pdx.edu
ARPANET:huangd%cs.pdx.edu@relay.cs.net

stafford@ti-csl.CSNET (Ron Stafford) (12/27/88)

>
>HP has offered full KANJI support for quite a number of years.  KANJI is used
>in Japan and consist of approximately 30,000-50,000 ideograms.   Rather than
>using a keystroke approach as given above (highly impractical), a very special
>terminal is employed.  This terminal allows any of the 30K-50K ideograms to 
>be entered with a single keystroke.  Note that each character in the KANJI 
>languague is represented by 16-bits. 

WOW!!!   HP has a keyboard with 50,000 keys.

	How long does it take to learn touch typing on this monster?

kinmonthprep@deneb.ucdavis.edu (Earl H. Kinmonth) (12/28/88)

In article <66384@ti-csl.CSNET> stafford@tilde.UUCP (Ron Stafford) writes:
>>
>>HP has offered full KANJI support for quite a number of years.  KANJI is used
>>in Japan and consist of approximately 30,000-50,000 ideograms.   Rather than

While over the course of all of Japanese and Chinese history there may
be a total of 50,000 characters and variations, only a small subset
have general use. Minimum literacy in Japan involves roughly 1800
characters. A college graduate might be able to recognize twice or
three times this number depending on field.

My NEC word processor offers around 7000 characters, many of which are
used only for place or family names. Many low end word processors offer
only half this number.

>>using a keystroke approach as given above (highly impractical), a very special
>>terminal is employed.  This terminal allows any of the 30K-50K ideograms to 

Sounds fishy to me. Most Japanese terminals have an ordinary
American-European keyboard. Entry is phonetic either through the
Japanese kana or romanization. The only larger devices I've seen have
used tablets with the characters in a grid. You punch them with a
stylus.

>>be entered with a single keystroke.  Note that each character in the KANJI 
>>languague is represented by 16-bits. 
>
>WOW!!!   HP has a keyboard with 50,000 keys.
>
>	How long does it take to learn touch typing on this monster?

You don't because it doesn't exist.

Earl H. Kinmonth, History Department, University of California,
Davis, California, 95616

916-752-1636 (voice, fax) /0776

Disclaimer: This is AmeriKa!  Who needs a disclaimer!

Internet:   ucdked!cck@ucdavis.edu

UUCP:       {ucbvax, lll-crg}!ucdavis!ucdked!cck