[comp.graphics] Chinese character input scheme -- call for references

thomson@wasatch.UUCP (Rich Thomson) (12/13/88)

[ Please excuse the large newsgroup list, but also note that follow-ups are
  directed to comp.graphics. ]

I'm interested in a scheme for entering Chinese characters via a keyboard.
I've come up with the idea on my own, but the scheme seems obvious.  So
ovious that I imagine someone has already implemented it.

The basic problem is to design a user interface for input of Chinese
characters in a fashion that is analogous to the writing of the character
as a sequences of strokes.  There are 24 different basic strokes that I
know of for Chinese calligraphy, although there may be more.

When someone writes a Chinese character, the basic strokes are always written
in accordance with a set of rules (left to right, top to bottom, etc).  The
sequence of basic strokes comprising a character is consistent from person
to person.  Similarly, when printing the letter 'h', we are always taught to
draw the stem '|' first, and then the tail to complete the letter.

The user interface for input of the character should use the stroke
information (encoded on a key, for instance) in combination with the order
of the strokes to uniquely identify a given Chinese character, or perhaps
learn a new character.  The Roman alphabet equivalent is already
implemented in real-time spelling checker/completion programs that
currently run on many machines.

I believe that this is a most natural scheme for entering the characters as
it mimics the act of writing the character calligraphically.  This means
the user need only adapt their current method of writing characters for
machine input.  Similar to learning to type English words by pressing
sequences of letter keys in conjunction with the SHIFT key.

There is also the subtle issue of size in conjunction with the stroke type
and sequence.  The same stroke appears in many different characters but of
different sizes, so the user must be provided some fashion of adjusting the
size of the stroke to fit the character; perhaps an ALT, SHIFT or META key
can serve to identify this modifier to the stroke.

Given this type of a scheme, does anyone know of any implementations of
similar character entry systems, possibly for Japanese or other oriental
character sets?  Are there any journals (again, possibly Japanese) devoted
to the problem of oriental native language I/O?  Any references to
articles, journals, books, programs, etc., would be greatly appreciated.

					Thanks in advance,
						-- Rich
-- 
Rich Thomson	thomson@cs.utah.edu  {bellcore,hplabs}!utah-cs!thomson
"Tyranny, like hell, is not easily conquered; yet we have this consolation with
us, that the harder the conflict, the more glorious the triumph. What we obtain
too cheap, we esteem too lightly." Thomas Paine, _The Crisis_, Dec. 23rd, 1776

bph@buengc.BU.EDU (Blair P. Houghton) (12/14/88)

In article <789@wasatch.UUCP> thomson@wasatch.utah.edu.UUCP (Rich Thomson) writes:
>[ Please excuse the large newsgroup list, but also note that follow-ups are
>  directed to comp.graphics. ]
>
>I'm interested in a scheme for entering Chinese characters via a keyboard.
>I've come up with the idea on my own, but the scheme seems obvious.  So
>ovious that I imagine someone has already implemented it.
>
>The basic problem is to design a user interface for input of Chinese
>characters in a fashion that is analogous to the writing of the character
>as a sequences of strokes.  There are 24 different basic strokes that I
>know of for Chinese calligraphy, although there may be more.

Sounds simple enough, but you might try a digitizing pad and some sort of
character-recognition software; the numerous configurations of those
strokes in the thousands of chinese symbols might be a source of error
in typing.

I've actually seen a photo of a Chinese keyboard:  it had about a hundred
alphabetic keys, and a pad of nine (that's nine, one less than ten) shift
keys.

				--Blair
				  "Sounds perfect for Emacs."

kinmonthprep@deneb.ucdavis.edu (Earl H. Kinmonth) (12/14/88)

In article <789@wasatch.UUCP> thomson@wasatch.utah.edu.UUCP (Rich Thomson) writes:
>[ Please excuse the large newsgroup list, but also note that follow-ups are
>  directed to comp.graphics. ]
>
>I'm interested in a scheme for entering Chinese characters via a keyboard.

First, before you invent a wheel that has already been
invented, why not look at some of the commercial word
processors that are available for Chinese and Japanese.

Even before you do that, think about your terminology.
Chinese characters for Chinese are one thing, characters of
(largely) Chinese origin used in Japanese are another.

>The basic problem is to design a user interface for input of Chinese
>characters in a fashion that is analogous to the writing of the character

If you had done a little research you would know that there
are a variety of methods already in use for "Chinese"
characters ranging from entering raw HEX codes to fairly
sophisticated context analysis schemes using rudimentary AI
techniques.

Japanese vendors have experimented with a variety of
techniques including pressure sensitive tablets, stroke
classification schemes, etc.

A few of these are discussed in J. Marshall Unger, The
Fifth Generation Fallacy: Why Japan is Betting Its Future
on Artificial Intelligence (Oxford University Press, 1987).
Overall, this is a shallow book, but it does describe some
of the techniques use to handle characters in ENGLISH. To
learn more, pick up the technical manuals for commercial
Japanese word processors.

Overall, Japanese seems best handled by table lookup from
romanized input. Of course, characters are only a fraction
of the symbols needed for writing Japanese. I make this
generalization based on experimentation with a number of
input techniques, but the best argument for it is that it
is what people are buying in Japan. Every Japanese
manufacturer seems to have tried a proprietary input
scheme, but the one that users seem to prefer is
translation from romaji.

[much cut]

sun@venus.ycc.yale.edu (12/14/88)

In article <789@wasatch.UUCP>, thomson@wasatch.UUCP (Rich Thomson) writes...
> 
>The user interface for input of the character should use the stroke
>information (encoded on a key, for instance) in combination with the order
>of the strokes to uniquely identify a given Chinese character, or perhaps
                   ^^^^^^^^
>learn a new character.

	This scheme doesn't solve the problem of ambiguity, which is one of 
the major obstacles in Chinese character coding systems. For example, the 
character Jia3 (as in Jia3, Yi3, Bin3, Ding1, i.e., 1, 2, 3, 4, you know 
what I meant) and the character Shen1 (a family name) have the same number
and sequence of strokes, and the same size of strokes. The only difference 
is the relative position of the last vertical stroke.

	Besides, the number of keys pressed could be very large. Hence, 
even if such an implementation exists, it is a very inefficient one.

>er sets?  Are there any journals (again, possibly Japanese) devoted
>to the problem of oriental native language I/O?  Any references to
>articles, journals, books, programs, etc., would be greatly appreciated.
	I rember I read somewhere that there was a conference dedicated for 
Chinese Word Processing. But I fogot where. Maybe you can look for it.

tex@wucc.waseda.JUNET (Kamiya Fumiaki) (12/14/88)

I don't know how it is done in other oriental countries, but at
least, I can tell you how it is usually done in Japan.

The main idea is to deploy what is called a kana-to-kanji
converter.  Given a string of kanas, which represents the
sound of the kanji he/she wants, it displays a list of
kanjis and the user selects the one he/she wants.  That's
all.  In fact there are other features implemented in real
kana-to-kanji converters in public but the fundamental part
is just what I have said.

Of course, since there are about 50 kana characters, we
can't enter a kana in a single stroke from an ASCII
keyboard.  But fortunately, there is so-called 'roma-ji'
that assigns a string of alphabets, usually two, to every
kanas.  So if this convention is known by the kana-to-kanji
converter, one can obtain kanji documents from an ASCII
keyboard.  (We also have so-called 'JIS keyboard' and one
can enter kana in a single stroke)

Kamiya Fumiaki
Department of Mathematics, Waseda University

NOTE: Please don't reply by mail, it will be rejected at the gateway.

geoff@lloyd.camex.uucp (Geoffrey Knauth) (12/14/88)

In article <45616@yale-celray.yale.UUCP> sun@venus.ycc.yale.edu writes:
>	Besides, the number of keys pressed could be very large. Hence, 
>even if such an implementation exists, it is a very inefficient one.
>
>>er sets?  Are there any journals (again, possibly Japanese) devoted
>>to the problem of oriental native language I/O?  Any references to
>>articles, journals, books, programs, etc., would be greatly appreciated.
>	I rember I read somewhere that there was a conference dedicated for 
>Chinese Word Processing. But I fogot where. Maybe you can look for it.

I suggest you contact IBM, which has done a lot of work in China.  You
should also read the 11/21/88 edition of the Seybold Report on
Publishing Systems, Vol. 18, No. 5, "IPEX, Part III: Non-Roman
Languages Take Center Stage."  An excerpt from that article reads,
"HTS [High Technology Systems, an industry leader] uses the so-called
'Dr. Zhi' method of typing Chinese, whereby four basic elements (out
of a set of 180) are used to construct a character.  Some common
characters can be entered with a single keystroke."
-- 
Geoffrey S. Knauth               ARPA: geoff%lloyd@hcsfvax.harvard.edu
Camex, Inc.                      UUCP: geoff@lloyd.uucp or hcsfvax!lloyd!geoff
75 Kneeland St., Boston, MA 02111
Tel: (617)426-3577  Fax: 426-9285            I do not speak for Camex.

curtc@pogo.GPID.TEK.COM (Curtis Charles) (12/15/88)

In article <789@wasatch.UUCP>, thomson@wasatch.UUCP (Rich Thomson) writes...
>The user interface for input of the character should use the stroke
>information (encoded on a key, for instance) in combination with the order
>of the strokes to uniquely identify a given Chinese character, or perhaps

Several years ago I saw a prototype for a keyboard well suited to
Chinese.  (I know very little about Chinese, so take this with a grain
of salt...)   The keyboard was flat, and lacked the tactile feeling
we've come to enjoy, and was much like a membrain keyboard.  The reason
that it was flat was that the glyphs were projected from behind onto
the keyboard.  Apparently, the Chinese alphabet can be thought of as
tree structured, so getting a character (glyph?) on the screen became a
process of menu selection.  Several thousand characters were programmed
in, and it took 3 to 5 (?) "menu picks" to get to a glyph on the screen.

Thought about a graphic tablet with recognition software?  (Probably
tougher than recognition for English...)
------------------------------------------------------------------------
Curt Charles              | "Let our swords run red with the blood of
curtc@pogo.GPID.TEK.COM   | infidels..."    Sean Connery

wu@sunybcs.uucp (Wan-Chung Wu) (12/15/88)

In article <45616@yale-celray.yale.UUCP> sun@venus.ycc.yale.edu writes:

>	I rember I read somewhere that there was a conference dedicated for 
>Chinese Word Processing. But I fogot where. Maybe you can look for it.

I know at least one annual conference discusses all stuff about Chinese
Processing.  The name of the conference is "International Conference on
Chinese Computings".

The proceedings of that conference should be able to give you some ideas
of Chinese input methods.

The one I attended is held on June 14~17, Chicago, IL, 1987. If somebody
want to know where can you get the proceeding, please let me know and I
will try my best to give you the pointer.

=========================================================================

	wu@cs.buffalo.edu

        Graphics Group
	University Computing Service
	State University of New York at Buffalo

========================================================================

jdm@h.cs.wvu.wvnet.edu (James D Mooney,205K,7,2913548) (12/15/88)

From article <283@lloyd.camex.uucp>, by geoff@lloyd.camex.uucp (Geoffrey Knauth):
> In article <45616@yale-celray.yale.UUCP> sun@venus.ycc.yale.edu writes:
>>>er sets?  Are there any journals (again, possibly Japanese) devoted
>>>to the problem of oriental native language I/O?  Any references to
>>>articles, journals, books, programs, etc., would be greatly appreciated.
>>	I rember I read somewhere that there was a conference dedicated for 
>>Chinese Word Processing. But I fogot where. Maybe you can look for it.
> 
> I suggest you contact IBM, which has done a lot of work in China.  You
> should also read the 11/21/88 edition of the Seybold Report on
> Publishing Systems, Vol. 18, No. 5, "IPEX, Part III: Non-Roman
> Languages Take Center Stage."  An excerpt from that article reads,
> "HTS [High Technology Systems, an industry leader] uses the so-called
> 'Dr. Zhi' method of typing Chinese, whereby four basic elements (out
> of a set of 180) are used to construct a character.  Some common
> characters can be entered with a single keystroke."

Another place this subject is discussed is at the annual PROTEXT
conferences organized by Professor J. Miller of Trinity College,
Dublin, Ireland.  PROTEXT IV, held October 1987 in Boston,
included some relevant papers including:

	Text Processing in Ideographic Languages, by Loh
		Shiu-Chang and Kong Luan

	Key Problems in Developing an Advanced Chinese Text
		Processing and Typesetting System, by
		Wang Xuan

Proceedings of all PROTEXT Conferences are available from

	Boole Press Limited
	P.O. Box 5
	Dun Laoghaire, Co. Dublin, Ireland

Jim Mooney				Dept. of Stat. & Computer Science
(304) 293-3607				West Virginia University
					Morgantown, WV 26506
USENET:  {allegra,bellcore,cadre,idis,psuvax1}!pitt!wvucsb!wvucsa!jdm

asp@puck.UUCP (Andy Puchrik) (12/16/88)

In article <391@wucc.waseda.JUNET>, tex@wucc.waseda.JUNET (Kamiya Fumiaki) writes:
> I don't know how it is done in other oriental countries, but at
> least, I can tell you how it is usually done in Japan.

I've seen the NEC msdos micro and some of the laptop Japanese word
processors.  They all have the JIS character set in ROM.  I suppose
the terminals have hardware assist also.  What kinds of software is
available for workstations?  Surely there must be terminal emulators
and word processors for SUN and 386-class systems.  Much of the spread
of computers in the States and Europe was due to public domain editors
and terminal emulators.  Is there such a thing as public domain
Japanese software?  Anything that would run  on the larger systems?
-- 
Internet: asp@puck.UUCP				Andy Puchrik
uucp: decvax!necntc!necis!puck!asp		Moonlight Systems
ARPA: puchrik@tops20.dec.com			Concord, MA 01742

wu@sunybcs.uucp (Wan-Chung Wu) (12/17/88)

To those who are interested in the Chinese Input schemes,

	As I promise to "try my best to give you a pointer" for the
   proceedings of International Conference on Chinese Computing, here
   are the persons you should contact with: 

    (Because there are too many people to request the information,
     I have to post the information here to save my tight schedule :-)  )

	Prof. Shi-Kuo Chang
	Department of Computer Science
	University of Pittsburgh

	Dr. Patrick S.P. Wang
	Department of Computer Science
	Northeastern University
	Boston, Massachusetts

	Dr. An-Chi Liu
	Department of Electrical and Computer Engineering
	Illinois Institute of Technology
	Chicago, Illinois


	The following are a list of papers in proceeding of ICCC'87 
    that related to Chinese Input:


	-------------------------------------------------------------------

	1. W.C.P. Yu, "Some New Advancement in High Speed Two-Stroke Chinese
	   Input System".

	2. H.L. Soo, "A Generic Chinese Input System".

	3. W.H. Wu, "Chinese Characters Encoded in Stroke-Sequences".

        4. J. Zhu and X. Liu, "A New Input System for Chinese Language
	   Processing".

	5. K.Y. Cheng and F.K. Yu, "On Disambiguous Chinese Phonetic Input"

	6. A. Mathur and F. Fowler, "Design of a Dynamically Reconfigurable 
	   Keyboard".

	7. A. MacDonald and Y.H. Ng, "Sequence Prediction for Chinese 
	   Language Input".

	8. H.C. Tien, "PINXXIEE: The Chinese Computer Input Language".

	9. V.C. Yeh, "The Phonetic Chinese Language Computer System".

       10. T.Y. Kiang and T.H. Cheng, "Survey on the Establishment of 
	   Indexing System for Composed Chinese Characters".

       11. T. Huang, "The Dai-E Chinese Encoding Method".

       ----------------------------------------------------------------

	I am sure that there should be more interesting papers
     in the proceeding of ICCC'88 or former ones.

	If you still have question, send me mail again.

	Sorry to response you guys so late!


==========================================================================

	wu@cs.buffalo.edu

	Graphics Group
	Univeristy Computing Service
	State University of New York at Buffalo

==========================================================================

charette@edsews.EDS.COM (Mark A. Charette) (12/17/88)

In article <351@puck.UUCP>, asp@puck.UUCP (Andy Puchrik) writes:
> In article <391@wucc.waseda.JUNET>, tex@wucc.waseda.JUNET (Kamiya Fumiaki) writes:
> > I don't know how it is done in other oriental countries, but at
> > least, I can tell you how it is usually done in Japan.
> and terminal emulators.  Is there such a thing as public domain
> Japanese software?  Anything that would run  on the larger systems?

If you're really ambitious you might want to take the X based kterm program
and modify it to become an editor. All the X systems I've seen based on the
distributed X tape have kterm and the kana and kanji fonts (14x14).

If anyone is interested, I can send the 24x24 fonts to them. The file is a
bit big (~ 2 mb) and is in pseudo-bdf format (I got them to compile into
snf format with the X font compiler - but some work is necessary to put
them in the proper JIS 1 & 2 positions). I will mail if that's the only
way, but I would prefer it if you sent a Dec, Sun, Apollo, or HP tape, or
if you sent either a high density PC floppy or enough low density ones to
fit the data.

-----

Mark Charette             "People only like me when I'm dumb!", he said. 
Electronic Data Systems   "I like you a lot." was the reply.
750 Tower Drive           Voice: (313)265-7006        FAX: (313)265-5770
Troy, MI 48007-7019       charette@edsews.eds.com     uunet!edsews!charette 
-- 
Mark Charette             "People only like me when I'm dumb!", he said. 
Electronic Data Systems   "I like you a lot." was the reply.
750 Tower Drive           Voice: (313)265-7006        FAX: (313)265-5770
Troy, MI 48007-7019       charette@edsews.eds.com     uunet!edsews!charette

tex@wucc.waseda.JUNET (Kamiya Fumiaki) (12/19/88)

In article <351@puck.UUCP>, asp@puck.UUCP (Andy Puchrik) writes:
> I've seen the NEC msdos micro and some of the laptop Japanese word
> processors.  They all have the JIS character set in ROM.  I suppose
> the terminals have hardware assist also.  What kinds of software is
> available for workstations?  Surely there must be terminal emulators
> and word processors for SUN and 386-class systems.  Much of the spread
> of computers in the States and Europe was due to public domain editors
> and terminal emulators.  Is there such a thing as public domain
> Japanese software?  Anything that would run  on the larger systems?

Yes, as far as I know, there are few kana-to-kanji systems 
for UNIX machines.  Wnn is one of such systems and is said
to be the most powerful tool for this purpose.  It was
developed by Kyoto University, Tateishi Electronics and
ASTEC.  Since I'm not sure about how it is actually
distributed, anyone willing to obtain a copy or more
information should contact ASTEC directly.  Their address
is:	
	ASTEC, Inc.
	Nagashima-Daiichi Building
	1-22-12, Dougenzaka, Shibuya, Tokyo 150.

---
Kamiya Fumiaki
Department of Mathematics, Waseda University