[comp.editors] Programming and international character sets.

guy@auspex.UUCP (Guy Harris) (11/02/88)

In article <4002@homxc.UUCP> gauss@homxc.UUCP (E.GAUSS) writes:

(And misattributes both quotes - that's why I don't like the "In article
..., ... writes:" lines)

>An author friend that I work with, Eb Colville, has been trying for a
>number of years to find a VI editor that will handle the German characters
>available in the extended ASCI characters on his MS-DOS  PC.  He used those
>in his novel, THE LAST ZEPPELIN, which is trying to find a publisher.  Whatever
>the talk, it does not seem to be possible to do this.  Extended ASCII
>requires the full eight bits to be available, and all VI's that we have
>seen simply toss away the lead bit folding umlauted characters into
>control characters.

The "vi" in System V Release 3.1 handles 8-bit characters. 
Unfortunately, I don't know if anybody's ported it to MS-DOS....

Also, some version of Unipress EMACS can be configured to support 8-bit
characters as well (I don't know if that version has been released yet
or not).

>There are methods for doing Japannese where the keyboardist types in
>"Romanji" and the computer makes a guess at the konji.

The ones I've seen convert Romaji to Kana as you type (this is, as I
understand it, a straightforward translation) and then permit you to
request that the computer translate the Kana you typed since the last
checkpoint (switching mode into Kanji mode, or asking for a
Kana-to-Kanji translation) into Kanji.  It gives you a list of the
possible translations, and lets you choose which one you want.

Of course, now you'd need an editor that handles *16*-bit characters; I
think AT&T has a "vi" that will handle them, and I don't know about
EMACS (although I remember an #ifdef in the aforementioned Unipress
version for Kanji).

mark@jhereg.Jhereg.MN.ORG (Mark H. Colburn) (11/03/88)

In article <380@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
>The ones I've seen convert Romaji to Kana as you type (this is, as I
>understand it, a straightforward translation) and then permit you to
>request that the computer translate the Kana you typed since the last
>checkpoint (switching mode into Kanji mode, or asking for a
>Kana-to-Kanji translation) into Kanji.  It gives you a list of the
>possible translations, and lets you choose which one you want.
>
>Of course, now you'd need an editor that handles *16*-bit characters; I
>think AT&T has a "vi" that will handle them [...]

There was a demonstration of a version of vi which supported Japanese
character sets at the October POSIX meeting in Hawaii.   The implementation
that they demonstratated allowed the user to type in Romaji, then convert
to Kana, and then to convert again to Katakana or Kanji.  In order to
provide the correct translation, they had a dictionary on-line which
provided glyph lookup based on the word being translated.

The interesting part of the whole thing was that the translation was all
done at the TTY device driver level, rather than within the editor itself.
Of course, vi still needs to handle 16-bit characters...
-- 
Mark H. Colburn                  "They didn't understand a different kind of 
NAPS International                smack was needed, than the back of a hand, 
mark@jhereg.mn.org                something else was always needed."

guy@auspex.UUCP (Guy Harris) (11/05/88)

>The interesting part of the whole thing was that the translation was all
>done at the TTY device driver level, rather than within the editor itself.

No, I doubt it was *all* done at the TTY driver level - at least I hope
it wasn't.  If nothing else, the dictionary management doesn't belong in
the kernel (which, unfortunately, is where UNIX tty drivers tend to be). 
They may have stuck some of the user-interface part there as well, which
I also doubt is necessary; all you really need in the kernel is a hook
to let a "user-interface daemon" take temporary control of the terminal. 

Of course, in a lot of cases you can do it in the terminal (e.g., a PC
used as a terminal, or a "terminal" such as "shelltool" or "xterm"), and
have it send Kanji over the wire.

Doing it in some centralized place is nice, but that doesn't necessarily
mean doing it in the kernel.

dret@dgp.toronto.edu (George Drettakis) (11/10/88)

The Institute of Computer Science in Iraklion Crete, Greece has developed
a system for the support of Greek text in a 4.xBSD environment.
During the time I was there I was involved in the modification of
vi to support 8-bit characters. Three points are relevant to the discussion:
(a) The tty driver needs to be changed to allow 8-bit characters to pass
through. In 4.3 the change is trivial, but extending it to 16-bits could
be hairy. System V driver can be left unmodified for 8-bits.
(b) The changes to vi actually support 16 bit chars as the modifications
included changing the screen buffer to short array instead of char array.
(c) Supporting large character sets is a rather compilcated business as it
includes supporting the char set in several different places such as
terminal emulators, keyboard mapping, window systems etc.  Some code was
added to our vi to support greek letters for commands.

If anybody is interested write to:
Institute of Computer Science,
P.O. Box 1385
Iraklion, Crete, GR-711 10   GREECE,

and ask for a copy of Tech Report "Supporting 8-bit characters in Unix"
by George Drettakis and Vassilis Prevelakis (I dont have the number handy).

As far as I know SCO XENIX also supports 8-bit vi.

George Drettakis, Univeristy of Toronto, Canada.
UUCP:	dret@dgp.toronto.edu or dret@dgp.toronto.ca
BITNET:	dret@dgp.utoronto