[comp.emacs] non-ASCII support

arndt@zyx.SE (Arndt Jonasson) (06/21/88)

Suggestions for how Gnu Emacs can be made to handle non-ASCII.

["non-ASCII characters" below refer to those characters in the set of
8-bit characters (of which set ASCII is a subset) that have codes >
127. Thus, they don't include EBCDIC or 16-bit characters.]

With the advent of the ISO Latin-1 standard, non-ASCII characters in
text files are going to be increasingly common, and already there are
manufacturers who support non-ASCII characters in their operating
systems.  Therefore, a few suggestions on how support for them can be
accomplished in Gnu Emacs with only minor effort.


1) Display.

With fairly minor changes to the C code, Emacs can be made to display
characters with codes > 127 not in the usual way (e.g. \314), but as
themselves, assuming that the virtual terminal can handle such
characters. The changes involve a half dozen tests in xdisp.c and
indent.c, including a Lisp flag to toggle the new functionality on and
off.


2) Input.

Assuming that the virtual terminal possesses the capability to let the
user enter non-ASCII characters from the keyboard, support for easy
input of them in Emacs (easy = without needing C-Q) can be implemented
in Lisp alone, with no C changes.


3) Character syntax.

This is affected by the Lisp function 'modify-syntax-entry' and
presents no problems.


4) Upper/lower-case conversion.

This is not available as a user-settable table. I suggest that it be
made user-settable, either by making the tables available as strings,
or through Lisp functions.


These are the areas that have come to my mind; are there any that I
have forgotten? I am using Gnu Emacs 18.49.


If there is interest among the Gnu Emacs developers to implement the
above suggestions, I will gladly supply the code that I have (which
implements 1 and 2).
-- 
Arndt Jonasson, ZYX Sweden AB, Styrmansgatan 6, 114 54 Stockholm, Sweden
email address:	 arndt@zyx.SE	or	<backbone>!mcvax!enea!zyx!arndt

janssen@titan.SW.MCC.COM (Bill Janssen) (06/23/88)

In article <2641@zyx.SE>, arndt@zyx.SE (Arndt Jonasson) writes:
> Suggestions for how Gnu Emacs can be made to handle non-ASCII.
...
> 1) Display.
...
> characters. The changes involve a half dozen tests in xdisp.c and
> indent.c, including a Lisp flag to toggle the new functionality on and
> off.

It isn't quite this easy.  A lot of the code that figures out "what line
is where" in the window uses the knowledge that certain character codes
take up 2 or 4 character positions.  This knowledge seems to be scattered
through the code, and might require some rooting to eliminate cleanly.

Bill

karl@haddock.ISC.COM (Karl Heuer) (06/25/88)

In article <807@titan.SW.MCC.COM> janssen@titan.SW.MCC.COM (Bill Janssen) writes:
>In article <2641@zyx.SE>, arndt@zyx.SE (Arndt Jonasson) writes:
>> Suggestions for how Gnu Emacs can be made to handle non-ASCII. ...
>> The [display] changes involve a half dozen tests in xdisp.c and indent.c,

>It isn't quite this easy.  A lot of the code that figures out "what line
>is where" in the window uses the knowledge that certain character codes
>take up 2 or 4 character positions.  This knowledge seems to be scattered
>through the code, and might require some rooting to eliminate cleanly.

It seems to me that any such knowledge, if it correctly handles control
characters, must test the ctl-arrow variable.  A grep on the 18.41 sources
revealed five places where it's being used in this way.  (As Arndt said, it's
confined to xdisp.c and indent.c.)  Can you give a specific example of
something else that would need to be changed?

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint