[comp.std.internat] Re : International emacs

gunnar@hafro.UUCP (02/16/87)

A while ago I asked the net about versions of emacs that could handle
international character sets. Thanks to all those that responded
(quite many, in fact). Here is a summary of the findings.

The problem : To get a version of emacs which can handle international
character sets. In our case this was intended simply to mean that the
editor should leave all the bits in a byte alone, i.e. be 8-bit
transparent : Inserting a character at the keyboard must result in the
same code being inserted into the file and also thrown at screen. Thus
there must be no stripping of the 8th bit, nor interpreting it as a
"meta"-bit.  Other people may want some conversion to be done, e.g. if
their terminal can't handle the character set that they want to use
internally in the computer, but that is not our basic problem. We just
want to leave the bytes in peace.

Our primary interest is with emacs-related editors, so although there
are some wordprocessing systems and enhanced vi's, I won't mention
those any further.  Other things like nroff etc can be handled fairly
easily with pre- and post- filters, so that's no problem -- we just
want to be able to edit a Western European language of our choice and
see the language we are writing.  I think the English speaking
population would object to writing "[is" for "this", "b|i" for "be"
etc, which is essentially what a lot of others put up with.  So let's
not have an argument about purpose.

The following lists all the versions of emacs (under Unix) that I
have ever heard of :  GNU Emacs, Unipress Emacs, MicroGnuEmacs,
MicroEmacs, Jove and Scame.  I believe I have found out what each one
does to the 8th bit.  Please let me know if I am making any errors or
omissions.

(1) GNU EMACS.  Gnu Emacs seems to assume that setting the 8th bit
means the user hit the Meta key.  So the character is bound to a
command.  If quoted (with ^Q), the character does get inserted into
the text, but displayed as an octal value.  A quick fix is to hack up
GNU Emacs to make the character self-insert, display correctly
and then of course the in-line cursor positioning must be fixed.  The
trick is not to lose the possibility of doing the corresponding
meta-command with the escape-prefix.  Such a fix has been performed
but is not currently available to the net (the site performing the fix
isn't on the net).  At another site, a much more drastic modification
to GNU Emacs has been made, where Gnu Emacs has been made to do
all the conversions necessary to display any character set on any terminal.
The plan is to have this mod distributed with a future version of GNU Emacs.
The only problem of course with GNU Emacs is that it only runs on
fairly large machines and not even all of those (I have some DEC Pro's
w/Venix, an AT w/Venix, along with an HP 9000/550 with HP-UX and GNU
Emacs won't run on any of those). 

(2) Unipress Emacs.  Another big one.  This one simply strips the 8th
bit on input (even when quoted).  Unipress says it's working on the
international character set problem, but that's all we've heard from
them.

(3) Scame currently displays the 8th bit with a prefix (as in ~A),
but international support is being provided by the author.  It's not
clear to me how easy it is currently to insert a character with the
8th bit set into a file.

(4) MicroEmacs. There are several variants of this editor. Two of
them are called by this same name. I believe both strip the 8th bit
on input (this is heard from users -- I haven't tried it myself).

(5) MicroGnuEmacs, as just recently posted on mod.sources.  Now this
editor is 8-bit transparent if you compile it with the DO_METAKEY
option disabled and change 0x7F to 0xFF in the only place it occurs
in a for statement in symbol.c, or even better if you modify that
for-loop to become :  
	for (i=0x20; i<0xFF; ++i) { 
		if (binding[i] == NULL) 
			binding[i] = sp; 
	} 
(yes, I've tried this one out, and there is only one such for-loop).
MicroGnuEmacs is also very quick in startup - a nice change as far as
Emacses are concerned.

(6) Jove. Very nice editor. Seems to have more capabilities than
MicroGnuEmacs. Unfortunately, it strips the 8th bit.

So if you need to edit international text (i.e. the full 8bit ascii
table) and see what you are doing, the only currently available emacs
editor is MicroGnuEmacs, but there seem to be several others on the
way.
-- 

-----------------------------------------------------------------------------
Gunnar Stefansson                       {mcvax,enea}!hafro!gunnar 
Marine Research Institute, Reykjavik    gunnar@hafro.UUCP