[comp.soft-sys.andrew] 8-bit characters, how to use ?

henke@qt.ipa.fhg.de (Juergen Henke) (11/08/90)

Since patchlevel 7 there is support for 8-bit characters (ISO 8859) in
the ATK (they told...:-)).

But how can i get my a-umlaut, u-umlaut and so on in ez (for example) ?

Thanks in advance,

Juergen.

_________________________________________________________________________
Juergen Henke, e-mail juh@qt.IPA.FhG.de, PSI-mail PSI%4571109306::JUH_IPA
Fraunhofer-Institut f. Produktionstechnik u. Automatisierung
Eierstrasse 46, D-7000 Stuttgart 1

bernerus@CS.CHALMERS.SE (Christer Bernerus) (11/08/90)

Excerpts from info-andrew: 7-Nov-90 8-bit characters, how to use ?
Juergen Henke@qt.ipa.fhg (446+0)

> Since patchlevel 7 there is support for 8-bit characters (ISO 8859) in
> the ATK (they told...:-)).

> But how can i get my a-umlaut, u-umlaut and so on in ez (for example) ?

> Thanks in advance,

> Juergen.

> _________________________________________________________________________
> Juergen Henke, e-mail juh@qt.IPA.FhG.de, PSI-mail PSI%4571109306::JUH_IPA
> Fraunhofer-Institut f. Produktionstechnik u. Automatisierung
> Eierstrasse 46, D-7000 Stuttgart 1

\Warning{\German{

    J|rgen.

    Du kannst mit  "help compchar" hilfe finden. Ich brauche 8-bits
    um Schwedisch zu schreiben, und es funktzioniert ganz fein. 

    Um d zu schreiben, versuch mit  ^X-v a:<ret>. Es scheint etwa
    kompliziert, aber es gibt mvglichkeiten um ein mehr einfach
    verwendergrenzschnitt zu spezifizieren.

    Bitte entschuldigen Sie mir f|r meinen schlecten Deutschen
    Grammatik, aber es war unwiederstdndlich zu beweisen das es
    wirklich funkzioniert. D.h. wenn Du dieses Brief mit "messages"
    lest.

}
}

Please excuse me for writing the above in German, it's probably bad
grammar, but I couldn't resist the temptation of showing some of the
possibilites.

Those of you who read this on usenet or as unformatted mail probably saw
strange characters instead of u-umlaut, a-umlaut etc. This is because
unformatting ATK mail just strips the 8-th bit, and u-umlaut becomes |,
a-umlaut becomes d and o-umlaut becomes v. There are routines in ATK to
make more sane conversions, but I've never figured out where
unformatting of mail is done, and how it should be possible to use the
compchar routines there, but maybe some of the ATK gurus could tell me.

I think an 8-bit RFC822 would help a bit.



Chris.
-------------------------------------------------------
Christer Bernerus 			! E-mail: bernerus@cs.chalmers.se
Chalmers University of Technology		! Phone: +46 31 721000
Department of Computer Science		! Ham radio: SM6FBQ	144.3 MHz
S-412 96 Gothenburg, SWEDEN	

tpn+@ANDREW.CMU.EDU (Tom Neuendorffer) (11/09/90)

Many people reading Christer's reply may wonder why they didn't see the
advertised umlaut's. As he mentioned, we don't currently have a good
solution for those reading unformatted mail . But if you normally get 
formatted mail and are having problems, the following  should help you
fix things up.

If the second letter in J|rgen appears as an u-umlaut, you are in good
shape, ignore this message.

If it appears as a  vertical bar (J|rgen), then you are running a fairly
old version of ATK. If you expose styles, you will note an undefined
style around the bar with the name '@' (i.e. \@{|}). This is how we
specified that the high-bit should be turned on. It is backward
compatible in that if this file is rewritten, the old ATK  maintains the
information. Patches are available to upgrade you to a more recent
version; see the recent post by Susan (Re: What is andrew (CMU, I know
that!) ?).

If it appears as a hex number ( J\374rgen), this indicates that you have
the right version of ATK, but are using the wrong fonts. If you have all
of the fonts that came with that X11.R4 distribution from MIT, this can
be fixed by installing the non-andrew font alias file as either the
standard font alias file for your system, or for yourself on an
individual basis. 

To make it standard for your system, just copy
<ANDREW_SOURCE_DIR>/xmkfontd/non-andrew.fonts.alias to
$ANDREWDIR/X11fonts/fonts.alias, and either restart x or run' xset fp
rehash'.  Once installed and working, you will be able to delete the
cou* hel* and tim* font files from $ANDREWDIR/X11fonts, since they will
be replaced by their ISO counterparts from the R4 distribution. I would
recommend that all sites that expect to use this feature install this
alias file. We will try to get something put in the next patch that will
set this up automatically according to a site.mcr file variable.

To install it on an individual basis, do something like
    mkdir ~/myxfonts
    mkfontdir ~/myxfonts
    cp  ANDREW_SOURCE_DIR/xmkfontd/non-andrew.fonts.alias 
    ~/myxfonts/fonts.alias
    xset +fp ~/myxfonts
    xset fp rehash
The xset +fp call can be added to your .xinitrc file to add the
directory when you start up X.

Note: Once this alias file is installed, users will note greater space
between lines of text in ez, messages, etc. This is not a bug, it simply
reflects the fact that the height of these fonts has to be greater, in
order to allow room for accents over capital letters.

At this point, you should hopefully be able to view files containing ISO
characters. For help in entering these characters, see the help file on
cpchar (run help cpchar).  Other information is given in my 'ATK + 8859
= Multi-lingual Text and Mail' paper in the recent EUUG (now Europen)
proceedings. 

While I am at it, I would like to that Rob Ryan for all of his work in
getting the ISO stuff together. Thanks Rob!

If you have more problems or questions, please let us know.

	Regards,

		Tom N.
---------------------------
Tom Neuendorffer 	(tpn@andrew.cmu.edu)
Manager-ATK Group
Information Technology Center
Carnegie Mellon University
4910 Forbes Ave.
Pittsburgh, Pa. 15213-3890

bernerus@CS.CHALMERS.SE (Christer Bernerus) (11/09/90)

Excerpts from mail: 8-Nov-90 Re: 8-bit characters, how t..
Craig_Everhart@transarc. (389)

> Mail unformatting is done by the andrew/overhead/mail/lib/unscribe.c
> module.  Is it always obvious how to turn accented characters into
> non-accented ones?  I know some of the rules in German (what turns into
> (e.g.) oe, ae, ue, ss), but what rules apply to other languages? 
> Swedish, for instance?

> Certainly the unscribe.c module pre-dates any consideration of 8-bit
> characters.

> 		Craig

Thanks for pointing out unscribe.c for me. I had a look at it but it
doesn't seem trivial to enhance it the way I wanted.

What I had in mind was to use the compchar character table which allows
for "customary local replacements". Preferably using the ATKToASCII
function in textaux/compchar.c, but  it doesn't seem as if unscribe.c
was a part of the object-oriented stuff in ATK, so I'm very unsure how
to do it in a proper way. It can of course be done as a "hack", but I
feel that's a bit dangerous if e.g the lib/compchar/comps format changes.

Regarding the way conversions should be done, there are usually many
ways of doing this, even within a country, institution, group etc. So
the problem isn't trivial, especially not for a mail gateway which does
the unformatting. E.g. mail from Sweden containing  e, d,  v and even |
should probably be replaced with }, { | and u, but if the letter came
from Germany, maybe the replacements should be (there's no e in germany)
, ae, oe and ue respectively. Converting the other way round is
definitely non-trivial, epecially if the latter replacements are used.

In my opinion, the only thing that really helps for the (nearest) future
is an 8-bit extension to RFC822 which would make it "legal" to write
mailers which support 8 bit mail transparently. It doesn't solve the
whole world's problems though.

Chris.

Craig_Everhart@TRANSARC.COM (11/09/90)

Indeed, unscribe.c is not part of ATK at all, but has been a wart on the
side.  It is used by several non-ATK programs (the AMS message server,
AMDS, CUI, VUI) that don't need the overhead (distribution-time,
build-time, and execution-time) of getting involved in dynamic loading.

Fortunately, unscribe makes no pretensions at being able to invert its
transformations.  It does enough interpretations that doing so would be
impossible.  Thus, a run through UnScribe is a known way to lose
information.

As you suggest, the issue for a mail gateway is non-trivial, and the
reason is the same reason that the ``customary local replacements'' are
important.

Does 8-bit RFC822 mail really solve any problems?  What are recipients
in Germany supposed to do with Swedish e, since their displays can't
handle it?  What are they supposed to do with upside-down question marks
(?)?  We would always have the problem of the ``local extensions,'' no?

I could imagine doing worse things than reading the local-extensions
table in unscribe.

		Craig

henke@qt.ipa.fhg.de (Juergen Henke) (11/10/90)

Excerpts from mail: 9-Nov-90 Re: 8-bit characters, how t..
Craig_Everhart@transarc. (1041)

> Does 8-bit RFC822 mail really solve any problems?  What are recipients
> in Germany supposed to do with Swedish e, since their displays can't
> handle it?  What are they supposed to do with upside-down question marks
> (?)?  We would always have the problem of the ``local extensions,'' no?

> I could imagine doing worse things than reading the local-extensions
> table in unscribe.

> 		Craig

Craig, there's of course a problem with the swedish e (e| ?), but most
of the special (country specific) characters are in ISO 8859. So a 8 bit
RFC 822 would help a lot to those outside the (native) english speaking
world...

	J|rgen

P.S.: You notice the u-umlaut in my name ? :-) or :-( ?

_________________________________________________________________________
Juergen Henke, e-mail juh@qt.IPA.FhG.de, PSI-mail PSI%4571109306::JUH_IPA
Fraunhofer-Institut f. Produktionstechnik u. Automatisierung
Eierstrasse 46, D-7000 Stuttgart 1