[comp.text.tex] Problem with TeX character codes

knuutila@tucos.utu.fi (Timo Knuutila) (02/07/91)

Dear TeXackers,

I encountered the following problem when trying to marry TeX, PostScript fonts
and foreign language hyphenation in my TeX-system (SB-TeX, emTeX drivers and
font libraries, dvips54, 386 PC).

I have a scandinavian keyboard with special keys for the accented scandinavian
letters (\"a, \"o, \"A, \"O, \aa, \AA) and I'm trying to map these to the
corresponding PostScript character codes. In what follows, suppose that
x, y, X and Y stand for the characters of a-, o-, A- and O-umlaut,
respectively.

The keyboard codes for the characters, ie. the codes TeX sees at the first
glance, are as follows:

    x: 0x84   y: 0x94   X: 0x8E   Y: 0x99

The easiest way to map these characters to something TeX knows about, would be

    \catcode`\x=\active    \catcode`\y=\active
    \catcode`\X=\active    \catcode`\Y=\active

    \defx{\"a}  \defy{\"o}  \defX{\"A}  \defY{\"O}

but then it would be impossible to hyphenate words containing these characters
(and thus accents). They should be mapped directly to character codes (the
same that are used in the hyphenation patterns) in order to maintain the
hyphenation capability. The corresponding codes in the PostScript font tables
are given below:

    x: 0x99    y: 0x89   X: 0xD2   Y: 0xBE

I made the following macro definitions (after x etc had been declared to be
active characters):

    \defx{^^99}    \defy{^^89}    \defX{^^d2}    \defY{^^be}

The problem is that the `keyboard code' of Y is the same as the PS character
code of x (0x99). Thus, TeX maps x first to Y and then to ^^be, which is
obviously not what I want. However, even if I change the \defx to (11 is the
category code of letter characters)

    \defx{\catcode`^^99=11 ^^99\catcode`^^99=\active}

the result is just the same --- x is mapped to ^^BE. Where have I gone wrong?


        Timo Knuutila
        knuutila@cs.utu.fi

eijkhout@s41.csrd.uiuc.edu (Victor Eijkhout) (02/08/91)

knuutila@tucos.utu.fi (Timo Knuutila) writes:

>Dear TeXackers,

>I encountered the following problem when trying to marry TeX, PostScript fonts
>and foreign language hyphenation in my TeX-system (SB-TeX, emTeX drivers and
>font libraries, dvips54, 386 PC).

You could try adding virtual fonts to this...

>I have a scandinavian keyboard with special keys for the accented scandinavian
>letters (\"a, \"o, \"A, \"O, \aa, \AA) and I'm trying to map these to the
>corresponding PostScript character codes. In what follows, suppose that
>x, y, X and Y stand for the characters of a-, o-, A- and O-umlaut,
>respectively.

>The easiest way to map these characters to something TeX knows about, would be

>    \catcode`\x=\active  [misguided attempt deleted]

>They should be mapped directly to character codes (the

>    x: 0x99    [...]

>I made the following macro definitions (after x etc had been declared to be
>active characters):

>    \defx{^^99}    [...]

This is too low level. Have you tried 

  \catcode`x\active \def x{\char"99 }

? That addresses directly the font position, without trying
to force some earlier translation table.

Victor.

phil@cs.mcgill.ca (Philip LOCONG) (02/08/91)

In article <1991Feb7.175735.12642@csrd.uiuc.edu> eijkhout@s41.csrd.uiuc.edu (Victor Eijkhout) writes:
>knuutila@tucos.utu.fi (Timo Knuutila) writes:
>
>>Dear TeXackers,
>
>>I encountered the following problem when trying to marry TeX, PostScript fonts
>>and foreign language hyphenation in my TeX-system (SB-TeX, emTeX drivers and
>>font libraries, dvips54, 386 PC).
>
>You could try adding virtual fonts to this...
>
>...
>
>This is too low level. Have you tried 
>
>  \catcode`x\active \def x{\char"99 }
>
>? That addresses directly the font position, without trying
>to force some earlier translation table.
>
>Victor.

try emTeX, it has an option (TeX code page) just for this :-)

Phil

eijkhout@s41.csrd.uiuc.edu (Victor Eijkhout) (02/08/91)

phil@cs.mcgill.ca (Philip LOCONG) writes:

>In article <1991Feb7.175735.12642@csrd.uiuc.edu> eijkhout@s41.csrd.uiuc.edu (Victor Eijkhout) writes:

>>knuutila@tucos.utu.fi (Timo Knuutila) writes:

[...]

>try emTeX, it has an option (TeX code page) just for this :-)

Great. Where do I get emTeX for the Amiga? The ST? The Mac?
No, rather for Unix, that covers 99% of all computers,
and 99.999% of all mentionable ones...

:-), of course.

But seriously, hardcoding the character codes is one of
the main drawbacks of TeX. Inaccessibility of kerning
and ligature programs is another.

Victor.

mattes@azu.informatik.uni-stuttgart.de (Eberhard Mattes) (02/08/91)

Timo Knuutila wrote:

> I have a scandinavian keyboard with special keys for the accented scandinavian
> letters (\"a, \"o, \"A, \"O, \aa, \AA) and I'm trying to map these to the
> corresponding PostScript character codes. In what follows, suppose that
> x, y, X and Y stand for the characters of a-, o-, A- and O-umlaut,
> respectively.

Here's a file which I used for testing. It tries to work correctly even
with circular references like
  A -> B
  B -> A
which will cause an infinite loop or both A and B being replaced with B (eg)
if a simpler method were used. The method below should work correctly
with \hyphenation and \write.

----------------------------- test.tex --------------------------------------
{\catcode`^^e3\active \global\let^^e3=^^e4}
{\catcode`^^e4\active \global\let^^e4=^^e3}
% more redefinitions

\catcode`^^e3\active
\catcode`^^e4\active
% more active characters


% for \hyphenation
\lccode`^^e4=`^^e4

% this should print ^^e3^^e4
\message{^^e3^^e4}

% this shouldn't give an error message
\hyphenation{^^e3-a}

% this should write ^^e4 to test.aux
\openout1 test.aux
\write1{^^e4}
\closeout1

% this should write ^^e4^^e3 to test.dvi
\font\tendmr=dmr10
\tendmr
^^e3^^e4

\bye
--------------------------end of test.tex ------------------------------------
--
    Eberhard Mattes (mattes@azu.informatik.uni-stuttgart.de)