[comp.std.internat] Latin-2 character set

sandi@apollo.COM (Sandra Martin) (01/28/89)

I'd appreciate some help in locating the following:

    1.  A listing of the high half (decimal positions
        128-255) of the ISO 8859/2 character set. The
        set usually is called Latin-2.

    2.  A list of the languages Latin-2 supports.

    3.  A list of compose sequences (if one exists)
        for the Latin-2 characters. By compose sequences,
        I mean keystrokes that allow users to create
        any Latin-2 character with an ASCII-only keyboard.
        I know that such sequences exist for Latin-1, but
        I don't know whether anyone has defined them for
        Latin-2.

Thanks for your help.

   Sandra Martin, Apollo Computer
   sandi@apollo.com
   {decvax,mit-eddie,umix}!apollo!sandi
   apollo!sandi@eddie.mit.edu

bas+@andrew.cmu.edu (Bruce Sherwood) (01/30/89)

Sandra Martin writes that there already exist defined keystroke sequences "that
allow users to create any Latin-2 character with an ASCII-only keyboard."  I'd
love to hear the details -- can anyone tell me what these are?  Where do such
keystroke conventions come from?

Latin-2 characters are intended to handle Albanian, Czech, German, Hungarian,
Polish, Rumanian, Serbocroatian (the Latin-alphabet form, of course), Slovak,
and Slovene.  So basically East Europe.

I don't have a machine-readable listing of the 96 upper-half characters, so I'm
reluctant to do all that typing from the ISO document!

Bruce Sherwood
bas@andrew.cmu.edu  or  bas@andrew.bitnet

sandi@apollo.COM (Sandra Martin) (01/30/89)

Bruce Sherwood @ Carnegie Mellon, Pittsburgh, PA writes:
>  Sandra Martin writes that there already exist defined keystroke sequences "that
>  allow users to create any Latin-2 character with an ASCII-only keyboard."  I'd
>  love to hear the details -- can anyone tell me what these are?  Where do such
>  keystroke conventions come from?

There's been a misunderstanding here. Actually, I asked *whether*
such sequences exist for Latin-2. Here's the quote from my
original mail:

|    3.  A list of compose sequences (if one exists)
|        for the Latin-2 characters. By compose sequences,
|        I mean keystrokes that allow users to create
|        any Latin-2 character with an ASCII-only keyboard.
|        I know that such sequences exist for Latin-1, but
|        I don't know whether anyone has defined them for
|        Latin-2.

I can provide the Latin-1 compose sequences if anyone wants them.
We use DEC's defacto standard for these sequences.

Bruce, thanks for the list of languages Latin-2 supports.

   Sandra Martin, Apollo Computer
   sandi@apollo.com
   {decvax,mit-eddie,umix}!apollo!sandi
   apollo!sandi@eddie.mit.edu

bas+@andrew.cmu.edu (Bruce Sherwood) (01/30/89)

Oops.  I meant to ask about the supposedly already-existing Latin-1 keystroke
sequences, not the Latin-2 keystroke sequences (tho I'd be happy to learn of
them, too).

Bruce Sherwood
bas@andrew.cmu.edu  or  bas@andrew.bitnet

guy@auspex.UUCP (Guy Harris) (01/31/89)

>Sandra Martin writes that there already exist defined keystroke
>sequences "that allow users to create any Latin-2 character with
>an ASCII-only keyboard."

No, she doesn't.  What she writes is

        I know that such sequences exist for Latin-1, but
        I don't know whether anyone has defined them for
        Latin-2.

The sequences exist for Latin-1, but they may or may not exist for
Latin-2.

>I'd love to hear the details -- can anyone tell me what these are?

The way these "compose sequences" work - at least on the systems with
which I'm familar - is that there is some key that introduces the
compose sequence.  You type that key (e.g., on the Sun Type 4 keyboard
it's labelled "compose", on the DEC keyboards it's labelled "Compose
Character") and then you type two (or possibly more) keys.

Some level of {hard|firm|soft}ware recognizes this sequence and
generates an ISO Latin #1, #2, etc. code from them.

A couple of examples from the SunOS keyboard driver (at least on the
'386i; support for Sun-3 and Sun-4 will come in a later release):

	<Compose> r o -> "registered trademark character"
	<Compose> a " -> "a with an umlaut"

>Where do such keystroke conventions come from?

I don't know.  Is there a standard (*de jure* or *de facto*) for them?

>Latin-2 characters are intended to handle Albanian, Czech, German, Hungarian,
>Polish, Rumanian, Serbocroatian (the Latin-alphabet form, of course), Slovak,
>and Slovene.  So basically East Europe.

For what languages are Latin-3 and Latin-4 intended?

>I don't have a machine-readable listing of the 96 upper-half
>characters, so I'm reluctant to do all that typing from the ISO document!

Besides, many systems don't have a way of displaying text in Latin-2, so
there's no guarantee that a machine-readable listing would work on her
machine anyway....  She might want to post a paper-mail address so that
printed copies can be sent.

dik@cwi.nl (Dik T. Winter) (01/31/89)

In article <918@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
 > For what languages are Latin-3 and Latin-4 intended?
 > 
Latin-3: South Eastern Europe
Latin-4: Northern Europe
As far as I know they are still draft.
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

bas+@andrew.cmu.edu (Bruce Sherwood) (01/31/89)

I'm only too well aware that the Latin-3 character shapes aren't going to show
up in a mail message on most computers!  But there are listings in the ISO
documents of the names of the characters.  E.g., "small letter s with cedilla."
Here is the full listing of languages given in the ISO documents.

Latin-1:  Danish, Dutch, English, Faeroese, Finnish, French, German, Icelandic,
Irish, Italian, Norwegian, Portuguese, Spanish, Swedish.  Note that while
Latin-1 is sometimes referred to as handling West European languages, it doesn't
handle Catalan (see Latin-3) or Welsh (as I understand it, not handled by any
ISO-8859 set), and maybe other non-national but existing languages of West
Europe.

Latin-2:  Albanian, Czech, English, German, Hungarian, Polish, Rumanian,
Serbocroatian, Slovak, Slovene.

Latin-3:  Afrikaans, Catalan, Dutch, English, Esperanto, German, Italian,
Maltese, Spanish, Turkish.  Note that Dutch, German, Italian, and Spanish are
also covered by Latin-1.

Latin-4:  Danish, Estonian, English, Finnish, German, Greenlandic, Lappish,
Latvian, Lithuanian, Swedish, Norwegian.  Again, note the many overlaps with
Latin-1.

5) Cyrillic.

6) Arabic.

7) Greek.

8) Hebrew.

Bruce Sherwood
bas@andrew.cmu.edu  or  bas@andrew.bitnet