[net.nlang] Extended character sets -- tabulation

jhenry@randvax.UUCP (Jim Henry) (02/22/86)

What follows are tabulations of characters that might be needed to properly
support a particular language that are not part of the standard ASCII
character set.  The master table from which these are drawn has been posted
to net.internat.  I am cross-posting to net.nlang to solicit comments and
corrections.  Please mail these to me.  I will post a corrected version of
this information if there is enough interest.  If you can authoritatively
state that some part of this is correct I'd like to hear that too! (I speak
only English so this hasn't been easy to do.)  Followups have been directed
to net.internat.

Each character is described by its PostScript name or a PostScript-like
name which starts with a :.  Usually a two ASCII character sequence is
given which suggests what the character looks like.  (Visualize the two
characters as an overstrike.)  Following this are the hexadecimal codes for
a number of popular systems which have an extended character set.  The key
for these devices is as follows:

		      ASCII as
		ISO Latin-1 l1
		 PostScript ps
		     IBM-PC pc
		  MacIntosh mc
		 DEC VT-220 vt

This is a tabulation of the countries that are covered by the ISO Latin-1
proposal and the languages used.  Help in completing this would be
appreciated.

      SP Argentina            FI Finland              SP Panama
      BR Australia            FR France               SP Paraguay
      GE Austria              GE Germany              SP Peru
	 Belgium              SP Guatemala            PR Portugal
	 Belize                  Guyana               SP El Salvador
      SP Bolivia              SP Honduras             SP Spain
      PR Brazil               IC Iceland                 Surinam
   BR,FR Canada               BR Ireland              SW Sweden
      SP Chile                IT Italy                   Switzerland
      SP Colombia                Liechtenstein        DU The Netherlands
      SP Costa Rica              Luxemburg            BR United Kingdom
      SP Cuba                 SP Mexico               AM United States
      DA Denmark              BR New Zealand          SP Uruguay
      SP Ecuador              SP Nicaragua            SP Venezuela
	 Faroe Islands        NO Norway

Language codes
--------------
  American AM               French FR               Polish PL
   British BR               German GE           Portuguese PR
    Danish DA            Hungarian HU              Spanish SP
     Dutch DU            Icelandic IC              Swedish SW
 Esperanto ES              Italian IT              Turkish TU
   Finnish FI            Norwegian NO

I have tabulated seventeen languages using Latin alphabets.  Are there
others that someone could tabulate?

American

American extension are basically all symbols.  I think the ligatures fi and
fl are essentially typographic in nature and shouldn't be considered in a
standard character set.  The typographic quotes can probably be dispensed
with in favor of the existing ASCII characters.  dagger and double dagger
do not seem important.  There are additional typographic symbols such as
copyright which are in the master table but I didn't pick-up all the
symbols when I extracted this.

    PostScript    as l1 ps pc mc vt       PostScript    as l1 ps pc mc vt
	  cent c/ -- a2 a2 9b a2 a2      quotesingle -- -- 27 a9 -- 27 --
       section S* -- a7 a7 15 a4 a7        quoteleft `  -- -- 60 -- d4 --
     paragraph P| -- a7 b6 14 a6 b6       quoteright '  -- -- 27 -- d5 --
	dagger -- -- -- b2 -- a0 --     quotedblleft -- -- -- aa -- d2 --
     daggerdbl -- -- -- b3 -- -- --    quotedblright -- -- -- ba -- d3 --
	  ring ** -- b0 ca f8 a1 b0               fi -- -- -- ae -- -- --
						  fl -- -- -- af -- -- --

British

The only British extension that is not common with American is the pound
sterling symbol.

    PostScript    as l1 ps pc mc vt
      sterling L- -- a3 a3 9c a3 a3

Danish

    PostScript    as l1 ps pc mc vt
	    ae ae -- e6 f1 91 be e6
	    AE AE -- c6 e1 92 ae c6
	 aring ao -- e5 -- 86 8c e5
	 Aring Ao -- c5 -- 8f 81 c5
	oslash o/ -- f8 f9 -- bf f8
	Oslash O/ -- d8 e9 ed af d8
	  ring ** -- b0 ca f8 a1 b0

Dutch

Is the Dutch ligature ij typographic in nature, like English fi, or should
it be part of an extended set?  What is the use of y dieresis as a
substitute for ij?  Is it good practice or a workaround?

    PostScript    as l1 ps pc mc vt
	    ij ij -- -- -- -- -- --
	    IJ IJ -- -- -- -- -- --
     ydieresis y" -- ff -- 98 d8 fd
     Ydieresis Y" -- -- -- -- -- dd
      dieresis "  -- a8 c8 -- ac --

Esperanto

Are all these still used in Esperanto or are some archaic?

    PostScript    as l1 ps pc mc vt       PostScript    as l1 ps pc mc vt
  :ccircumflex c^ -- -- -- -- -- --     :jcircumflex j^ -- -- -- -- -- --
  :Ccircumflex C^ -- -- -- -- -- --     :Jcircumflex J^ -- -- -- -- -- --
  :gcircumflex g^ -- -- -- -- -- --     :scircumflex s^ -- -- -- -- -- --
  :Gcircumflex G^ -- -- -- -- -- --     :Scircumflex S^ -- -- -- -- -- --
  :hcircumflex h^ -- -- -- -- -- --          :ubreve uu -- -- -- -- -- --
  :Hcircumflex H^ -- -- -- -- -- --          :Ubreve Uu -- -- -- -- -- --

Finnish

    PostScript    as l1 ps pc mc vt
     adieresis a" -- e4 -- 84 8a e4
     Adieresis A" -- c4 -- 8e 80 c4
     odieresis o" -- f6 -- 94 9a f6
     Odieresis O" -- d6 -- 99 85 d6
      dieresis "  -- a8 c8 -- ac --

French

How important is the oe ligature?  The ISO Latin-1 set does not have oe.
Is a single guillemot needed?

    PostScript    as l1 ps pc mc vt       PostScript    as l1 ps pc mc vt
   acircumflex a^ -- e2 -- 83 89 e2      ocircumflex o^ -- f4 -- 93 99 f4
   Acircumflex A^ -- c2 -- -- -- c2      Ocircumflex O^ -- d4 -- -- -- d4
	agrave a` -- e0 -- 85 88 e0      ucircumflex u^ -- fb -- 96 9e fb
	Agrave A` -- c0 -- -- cb c0      Ucircumflex U^ -- db -- -- -- db
      ccedilla c, -- e7 -- 87 8d e7        udieresis u" -- fc -- 81 9f fc
      Ccedilla C, -- c7 -- 80 82 c7        Udieresis U" -- dc -- 9a 86 dc
	eacute e' -- e9 -- 82 8e e9           ugrave u` -- f9 -- 97 9d f9
	Eacute E' -- c9 -- 90 83 c9           Ugrave U` -- d9 -- -- -- d9
   ecircumflex e^ -- ea -- 88 90 ea          section S* -- a7 a7 15 a4 a7
   Ecircumflex E^ -- ca -- -- -- ca    guillemoright >> -- bb bb af c8 bb
     edieresis e" -- eb -- 89 91 eb    guillemotleft << -- ab ab ae c7 ab
     Edieresis E" -- cb -- -- -- cb    guilsinglleft -- -- -- ac -- -- --
	egrave e` -- e8 -- 8a 8f e8   guilsinglright -- -- -- ad -- -- --
	Egrave E` -- c8 -- -- -- c8            grave `  60 -- c1 -- -- --
   icircumflex i^ -- ee -- 8c 94 ee            acute '  27 b4 c2 -- ab --
   Icircumflex I^ -- ce -- -- -- ce       circumflex ^  5e -- c3 -- -- --
     idieresis i" -- ef -- 8b 95 ef         dieresis "  -- a8 c8 -- ac --
     Idieresis I" -- cf -- -- -- cf          cedilla ,  -- b8 cb -- -- --
	    oe oe -- -- fa -- cf f7
	    OE OE -- -- ea -- ce d7

German

Please remember that the double s is not the same as Greek beta!  I don't
read German but that error still offends my eyes.

    PostScript    as l1 ps pc mc vt
     adieresis a" -- e4 -- 84 8a e4
     Adieresis A" -- c4 -- 8e 80 c4
     odieresis o" -- f6 -- 94 9a f6
     Odieresis O" -- d6 -- 99 85 d6
    germandbls ss -- df fb e1 a7 df
     udieresis u" -- fc -- 81 9f fc
     Udieresis U" -- dc -- 9a 86 dc
      dieresis "  -- a8 c8 -- ac --

Hungarian

    PostScript    as l1 ps pc mc vt       PostScript    as l1 ps pc mc vt
	aacute a' -- e1 -- a0 87 e1           uacute u' -- fa -- a3 9c fa
	Aacute A' -- c1 -- -- -- c1           Uacute U' -- da -- -- -- da
	eacute e' -- e9 -- 82 8e e9        udieresis u" -- fc -- 81 9f fc
	Eacute E' -- c9 -- 90 83 c9        Udieresis U" -- dc -- 9a 86 dc
	iacute i' -- ed a1 92 ed --   :uhungarumlaut u* -- -- -- -- -- --
	Iacute I' -- cd -- -- -- cd   :Uhungarumlaut U* -- -- -- -- -- --
	oacute o' -- f3 -- a2 97 f3            acute '  27 b4 c2 -- ab --
	Oacute O' -- d3 -- -- -- d3         dieresis "  -- a8 c8 -- ac --
     odieresis o" -- f6 -- 94 9a f6     hungarumlaut '' -- -- cd -- -- --
     Odieresis O" -- d6 -- 99 85 d6
:ohungarumlaut o* -- -- -- -- -- --
:Ohungarumlaut O* -- -- -- -- -- --

Icelandic

I know nothing about Icelandic!  Any other extended characters?

    PostScript    as l1 ps pc mc vt
 :icelandiceth -- -- f0 -- -- -- --
 :icelandicETH -- -- d0 -- -- -- --
:celandicthorn -- -- fe -- -- -- --
:celandicTHORN -- -- de -- -- -- --

Italian

Does Italian use Igrave and Ograve?  I have information that says they are
used but rarely.

    PostScript    as l1 ps pc mc vt
	agrave a` -- e0 -- 85 88 e0
	egrave e` -- e8 -- 8a 8f e8
	igrave i` -- ec -- 8d 93 ec
	ograve o` -- f2 -- 95 98 f2
	ugrave u` -- f9 -- 97 9d f9
	florin -- -- -- a6 9f c4 --
	 grave `  60 -- c1 -- -- --

Norwegian

    PostScript    as l1 ps pc mc vt
	    ae ae -- e6 f1 91 be e6
	    AE AE -- c6 e1 92 ae c6
	 aring ao -- e5 -- 86 8c e5
	 Aring Ao -- c5 -- 8f 81 c5
	oslash o/ -- f8 f9 -- bf f8
	Oslash O/ -- d8 e9 ed af d8
	  ring ** -- b0 ca f8 a1 b0

Polish

An ogonek is a mirror image of a cedilla.

    PostScript    as l1 ps pc mc vt       PostScript    as l1 ps pc mc vt
      :aogonek a, -- -- -- -- -- --          :sacute s' -- -- -- -- -- --
      :Aogonek A, -- -- -- -- -- --          :Sacute S' -- -- -- -- -- --
       :cacute c' -- -- -- -- -- --          :zacute z' -- -- -- -- -- --
       :Cacute C' -- -- -- -- -- --          :Zacute Z' -- -- -- -- -- --
      :eogonek e, -- -- -- -- -- --            :zdot z. -- -- -- -- -- --
      :Eogonek E, -- -- -- -- -- --            :Zdot Z. -- -- -- -- -- --
	lslash l/ -- -- f8 -- -- --            acute '  27 b4 c2 -- ab --
	Lslash L/ -- -- e8 -- -- --        dotaccent -- -- -- c7 -- -- --
	oacute o' -- f3 -- a2 97 f3         dieresis "  -- a8 c8 -- ac --
	Oacute O' -- d3 -- -- -- d3           ogonek -- -- -- ce -- -- --

Portuguese

Are e tilde, i tilde, or u tilde used?  Is there a special use of tilde in
contractions that needs to be considered?

    PostScript    as l1 ps pc mc vt       PostScript    as l1 ps pc mc vt
   ordfeminine a- -- aa e3 a6 bb aa      ocircumflex o^ -- f4 -- 93 99 f4
	atilde a~ -- e3 -- -- 8b e3      Ocircumflex O^ -- d4 -- -- -- d4
	Atilde A~ -- c3 -- -- cc c3           otilde o~ -- f5 -- -- 9b f5
      ccedilla c, -- e7 -- 87 8d e7           Otilde O~ -- d5 -- -- cd d5
      Ccedilla C, -- c7 -- 80 82 c7        udieresis u" -- fc -- 81 9f fc
	eacute e' -- e9 -- 82 8e e9        Udieresis U" -- dc -- 9a 86 dc
	Eacute E' -- c9 -- 90 83 c9            acute '  27 b4 c2 -- ab --
   ecircumflex e^ -- ea -- 88 90 ea       circumflex ^  5e -- c3 -- -- --
   Ecircumflex E^ -- ca -- -- -- ca            tilde ~  7e -- c4 -- -- --
  ordmasculine o- -- ba eb a7 bc ba         dieresis "  -- a8 c8 -- ac --
	oacute o' -- f3 -- a2 97 f3          cedilla ,  -- b8 cb -- -- --
	Oacute O' -- d3 -- -- -- d3

Spanish

Is the Pt ligature for peseta needed?

    PostScript    as l1 ps pc mc vt       PostScript    as l1 ps pc mc vt
	aacute a' -- e1 -- a0 87 e1           uacute u' -- fa -- a3 9c fa
	Aacute A' -- c1 -- -- -- c1           Uacute U' -- da -- -- -- da
	eacute e' -- e9 -- 82 8e e9        udieresis u" -- fc -- 81 9f fc
	Eacute E' -- c9 -- 90 83 c9        Udieresis U" -- dc -- 9a 86 dc
	iacute i' -- ed a1 92 ed --       exclamdown !! -- a1 a1 ad c1 a1
	Iacute I' -- cd -- -- -- cd     questiondown ?? -- bf bf a8 c0 bf
	ntilde n~ -- f1 -- a4 96 f1          :peseta Pt -- -- -- 9e -- --
	Ntilde N~ -- d1 -- a5 84 d1            acute '  27 b4 c2 -- ab --
	oacute o' -- f3 -- a2 97 f3            tilde ~  7e -- c4 -- -- --
	Oacute O' -- d3 -- -- -- d3         dieresis "  -- a8 c8 -- ac --

Swedish

    PostScript    as l1 ps pc mc vt
     adieresis a" -- e4 -- 84 8a e4
     Adieresis A" -- c4 -- 8e 80 c4
	 aring ao -- e5 -- 86 8c e5
	 Aring Ao -- c5 -- 8f 81 c5
     odieresis o" -- f6 -- 94 9a f6
     Odieresis O" -- d6 -- 99 85 d6
      dieresis "  -- a8 c8 -- ac --
	  ring ** -- b0 ca f8 a1 b0

Turkish

The breve is a rounded mark liked a squashed U.  A caron is more angular
like a V.  Is the accent on a G a breve or a caron?

    PostScript    as l1 ps pc mc vt       PostScript    as l1 ps pc mc vt
   acircumflex a^ -- e2 -- 83 89 e2      ucircumflex u^ -- fb -- 96 9e fb
   Acircumflex A^ -- c2 -- -- -- c2      Ucircumflex U^ -- db -- -- -- db
      ccedilla c, -- e7 -- 87 8d e7        udieresis u" -- fc -- 81 9f fc
      Ccedilla C, -- c7 -- 80 82 c7        Udieresis U" -- dc -- 9a 86 dc
       :gbreve gu -- -- -- -- -- --       circumflex ^  5e -- c3 -- -- --
       :Gbreve Gu -- -- -- -- -- --            breve -- -- -- c6 -- -- --
      dotlessi -- -- -- f5 -- -- --        dotaccent -- -- -- c7 -- -- --
	 :Idot I. -- -- -- -- -- --         dieresis "  -- a8 c8 -- ac --
     odieresis o" -- f6 -- 94 9a f6          cedilla ,  -- b8 cb -- -- --
     Odieresis O" -- d6 -- 99 85 d6
     :scedilla s, -- -- -- -- -- --
     :Scedilla S, -- -- -- -- -- --