jhenry@randvax.UUCP (Jim Henry) (02/22/86)
What follows are tabulations of characters that might be needed to properly
support a particular language that are not part of the standard ASCII
character set. The master table from which these are drawn has been posted
to net.internat. I am cross-posting to net.nlang to solicit comments and
corrections. Please mail these to me. I will post a corrected version of
this information if there is enough interest. If you can authoritatively
state that some part of this is correct I'd like to hear that too! (I speak
only English so this hasn't been easy to do.) Followups have been directed
to net.internat.
Each character is described by its PostScript name or a PostScript-like
name which starts with a :. Usually a two ASCII character sequence is
given which suggests what the character looks like. (Visualize the two
characters as an overstrike.) Following this are the hexadecimal codes for
a number of popular systems which have an extended character set. The key
for these devices is as follows:
ASCII as
ISO Latin-1 l1
PostScript ps
IBM-PC pc
MacIntosh mc
DEC VT-220 vt
This is a tabulation of the countries that are covered by the ISO Latin-1
proposal and the languages used. Help in completing this would be
appreciated.
SP Argentina FI Finland SP Panama
BR Australia FR France SP Paraguay
GE Austria GE Germany SP Peru
Belgium SP Guatemala PR Portugal
Belize Guyana SP El Salvador
SP Bolivia SP Honduras SP Spain
PR Brazil IC Iceland Surinam
BR,FR Canada BR Ireland SW Sweden
SP Chile IT Italy Switzerland
SP Colombia Liechtenstein DU The Netherlands
SP Costa Rica Luxemburg BR United Kingdom
SP Cuba SP Mexico AM United States
DA Denmark BR New Zealand SP Uruguay
SP Ecuador SP Nicaragua SP Venezuela
Faroe Islands NO Norway
Language codes
--------------
American AM French FR Polish PL
British BR German GE Portuguese PR
Danish DA Hungarian HU Spanish SP
Dutch DU Icelandic IC Swedish SW
Esperanto ES Italian IT Turkish TU
Finnish FI Norwegian NO
I have tabulated seventeen languages using Latin alphabets. Are there
others that someone could tabulate?
American
American extension are basically all symbols. I think the ligatures fi and
fl are essentially typographic in nature and shouldn't be considered in a
standard character set. The typographic quotes can probably be dispensed
with in favor of the existing ASCII characters. dagger and double dagger
do not seem important. There are additional typographic symbols such as
copyright which are in the master table but I didn't pick-up all the
symbols when I extracted this.
PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt
cent c/ -- a2 a2 9b a2 a2 quotesingle -- -- 27 a9 -- 27 --
section S* -- a7 a7 15 a4 a7 quoteleft ` -- -- 60 -- d4 --
paragraph P| -- a7 b6 14 a6 b6 quoteright ' -- -- 27 -- d5 --
dagger -- -- -- b2 -- a0 -- quotedblleft -- -- -- aa -- d2 --
daggerdbl -- -- -- b3 -- -- -- quotedblright -- -- -- ba -- d3 --
ring ** -- b0 ca f8 a1 b0 fi -- -- -- ae -- -- --
fl -- -- -- af -- -- --
British
The only British extension that is not common with American is the pound
sterling symbol.
PostScript as l1 ps pc mc vt
sterling L- -- a3 a3 9c a3 a3
Danish
PostScript as l1 ps pc mc vt
ae ae -- e6 f1 91 be e6
AE AE -- c6 e1 92 ae c6
aring ao -- e5 -- 86 8c e5
Aring Ao -- c5 -- 8f 81 c5
oslash o/ -- f8 f9 -- bf f8
Oslash O/ -- d8 e9 ed af d8
ring ** -- b0 ca f8 a1 b0
Dutch
Is the Dutch ligature ij typographic in nature, like English fi, or should
it be part of an extended set? What is the use of y dieresis as a
substitute for ij? Is it good practice or a workaround?
PostScript as l1 ps pc mc vt
ij ij -- -- -- -- -- --
IJ IJ -- -- -- -- -- --
ydieresis y" -- ff -- 98 d8 fd
Ydieresis Y" -- -- -- -- -- dd
dieresis " -- a8 c8 -- ac --
Esperanto
Are all these still used in Esperanto or are some archaic?
PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt
:ccircumflex c^ -- -- -- -- -- -- :jcircumflex j^ -- -- -- -- -- --
:Ccircumflex C^ -- -- -- -- -- -- :Jcircumflex J^ -- -- -- -- -- --
:gcircumflex g^ -- -- -- -- -- -- :scircumflex s^ -- -- -- -- -- --
:Gcircumflex G^ -- -- -- -- -- -- :Scircumflex S^ -- -- -- -- -- --
:hcircumflex h^ -- -- -- -- -- -- :ubreve uu -- -- -- -- -- --
:Hcircumflex H^ -- -- -- -- -- -- :Ubreve Uu -- -- -- -- -- --
Finnish
PostScript as l1 ps pc mc vt
adieresis a" -- e4 -- 84 8a e4
Adieresis A" -- c4 -- 8e 80 c4
odieresis o" -- f6 -- 94 9a f6
Odieresis O" -- d6 -- 99 85 d6
dieresis " -- a8 c8 -- ac --
French
How important is the oe ligature? The ISO Latin-1 set does not have oe.
Is a single guillemot needed?
PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt
acircumflex a^ -- e2 -- 83 89 e2 ocircumflex o^ -- f4 -- 93 99 f4
Acircumflex A^ -- c2 -- -- -- c2 Ocircumflex O^ -- d4 -- -- -- d4
agrave a` -- e0 -- 85 88 e0 ucircumflex u^ -- fb -- 96 9e fb
Agrave A` -- c0 -- -- cb c0 Ucircumflex U^ -- db -- -- -- db
ccedilla c, -- e7 -- 87 8d e7 udieresis u" -- fc -- 81 9f fc
Ccedilla C, -- c7 -- 80 82 c7 Udieresis U" -- dc -- 9a 86 dc
eacute e' -- e9 -- 82 8e e9 ugrave u` -- f9 -- 97 9d f9
Eacute E' -- c9 -- 90 83 c9 Ugrave U` -- d9 -- -- -- d9
ecircumflex e^ -- ea -- 88 90 ea section S* -- a7 a7 15 a4 a7
Ecircumflex E^ -- ca -- -- -- ca guillemoright >> -- bb bb af c8 bb
edieresis e" -- eb -- 89 91 eb guillemotleft << -- ab ab ae c7 ab
Edieresis E" -- cb -- -- -- cb guilsinglleft -- -- -- ac -- -- --
egrave e` -- e8 -- 8a 8f e8 guilsinglright -- -- -- ad -- -- --
Egrave E` -- c8 -- -- -- c8 grave ` 60 -- c1 -- -- --
icircumflex i^ -- ee -- 8c 94 ee acute ' 27 b4 c2 -- ab --
Icircumflex I^ -- ce -- -- -- ce circumflex ^ 5e -- c3 -- -- --
idieresis i" -- ef -- 8b 95 ef dieresis " -- a8 c8 -- ac --
Idieresis I" -- cf -- -- -- cf cedilla , -- b8 cb -- -- --
oe oe -- -- fa -- cf f7
OE OE -- -- ea -- ce d7
German
Please remember that the double s is not the same as Greek beta! I don't
read German but that error still offends my eyes.
PostScript as l1 ps pc mc vt
adieresis a" -- e4 -- 84 8a e4
Adieresis A" -- c4 -- 8e 80 c4
odieresis o" -- f6 -- 94 9a f6
Odieresis O" -- d6 -- 99 85 d6
germandbls ss -- df fb e1 a7 df
udieresis u" -- fc -- 81 9f fc
Udieresis U" -- dc -- 9a 86 dc
dieresis " -- a8 c8 -- ac --
Hungarian
PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt
aacute a' -- e1 -- a0 87 e1 uacute u' -- fa -- a3 9c fa
Aacute A' -- c1 -- -- -- c1 Uacute U' -- da -- -- -- da
eacute e' -- e9 -- 82 8e e9 udieresis u" -- fc -- 81 9f fc
Eacute E' -- c9 -- 90 83 c9 Udieresis U" -- dc -- 9a 86 dc
iacute i' -- ed a1 92 ed -- :uhungarumlaut u* -- -- -- -- -- --
Iacute I' -- cd -- -- -- cd :Uhungarumlaut U* -- -- -- -- -- --
oacute o' -- f3 -- a2 97 f3 acute ' 27 b4 c2 -- ab --
Oacute O' -- d3 -- -- -- d3 dieresis " -- a8 c8 -- ac --
odieresis o" -- f6 -- 94 9a f6 hungarumlaut '' -- -- cd -- -- --
Odieresis O" -- d6 -- 99 85 d6
:ohungarumlaut o* -- -- -- -- -- --
:Ohungarumlaut O* -- -- -- -- -- --
Icelandic
I know nothing about Icelandic! Any other extended characters?
PostScript as l1 ps pc mc vt
:icelandiceth -- -- f0 -- -- -- --
:icelandicETH -- -- d0 -- -- -- --
:celandicthorn -- -- fe -- -- -- --
:celandicTHORN -- -- de -- -- -- --
Italian
Does Italian use Igrave and Ograve? I have information that says they are
used but rarely.
PostScript as l1 ps pc mc vt
agrave a` -- e0 -- 85 88 e0
egrave e` -- e8 -- 8a 8f e8
igrave i` -- ec -- 8d 93 ec
ograve o` -- f2 -- 95 98 f2
ugrave u` -- f9 -- 97 9d f9
florin -- -- -- a6 9f c4 --
grave ` 60 -- c1 -- -- --
Norwegian
PostScript as l1 ps pc mc vt
ae ae -- e6 f1 91 be e6
AE AE -- c6 e1 92 ae c6
aring ao -- e5 -- 86 8c e5
Aring Ao -- c5 -- 8f 81 c5
oslash o/ -- f8 f9 -- bf f8
Oslash O/ -- d8 e9 ed af d8
ring ** -- b0 ca f8 a1 b0
Polish
An ogonek is a mirror image of a cedilla.
PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt
:aogonek a, -- -- -- -- -- -- :sacute s' -- -- -- -- -- --
:Aogonek A, -- -- -- -- -- -- :Sacute S' -- -- -- -- -- --
:cacute c' -- -- -- -- -- -- :zacute z' -- -- -- -- -- --
:Cacute C' -- -- -- -- -- -- :Zacute Z' -- -- -- -- -- --
:eogonek e, -- -- -- -- -- -- :zdot z. -- -- -- -- -- --
:Eogonek E, -- -- -- -- -- -- :Zdot Z. -- -- -- -- -- --
lslash l/ -- -- f8 -- -- -- acute ' 27 b4 c2 -- ab --
Lslash L/ -- -- e8 -- -- -- dotaccent -- -- -- c7 -- -- --
oacute o' -- f3 -- a2 97 f3 dieresis " -- a8 c8 -- ac --
Oacute O' -- d3 -- -- -- d3 ogonek -- -- -- ce -- -- --
Portuguese
Are e tilde, i tilde, or u tilde used? Is there a special use of tilde in
contractions that needs to be considered?
PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt
ordfeminine a- -- aa e3 a6 bb aa ocircumflex o^ -- f4 -- 93 99 f4
atilde a~ -- e3 -- -- 8b e3 Ocircumflex O^ -- d4 -- -- -- d4
Atilde A~ -- c3 -- -- cc c3 otilde o~ -- f5 -- -- 9b f5
ccedilla c, -- e7 -- 87 8d e7 Otilde O~ -- d5 -- -- cd d5
Ccedilla C, -- c7 -- 80 82 c7 udieresis u" -- fc -- 81 9f fc
eacute e' -- e9 -- 82 8e e9 Udieresis U" -- dc -- 9a 86 dc
Eacute E' -- c9 -- 90 83 c9 acute ' 27 b4 c2 -- ab --
ecircumflex e^ -- ea -- 88 90 ea circumflex ^ 5e -- c3 -- -- --
Ecircumflex E^ -- ca -- -- -- ca tilde ~ 7e -- c4 -- -- --
ordmasculine o- -- ba eb a7 bc ba dieresis " -- a8 c8 -- ac --
oacute o' -- f3 -- a2 97 f3 cedilla , -- b8 cb -- -- --
Oacute O' -- d3 -- -- -- d3
Spanish
Is the Pt ligature for peseta needed?
PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt
aacute a' -- e1 -- a0 87 e1 uacute u' -- fa -- a3 9c fa
Aacute A' -- c1 -- -- -- c1 Uacute U' -- da -- -- -- da
eacute e' -- e9 -- 82 8e e9 udieresis u" -- fc -- 81 9f fc
Eacute E' -- c9 -- 90 83 c9 Udieresis U" -- dc -- 9a 86 dc
iacute i' -- ed a1 92 ed -- exclamdown !! -- a1 a1 ad c1 a1
Iacute I' -- cd -- -- -- cd questiondown ?? -- bf bf a8 c0 bf
ntilde n~ -- f1 -- a4 96 f1 :peseta Pt -- -- -- 9e -- --
Ntilde N~ -- d1 -- a5 84 d1 acute ' 27 b4 c2 -- ab --
oacute o' -- f3 -- a2 97 f3 tilde ~ 7e -- c4 -- -- --
Oacute O' -- d3 -- -- -- d3 dieresis " -- a8 c8 -- ac --
Swedish
PostScript as l1 ps pc mc vt
adieresis a" -- e4 -- 84 8a e4
Adieresis A" -- c4 -- 8e 80 c4
aring ao -- e5 -- 86 8c e5
Aring Ao -- c5 -- 8f 81 c5
odieresis o" -- f6 -- 94 9a f6
Odieresis O" -- d6 -- 99 85 d6
dieresis " -- a8 c8 -- ac --
ring ** -- b0 ca f8 a1 b0
Turkish
The breve is a rounded mark liked a squashed U. A caron is more angular
like a V. Is the accent on a G a breve or a caron?
PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt
acircumflex a^ -- e2 -- 83 89 e2 ucircumflex u^ -- fb -- 96 9e fb
Acircumflex A^ -- c2 -- -- -- c2 Ucircumflex U^ -- db -- -- -- db
ccedilla c, -- e7 -- 87 8d e7 udieresis u" -- fc -- 81 9f fc
Ccedilla C, -- c7 -- 80 82 c7 Udieresis U" -- dc -- 9a 86 dc
:gbreve gu -- -- -- -- -- -- circumflex ^ 5e -- c3 -- -- --
:Gbreve Gu -- -- -- -- -- -- breve -- -- -- c6 -- -- --
dotlessi -- -- -- f5 -- -- -- dotaccent -- -- -- c7 -- -- --
:Idot I. -- -- -- -- -- -- dieresis " -- a8 c8 -- ac --
odieresis o" -- f6 -- 94 9a f6 cedilla , -- b8 cb -- -- --
Odieresis O" -- d6 -- 99 85 d6
:scedilla s, -- -- -- -- -- --
:Scedilla S, -- -- -- -- -- --