jhenry@randvax.UUCP (Jim Henry) (02/22/86)
What follows are tabulations of characters that might be needed to properly support a particular language that are not part of the standard ASCII character set. The master table from which these are drawn has been posted to net.internat. I am cross-posting to net.nlang to solicit comments and corrections. Please mail these to me. I will post a corrected version of this information if there is enough interest. If you can authoritatively state that some part of this is correct I'd like to hear that too! (I speak only English so this hasn't been easy to do.) Followups have been directed to net.internat. Each character is described by its PostScript name or a PostScript-like name which starts with a :. Usually a two ASCII character sequence is given which suggests what the character looks like. (Visualize the two characters as an overstrike.) Following this are the hexadecimal codes for a number of popular systems which have an extended character set. The key for these devices is as follows: ASCII as ISO Latin-1 l1 PostScript ps IBM-PC pc MacIntosh mc DEC VT-220 vt This is a tabulation of the countries that are covered by the ISO Latin-1 proposal and the languages used. Help in completing this would be appreciated. SP Argentina FI Finland SP Panama BR Australia FR France SP Paraguay GE Austria GE Germany SP Peru Belgium SP Guatemala PR Portugal Belize Guyana SP El Salvador SP Bolivia SP Honduras SP Spain PR Brazil IC Iceland Surinam BR,FR Canada BR Ireland SW Sweden SP Chile IT Italy Switzerland SP Colombia Liechtenstein DU The Netherlands SP Costa Rica Luxemburg BR United Kingdom SP Cuba SP Mexico AM United States DA Denmark BR New Zealand SP Uruguay SP Ecuador SP Nicaragua SP Venezuela Faroe Islands NO Norway Language codes -------------- American AM French FR Polish PL British BR German GE Portuguese PR Danish DA Hungarian HU Spanish SP Dutch DU Icelandic IC Swedish SW Esperanto ES Italian IT Turkish TU Finnish FI Norwegian NO I have tabulated seventeen languages using Latin alphabets. Are there others that someone could tabulate? American American extension are basically all symbols. I think the ligatures fi and fl are essentially typographic in nature and shouldn't be considered in a standard character set. The typographic quotes can probably be dispensed with in favor of the existing ASCII characters. dagger and double dagger do not seem important. There are additional typographic symbols such as copyright which are in the master table but I didn't pick-up all the symbols when I extracted this. PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt cent c/ -- a2 a2 9b a2 a2 quotesingle -- -- 27 a9 -- 27 -- section S* -- a7 a7 15 a4 a7 quoteleft ` -- -- 60 -- d4 -- paragraph P| -- a7 b6 14 a6 b6 quoteright ' -- -- 27 -- d5 -- dagger -- -- -- b2 -- a0 -- quotedblleft -- -- -- aa -- d2 -- daggerdbl -- -- -- b3 -- -- -- quotedblright -- -- -- ba -- d3 -- ring ** -- b0 ca f8 a1 b0 fi -- -- -- ae -- -- -- fl -- -- -- af -- -- -- British The only British extension that is not common with American is the pound sterling symbol. PostScript as l1 ps pc mc vt sterling L- -- a3 a3 9c a3 a3 Danish PostScript as l1 ps pc mc vt ae ae -- e6 f1 91 be e6 AE AE -- c6 e1 92 ae c6 aring ao -- e5 -- 86 8c e5 Aring Ao -- c5 -- 8f 81 c5 oslash o/ -- f8 f9 -- bf f8 Oslash O/ -- d8 e9 ed af d8 ring ** -- b0 ca f8 a1 b0 Dutch Is the Dutch ligature ij typographic in nature, like English fi, or should it be part of an extended set? What is the use of y dieresis as a substitute for ij? Is it good practice or a workaround? PostScript as l1 ps pc mc vt ij ij -- -- -- -- -- -- IJ IJ -- -- -- -- -- -- ydieresis y" -- ff -- 98 d8 fd Ydieresis Y" -- -- -- -- -- dd dieresis " -- a8 c8 -- ac -- Esperanto Are all these still used in Esperanto or are some archaic? PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt :ccircumflex c^ -- -- -- -- -- -- :jcircumflex j^ -- -- -- -- -- -- :Ccircumflex C^ -- -- -- -- -- -- :Jcircumflex J^ -- -- -- -- -- -- :gcircumflex g^ -- -- -- -- -- -- :scircumflex s^ -- -- -- -- -- -- :Gcircumflex G^ -- -- -- -- -- -- :Scircumflex S^ -- -- -- -- -- -- :hcircumflex h^ -- -- -- -- -- -- :ubreve uu -- -- -- -- -- -- :Hcircumflex H^ -- -- -- -- -- -- :Ubreve Uu -- -- -- -- -- -- Finnish PostScript as l1 ps pc mc vt adieresis a" -- e4 -- 84 8a e4 Adieresis A" -- c4 -- 8e 80 c4 odieresis o" -- f6 -- 94 9a f6 Odieresis O" -- d6 -- 99 85 d6 dieresis " -- a8 c8 -- ac -- French How important is the oe ligature? The ISO Latin-1 set does not have oe. Is a single guillemot needed? PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt acircumflex a^ -- e2 -- 83 89 e2 ocircumflex o^ -- f4 -- 93 99 f4 Acircumflex A^ -- c2 -- -- -- c2 Ocircumflex O^ -- d4 -- -- -- d4 agrave a` -- e0 -- 85 88 e0 ucircumflex u^ -- fb -- 96 9e fb Agrave A` -- c0 -- -- cb c0 Ucircumflex U^ -- db -- -- -- db ccedilla c, -- e7 -- 87 8d e7 udieresis u" -- fc -- 81 9f fc Ccedilla C, -- c7 -- 80 82 c7 Udieresis U" -- dc -- 9a 86 dc eacute e' -- e9 -- 82 8e e9 ugrave u` -- f9 -- 97 9d f9 Eacute E' -- c9 -- 90 83 c9 Ugrave U` -- d9 -- -- -- d9 ecircumflex e^ -- ea -- 88 90 ea section S* -- a7 a7 15 a4 a7 Ecircumflex E^ -- ca -- -- -- ca guillemoright >> -- bb bb af c8 bb edieresis e" -- eb -- 89 91 eb guillemotleft << -- ab ab ae c7 ab Edieresis E" -- cb -- -- -- cb guilsinglleft -- -- -- ac -- -- -- egrave e` -- e8 -- 8a 8f e8 guilsinglright -- -- -- ad -- -- -- Egrave E` -- c8 -- -- -- c8 grave ` 60 -- c1 -- -- -- icircumflex i^ -- ee -- 8c 94 ee acute ' 27 b4 c2 -- ab -- Icircumflex I^ -- ce -- -- -- ce circumflex ^ 5e -- c3 -- -- -- idieresis i" -- ef -- 8b 95 ef dieresis " -- a8 c8 -- ac -- Idieresis I" -- cf -- -- -- cf cedilla , -- b8 cb -- -- -- oe oe -- -- fa -- cf f7 OE OE -- -- ea -- ce d7 German Please remember that the double s is not the same as Greek beta! I don't read German but that error still offends my eyes. PostScript as l1 ps pc mc vt adieresis a" -- e4 -- 84 8a e4 Adieresis A" -- c4 -- 8e 80 c4 odieresis o" -- f6 -- 94 9a f6 Odieresis O" -- d6 -- 99 85 d6 germandbls ss -- df fb e1 a7 df udieresis u" -- fc -- 81 9f fc Udieresis U" -- dc -- 9a 86 dc dieresis " -- a8 c8 -- ac -- Hungarian PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt aacute a' -- e1 -- a0 87 e1 uacute u' -- fa -- a3 9c fa Aacute A' -- c1 -- -- -- c1 Uacute U' -- da -- -- -- da eacute e' -- e9 -- 82 8e e9 udieresis u" -- fc -- 81 9f fc Eacute E' -- c9 -- 90 83 c9 Udieresis U" -- dc -- 9a 86 dc iacute i' -- ed a1 92 ed -- :uhungarumlaut u* -- -- -- -- -- -- Iacute I' -- cd -- -- -- cd :Uhungarumlaut U* -- -- -- -- -- -- oacute o' -- f3 -- a2 97 f3 acute ' 27 b4 c2 -- ab -- Oacute O' -- d3 -- -- -- d3 dieresis " -- a8 c8 -- ac -- odieresis o" -- f6 -- 94 9a f6 hungarumlaut '' -- -- cd -- -- -- Odieresis O" -- d6 -- 99 85 d6 :ohungarumlaut o* -- -- -- -- -- -- :Ohungarumlaut O* -- -- -- -- -- -- Icelandic I know nothing about Icelandic! Any other extended characters? PostScript as l1 ps pc mc vt :icelandiceth -- -- f0 -- -- -- -- :icelandicETH -- -- d0 -- -- -- -- :celandicthorn -- -- fe -- -- -- -- :celandicTHORN -- -- de -- -- -- -- Italian Does Italian use Igrave and Ograve? I have information that says they are used but rarely. PostScript as l1 ps pc mc vt agrave a` -- e0 -- 85 88 e0 egrave e` -- e8 -- 8a 8f e8 igrave i` -- ec -- 8d 93 ec ograve o` -- f2 -- 95 98 f2 ugrave u` -- f9 -- 97 9d f9 florin -- -- -- a6 9f c4 -- grave ` 60 -- c1 -- -- -- Norwegian PostScript as l1 ps pc mc vt ae ae -- e6 f1 91 be e6 AE AE -- c6 e1 92 ae c6 aring ao -- e5 -- 86 8c e5 Aring Ao -- c5 -- 8f 81 c5 oslash o/ -- f8 f9 -- bf f8 Oslash O/ -- d8 e9 ed af d8 ring ** -- b0 ca f8 a1 b0 Polish An ogonek is a mirror image of a cedilla. PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt :aogonek a, -- -- -- -- -- -- :sacute s' -- -- -- -- -- -- :Aogonek A, -- -- -- -- -- -- :Sacute S' -- -- -- -- -- -- :cacute c' -- -- -- -- -- -- :zacute z' -- -- -- -- -- -- :Cacute C' -- -- -- -- -- -- :Zacute Z' -- -- -- -- -- -- :eogonek e, -- -- -- -- -- -- :zdot z. -- -- -- -- -- -- :Eogonek E, -- -- -- -- -- -- :Zdot Z. -- -- -- -- -- -- lslash l/ -- -- f8 -- -- -- acute ' 27 b4 c2 -- ab -- Lslash L/ -- -- e8 -- -- -- dotaccent -- -- -- c7 -- -- -- oacute o' -- f3 -- a2 97 f3 dieresis " -- a8 c8 -- ac -- Oacute O' -- d3 -- -- -- d3 ogonek -- -- -- ce -- -- -- Portuguese Are e tilde, i tilde, or u tilde used? Is there a special use of tilde in contractions that needs to be considered? PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt ordfeminine a- -- aa e3 a6 bb aa ocircumflex o^ -- f4 -- 93 99 f4 atilde a~ -- e3 -- -- 8b e3 Ocircumflex O^ -- d4 -- -- -- d4 Atilde A~ -- c3 -- -- cc c3 otilde o~ -- f5 -- -- 9b f5 ccedilla c, -- e7 -- 87 8d e7 Otilde O~ -- d5 -- -- cd d5 Ccedilla C, -- c7 -- 80 82 c7 udieresis u" -- fc -- 81 9f fc eacute e' -- e9 -- 82 8e e9 Udieresis U" -- dc -- 9a 86 dc Eacute E' -- c9 -- 90 83 c9 acute ' 27 b4 c2 -- ab -- ecircumflex e^ -- ea -- 88 90 ea circumflex ^ 5e -- c3 -- -- -- Ecircumflex E^ -- ca -- -- -- ca tilde ~ 7e -- c4 -- -- -- ordmasculine o- -- ba eb a7 bc ba dieresis " -- a8 c8 -- ac -- oacute o' -- f3 -- a2 97 f3 cedilla , -- b8 cb -- -- -- Oacute O' -- d3 -- -- -- d3 Spanish Is the Pt ligature for peseta needed? PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt aacute a' -- e1 -- a0 87 e1 uacute u' -- fa -- a3 9c fa Aacute A' -- c1 -- -- -- c1 Uacute U' -- da -- -- -- da eacute e' -- e9 -- 82 8e e9 udieresis u" -- fc -- 81 9f fc Eacute E' -- c9 -- 90 83 c9 Udieresis U" -- dc -- 9a 86 dc iacute i' -- ed a1 92 ed -- exclamdown !! -- a1 a1 ad c1 a1 Iacute I' -- cd -- -- -- cd questiondown ?? -- bf bf a8 c0 bf ntilde n~ -- f1 -- a4 96 f1 :peseta Pt -- -- -- 9e -- -- Ntilde N~ -- d1 -- a5 84 d1 acute ' 27 b4 c2 -- ab -- oacute o' -- f3 -- a2 97 f3 tilde ~ 7e -- c4 -- -- -- Oacute O' -- d3 -- -- -- d3 dieresis " -- a8 c8 -- ac -- Swedish PostScript as l1 ps pc mc vt adieresis a" -- e4 -- 84 8a e4 Adieresis A" -- c4 -- 8e 80 c4 aring ao -- e5 -- 86 8c e5 Aring Ao -- c5 -- 8f 81 c5 odieresis o" -- f6 -- 94 9a f6 Odieresis O" -- d6 -- 99 85 d6 dieresis " -- a8 c8 -- ac -- ring ** -- b0 ca f8 a1 b0 Turkish The breve is a rounded mark liked a squashed U. A caron is more angular like a V. Is the accent on a G a breve or a caron? PostScript as l1 ps pc mc vt PostScript as l1 ps pc mc vt acircumflex a^ -- e2 -- 83 89 e2 ucircumflex u^ -- fb -- 96 9e fb Acircumflex A^ -- c2 -- -- -- c2 Ucircumflex U^ -- db -- -- -- db ccedilla c, -- e7 -- 87 8d e7 udieresis u" -- fc -- 81 9f fc Ccedilla C, -- c7 -- 80 82 c7 Udieresis U" -- dc -- 9a 86 dc :gbreve gu -- -- -- -- -- -- circumflex ^ 5e -- c3 -- -- -- :Gbreve Gu -- -- -- -- -- -- breve -- -- -- c6 -- -- -- dotlessi -- -- -- f5 -- -- -- dotaccent -- -- -- c7 -- -- -- :Idot I. -- -- -- -- -- -- dieresis " -- a8 c8 -- ac -- odieresis o" -- f6 -- 94 9a f6 cedilla , -- b8 cb -- -- -- Odieresis O" -- d6 -- 99 85 d6 :scedilla s, -- -- -- -- -- -- :Scedilla S, -- -- -- -- -- --