lasko@video.dec.com (Tim Lasko[TBU Architecture]DTN 223-2186) (08/06/87)
First, I'd like to apologize to those who are waiting for answers: unfortunately, I've been unable to read netnews postings for some time for assorted reasons, and I further compounded the problem by having been out of the office for a good portion of the last month, and giving an incorrect net address. I regret that I cannot send copies (hard or electronic) of the standard to the literally dozens of people who have asked me for one. ISO 8859-1 can be ordered in the U.S., like all ISO standards, from the American National Standards Institute, 1430 Broadway, New York, NY 10018 [(212) 354-3300; $18.00 for ISO 8859, Part 1; $16 for Part 2]. Those of you outside of the U.S. should contact your nation's standards body (e.g. BSI, DIN, NNI, etc.). I have, however, included the list of characters of ISO Latin-1 (pending 8-bit ASCII) at the end of this message. Let me first update the situation, as of the ISO working group meeting (ISO TC97/SC2/WG2) held last month in Geneva: If you recall, ISO Latin-1 (IS 8859-1) is actually one of an entire family of eight-bit one-byte character sets, all having ASCII on the left hand side, and with varying repertoires on the right hand side: IS 8859-1 Latin Alphabet No 1 Western Europe, released 2/87 IS 8859-2 Latin Alphabet No 2 Eastern Europe, released 2/87 ISO DIS 8859-3 Latin Alphabet No 3 SE Europe, etc., ballot ends 10/87 ISO DIS 8859-4 Latin Alphabet No 4 Northern Europe, ballot ends 10/87 * ISO DIS 8859-5.2 Latin-Cyrillic Alphabet new ballot to be sent by 9/87 ** IS 8859-6 Latin-Arabic Alphabet released 5/87 *** IS 8859-7 Latin-Greek Alphabet approved 7/87 **** ISO DIS 8859-8 Latin-Hebrew Alphabet ballot to be sent by 9/87 ***** * There is some sentiment among the Scandinavian countries to propose a new code table to include a number of minority languages spoken in that region which are not accomodated by the current '-4 draft. This could mean that the draft will be re-issued if there is sufficient consensus at the end of the year. ** At the working group meeting, a new USSR (GOST) code page was brought by the Czechoslovakian member which made several changes from the current '-5 draft; the members agreed that we wanted to be as close as possible to the GOST standard, so a new draft ballot for '-5 is being issued. (The most significant difference, for information, is that the GOST standard does not include a $ at 24 hex.) *** According to the ANSI sales department, this part is not yet available. I suggest waiting a few months before ordering. **** All of the comments on the Latin-Greek ('-7) draft were handled at the meeting, and publication should follow in a few months. ***** The Latin-Hebrew draft corresponds to the approved ECMA-118, and the Israeli SII standard (not sure of number). To answer a number of questions: 1) "How can I get a hold of XXX?" As I mentioned above, the approved IS 8859-1, and '-2 may be ordered from ANSI (or your nation's standards body). IS 8859-6 and '-7 should be available through the same sources shortly. ISO DIS 8859-3, '-4, and probably the OLD '-5, are also available through ANSI, call for details. ISO DIS 8859-8, of course, is not available yet. The following ECMA standards correspond to the indicated ISO standards: ECMA-94 IS 8859-1, '-2, ISO DIS 8859-3, '-4 ECMA-114 IS 8859-6 ECMA-117 IS 8859-7 ECMA-118 ISO DIS 8859-8 and the ECMA standards would be reasonable interim substitutes, as the actual characters do not (are not likely in the case of drafts) to differ from the ISO standard. ECMA standards can be ordered (free) from the European Computer Manufacturers Association, 114 Rue du Rhone, 1204 Geneva, Switzerland. (The o in Rhone should have a circumflex [^] over it [it would be great if the net understood Latin-1.]) (Note: ECMA-113 is the Latin-Cyrillic alphabet of the PREVIOUS Latin-Cyrillic ISO DIS 8859-5 draft; as mentioned above, a new ISO draft is going to be issued, and I understand that the ECMA standard will change.) 2) Why doesn't Latin-1 have an OE or oe ligature? Early drafts of ISO 8859-1 did include the OE and oe ligature characters. Many, including the U.S., felt that they were important for the French language. However, in 1985, AFNOR, the French member body to the ISO committee, stated that they could be removed, since they are technically not part of the French alphabet, being only a very commonly used presentation ligature. Since the purpose of the standard was to be efficient for data processing, and that OE and oe are always processed as the separate letters O and E, there was no need to have a separate code for the ligature. This resulted in the next draft having the two code positions (D7 and F7 hex) blank, and then for the DIS ballot they were filled with multiplication sign and division sign. The filling was needed since it was clear that leaving the two positions blank would lead vendors to include private characters in those positions, leading to small but difficult incompatibilities. The actual characters chosen were a compromise from several candidates. In late 1986, while under DIS ballot, the French did vote against '8859-1, asking that OE and oe ligature be re-included. After the fact, it was learned that the French PTT had not until 1986 been participating in the AFNOR coding committee; once they were sitting on the committee, the consensus within AFNOR changed to again ask for those ligatures. Unfortunately, by this time it was too late to make such a technical change to the standard. (Further, CEN/CENELEC coding committies do not require the OE and oe ligatures, either.) Therefore the OE and oe have never been replaced in any of the '8859 code tables. 3) Were the special requirements of the Welsh language considered? We (the U.S., ASC X3L2) realized a bit too late that certain characters needed to properly represent the Welsh language (w and y with circumflex) weren't conveniently available in any of the '8859 sets, and tried to change Part 4 to include them. However, there was neither room nor consensus within the ISO committee to include these, so these too do not exist in any of the '8859 code tables. (Arguably, the BSI should have been looking out for the requirements of Welsh, but for a number of reasons that I choose not to go into here, they did not.) Attached is the repertoire of ISO Latin Alphabet Nr 1 (IS 8859-1). I have indicated an alternate name where there might be confusion in the U.S.. R/C - row/column of code table Dec - Decimal Oct - Octal R/C Dec Oct Symbol Name 02/00 032 040 SP SPACE 02/01 033 041 ! EXCLAMATION POINT 02/02 034 042 " QUOTATION MARK 02/03 035 043 # NUMBER SIGN 02/04 036 044 $ DOLLAR SIGN 02/05 037 045 % PERCENT SIGN 02/06 038 046 & AMPERSAND 02/07 039 047 ' APOSTROPHE 02/08 040 050 ( LEFT PARENTHESIS 02/09 041 051 ) RIGHT PARENTHESIS 02/10 042 052 * ASTERISK 02/11 043 053 + PLUS SIGN 02/12 044 054 , COMMA 02/13 045 055 - HYPHEN, MINUS SIGN 02/14 046 056 . FULL STOP (U.S.: PERIOD, DECIMAL POINT) 02/15 047 057 / SOLIDUS (U.S.: SLASH) 03/00 048 060 0 DIGIT ZERO 03/01 049 061 1 DIGIT ONE 03/02 050 062 2 DIGIT TWO 03/03 051 063 3 DIGIT THREE 03/04 052 064 4 DIGIT FOUR 03/05 053 065 5 DIGIT FIVE 03/06 054 066 6 DIGIT SIX 03/07 055 067 7 DIGIT SEVEN 03/08 056 070 8 DIGIT EIGHT 03/09 057 071 9 DIGIT NINE 03/10 058 072 : COLON 03/11 059 073 ; SEMICOLON 03/12 060 074 < LESS-THAN SIGN 03/13 061 075 = EQUALS SIGN 03/14 062 076 > GREATER-THAN SIGN 03/15 063 077 ? QUESTION MARK 04/00 064 100 @ COMMERCIAL AT 04/01 065 101 A LATIN CAPITAL LETTER A 04/02 066 102 B LATIN CAPITAL LETTER B 04/03 067 103 C LATIN CAPITAL LETTER C 04/04 068 104 D LATIN CAPITAL LETTER D 04/05 069 105 E LATIN CAPITAL LETTER E 04/06 070 106 F LATIN CAPITAL LETTER F 04/07 071 107 G LATIN CAPITAL LETTER G 04/08 072 110 H LATIN CAPITAL LETTER H 04/09 073 111 I LATIN CAPITAL LETTER I 04/10 074 112 J LATIN CAPITAL LETTER J 04/11 075 113 K LATIN CAPITAL LETTER K 04/12 076 114 L LATIN CAPITAL LETTER L 04/13 077 115 M LATIN CAPITAL LETTER M 04/14 078 116 N LATIN CAPITAL LETTER N 04/15 079 117 O LATIN CAPITAL LETTER O 05/00 080 120 P LATIN CAPITAL LETTER P 05/01 081 121 Q LATIN CAPITAL LETTER Q 05/02 082 122 R LATIN CAPITAL LETTER R 05/03 083 123 S LATIN CAPITAL LETTER S 05/04 084 124 T LATIN CAPITAL LETTER T 05/05 085 125 U LATIN CAPITAL LETTER U 05/06 086 126 V LATIN CAPITAL LETTER V 05/07 087 127 W LATIN CAPITAL LETTER W 05/08 088 130 X LATIN CAPITAL LETTER X 05/09 089 131 Y LATIN CAPITAL LETTER Y 05/10 090 132 Z LATIN CAPITAL LETTER Z 05/11 091 133 [ LEFT SQUARE BRACKET 05/12 092 134 \ REVERSE SOLIDUS (U.S.: BACK SLASH) 05/13 093 135 ] RIGHT SQUARE BRACKET 05/14 094 136 ^ CIRCUMFLEX ACCENT 05/15 095 137 _ LOW LINE 06/00 096 140 ` GRAVE ACCENT 06/01 097 141 a LATIN SMALL LETTER a 06/02 098 142 b LATIN SMALL LETTER b 06/03 099 143 c LATIN SMALL LETTER c 06/04 100 144 d LATIN SMALL LETTER d 06/05 101 145 e LATIN SMALL LETTER e 06/06 102 146 f LATIN SMALL LETTER f 06/07 103 147 g LATIN SMALL LETTER g 06/08 104 150 h LATIN SMALL LETTER h 06/09 105 151 i LATIN SMALL LETTER i 06/10 106 152 j LATIN SMALL LETTER j 06/11 107 153 k LATIN SMALL LETTER k 06/12 108 154 l LATIN SMALL LETTER l 06/13 109 155 m LATIN SMALL LETTER m 06/14 110 156 n LATIN SMALL LETTER n 06/15 111 157 o LATIN SMALL LETTER o 07/00 112 160 p LATIN SMALL LETTER p 07/01 113 161 q LATIN SMALL LETTER q 07/02 114 162 r LATIN SMALL LETTER r 07/03 115 163 s LATIN SMALL LETTER s 07/04 116 164 t LATIN SMALL LETTER t 07/05 117 165 u LATIN SMALL LETTER u 07/06 118 166 v LATIN SMALL LETTER v 07/07 119 167 w LATIN SMALL LETTER w 07/08 120 170 x LATIN SMALL LETTER x 07/09 121 171 y LATIN SMALL LETTER y 07/10 122 172 z LATIN SMALL LETTER z 07/11 123 173 { LEFT CURLY BRACKET 07/12 124 174 | VERTICAL LINE 07/13 125 175 } RIGHT CURLY BRACKET 07/14 126 176 ~ TILDE 10/00 160 240 NBSP NO-BREAK SPACE 10/01 161 241 INVERTED EXCLAMATION MARK 10/02 162 242 CENT SIGN 10/03 163 243 POUND SIGN 10/04 164 244 CURRENCY SIGN 10/05 165 245 YEN SIGN 10/06 166 246 BROKEN BAR 10/07 167 247 PARAGRAPH SIGN, (U.S.) SECTION SIGN 10/08 168 250 DIERESIS 10/09 169 251 COPYRIGHT SIGN 10/10 170 252 FEMININE ORDINAL INDICATOR 10/11 171 253 LEFT ANGLE QUOTATION MARK 10/12 172 254 NOT SIGN 10/13 173 255 SHY SOFT HYPHEN 10/14 174 256 REGISTERED TRADEMARK SIGN 10/15 175 257 MACRON 11/00 176 260 RING ABOVE, DEGREE SIGN 11/01 177 261 PLUS-MINUS SIGN 11/02 178 262 SUPERSCRIPT TWO 11/03 179 263 SUPERSCRIPT THREE 11/04 180 264 ACUTE ACCENT 11/05 181 265 MICRO SIGN 11/06 182 266 PILCROW SIGN, (U.S.) PARAGRAPH 11/07 183 267 MIDDLE DOT 11/08 184 270 CEDILLA 11/09 185 271 SUPERSCRIPT ONE 11/10 186 272 MASCULINE ORDINAL INDICATOR 11/11 187 273 RIGHT ANGLE QUOTATION MARK 11/12 188 274 VULGAR FRACTION ONE QUARTER 11/13 189 275 VULGAR FRACTION ONE HALF 11/14 190 276 VULGAR FRACTION THREE QUARTERS 11/15 191 277 INVERTED QUESTION MARK 12/00 192 300 LATIN CAPITAL LETTER A WITH GRAVE ACCENT 12/01 193 301 LATIN CAPITAL LETTER A WITH ACUTE ACCENT 12/02 194 302 LATIN CAPITAL LETTER A WITH CIRCUMFLEX ACCENT 12/03 195 303 LATIN CAPITAL LETTER A WITH TILDE 12/04 196 304 LATIN CAPITAL LETTER A WITH DIAERESIS 12/05 197 305 LATIN CAPITAL LETTER A WITH RING ABOVE 12/06 198 306 CAPITAL DIPHTHONG AE 12/07 199 307 LATIN CAPITAL LETTER C WITH CEDILLA 12/08 200 310 LATIN CAPITAL LETTER E WITH GRAVE ACCENT 12/09 201 311 LATIN CAPITAL LETTER E WITH ACUTE ACCENT 12/10 202 312 LATIN CAPITAL LETTER E WITH CIRCUMFLEX ACCENT 12/11 203 313 LATIN CAPITAL LETTER E WITH DIAERESIS 12/12 204 314 LATIN CAPITAL LETTER I WITH GRAVE ACCENT 12/13 205 315 LATIN CAPITAL LETTER I WITH ACUTE ACCENT 12/14 206 316 LATIN CAPITAL LETTER I WITH CIRCUMFLEX ACCENT 12/15 207 317 LATIN CAPITAL LETTER I WITH DIAERESIS 13/00 208 320 CAPITAL ICELANDIC LETTER ETH 13/01 209 321 LATIN CAPITAL LETTER N WITH TILDE 13/02 210 322 LATIN CAPITAL LETTER O WITH GRAVE ACCENT 13/03 211 323 LATIN CAPITAL LETTER O WITH ACUTE ACCENT 13/04 212 324 LATIN CAPITAL LETTER O WITH CIRCUMFLEX ACCENT 13/05 213 325 LATIN CAPITAL LETTER O WITH TILDE 13/06 214 326 LATIN CAPITAL LETTER O WITH DIAERESIS 13/07 215 327 MULTIPLICATION SIGN 13/08 216 330 LATIN CAPITAL LETTER O WITH OBLIQUE STROKE 13/09 217 331 LATIN CAPITAL LETTER U WITH GRAVE ACCENT 13/10 218 332 LATIN CAPITAL LETTER U WITH ACUTE ACCENT 13/11 219 333 LATIN CAPITAL LETTER U WITH CIRCUMFLEX 13/12 220 334 LATIN CAPITAL LETTER U WITH DIAERESIS 13/13 221 335 LATIN CAPITAL LETTER Y WITH ACUTE ACCENT 13/14 222 336 CAPITAL ICELANDIC LETTER THORN 13/15 223 337 SMALL GERMAN LETTER SHARP s 14/00 224 340 LATIN SMALL LETTER a WITH GRAVE ACCENT 14/01 225 341 LATIN SMALL LETTER a WITH ACUTE ACCENT 14/02 226 342 LATIN SMALL LETTER a WITH CIRCUMFLEX ACCENT 14/03 227 343 LATIN SMALL LETTER a WITH TILDE 14/04 228 344 LATIN SMALL LETTER a WITH DIAERESIS 14/05 229 345 LATIN SMALL LETTER a WITH RING ABOVE 14/06 230 346 SMALL DIPHTHONG ae 14/07 231 347 LATIN SMALL LETTER c WITH CEDILLA 14/08 232 350 LATIN SMALL LETTER e WITH GRAVE ACCENT 14/09 233 351 LATIN SMALL LETTER e WITH ACUTE ACCENT 14/10 234 352 LATIN SMALL LETTER e WITH CIRCUMFLEX ACCENT 14/11 235 353 LATIN SMALL LETTER e WITH DIAERESIS 14/12 236 354 LATIN SMALL LETTER i WITH GRAVE ACCENT 14/13 237 355 LATIN SMALL LETTER i WITH ACUTE ACCENT 14/14 238 356 LATIN SMALL LETTER i WITH CIRCUMFLEX ACCENT 14/15 239 357 LATIN SMALL LETTER i WITH DIAERESIS 15/00 240 360 SMALL ICELANDIC LETTER ETH 15/01 241 361 LATIN SMALL LETTER n WITH TILDE 15/02 242 362 LATIN SMALL LETTER o WITH GRAVE ACCENT 15/03 243 363 LATIN SMALL LETTER o WITH ACUTE ACCENT 15/04 244 364 LATIN SMALL LETTER o WITH CIRCUMFLEX ACCENT 15/05 245 365 LATIN SMALL LETTER o WITH TILDE 15/06 246 366 LATIN SMALL LETTER o WITH DIAERESIS 15/07 247 367 DIVISION SIGN 15/08 248 370 LATIN SMALL LETTER o WITH OBLIQUE STROKE 15/09 249 371 LATIN SMALL LETTER u WITH GRAVE ACCENT 15/10 250 372 LATIN SMALL LETTER u WITH ACUTE ACCENT 15/11 251 373 LATIN SMALL LETTER u WITH CIRCUMFLEX ACCENT 15/12 252 374 LATIN SMALL LETTER u WITH DIAERESIS 15/13 253 375 LATIN SMALL LETTER y WITH ACUTE ACCENT 15/14 254 376 SMALL ICELANDIC LETTER THORN 15/15 255 377 LATIN SMALL LETTER y WITH DIAERESIS =================================== Tim Lasko Digital Equipment Corporation Maynard, MA (decvax!video.dec.com!lasko, lasko%video.dec@decwrl, lasko@video.dec.com)