lasko@video.dec.com (Tim Lasko[TBU Architecture]DTN 223-2186) (08/06/87)
First, I'd like to apologize to those who are waiting for answers:
unfortunately, I've been unable to read netnews postings for some time for
assorted reasons, and I further compounded the problem by having been out of
the office for a good portion of the last month, and giving an incorrect
net address.
I regret that I cannot send copies (hard or electronic) of the
standard to the literally dozens of people who have asked me for
one. ISO 8859-1 can be ordered in the U.S., like all ISO standards,
from the American National Standards Institute, 1430 Broadway, New
York, NY 10018 [(212) 354-3300; $18.00 for ISO 8859, Part 1; $16 for
Part 2]. Those of you outside of the U.S. should contact your
nation's standards body (e.g. BSI, DIN, NNI, etc.).
I have, however, included the list of characters of ISO Latin-1
(pending 8-bit ASCII) at the end of this message.
Let me first update the situation, as of the ISO working group meeting (ISO
TC97/SC2/WG2) held last month in Geneva: If you recall, ISO Latin-1 (IS 8859-1)
is actually one of an entire family of eight-bit one-byte character sets, all
having ASCII on the left hand side, and with varying repertoires on the right
hand side:
IS 8859-1 Latin Alphabet No 1 Western Europe, released 2/87
IS 8859-2 Latin Alphabet No 2 Eastern Europe, released 2/87
ISO DIS 8859-3 Latin Alphabet No 3 SE Europe, etc., ballot ends 10/87
ISO DIS 8859-4 Latin Alphabet No 4 Northern Europe, ballot ends 10/87 *
ISO DIS 8859-5.2 Latin-Cyrillic Alphabet new ballot to be sent by 9/87 **
IS 8859-6 Latin-Arabic Alphabet released 5/87 ***
IS 8859-7 Latin-Greek Alphabet approved 7/87 ****
ISO DIS 8859-8 Latin-Hebrew Alphabet ballot to be sent by 9/87 *****
* There is some sentiment among the Scandinavian countries to propose a
new code table to include a number of minority languages spoken in
that region which are not accomodated by the current '-4 draft. This
could mean that the draft will be re-issued if there is sufficient
consensus at the end of the year.
** At the working group meeting, a new USSR (GOST) code page was brought
by the Czechoslovakian member which made several changes from the
current '-5 draft; the members agreed that we wanted to be as close
as possible to the GOST standard, so a new draft ballot for '-5
is being issued. (The most significant difference, for information,
is that the GOST standard does not include a $ at 24 hex.)
*** According to the ANSI sales department, this part is not yet available.
I suggest waiting a few months before ordering.
**** All of the comments on the Latin-Greek ('-7) draft were handled at the
meeting, and publication should follow in a few months.
***** The Latin-Hebrew draft corresponds to the approved ECMA-118, and
the Israeli SII standard (not sure of number).
To answer a number of questions:
1) "How can I get a hold of XXX?"
As I mentioned above, the approved IS 8859-1, and '-2 may be ordered
from ANSI (or your nation's standards body). IS 8859-6 and '-7
should be available through the same sources shortly. ISO DIS
8859-3, '-4, and probably the OLD '-5, are also available through
ANSI, call for details. ISO DIS 8859-8, of course, is not available
yet.
The following ECMA standards correspond to the indicated ISO
standards:
ECMA-94 IS 8859-1, '-2, ISO DIS 8859-3, '-4
ECMA-114 IS 8859-6
ECMA-117 IS 8859-7
ECMA-118 ISO DIS 8859-8
and the ECMA standards would be reasonable interim substitutes, as
the actual characters do not (are not likely in the case of drafts)
to differ from the ISO standard. ECMA standards can be ordered
(free) from the European Computer Manufacturers Association, 114 Rue
du Rhone, 1204 Geneva, Switzerland. (The o in Rhone should have a
circumflex [^] over it [it would be great if the net understood
Latin-1.]) (Note: ECMA-113 is the Latin-Cyrillic alphabet of the
PREVIOUS Latin-Cyrillic ISO DIS 8859-5 draft; as mentioned above, a
new ISO draft is going to be issued, and I understand that the ECMA
standard will change.)
2) Why doesn't Latin-1 have an OE or oe ligature?
Early drafts of ISO 8859-1 did include the OE and oe ligature
characters. Many, including the U.S., felt that they were important
for the French language. However, in 1985, AFNOR, the French
member body to the ISO committee, stated that they could be removed,
since they are technically not part of the French alphabet, being
only a very commonly used presentation ligature. Since the purpose
of the standard was to be efficient for data processing, and
that OE and oe are always processed as the separate letters O
and E, there was no need to have a separate code for the ligature.
This resulted in the next draft having the two code positions (D7
and F7 hex) blank, and then for the DIS ballot they were filled with
multiplication sign and division sign. The filling was needed since
it was clear that leaving the two positions blank would lead vendors
to include private characters in those positions, leading to small
but difficult incompatibilities. The actual characters chosen were
a compromise from several candidates.
In late 1986, while under DIS ballot, the French did vote against
'8859-1, asking that OE and oe ligature be re-included. After the
fact, it was learned that the French PTT had not until 1986 been
participating in the AFNOR coding committee; once they were sitting
on the committee, the consensus within AFNOR changed to again ask
for those ligatures. Unfortunately, by this time it was too late to
make such a technical change to the standard. (Further, CEN/CENELEC
coding committies do not require the OE and oe ligatures, either.)
Therefore the OE and oe have never been replaced in any of the
'8859 code tables.
3) Were the special requirements of the Welsh language considered?
We (the U.S., ASC X3L2) realized a bit too late that certain
characters needed to properly represent the Welsh language (w and y
with circumflex) weren't conveniently available in any of the '8859
sets, and tried to change Part 4 to include them. However, there
was neither room nor consensus within the ISO committee to include
these, so these too do not exist in any of the '8859 code tables.
(Arguably, the BSI should have been looking out for the requirements
of Welsh, but for a number of reasons that I choose not to go into
here, they did not.)
Attached is the repertoire of ISO Latin Alphabet Nr 1 (IS 8859-1). I have
indicated an alternate name where there might be confusion in the U.S..
R/C - row/column of code table
Dec - Decimal
Oct - Octal
R/C Dec Oct Symbol Name
02/00 032 040 SP SPACE
02/01 033 041 ! EXCLAMATION POINT
02/02 034 042 " QUOTATION MARK
02/03 035 043 # NUMBER SIGN
02/04 036 044 $ DOLLAR SIGN
02/05 037 045 % PERCENT SIGN
02/06 038 046 & AMPERSAND
02/07 039 047 ' APOSTROPHE
02/08 040 050 ( LEFT PARENTHESIS
02/09 041 051 ) RIGHT PARENTHESIS
02/10 042 052 * ASTERISK
02/11 043 053 + PLUS SIGN
02/12 044 054 , COMMA
02/13 045 055 - HYPHEN, MINUS SIGN
02/14 046 056 . FULL STOP (U.S.: PERIOD, DECIMAL POINT)
02/15 047 057 / SOLIDUS (U.S.: SLASH)
03/00 048 060 0 DIGIT ZERO
03/01 049 061 1 DIGIT ONE
03/02 050 062 2 DIGIT TWO
03/03 051 063 3 DIGIT THREE
03/04 052 064 4 DIGIT FOUR
03/05 053 065 5 DIGIT FIVE
03/06 054 066 6 DIGIT SIX
03/07 055 067 7 DIGIT SEVEN
03/08 056 070 8 DIGIT EIGHT
03/09 057 071 9 DIGIT NINE
03/10 058 072 : COLON
03/11 059 073 ; SEMICOLON
03/12 060 074 < LESS-THAN SIGN
03/13 061 075 = EQUALS SIGN
03/14 062 076 > GREATER-THAN SIGN
03/15 063 077 ? QUESTION MARK
04/00 064 100 @ COMMERCIAL AT
04/01 065 101 A LATIN CAPITAL LETTER A
04/02 066 102 B LATIN CAPITAL LETTER B
04/03 067 103 C LATIN CAPITAL LETTER C
04/04 068 104 D LATIN CAPITAL LETTER D
04/05 069 105 E LATIN CAPITAL LETTER E
04/06 070 106 F LATIN CAPITAL LETTER F
04/07 071 107 G LATIN CAPITAL LETTER G
04/08 072 110 H LATIN CAPITAL LETTER H
04/09 073 111 I LATIN CAPITAL LETTER I
04/10 074 112 J LATIN CAPITAL LETTER J
04/11 075 113 K LATIN CAPITAL LETTER K
04/12 076 114 L LATIN CAPITAL LETTER L
04/13 077 115 M LATIN CAPITAL LETTER M
04/14 078 116 N LATIN CAPITAL LETTER N
04/15 079 117 O LATIN CAPITAL LETTER O
05/00 080 120 P LATIN CAPITAL LETTER P
05/01 081 121 Q LATIN CAPITAL LETTER Q
05/02 082 122 R LATIN CAPITAL LETTER R
05/03 083 123 S LATIN CAPITAL LETTER S
05/04 084 124 T LATIN CAPITAL LETTER T
05/05 085 125 U LATIN CAPITAL LETTER U
05/06 086 126 V LATIN CAPITAL LETTER V
05/07 087 127 W LATIN CAPITAL LETTER W
05/08 088 130 X LATIN CAPITAL LETTER X
05/09 089 131 Y LATIN CAPITAL LETTER Y
05/10 090 132 Z LATIN CAPITAL LETTER Z
05/11 091 133 [ LEFT SQUARE BRACKET
05/12 092 134 \ REVERSE SOLIDUS (U.S.: BACK SLASH)
05/13 093 135 ] RIGHT SQUARE BRACKET
05/14 094 136 ^ CIRCUMFLEX ACCENT
05/15 095 137 _ LOW LINE
06/00 096 140 ` GRAVE ACCENT
06/01 097 141 a LATIN SMALL LETTER a
06/02 098 142 b LATIN SMALL LETTER b
06/03 099 143 c LATIN SMALL LETTER c
06/04 100 144 d LATIN SMALL LETTER d
06/05 101 145 e LATIN SMALL LETTER e
06/06 102 146 f LATIN SMALL LETTER f
06/07 103 147 g LATIN SMALL LETTER g
06/08 104 150 h LATIN SMALL LETTER h
06/09 105 151 i LATIN SMALL LETTER i
06/10 106 152 j LATIN SMALL LETTER j
06/11 107 153 k LATIN SMALL LETTER k
06/12 108 154 l LATIN SMALL LETTER l
06/13 109 155 m LATIN SMALL LETTER m
06/14 110 156 n LATIN SMALL LETTER n
06/15 111 157 o LATIN SMALL LETTER o
07/00 112 160 p LATIN SMALL LETTER p
07/01 113 161 q LATIN SMALL LETTER q
07/02 114 162 r LATIN SMALL LETTER r
07/03 115 163 s LATIN SMALL LETTER s
07/04 116 164 t LATIN SMALL LETTER t
07/05 117 165 u LATIN SMALL LETTER u
07/06 118 166 v LATIN SMALL LETTER v
07/07 119 167 w LATIN SMALL LETTER w
07/08 120 170 x LATIN SMALL LETTER x
07/09 121 171 y LATIN SMALL LETTER y
07/10 122 172 z LATIN SMALL LETTER z
07/11 123 173 { LEFT CURLY BRACKET
07/12 124 174 | VERTICAL LINE
07/13 125 175 } RIGHT CURLY BRACKET
07/14 126 176 ~ TILDE
10/00 160 240 NBSP NO-BREAK SPACE
10/01 161 241 INVERTED EXCLAMATION MARK
10/02 162 242 CENT SIGN
10/03 163 243 POUND SIGN
10/04 164 244 CURRENCY SIGN
10/05 165 245 YEN SIGN
10/06 166 246 BROKEN BAR
10/07 167 247 PARAGRAPH SIGN, (U.S.) SECTION SIGN
10/08 168 250 DIERESIS
10/09 169 251 COPYRIGHT SIGN
10/10 170 252 FEMININE ORDINAL INDICATOR
10/11 171 253 LEFT ANGLE QUOTATION MARK
10/12 172 254 NOT SIGN
10/13 173 255 SHY SOFT HYPHEN
10/14 174 256 REGISTERED TRADEMARK SIGN
10/15 175 257 MACRON
11/00 176 260 RING ABOVE, DEGREE SIGN
11/01 177 261 PLUS-MINUS SIGN
11/02 178 262 SUPERSCRIPT TWO
11/03 179 263 SUPERSCRIPT THREE
11/04 180 264 ACUTE ACCENT
11/05 181 265 MICRO SIGN
11/06 182 266 PILCROW SIGN, (U.S.) PARAGRAPH
11/07 183 267 MIDDLE DOT
11/08 184 270 CEDILLA
11/09 185 271 SUPERSCRIPT ONE
11/10 186 272 MASCULINE ORDINAL INDICATOR
11/11 187 273 RIGHT ANGLE QUOTATION MARK
11/12 188 274 VULGAR FRACTION ONE QUARTER
11/13 189 275 VULGAR FRACTION ONE HALF
11/14 190 276 VULGAR FRACTION THREE QUARTERS
11/15 191 277 INVERTED QUESTION MARK
12/00 192 300 LATIN CAPITAL LETTER A WITH GRAVE ACCENT
12/01 193 301 LATIN CAPITAL LETTER A WITH ACUTE ACCENT
12/02 194 302 LATIN CAPITAL LETTER A WITH CIRCUMFLEX ACCENT
12/03 195 303 LATIN CAPITAL LETTER A WITH TILDE
12/04 196 304 LATIN CAPITAL LETTER A WITH DIAERESIS
12/05 197 305 LATIN CAPITAL LETTER A WITH RING ABOVE
12/06 198 306 CAPITAL DIPHTHONG AE
12/07 199 307 LATIN CAPITAL LETTER C WITH CEDILLA
12/08 200 310 LATIN CAPITAL LETTER E WITH GRAVE ACCENT
12/09 201 311 LATIN CAPITAL LETTER E WITH ACUTE ACCENT
12/10 202 312 LATIN CAPITAL LETTER E WITH CIRCUMFLEX ACCENT
12/11 203 313 LATIN CAPITAL LETTER E WITH DIAERESIS
12/12 204 314 LATIN CAPITAL LETTER I WITH GRAVE ACCENT
12/13 205 315 LATIN CAPITAL LETTER I WITH ACUTE ACCENT
12/14 206 316 LATIN CAPITAL LETTER I WITH CIRCUMFLEX ACCENT
12/15 207 317 LATIN CAPITAL LETTER I WITH DIAERESIS
13/00 208 320 CAPITAL ICELANDIC LETTER ETH
13/01 209 321 LATIN CAPITAL LETTER N WITH TILDE
13/02 210 322 LATIN CAPITAL LETTER O WITH GRAVE ACCENT
13/03 211 323 LATIN CAPITAL LETTER O WITH ACUTE ACCENT
13/04 212 324 LATIN CAPITAL LETTER O WITH CIRCUMFLEX ACCENT
13/05 213 325 LATIN CAPITAL LETTER O WITH TILDE
13/06 214 326 LATIN CAPITAL LETTER O WITH DIAERESIS
13/07 215 327 MULTIPLICATION SIGN
13/08 216 330 LATIN CAPITAL LETTER O WITH OBLIQUE STROKE
13/09 217 331 LATIN CAPITAL LETTER U WITH GRAVE ACCENT
13/10 218 332 LATIN CAPITAL LETTER U WITH ACUTE ACCENT
13/11 219 333 LATIN CAPITAL LETTER U WITH CIRCUMFLEX
13/12 220 334 LATIN CAPITAL LETTER U WITH DIAERESIS
13/13 221 335 LATIN CAPITAL LETTER Y WITH ACUTE ACCENT
13/14 222 336 CAPITAL ICELANDIC LETTER THORN
13/15 223 337 SMALL GERMAN LETTER SHARP s
14/00 224 340 LATIN SMALL LETTER a WITH GRAVE ACCENT
14/01 225 341 LATIN SMALL LETTER a WITH ACUTE ACCENT
14/02 226 342 LATIN SMALL LETTER a WITH CIRCUMFLEX ACCENT
14/03 227 343 LATIN SMALL LETTER a WITH TILDE
14/04 228 344 LATIN SMALL LETTER a WITH DIAERESIS
14/05 229 345 LATIN SMALL LETTER a WITH RING ABOVE
14/06 230 346 SMALL DIPHTHONG ae
14/07 231 347 LATIN SMALL LETTER c WITH CEDILLA
14/08 232 350 LATIN SMALL LETTER e WITH GRAVE ACCENT
14/09 233 351 LATIN SMALL LETTER e WITH ACUTE ACCENT
14/10 234 352 LATIN SMALL LETTER e WITH CIRCUMFLEX ACCENT
14/11 235 353 LATIN SMALL LETTER e WITH DIAERESIS
14/12 236 354 LATIN SMALL LETTER i WITH GRAVE ACCENT
14/13 237 355 LATIN SMALL LETTER i WITH ACUTE ACCENT
14/14 238 356 LATIN SMALL LETTER i WITH CIRCUMFLEX ACCENT
14/15 239 357 LATIN SMALL LETTER i WITH DIAERESIS
15/00 240 360 SMALL ICELANDIC LETTER ETH
15/01 241 361 LATIN SMALL LETTER n WITH TILDE
15/02 242 362 LATIN SMALL LETTER o WITH GRAVE ACCENT
15/03 243 363 LATIN SMALL LETTER o WITH ACUTE ACCENT
15/04 244 364 LATIN SMALL LETTER o WITH CIRCUMFLEX ACCENT
15/05 245 365 LATIN SMALL LETTER o WITH TILDE
15/06 246 366 LATIN SMALL LETTER o WITH DIAERESIS
15/07 247 367 DIVISION SIGN
15/08 248 370 LATIN SMALL LETTER o WITH OBLIQUE STROKE
15/09 249 371 LATIN SMALL LETTER u WITH GRAVE ACCENT
15/10 250 372 LATIN SMALL LETTER u WITH ACUTE ACCENT
15/11 251 373 LATIN SMALL LETTER u WITH CIRCUMFLEX ACCENT
15/12 252 374 LATIN SMALL LETTER u WITH DIAERESIS
15/13 253 375 LATIN SMALL LETTER y WITH ACUTE ACCENT
15/14 254 376 SMALL ICELANDIC LETTER THORN
15/15 255 377 LATIN SMALL LETTER y WITH DIAERESIS
===================================
Tim Lasko Digital Equipment Corporation Maynard, MA
(decvax!video.dec.com!lasko, lasko%video.dec@decwrl, lasko@video.dec.com)