[comp.std.internat] Follow-up on Re: ISO-Latin/1

lasko@video.dec.com (Tim Lasko[TBU Architecture]DTN 223-2186) (08/06/87)

First, I'd like to apologize to those who are waiting for answers:
unfortunately, I've been unable to read netnews postings for some time for
assorted reasons, and I further compounded the problem by having been out of
the office for a good portion of the last month, and giving an incorrect
net address.
                                                     
    I regret that I cannot send copies (hard or electronic) of the
    standard to the literally dozens of people who have asked me for
    one.  ISO 8859-1 can be ordered in the U.S., like all ISO standards,
    from the American National Standards Institute, 1430 Broadway, New
    York, NY 10018 [(212) 354-3300; $18.00 for ISO 8859, Part 1; $16 for
    Part 2]. Those of you outside of the U.S. should contact your
    nation's standards body (e.g. BSI, DIN, NNI, etc.). 
    
    I have, however, included the list of characters of ISO Latin-1
    (pending 8-bit ASCII) at the end of this message. 
            
Let me first update the situation, as of the ISO working group meeting (ISO
TC97/SC2/WG2) held last month in Geneva:  If you recall, ISO Latin-1 (IS 8859-1)
is actually one of an entire family of eight-bit one-byte character sets, all
having ASCII on the left hand side, and with varying repertoires on the right
hand side: 
     
IS 8859-1        Latin Alphabet No 1     Western Europe, released 2/87
IS 8859-2        Latin Alphabet No 2     Eastern Europe, released 2/87
ISO DIS 8859-3   Latin Alphabet No 3     SE Europe, etc., ballot ends 10/87
ISO DIS 8859-4   Latin Alphabet No 4     Northern Europe, ballot ends 10/87 *
ISO DIS 8859-5.2 Latin-Cyrillic Alphabet new ballot to be sent by 9/87     **
IS 8859-6        Latin-Arabic Alphabet   released 5/87                     ***
IS 8859-7        Latin-Greek Alphabet    approved 7/87                     ****
ISO DIS 8859-8   Latin-Hebrew Alphabet   ballot to be sent by 9/87         *****

*     There is some sentiment among the Scandinavian countries to propose a
      new code table to include a number of minority languages spoken in 
      that region which are not accomodated by the current '-4 draft.  This
      could mean that the draft will be re-issued if there is sufficient
      consensus at the end of the year.
                                                               
**    At the working group meeting, a new USSR (GOST) code page was brought 
      by the Czechoslovakian member which made several changes from the
      current '-5 draft; the members agreed that we wanted to be as close
      as possible to the GOST standard, so a new draft ballot for '-5
      is being issued.  (The most significant difference, for information,
      is that the GOST standard does not include a $ at 24 hex.)
                                                            
***   According to the ANSI sales department, this part is not yet available.
      I suggest waiting a few months before ordering.
                                                            
****  All of the comments on the Latin-Greek ('-7) draft were handled at the
      meeting, and publication should follow in a few months.
               
***** The Latin-Hebrew draft corresponds to the approved ECMA-118, and 
      the Israeli SII standard (not sure of number).  
                                                               
To answer a number of questions:

1) "How can I get a hold of XXX?"

    As I mentioned above, the approved IS 8859-1, and '-2 may be ordered
    from ANSI (or your nation's standards body).  IS 8859-6 and '-7
    should be available through the same sources shortly.  ISO DIS
    8859-3, '-4, and probably the OLD '-5, are also available through
    ANSI, call for details.  ISO DIS 8859-8, of course, is not available
    yet. 
    
    The following ECMA standards correspond to the indicated ISO
    standards: 

     ECMA-94    IS 8859-1, '-2, ISO DIS 8859-3, '-4
     ECMA-114   IS 8859-6
     ECMA-117   IS 8859-7
     ECMA-118   ISO DIS 8859-8                        

    and the ECMA standards would be reasonable interim substitutes, as
    the actual characters do not (are not likely in the case of drafts)
    to differ from the ISO standard. ECMA standards can be ordered
    (free) from the European Computer Manufacturers Association, 114 Rue
    du Rhone, 1204 Geneva, Switzerland.  (The o in Rhone should have a
    circumflex [^] over it [it would be great if the net understood
    Latin-1.]) (Note: ECMA-113 is the Latin-Cyrillic alphabet of the
    PREVIOUS Latin-Cyrillic ISO DIS 8859-5 draft; as mentioned above, a
    new ISO draft is going to be issued, and I understand that the ECMA
    standard will change.) 
                                                                       
2) Why doesn't Latin-1 have an OE or oe ligature?
                                
    Early drafts of ISO 8859-1 did include the OE and oe ligature
    characters.  Many, including the U.S., felt that they were important
    for the French language.  However, in 1985, AFNOR, the French
    member body to the ISO committee, stated that they could be removed,
    since they are technically not part of the French alphabet, being
    only a very commonly used presentation ligature. Since the purpose
    of the standard was to be efficient for data processing, and
    that OE and oe are always processed as the separate letters O
    and E, there was no need to have a separate code for the ligature.
    
    This resulted in the next draft having the two code positions (D7
    and F7 hex) blank, and then for the DIS ballot they were filled with
    multiplication sign and division sign.  The filling was needed since
    it was clear that leaving the two positions blank would lead vendors
    to include private characters in those positions, leading to small
    but difficult incompatibilities.  The actual characters chosen were
    a compromise from several candidates. 
                                                                      
    In late 1986, while under DIS ballot, the French did vote against
    '8859-1, asking that OE and oe ligature be re-included.  After the
    fact, it was learned that the French PTT had not until 1986 been
    participating in the AFNOR coding committee; once they were sitting
    on the committee, the consensus within AFNOR changed to again ask
    for those ligatures.  Unfortunately, by this time it was too late to
    make such a technical change to the standard. (Further, CEN/CENELEC
    coding committies do not require the OE and oe ligatures, either.) 
    Therefore the OE and oe have never been replaced in any of the
    '8859 code tables.
                        
3) Were the special requirements of the Welsh language considered?
                                                            
    We (the U.S., ASC X3L2) realized a bit too late that certain
    characters needed to properly represent the Welsh language (w and y
    with circumflex) weren't conveniently available in any of the '8859
    sets, and tried to change Part 4 to include them.  However, there
    was neither room nor consensus within the ISO committee to include
    these, so these too do not exist in any of the '8859 code tables.
    (Arguably, the BSI should have been looking out for the requirements
    of Welsh, but for a number of reasons that I choose not to go into
    here, they did not.) 

Attached is the repertoire of ISO Latin Alphabet Nr 1 (IS 8859-1). I have
indicated an alternate name where there might be confusion in the U.S..

R/C - row/column of code table
Dec - Decimal
Oct - Octal
    
 R/C  Dec Oct Symbol Name 
                     
02/00 032 040   SP   SPACE
02/01 033 041   !    EXCLAMATION POINT
02/02 034 042   "    QUOTATION MARK
02/03 035 043   #    NUMBER SIGN
02/04 036 044   $    DOLLAR SIGN
02/05 037 045   %    PERCENT SIGN
02/06 038 046   &    AMPERSAND
02/07 039 047   '    APOSTROPHE
02/08 040 050   (    LEFT PARENTHESIS                     
02/09 041 051   )    RIGHT PARENTHESIS                          
02/10 042 052   *    ASTERISK
02/11 043 053   +    PLUS SIGN
02/12 044 054   ,    COMMA
02/13 045 055   -    HYPHEN, MINUS SIGN                                   
02/14 046 056   .    FULL STOP   (U.S.: PERIOD, DECIMAL POINT)
02/15 047 057   /    SOLIDUS     (U.S.: SLASH)
                     
03/00 048 060   0    DIGIT ZERO                                   
03/01 049 061   1    DIGIT ONE                                    
03/02 050 062   2    DIGIT TWO                                    
03/03 051 063   3    DIGIT THREE                                  
03/04 052 064   4    DIGIT FOUR                                   
03/05 053 065   5    DIGIT FIVE                                   
03/06 054 066   6    DIGIT SIX                                    
03/07 055 067   7    DIGIT SEVEN                                  
03/08 056 070   8    DIGIT EIGHT                                  
03/09 057 071   9    DIGIT NINE                                   
03/10 058 072   :    COLON
03/11 059 073   ;    SEMICOLON
03/12 060 074   <    LESS-THAN SIGN                               
03/13 061 075   =    EQUALS SIGN
03/14 062 076   >    GREATER-THAN SIGN                            
03/15 063 077   ?    QUESTION MARK
                                           
04/00 064 100   @    COMMERCIAL AT
04/01 065 101   A    LATIN CAPITAL LETTER A
04/02 066 102   B    LATIN CAPITAL LETTER B
04/03 067 103   C    LATIN CAPITAL LETTER C
04/04 068 104   D    LATIN CAPITAL LETTER D
04/05 069 105   E    LATIN CAPITAL LETTER E
04/06 070 106   F    LATIN CAPITAL LETTER F
04/07 071 107   G    LATIN CAPITAL LETTER G
04/08 072 110   H    LATIN CAPITAL LETTER H
04/09 073 111   I    LATIN CAPITAL LETTER I
04/10 074 112   J    LATIN CAPITAL LETTER J
04/11 075 113   K    LATIN CAPITAL LETTER K
04/12 076 114   L    LATIN CAPITAL LETTER L
04/13 077 115   M    LATIN CAPITAL LETTER M
04/14 078 116   N    LATIN CAPITAL LETTER N
04/15 079 117   O    LATIN CAPITAL LETTER O
                     
05/00 080 120   P    LATIN CAPITAL LETTER P
05/01 081 121   Q    LATIN CAPITAL LETTER Q
05/02 082 122   R    LATIN CAPITAL LETTER R
05/03 083 123   S    LATIN CAPITAL LETTER S
05/04 084 124   T    LATIN CAPITAL LETTER T
05/05 085 125   U    LATIN CAPITAL LETTER U
05/06 086 126   V    LATIN CAPITAL LETTER V
05/07 087 127   W    LATIN CAPITAL LETTER W
05/08 088 130   X    LATIN CAPITAL LETTER X
05/09 089 131   Y    LATIN CAPITAL LETTER Y
05/10 090 132   Z    LATIN CAPITAL LETTER Z
05/11 091 133   [    LEFT SQUARE BRACKET                        
05/12 092 134   \    REVERSE SOLIDUS    (U.S.: BACK SLASH)
05/13 093 135   ]    RIGHT SQUARE BRACKET                        
05/14 094 136   ^    CIRCUMFLEX ACCENT
05/15 095 137   _    LOW LINE           
                     
06/00 096 140   `    GRAVE ACCENT       
06/01 097 141   a    LATIN SMALL LETTER a
06/02 098 142   b    LATIN SMALL LETTER b
06/03 099 143   c    LATIN SMALL LETTER c
06/04 100 144   d    LATIN SMALL LETTER d
06/05 101 145   e    LATIN SMALL LETTER e
06/06 102 146   f    LATIN SMALL LETTER f
06/07 103 147   g    LATIN SMALL LETTER g
06/08 104 150   h    LATIN SMALL LETTER h
06/09 105 151   i    LATIN SMALL LETTER i
06/10 106 152   j    LATIN SMALL LETTER j
06/11 107 153   k    LATIN SMALL LETTER k
06/12 108 154   l    LATIN SMALL LETTER l
06/13 109 155   m    LATIN SMALL LETTER m
06/14 110 156   n    LATIN SMALL LETTER n
06/15 111 157   o    LATIN SMALL LETTER o
                     
07/00 112 160   p    LATIN SMALL LETTER p
07/01 113 161   q    LATIN SMALL LETTER q
07/02 114 162   r    LATIN SMALL LETTER r
07/03 115 163   s    LATIN SMALL LETTER s
07/04 116 164   t    LATIN SMALL LETTER t
07/05 117 165   u    LATIN SMALL LETTER u
07/06 118 166   v    LATIN SMALL LETTER v
07/07 119 167   w    LATIN SMALL LETTER w
07/08 120 170   x    LATIN SMALL LETTER x
07/09 121 171   y    LATIN SMALL LETTER y
07/10 122 172   z    LATIN SMALL LETTER z
07/11 123 173   {    LEFT CURLY BRACKET                         
07/12 124 174   |    VERTICAL LINE
07/13 125 175   }    RIGHT CURLY BRACKET                         
07/14 126 176   ~    TILDE
                     
10/00 160 240  NBSP  NO-BREAK SPACE 
10/01 161 241        INVERTED EXCLAMATION MARK
10/02 162 242        CENT SIGN
10/03 163 243        POUND SIGN
10/04 164 244        CURRENCY SIGN                                
10/05 165 245        YEN SIGN
10/06 166 246        BROKEN BAR                                   
10/07 167 247        PARAGRAPH SIGN, (U.S.) SECTION SIGN 
10/08 168 250        DIERESIS                                    
10/09 169 251        COPYRIGHT SIGN
10/10 170 252        FEMININE ORDINAL INDICATOR
10/11 171 253        LEFT ANGLE QUOTATION MARK
10/12 172 254        NOT SIGN                                     
10/13 173 255   SHY  SOFT HYPHEN                               
10/14 174 256        REGISTERED TRADEMARK SIGN                   
10/15 175 257        MACRON                                       
                     
11/00 176 260        RING ABOVE, DEGREE SIGN
11/01 177 261        PLUS-MINUS SIGN
11/02 178 262        SUPERSCRIPT TWO
11/03 179 263        SUPERSCRIPT THREE
11/04 180 264        ACUTE ACCENT                                 
11/05 181 265        MICRO SIGN
11/06 182 266        PILCROW SIGN, (U.S.) PARAGRAPH
11/07 183 267        MIDDLE DOT                      
11/08 184 270        CEDILLA
11/09 185 271        SUPERSCRIPT ONE
11/10 186 272        MASCULINE ORDINAL INDICATOR
11/11 187 273        RIGHT ANGLE QUOTATION MARK
11/12 188 274        VULGAR FRACTION ONE QUARTER
11/13 189 275        VULGAR FRACTION ONE HALF
11/14 190 276        VULGAR FRACTION THREE QUARTERS               
11/15 191 277        INVERTED QUESTION MARK
                     
12/00 192 300        LATIN CAPITAL LETTER A WITH GRAVE ACCENT
12/01 193 301        LATIN CAPITAL LETTER A WITH ACUTE ACCENT
12/02 194 302        LATIN CAPITAL LETTER A WITH CIRCUMFLEX ACCENT
12/03 195 303        LATIN CAPITAL LETTER A WITH TILDE
12/04 196 304        LATIN CAPITAL LETTER A WITH DIAERESIS
12/05 197 305        LATIN CAPITAL LETTER A WITH RING ABOVE
12/06 198 306        CAPITAL DIPHTHONG AE
12/07 199 307        LATIN CAPITAL LETTER C WITH CEDILLA
12/08 200 310        LATIN CAPITAL LETTER E WITH GRAVE ACCENT 
12/09 201 311        LATIN CAPITAL LETTER E WITH ACUTE ACCENT 
12/10 202 312        LATIN CAPITAL LETTER E WITH CIRCUMFLEX ACCENT
12/11 203 313        LATIN CAPITAL LETTER E WITH DIAERESIS
12/12 204 314        LATIN CAPITAL LETTER I WITH GRAVE ACCENT 
12/13 205 315        LATIN CAPITAL LETTER I WITH ACUTE ACCENT 
12/14 206 316        LATIN CAPITAL LETTER I WITH CIRCUMFLEX ACCENT
12/15 207 317        LATIN CAPITAL LETTER I WITH DIAERESIS
                     
13/00 208 320        CAPITAL ICELANDIC LETTER ETH                 
13/01 209 321        LATIN CAPITAL LETTER N WITH TILDE
13/02 210 322        LATIN CAPITAL LETTER O WITH GRAVE ACCENT 
13/03 211 323        LATIN CAPITAL LETTER O WITH ACUTE ACCENT 
13/04 212 324        LATIN CAPITAL LETTER O WITH CIRCUMFLEX ACCENT
13/05 213 325        LATIN CAPITAL LETTER O WITH TILDE
13/06 214 326        LATIN CAPITAL LETTER O WITH DIAERESIS
13/07 215 327        MULTIPLICATION SIGN                          
13/08 216 330        LATIN CAPITAL LETTER O WITH OBLIQUE STROKE
13/09 217 331        LATIN CAPITAL LETTER U WITH GRAVE ACCENT 
13/10 218 332        LATIN CAPITAL LETTER U WITH ACUTE ACCENT 
13/11 219 333        LATIN CAPITAL LETTER U WITH CIRCUMFLEX
13/12 220 334        LATIN CAPITAL LETTER U WITH DIAERESIS
13/13 221 335        LATIN CAPITAL LETTER Y WITH ACUTE ACCENT  
13/14 222 336        CAPITAL ICELANDIC LETTER THORN               
13/15 223 337        SMALL GERMAN LETTER SHARP s
                     
14/00 224 340        LATIN SMALL LETTER a WITH GRAVE ACCENT
14/01 225 341        LATIN SMALL LETTER a WITH ACUTE ACCENT
14/02 226 342        LATIN SMALL LETTER a WITH CIRCUMFLEX ACCENT
14/03 227 343        LATIN SMALL LETTER a WITH TILDE
14/04 228 344        LATIN SMALL LETTER a WITH DIAERESIS
14/05 229 345        LATIN SMALL LETTER a WITH RING ABOVE
14/06 230 346        SMALL DIPHTHONG ae
14/07 231 347        LATIN SMALL LETTER c WITH CEDILLA
14/08 232 350        LATIN SMALL LETTER e WITH GRAVE ACCENT
14/09 233 351        LATIN SMALL LETTER e WITH ACUTE ACCENT
14/10 234 352        LATIN SMALL LETTER e WITH CIRCUMFLEX ACCENT
14/11 235 353        LATIN SMALL LETTER e WITH DIAERESIS
14/12 236 354        LATIN SMALL LETTER i WITH GRAVE ACCENT
14/13 237 355        LATIN SMALL LETTER i WITH ACUTE ACCENT
14/14 238 356        LATIN SMALL LETTER i WITH CIRCUMFLEX ACCENT
14/15 239 357        LATIN SMALL LETTER i WITH DIAERESIS
                     
15/00 240 360        SMALL ICELANDIC LETTER ETH                   
15/01 241 361        LATIN SMALL LETTER n WITH TILDE
15/02 242 362        LATIN SMALL LETTER o WITH GRAVE ACCENT
15/03 243 363        LATIN SMALL LETTER o WITH ACUTE ACCENT
15/04 244 364        LATIN SMALL LETTER o WITH CIRCUMFLEX ACCENT
15/05 245 365        LATIN SMALL LETTER o WITH TILDE
15/06 246 366        LATIN SMALL LETTER o WITH DIAERESIS
15/07 247 367        DIVISION SIGN                                
15/08 248 370        LATIN SMALL LETTER o WITH OBLIQUE STROKE
15/09 249 371        LATIN SMALL LETTER u WITH GRAVE ACCENT
15/10 250 372        LATIN SMALL LETTER u WITH ACUTE ACCENT
15/11 251 373        LATIN SMALL LETTER u WITH CIRCUMFLEX ACCENT
15/12 252 374        LATIN SMALL LETTER u WITH DIAERESIS
15/13 253 375        LATIN SMALL LETTER y WITH ACUTE ACCENT       
15/14 254 376        SMALL ICELANDIC LETTER THORN                 
15/15 255 377        LATIN SMALL LETTER y WITH DIAERESIS          
                     
===================================
Tim Lasko  Digital Equipment Corporation  Maynard, MA                    
(decvax!video.dec.com!lasko, lasko%video.dec@decwrl, lasko@video.dec.com)