[comp.lang.c] ISO Latin-1 Character Set

minow@decvax.UUCP (Martin Minow) (01/12/87)

In light of the recent discussion of character sets (as they impact
the Draft Ansi C Standard), you may find this listing of the ISO
Latin-1 character set useful.

The ISO Latin-1 character set (ISO DIS 8859/1) is identical to the Draft
Ansi standard (ANSI BSR X3.134.2) which is likely to be approved early
this year.  ISO Latin-1 contains characters for the following languages: 

    Danish, Dutch, English, Faeroese, Finnish, French, German, Icelandic,
    Irish, Italian, Norwegian, Portuguese, Spanish, and Swedish 

as used in the following countries:

    Argentina, Australia, Austria, Belgium, Belize, Bolivia, Brazil, Canada,
    Chile, Colombia, Costa Rica, Cuba, Denmark, Ecuador, Faroe Islands,
    Finland, Germany, Guatemala, Guyana, Honduras, Iceland, Ireland, Italy,
    Liechtenstein, Luxembourg, The Netherlands, Mexico, New Zealand,
    Nicaragua, Norway, Panama, Paraguay, Peru, Portugal, El Salvador, Spain,
    Surinam, France, Sweden, Switzerland, United Kingdom, United States,
    Uruguay, Venezuela. 

The coding of each character is represented by one octet. The value of
an octet is given in decimal notation and in a standard "column/row"
notation. 

The column/row notation is a decimalized hex notation and follows the
notation of ANSI and ISO coding standards for 8-bit coding wherein coded
characters are allocated to a 16 column code table with 16 rows each.
The first number is the column number of the 16 x 16 code table and
corresponds to bits 7 to 4 expressed as a decimal number from 0 to 15.
The second number is the row number of the 16 x 16 code table and
corresponds to bits 3 to 0 expressed as a decimal number from 0 to 15.
The values of this column/row notation run from 00/00 to 15/15,
corresponding to the decimal notation 000 to 255. 

In the following tables, the first column indicates the decimal coding
of the 8-bit octet, the second column indicates column/row notation used
in ANSI and ISO coding standards, the third column indicates one of the
COMPOSE sequence key pairs used to input characters that may not be
present directly on the keyboard, the fourth column is the graphic
symbol for the character, the fifth column is the full name in all
upper-case as is the convention in ANSI and ISO standards, and the last
column indicates notes. (The COMPOSE sequence is implemented on DEC
video and hardcopy terminals. It allows a terminal to generate all ISO
Latin 1 characters, even if a character does not appear directly on the
keyboard.)

Note that the fourth columm contains the graphic symbol in Dec 
Multinational (eight-bit) encoding (and may display as garbage on
some systems).

Dec- Column        Character Name
imal  /row

032  02/00         SPACE
033  02/01      !  EXCLAMATION POINT
034  02/02      "  QUOTATION MARK
035  02/03         NUMBER SIGN
036  02/04      $  DOLLAR SIGN
037  02/05      %  PERCENT SIGN
038  02/06      &  AMPERSAND
039  02/07      '  APOSTROPHE, RIGHT SINGLE-QUOTATION MARK
040  02/08      (  OPENING PARENTHESIS
041  02/09      )  CLOSING PARENTHESIS
042  02/10      *  ASTERISK
043  02/11      +  PLUS SIGN
044  02/12      ,  COMMA
045  02/13      -  HYPHEN, MINUS SIGN
046  02/14      .  PERIOD, DECIMAL POINT
047  02/15      /  SLASH

048  03/00      0  DIGIT ZERO
049  03/01      1  DIGIT ONE
050  03/02      2  DIGIT TWO
051  03/03      3  DIGIT THREE
052  03/04      4  DIGIT FOUR
053  03/05      5  DIGIT FIVE
054  03/06      6  DIGIT SIX
055  03/07      7  DIGIT SEVEN
056  03/08      8  DIGIT EIGHT
057  03/09      9  DIGIT NINE
058  03/10      :  COLON
059  03/11      ;  SEMICOLON
060  03/12      <  LESS-THAN SIGN
061  03/13      =  EQUALS SIGN
062  03/14      >  GREATER-THAN SIGN
063  03/15      ?  QUESTION MARK

064  04/00      @  COMMERCIAL AT
065  04/01      A  LATIN CAPITAL LETTER A-Z (through 090 05/10)
091  05/11      [  OPENING SQUARE BRACKET
092  05/12      \  BACK SLASH
093  05/13      ]  CLOSING SQUARE BRACKET
094  05/14      ^  CIRCUMFLEX ACCENT, UPWARD ARROW HEAD
095  05/15      _  UNDERLINE

096  06/00      `  GRAVE ACCENT, LEFT SINGLE QUOTATION MARK
097  06/01      a  LATIN SMALL LETTER a-z (through 122 07/10)
123  07/11      {  OPENING CURLY BRACKET
124  07/12      |  VERTICAL LINE
125  07/13      }  CLOSING CURLY BRACKET
126  07/14      ~  TILDE

160  10/00 <SP><SP>   NO-BREAK SPACE (NBSP)
161  10/01  !!  !  INVERTED EXCLAMATION MARK
162  10/02  c/  "  CENT SIGN
163  10/03  L-  #  POUND SIGN
164  10/04  XO  (  CURRENCY SIGN
165  10/05  Y-  %  YEN SIGN
166  10/06  ||     BROKEN BAR
167  10/07  SO  '  SECTION SIGN
168  10/08  ""     DIAERESIS
169  10/09  co  )  COPYRIGHT SIGN
170  10/10  a_  *  FEMININE ORDINAL INDICATOR
171  10/11  <<  +  LEFT ANGLE QUOTATION MARK
172  10/12  -,     NOT SIGN
173  10/13  --     SOFT HYPHEN
174  10/14  RO     REGISTERED TRADE MARK SIGN
175  10/15  -^     MACRON

176  11/00  0^  0  RING ABOVE, DEGREE SIGN
177  11/01  +-  1  PLUS-MINUS SIGN
178  11/02  2^  2  SUPERSCRIPT TWO
179  11/03  3^  3  SUPERSCRIPT THREE
180  11/04  ''     ACUTE ACCENT
181  11/05  /u  5  MICRO SIGN
182  11/06  P!  6  PARAGRAPH SIGN, PILCROW SIGN
183  11/07  .^  7  MIDDLE DOT
184  11/08  ,,     CEDILLA
185  11/09  1^  9  SUPERSCRIPT ONE
186  11/10  o_  :  MASCULINE ORDINAL INDICATOR
187  11/11  >>  ;  RIGHT ANGLE QUOTATION MARK
188  11/12  14  <  VULGAR FRACTION ONE QUARTER
189  11/13  12  =  VULGAR FRACTION ONE HALF
190  11/14  34     VULGAR FRACTION THREE QUARTERS
191  11/15  ??  ?  INVERTED QUESTION MARK

192  12/00  A`  @  LATIN CAPITAL LETTER A WITH GRAVE ACCENT
193  12/01  A'  A  LATIN CAPITAL LETTER A WITH ACUTE ACCENT
194  12/02  A^  B  LATIN CAPITAL LETTER A WITH CIRCUMFLEX ACCENT
195  12/03  A~  C  LATIN CAPITAL LETTER A WITH TILDE
196  12/04  A"  D  LATIN CAPITAL LETTER A WITH DIAERESIS
197  12/05  A*  E  LATIN CAPITAL LETTER A WITH RING ABOVE
198  12/06  AE  F  CAPITAL DIPHTHONG AE
199  12/07  C,  G  LATIN CAPITAL LETTER C WITH CEDILLA
200  12/08  E`  H  LATIN CAPITAL LETTER E WITH GRAVE ACCENT
201  12/09  E'  I  LATIN CAPITAL LETTER E WITH ACUTE ACCENT
202  12/10  E^  J  LATIN CAPITAL LETTER E WITH CIRCUMFLEX ACCENT
203  12/11  E"  K  LATIN CAPITAL LETTER E WITH DIAERESIS
204  12/12  I`  L  LATIN CAPITAL LETTER I WITH GRAVE ACCENT
205  12/13  I'  M  LATIN CAPITAL LETTER I WITH ACUTE ACCENT
206  12/14  I^  N  LATIN CAPITAL LETTER I WITH CIRCUMFLEX ACCENT
207  12/15  I"  O  LATIN CAPITAL LETTER I WITH DIAERESIS

208  13/00  D-     CAPITAL ICELANDIC LETTER ETH
209  13/01  N~  Q  LATIN CAPITAL LETTER N WITH TILDE
210  13/02  O`  R  LATIN CAPITAL LETTER O WITH GRAVE ACCENT
211  13/03  O'  S  LATIN CAPITAL LETTER O WITH ACUTE ACCENT
212  13/04  O^  T  LATIN CAPITAL LETTER O WITH CIRCUMFLEX ACCENT
213  13/05  O~  U  LATIN CAPITAL LETTER O WITH TILDE
214  13/06  O"  V  LATIN CAPITAL LETTER O WITH DIAERESIS
215  13/07  xx     MULTIPLICATION SIGN
216  13/08  O/  X  LATIN CAPITAL LETTER O WITH OBLIQUE STROKE
217  13/09  U`  Y  LATIN CAPITAL LETTER U WITH GRAVE ACCENT
218  13/10  U'  Z  LATIN CAPITAL LETTER U WITH ACUTE ACCENT
219  13/11  U^  [  LATIN CAPITAL LETTER U WITH CIRCUMFLEX
220  13/12  U"  \  LATIN CAPITAL LETTER U WITH DIAERESIS
221  13/13  Y'     LATIN CAPITAL LETTER Y WITH ACUTE ACCENT
222  13/14  TH     CAPITAL ICELANDIC LETTER THORN
223  13/15  ss  _  SMALL GERMAN LETTER SHARP s

224  14/00  a`  `  LATIN SMALL LETTER a WITH GRAVE ACCENT
225  14/01  a'  a  LATIN SMALL LETTER a WITH ACUTE ACCENT
226  14/02  a^  b  LATIN SMALL LETTER a WITH CIRCUMFLEX ACCENT
227  14/03  a~  c  LATIN SMALL LETTER a WITH TILDE
228  14/04  a"  d  LATIN SMALL LETTER a WITH DIAERESIS
229  14/05  a*  e  LATIN SMALL LETTER a WITH RING ABOVE
230  14/06  ae  f  SMALL DIPHTHONG ae
231  14/07  c,  g  LATIN SMALL LETTER c WITH CEDILLA
232  14/08  e`  h  LATIN SMALL LETTER e WITH GRAVE ACCENT
233  14/09  e'  i  LATIN SMALL LETTER e WITH ACUTE ACCENT
234  14/10  e^  j  LATIN SMALL LETTER e WITH CIRCUMFLEX ACCENT
235  14/11  e"  k  LATIN SMALL LETTER e WITH DIAERESIS
236  14/12  i`  l  LATIN SMALL LETTER i WITH GRAVE ACCENT
237  14/13  i'  m  LATIN SMALL LETTER i WITH ACUTE ACCENT
238  14/14  i^  n  LATIN SMALL LETTER i WITH CIRCUMFLEX ACCENT
239  14/15  i"  o  LATIN SMALL LETTER i WITH DIAERESIS

240  15/00  d-     SMALL ICELANDIC LETTER ETH
241  15/01  n~  q  LATIN SMALL LETTER n WITH TILDE
242  15/02  o`  r  LATIN SMALL LETTER o WITH GRAVE ACCENT
243  15/03  o'  s  LATIN SMALL LETTER o WITH ACUTE ACCENT
244  15/04  o^  t  LATIN SMALL LETTER o WITH CIRCUMFLEX ACCENT
245  15/05  o~  u  LATIN SMALL LETTER o WITH TILDE
246  15/06  o"  v  LATIN SMALL LETTER o WITH DIAERESIS
247  15/07  -:     DIVISION SIGN
248  15/08  o/  x  LATIN SMALL LETTER o WITH OBLIQUE STROKE
249  15/09  u`  y  LATIN SMALL LETTER u WITH GRAVE ACCENT
250  15/10  u'  z  LATIN SMALL LETTER u WITH ACUTE ACCENT
251  15/11  u^  {  LATIN SMALL LETTER u WITH CIRCUMFLEX ACCENT
252  15/12  u"  |  LATIN SMALL LETTER u WITH DIAERESIS
253  15/13  y'     LATIN SMALL LETTER y WITH ACUTE ACCENT
254  15/14  th     SMALL ICELANDIC LETTER THORN
255  15/15  y"     LATIN SMALL LETTER y WITH DIAERESIS

For the record, this note does not represent the position of
Digital Equipment Corporation.

Martin Minow
decvax!minow

rbutterworth@watmath.UUCP (01/14/87)

According to old ascii tables the 096 character is defined as a
grave accent.  Some terminals and printers have begun displaying it
as a left-quote (sometimes like the figure 6 and sometimes like
an upside down 6 both with the tail pointing to the right).  Judging
by many of the net articles, this practice seems to be becoming quite
popular.  (On a standard terminal, the text appears to look more like
~~quote'' than what was intended though.  It is quite ugly anyway.)
I admit the left-quote usage is much more useful, but it would be
nice if manufacturers adhered to the official standards.

In article <6@decvax.UUCP>, minow@decvax.UUCP (Martin Minow) writes:
> The ISO Latin-1 character set (ISO DIS 8859/1) is identical to the Draft
> Ansi standard (ANSI BSR X3.134.2) which is likely to be approved early
> this year.
>
> 039  02/07      '  APOSTROPHE, RIGHT SINGLE-QUOTATION MARK
> 096  06/00      `  GRAVE ACCENT, LEFT SINGLE QUOTATION MARK
> 180  11/04  ''     ACUTE ACCENT

Are these going to be the official definitions?  I haven't heard any
of the discussions that went into designing this standard, but what
is here only seems to complicate the current mess.

039 as both apostrophe and right-quote is fine, since the two do look
the same (at least in English).  (Or 039 could even be a symmetric
vertical stroke like the double-quote.)  But what on earth is 096
supposed to look like?  The existence of 180, the acute-accent implies
that it should look like its mirror image, the grave-accent, yet the
existence of 039, the right-quote, implies that it should look like
its mirror image, the left-quote.

If the standard is going to provide acute accents, it should also
provide grave accents.  If the standard is going to provide
right-quotes, it should also provide left-quotes.  How do they
expect the same character to fill the two jobs?  On any standard
printer, we'll either get very ugly mismatched quotes, or get even
uglier accents.