jtkohl@athena.mit.edu (John T Kohl) (06/21/89)
Can someone point me to the ISO reference defining ISO-LATIN-1? Thanks. John Kohl <jtkohl@ATHENA.MIT.EDU> or <jtkohl@Kolvir.Brookline.MA.US> Digital Equipment Corporation/Project Athena (The above opinions are MINE. Don't put my words in somebody else's mouth!)
jch@apollo.COM (Jan Hardenbergh) (06/27/89)
> Message-ID: <12108@bloom-beacon.MIT.EDU> > Reply-To: jtkohl@athena.mit.edu (John T Kohl) > Can someone point me to the ISO reference defining ISO-LATIN-1? Here are some old articles that same quite a bit about ISO-LATIN-1 alias ISO 8859/1. Note that this info is derived form an X include file. Then an enumeration of other 8859/? codes & characters. If you are stuck trying to get ISO documents you might try ANSI in New York, (212)-642-4900. -Jan (Yon) Hardenbegh - jch@apollo.com - (508)-256-6600 Apollo, a susidiary of HP ---------------------------------------------------------------- Article 169 of comp.std.unix: Path: apollo!ulowell!masscomp!uunet!longway!std-unix From: guy@Sun.COM (Guy Harris) Newsgroups: comp.std.unix Subject: Re: 8-Bit ASCII Standard on UNIX-POSIX Message-ID: <161@longway.TIC.COM> Date: 8 Apr 88 05:38 GMT Sender: std-unix@longway.TIC.COM Reply-To: guy@Sun.COM (Guy Harris) Lines: 136 Approved: jsq@longway.tic.com (Moderator, John S. Quarterman) From: guy@Sun.COM (Guy Harris) > To possibly add to the list, this sounds like the character set > Microsoft Windows uses and terms (by no standard I know of) "ANSI". > It has the vowels in acute, grave, circumflex, tilde, and umlaut. > The high bit characters also include cent, pound, yen, and universal > currency symbols, circle-R trademark and circle-C copyright symbols, > inverted ? and !, section and paragraph symbols, << guillemets >>, > several accents, 1/4, 1/2, and 3/4 characters, and superscripted 1, 2, > and 3. The last sound like a bad idea to me, so I actually hope this > is something they threw together themselves. > Sound like ISO 8859? Yes. The superscripted letters *do* come from ISO 8859 (see below). > What I would also like to see is the ASCII 0..1F (31 dec.) graphic > representations on new machines conform to the ANSI standard. They > might look impractical, but after setting up a font using them on my > micro, it's amazing how much sense they make to me. What "graphic representations" are you referring to? The only ANSI standard I know of for characters in the range 0x00 to 0x1f is ASCII, which says they're *control* characters, not *printable* characters. For your collective amusement, here is a chart of ISO 8859/1 or "ISO Latin Alphabet #1". This was derived by some quick hacking on the X11 include file "keysymdef.h" - yes, X11 uses the ISO character sets as well. non-breaking space 0xa0 inverted exclamation point 0xa1 cent sign 0xa2 pounds sterling 0xa3 "currency symbol" 0xa4 yen 0xa5 broken bar 0xa6 section mark 0xa7 diaeresis 0xa8 copyright 0xa9 feminine ordinal 0xaa (this is a subscripted lower-case "a", underlined) left guillemot 0xab (French left quote, looks like small "<<") not sign 0xac hyphen 0xad registered trademark 0xae macron 0xaf (an elevated small horizontal bar) degree symbol 0xb0 plus/minus 0xb1 superscript 2 0xb2 superscript 3 0xb3 acute accent 0xb4 mu 0xb5 paragraph symbol 0xb6 small centered dot 0xb7 cedilla 0xb8 superscript 1 0xb9 masculine ordinal 0xba (this is a subscripted lower-case "o", underlined) right guillemot 0xbb (French right quote, looks like small ">>") 1/4 0xbc 1/2 0xbd 3/4 0xbe inverted question mark 0xbf A with grave accent 0xc0 A with acute accent 0xc1 A with circumflex accent 0xc2 A with tilde 0xc3 A with diaeresis 0xc4 A with ring 0xc5 (as in "Angstrom") AE dipthong 0xc6 C with cedilla 0xc7 E with grave accent 0xc8 E with acute accent 0xc9 E with circumflex accent 0xca E with diaeresis 0xcb I with grave accent 0xcc I with acute accent 0xcd I with circumflex accent 0xce I with diaeresis 0xcf upper-case eth 0xd0 (eth is an Icelandic letter) N with tilde 0xd1 O with grave accent 0xd2 O with acute accent 0xd3 O with circumflex accent 0xd4 O with tilde 0xd5 O with diaeresis 0xd6 multiply sign 0xd7 O with slash 0xd8 U with grave accent 0xd9 U with acute accent 0xda U with circumflex accent 0xdb U with diaeresis 0xdc Y with acute accent 0xdd upper-case thorn 0xde (thorn is an Icelandic letter) German double-s 0xdf a with grave accent 0xe0 a with acute accent 0xe1 a with circumflex accent 0xe2 a with tilde 0xe3 a with diaeresis 0xe4 a with ring 0xe5 (lower-case "A with ring") ae dipthong 0xe6 c with cedilla 0xe7 e with grave accent 0xe8 e with acute accent 0xe9 e with circumflex accent 0xea e with diaeresis 0xeb i with grave accent 0xec i with acute accent 0xed i with circumflex accent 0xee i with diaeresis 0xef lower-case eth 0xf0 n with tilde 0xf1 o with grave accent 0xf2 o with acute accent 0xf3 o with circumflex accent 0xf4 o with tilde 0xf5 o with diaeresis 0xf6 division sign 0xf7 o with slash 0xf8 u with grave accent 0xf9 u with acute accent 0xfa u with circumflex accent 0xfb u with diaeresis 0xfc y with acute accent 0xfd lower-case thorn 0xfe y with diaeresis 0xff Volume-Number: Volume 13, Number 49 Article 294 of comp.std.internat: Path: apollo!ulowell!m2c!husc6!purdue!decwrl!video.dec.com!lasko From: lasko@video.dec.com (Tim - DSG Terminals Architecture - 223-2186) Newsgroups: comp.std.internat Subject: RE: ISO 8859 Message-ID: <8805122042.AA21976@decwrl.dec.com> Date: 12 May 88 23:19 GMT Organization: Digital Equipment Corporation Lines: 55 In response to: Richard Lee (rlee@ads.com) ISO 8859 consists of several parts, each part specifying a set of up to 191 graphic characters and the coded representations thereof by means of a single 8-bit byte. The use of control functions for the coded representation of composite characters (like o [backspace] /) is prohibited. Another major feature of each of the parts is that the "left hand" part, or the lower 94 graphic characters, is exactly the same as ASCII. The "right hand" part, or the higher-order 96 characters, are a mix of characters and symbols (paragraph sign, fractions, special punctuation, etc.) useful for the region covered by the part of the standard. Note that bit combinations A0 hex and FF hex constitute valid graphic characters (not control characters) in this character code. The parts of 8859 are simply character sets, they don't define keyboards, code extension mechanisms, or anything else. The character codes are based on the July 1986 version of ISO 4873 which specifies rules for eight-bit character codes, similar to the ISO 646 standard, which specified basic rules for seven-bit character codes. Each part of ISO 8859 is intended for use with a different set of languages or scripts: Part Name Status* Language/Region/Script 1 Latin Alphabet No 1 IS Feb 87 "Western European" 2 Latin Alphabet No 2 IS Feb 87 "Eastern European" 3 Latin Alphabet No 3 IS Mar 88 "Southern European" + S. Africa 4 Latin Alphabet No 4 IS Mar 88 Majority Scandinavian 5 Latin-Cyrillic Alphabet tbp IS 88 ASCII + Cyrillic characters 6 Latin-Arabic Alphabet IS Aug 87 ASCII + Arabic characters* 7 Latin-Greek Alphabet IS Nov 87 ASCII + Greek characters 8 Latin-Hebrew Alphabet tbp IS 88 ASCII + Hebrew characters 9 Latin Alphabet No 5 proposed modification of pt. 3 by Turkey * Status key: IS - approved international standard published at indicated date tbp IS - standard is approved, but not yet published proposed - draft text hasn't yet entered ISO ballot cycle The repertoires of parts 5 through 8 have been worked out with relevant experts in the affected countries, and in many cases form national standards as well. ISO 8859/1, Latin Alphabet No 1, is probably the one most U.S. manufacturers will want to be concerned with, since it covers the repertoires for the following languages: Danish, Dutch, English, Faeroese, Finnish, French, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish, and Swedish [Flemish, too if you consider it a separate language; it doesn't include Welsh]. That's probably more than you needed...let me know if you'd like more details. ================== Tim Lasko, Digital Equipment Corporation, Maynard, MA "There are no temporary workarounds..." lasko@video.dec.com lasko%video.dec@decwrl decwrl!video.dec.com!lasko