ath@linkoping.telesoft.se (Anders Thulin) (02/03/91)
It was recently remarked in comp.text that ISO 8859-1 (Latin-1) does not cover the major Western languages. As an example, it was noted that the French letter <oe> (ligature of o and e) was not included in any of the Latin-n tables. I am trying to find out the reson for this apparent oversight. Is <oe> an indispensable character in French? If anyone out there has any authoritative info about the curious letter lower case y with dieresis - what language? why no upper-case form in the Latin tables? - I would be very interested. -- Anders Thulin ath@linkoping.telesoft.se Telesoft Europe AB, Teknikringen 2B, S-583 30 Linkoping, Sweden
enag@ifi.uio.no (Erik Naggum) (02/04/91)
In article <728@castor.linkoping.telesoft.se> ath@linkoping.telesoft.se (Anders Thulin) writes: > It was recently remarked in comp.text that ISO 8859-1 (Latin-1) does > not cover the major Western languages. As an example, it was noted > that the French letter <oe> (ligature of o and e) was not included > in any of the Latin-n tables. Neither are the ligatures fi, fl, ffi, and ffl. These are truly indespensible to typographers. ISO (DIS) 10646 has these as well as the oe ligature. > I am trying to find out the reason for this apparent oversight. While you're at it, can you try to find out what the hell the multiplication and division signs are doing in the middle of the accented characters, too? > Is <oe> an indispensable character in French? The Frenchmen I've talked to recognize it as a ligature, only, unlike, as I mentioned in comp.text, the Danish, Icelandic and Norwegian character <ae>. This is not a typographic convention, it's a special character. It's relevant for collation order, and other things. The French <oe> is supposed to be collated as the string "oe". > If anyone out there has any authoritative info about the curious > letter lower case y with dieresis - what language? why no upper-case > form in the Latin tables? - I would be very interested. Sorry, can't help here. At least not yet, until I find the list of languages in which it is used. Maybe later today. -- What's your favorite amphibian? French girls. -- [Erik Naggum] Snail: Naggum Software / BOX 1570 VIKA / 0118 OSLO / NORWAY Mail: <erik@naggum.uu.no>, <enag@ifi.uio.no> My opinions. Wail: +47-2-836-863 Another int'l standards dude.
sandee@sun16.scri.fsu.edu (Daan Sandee) (02/04/91)
In article <728@castor.linkoping.telesoft.se> ath@linkoping.telesoft.se (Anders Thulin) writes: >If anyone out there has any authoritative info about the curious >letter lower case y with dieresis - what language? why no upper-case >form in the Latin tables? - I would be very interested. > >-- >Anders Thulin ath@linkoping.telesoft.se >Telesoft Europe AB, Teknikringen 2B, S-583 30 Linkoping, Sweden Dutch has a lower case y with a dieresis ; spelled as ij when the character is not available to the printer. For instance, Dijkstra (of structured programming fame) has seven letters in his surname. Really. There is no special capital letter ; printers use IJ. But NOTE: when capitalized at the beginning of a word or sentence, it must be spelled IJ : Ij is *wrong*. (All non-Dutch atlases show the lake of Ijsselmeer and the city of Ijmuiden, while the real names are IJsselmeer and IJmuiden.) For computerized typesetting it would therefore be easier to use a code for capital IJ as well. In the dictionaries, the character is collated as if spelled i-j ; i.e., *bijl* comes between *big* and *bikken*. But in phone books it is usually lumped with y ; there are too many people called Meijer as well as Meyer. Daan Sandee sandee@sun16.scri.fsu.edu Supercomputer Computations Research Institute Florida State University, Tallahassee, FL 32306-4052 (904) 644-7045
lasko@regent.dec.com (Tim Lasko, Digital Equipment Corp., Westford, MA) (02/04/91)
Why no "oe" in ISO 8859-1: It was the opinion of a majority of the members of the ISO working group that developed ISO 8859-1, supported by a majority of the voting members of the parent subcommittee, including the French national body, that "oe" and "OE" were not characters but ligatures only of interest in typography. Similarly, other ligatures are also not included. Capital Y with dieresis was removed from the list because of its rarity. This left two holes in the code table that were later filled with the multiplication and division sign--only a compromise from the dozen-or-so characters that had been considered--to avoid vendor-specific implementations of ISO 8859-1. Of course, expert opinion can change. The French member body changed its mind less than a year after publication of ISO 8859-1 and among the consequences is one new proposed code table tentatively titled ISO Latin Alphabet No 7, based on an AFNOR draft--possibly approved by now--standard covering the "Languages of the EEC written using the Latin script". And so it goes. [I have had the privilege of sitting on the U.S. and ISO committees that developed ISO 8859-1, although I joined late in its development. It is an interesting balance of compromise and technical effort. The discussion on comp.text has filtered into a number of other lists and while I did not see that discussion, I can only point out that ISO 8859-1 was not intended to cover *all* of the Western European languages. You just simply cannot do that in 191 character positions and include all of the lesser-used and minority languages. Welsh is an oft-cited oversight.] Tim Lasko, Digital Equipment Corp., Westford MA (lasko@regent.enet.dec.com) Disclaimer: My opinions are my own; the facts can speak for themselves.
keld@login.dkuug.dk (Keld J|rn Simonsen) (02/05/91)
enag@ifi.uio.no (Erik Naggum) writes: >In article <728@castor.linkoping.telesoft.se> ath@linkoping.telesoft.se (Anders Thulin) writes: >> It was recently remarked in comp.text that ISO 8859-1 (Latin-1) does >> not cover the major Western languages. As an example, it was noted >> that the French letter <oe> (ligature of o and e) was not included >> in any of the Latin-n tables. The story as I know it is that the <oe> was not deemed nessecary for the French language by AFNOR when ISO 8859-1 was in the works and accepted. Later AFNOR changed its opinion, and has proposed that ISO 8859-1 was changed to include the <oe> and other interesting stuff, at the expense of the Icelandic letters eth and thorn. This was voted down in SC2. Now AFNOR is proposing a new ISO 8859 part covering "EEC" - with the <oe> - we will se what happens to that. >> I am trying to find out the reason for this apparent oversight. >While you're at it, can you try to find out what the hell the >multiplication and division signs are doing in the middle of the >accented characters, too? The multiplication and division signs were put there as the space would otherwise be empty, and to avoid all kinds of incompatibilities with vendors and the like assigning different characters to these positions, SC2 placed these symbols there. >> Is <oe> an indispensable character in French? Obviously the French have different opinions about this. As I learnt it in school however, oeuf and boeuf was always spelled with the <oe> letter/ligature. I am no Frenchman though. Keld Simonsen
egr@contact.uucp (Gordan Palameta) (02/07/91)
In <ENAG.91Feb4001847@holmenkollen.ifi.uio.no> enag@ifi.uio.no (Erik Naggum) writes: >In article <728@castor.linkoping.telesoft.se> ath@linkoping.telesoft.se (Anders Thulin) writes: >> It was recently remarked in comp.text that ISO 8859-1 (Latin-1) does >> not cover the major Western languages. As an example, it was noted >> that the French letter <oe> (ligature of o and e) was not included >> in any of the Latin-n tables. >> I am trying to find out the reason for this apparent oversight. >While you're at it, can you try to find out what the hell the >multiplication and division signs are doing in the middle of the >accented characters, too? These two things are directly related: OE and oe were dropped from the original Latin-1 proposal (at the request of the French representative, no less, on the grounds that this is a ligature and not a separate letter). Since the two empty slots had to filled, the multiplication and division signs were finally chosen out of a number of other possible replacements...
henry@zoo.toronto.edu (Henry Spencer) (02/07/91)
In article <2078@sun13.scri.fsu.edu> sandee@sun16.scri.fsu.edu (Daan Sandee) writes: >Ij is *wrong*. (All non-Dutch atlases show the lake of Ijsselmeer and the >city of Ijmuiden, while the real names are IJsselmeer and IJmuiden.) ... Let us not be too dogmatic about this. The Times Atlas of the World gets it right, and I could have sworn the Times isn't Dutch... :-) -- "Maybe we should tell the truth?" | Henry Spencer at U of Toronto Zoology "Surely we aren't that desperate yet." | henry@zoo.toronto.edu utzoo!henry
enag@ifi.uio.no (Erik Naggum) (02/08/91)
In article <1991Feb7.015202.29053@contact.uucp>, Gordan Palameta writes: >These two things are directly related: OE and oe were dropped from >the original Latin-1 proposal (at the request of the French >representative, no less, on the grounds that this is a ligature and >not a separate letter). Sigh! If the French attempt to boycott ISO 8859-1 as the one-octet default for ISO 10646, and want their own ISO 8859-n (for some large n) why can't we just "update" ISO 8859-1 by re-inserting those OE and oe ligatures right in the middle of the other "O with random squiggle" series? I'm not impressed by this counter-productivity and random politicking. -- [Erik Naggum] <enag@ifi.uio.no> Naggum Software, Oslo, Norway <erik@naggum.uu.no>
Philippe.Deschamp@Seti.INRIA.Fr (Philippe Deschamp) (02/15/91)
>>>>> AT == ath@linkoping.telesoft.se (Anders Thulin) >>>>> EN == enag@ifi.uio.no (Erik Naggum) >>>>> GP == egr@contact.uucp (Gordan Palameta) AT> It was recently remarked in comp.text that ISO 8859-1 (Latin-1) does not AT> cover the major Western languages. As an example, it was noted that the AT> French letter <oe> (ligature of o and e) was not included in any of the AT> Latin-n tables. AT> I am trying to find out the reason for this apparent oversight. GP> OE and oe were dropped from the original Latin-1 proposal (at the request GP> of the French representative, no less, on the grounds that this is a GP> ligature and not a separate letter). Never believe what experts say :-). This is a sad story! AT> Is <oe> an indispensable character in French? Yes (I should add, IMHO, but somehow cannot :-). Some words will use "oe" (two separate letters), some others <oe> (the [in]famous so-called ligature). Examples: oeil (eye), oeuf (egg), boeuf (ox), oeuvre (work, opus), coeur (heart) all use the ligature <oe>, and must be written <oe>il, <oe>uf, b<oe>uf, <oe>uvre, c<oe>ur, while coefficient, coercition, coexister (self-explanatory) or boette (a kind of bait) do not use it. Thus this ``ligature'' is different from the "ff", "fi", "ffi" ligatures, which are imposed by typographers as soon as the characters occur together: I write "coefficient", and I want it to appear on paper as "coe<ffi>cient". EN> Sigh! If the French attempt to boycott ISO 8859-1 as the one-octet default EN> for ISO 10646, and want their own ISO 8859-n (for some large n) why can't EN> we just "update" ISO 8859-1 by re-inserting those OE and oe ligatures right EN> in the middle of the other "O with random squiggle" series? I would second this kind of proposition, but I am afraid it is too late. EN> I'm not impressed by this counter-productivity and random politicking. I do not want to comment on that. The only thing I have to say is that I would like to be able to use ISO 8859 to write texts in the french language, and at the moment this is not possible with only ISO 8859-1. -- Philippe Deschamp. Tlx: 697033F Fax: +33 (1) 39-63-53-30 Tel: +33 (1) 39-63-58-58 Email: Philippe.Deschamp@Nuri.INRIA.Fr || ...!inria!deschamp Smail: INRIA, Rocquencourt, BP 105, 78153 Le Chesnay Cedex, France
huitema@jerry.inria.fr (Christian Huitema) (02/22/91)
In article <1941@seti.inria.fr>, Philippe.Deschamp@Seti.INRIA.Fr (Philippe Deschamp) writes: > Yes (I should add, IMHO, but somehow cannot :-). Some words will use > "oe" > (two separate letters), some others <oe> (the [in]famous so-called > ligature). > Examples: oeil (eye), oeuf (egg), boeuf (ox), oeuvre (work, opus), coeur > (heart) all use the ligature <oe>, and must be written <oe>il, <oe>uf, > b<oe>uf, > <oe>uvre, c<oe>ur, while coefficient, coercition, coexister > (self-explanatory) > or boette (a kind of bait) do not use it. > > Thus this ``ligature'' is different from the "ff", "fi", "ffi" ligatures, > which are imposed by typographers as soon as the characters occur together: > I > write "coefficient", and I want it to appear on paper as "coe<ffi>cient". Three comments: 1- the <oe> group is really a ligature. Traditional directory sorting request that <oe> be sorted as the group <o> <e>. Representing it by a single character would not help much. 2- there is a general rule on "when to apply the ligature", and that is "when the <e> is mute". The ligature shall not be applied if the e is accentuated, or marked by a diaresis, or is necessary to "sound" the next letter. That could easily be programmed -- without the help of a dictionnary. 3- moreover, the absence of the ligature has absolutely no impact on prononciation and/or comprehension. Like many specificities of the French written form, this ligature is much more a scholastic mark of elegance than an improvement in readibility. Leaving it as two characters is, in my opinion, a good idea... Christian Huitema