[comp.std.internat] Latin-1 and the French language

ath@linkoping.telesoft.se (Anders Thulin) (02/03/91)

It was recently remarked in comp.text that ISO 8859-1 (Latin-1) does
not cover the major Western languages. As an example, it was noted
that the French letter <oe> (ligature of o and e) was not included
in any of the Latin-n tables.

I am trying to find out the reson for this apparent oversight.
Is <oe> an indispensable character in French? 

If anyone out there has any authoritative info about the curious
letter lower case y with dieresis - what language? why no upper-case
form in the Latin tables? - I would be very interested.

-- 
Anders Thulin       ath@linkoping.telesoft.se 
Telesoft Europe AB, Teknikringen 2B, S-583 30 Linkoping, Sweden

enag@ifi.uio.no (Erik Naggum) (02/04/91)

In article <728@castor.linkoping.telesoft.se> ath@linkoping.telesoft.se (Anders Thulin) writes:

> It was recently remarked in comp.text that ISO 8859-1 (Latin-1) does
> not cover the major Western languages. As an example, it was noted
> that the French letter <oe> (ligature of o and e) was not included
> in any of the Latin-n tables.

Neither are the ligatures fi, fl, ffi, and ffl.  These are truly
indespensible to typographers.  ISO (DIS) 10646 has these as well as
the oe ligature.

> I am trying to find out the reason for this apparent oversight.

While you're at it, can you try to find out what the hell the
multiplication and division signs are doing in the middle of the
accented characters, too?

> Is <oe> an indispensable character in French? 

The Frenchmen I've talked to recognize it as a ligature, only, unlike,
as I mentioned in comp.text, the Danish, Icelandic and Norwegian
character <ae>.  This is not a typographic convention, it's a special
character.  It's relevant for collation order, and other things.  The
French <oe> is supposed to be collated as the string "oe".

> If anyone out there has any authoritative info about the curious
> letter lower case y with dieresis - what language? why no upper-case
> form in the Latin tables? - I would be very interested.

Sorry, can't help here.  At least not yet, until I find the list of
languages in which it is used.  Maybe later today.

--
What's your favorite amphibian?  French girls.
--
[Erik Naggum]	Snail: Naggum Software / BOX 1570 VIKA / 0118 OSLO / NORWAY
		Mail: <erik@naggum.uu.no>, <enag@ifi.uio.no>
My opinions.	Wail: +47-2-836-863	Another int'l standards dude.

sandee@sun16.scri.fsu.edu (Daan Sandee) (02/04/91)

In article <728@castor.linkoping.telesoft.se> ath@linkoping.telesoft.se (Anders Thulin) writes:

>If anyone out there has any authoritative info about the curious
>letter lower case y with dieresis - what language? why no upper-case
>form in the Latin tables? - I would be very interested.
>
>-- 
>Anders Thulin       ath@linkoping.telesoft.se 
>Telesoft Europe AB, Teknikringen 2B, S-583 30 Linkoping, Sweden

Dutch has a lower case y with a dieresis ; spelled as ij when the character
is not available to the printer. For instance, Dijkstra (of structured
programming fame) has seven letters in his surname. Really.
There is no special capital letter ; printers use IJ. But NOTE: when
capitalized at the beginning of a word or sentence, it must be spelled IJ :
Ij is *wrong*. (All non-Dutch atlases show the lake of Ijsselmeer and the
city of Ijmuiden, while the real names are IJsselmeer and IJmuiden.) For
computerized typesetting it would therefore be easier to use a code for
capital IJ as well.
In the dictionaries, the character is collated as if spelled i-j ; i.e.,
*bijl* comes between *big* and *bikken*. But in phone books it is usually
lumped with y ; there are too many people called Meijer as well as Meyer.

Daan Sandee                                           sandee@sun16.scri.fsu.edu
Supercomputer Computations Research Institute
Florida State University, Tallahassee, FL 32306-4052  (904) 644-7045

lasko@regent.dec.com (Tim Lasko, Digital Equipment Corp., Westford, MA) (02/04/91)

Why no "oe" in ISO 8859-1:

It was the opinion of a majority of the members of the ISO working group that
developed ISO 8859-1, supported by a majority of the voting members of the
parent subcommittee, including the French national body, that "oe" and "OE"
were not characters but ligatures only of interest in typography.  Similarly,
other ligatures are also not included.  Capital Y with dieresis was removed
from the list because of its rarity. This left two holes in the code table that
were later filled with the multiplication and division sign--only a compromise
from the dozen-or-so characters that had been considered--to avoid
vendor-specific implementations of ISO 8859-1.

Of course, expert opinion can change. The French member body changed its mind
less than a year after publication of ISO 8859-1 and among the consequences is
one new proposed code table tentatively titled ISO Latin Alphabet No 7, based
on an AFNOR draft--possibly approved by now--standard covering the "Languages
of the EEC written using the Latin script".  And so it goes.

[I have had the privilege of sitting on the U.S. and ISO committees that
developed ISO 8859-1, although I joined late in its development.  It is an
interesting balance of compromise and technical effort.  The discussion on
comp.text has filtered into a number of other lists and while I did not see
that discussion, I can only point out that ISO 8859-1 was not intended to cover
*all* of the Western European languages. You just simply cannot do that in 191
character positions and include all of the lesser-used and minority languages. 
Welsh is an oft-cited oversight.]

Tim Lasko, Digital Equipment Corp., Westford MA  (lasko@regent.enet.dec.com)
Disclaimer: My opinions are my own; the facts can speak for themselves.

keld@login.dkuug.dk (Keld J|rn Simonsen) (02/05/91)

enag@ifi.uio.no (Erik Naggum) writes:

>In article <728@castor.linkoping.telesoft.se> ath@linkoping.telesoft.se (Anders Thulin) writes:

>> It was recently remarked in comp.text that ISO 8859-1 (Latin-1) does
>> not cover the major Western languages. As an example, it was noted
>> that the French letter <oe> (ligature of o and e) was not included
>> in any of the Latin-n tables.

The story as I know it is that the <oe> was not deemed nessecary
for the French language by AFNOR when ISO 8859-1 was in the works
and accepted. Later AFNOR changed its opinion, and has proposed
that ISO 8859-1 was changed to include the <oe> and other interesting
stuff, at the expense of the Icelandic letters eth and thorn.
This was voted down in SC2. Now AFNOR is proposing a new ISO 8859
part covering "EEC" - with the <oe> - we will se what happens to that.

>> I am trying to find out the reason for this apparent oversight.

>While you're at it, can you try to find out what the hell the
>multiplication and division signs are doing in the middle of the
>accented characters, too?

The multiplication and division signs were put there as the space would
otherwise be empty, and to avoid all kinds of incompatibilities with
vendors and the like assigning different characters to these positions,
SC2 placed these symbols there.

>> Is <oe> an indispensable character in French? 

Obviously the French have different opinions about this.
As I learnt it in school however, oeuf and boeuf was always spelled
with the <oe> letter/ligature. I am no Frenchman though.

Keld Simonsen

egr@contact.uucp (Gordan Palameta) (02/07/91)

In <ENAG.91Feb4001847@holmenkollen.ifi.uio.no> enag@ifi.uio.no (Erik Naggum) writes:

>In article <728@castor.linkoping.telesoft.se> ath@linkoping.telesoft.se (Anders Thulin) writes:

>> It was recently remarked in comp.text that ISO 8859-1 (Latin-1) does
>> not cover the major Western languages. As an example, it was noted
>> that the French letter <oe> (ligature of o and e) was not included
>> in any of the Latin-n tables.

>> I am trying to find out the reason for this apparent oversight.

>While you're at it, can you try to find out what the hell the
>multiplication and division signs are doing in the middle of the
>accented characters, too?


These two things are directly related:  OE and oe were dropped from the
original Latin-1 proposal (at the request of the French representative,
no less, on the grounds that this is a ligature and not a separate letter).

Since the two empty slots had to filled, the multiplication and division signs
were finally chosen out of a number of other possible replacements...

henry@zoo.toronto.edu (Henry Spencer) (02/07/91)

In article <2078@sun13.scri.fsu.edu> sandee@sun16.scri.fsu.edu (Daan Sandee) writes:
>Ij is *wrong*. (All non-Dutch atlases show the lake of Ijsselmeer and the
>city of Ijmuiden, while the real names are IJsselmeer and IJmuiden.) ...

Let us not be too dogmatic about this.  The Times Atlas of the World gets
it right, and I could have sworn the Times isn't Dutch... :-)
-- 
"Maybe we should tell the truth?"      | Henry Spencer at U of Toronto Zoology
"Surely we aren't that desperate yet." |  henry@zoo.toronto.edu   utzoo!henry

enag@ifi.uio.no (Erik Naggum) (02/08/91)

In article <1991Feb7.015202.29053@contact.uucp>, Gordan Palameta writes:
>These two things are directly related: OE and oe were dropped from
>the original Latin-1 proposal (at the request of the French
>representative, no less, on the grounds that this is a ligature and
>not a separate letter).

Sigh!  If the French attempt to boycott ISO 8859-1 as the one-octet
default for ISO 10646, and want their own ISO 8859-n (for some large
n) why can't we just "update" ISO 8859-1 by re-inserting those OE and
oe ligatures right in the middle of the other "O with random squiggle"
series?

I'm not impressed by this counter-productivity and random politicking.

--
[Erik Naggum]					     <enag@ifi.uio.no>
Naggum Software, Oslo, Norway			   <erik@naggum.uu.no>

Philippe.Deschamp@Seti.INRIA.Fr (Philippe Deschamp) (02/15/91)

>>>>> AT == ath@linkoping.telesoft.se (Anders Thulin)
>>>>> EN == enag@ifi.uio.no (Erik Naggum)
>>>>> GP == egr@contact.uucp (Gordan Palameta)

AT> It was recently remarked in comp.text that ISO 8859-1 (Latin-1) does not
AT> cover the major Western languages. As an example, it was noted that the
AT> French letter <oe> (ligature of o and e) was not included in any of the
AT> Latin-n tables.

AT> I am trying to find out the reason for this apparent oversight.

GP> OE and oe were dropped from the original Latin-1 proposal (at the request
GP> of the French representative, no less, on the grounds that this is a
GP> ligature and not a separate letter).

   Never believe what experts say :-).  This is a sad story!

AT> Is <oe> an indispensable character in French? 

   Yes (I should add, IMHO, but somehow cannot :-).  Some words will use "oe"
(two separate letters), some others <oe> (the [in]famous so-called ligature).
Examples: oeil (eye), oeuf (egg), boeuf (ox), oeuvre (work, opus), coeur
(heart) all use the ligature <oe>, and must be written <oe>il, <oe>uf, b<oe>uf,
<oe>uvre, c<oe>ur, while coefficient, coercition, coexister (self-explanatory)
or boette (a kind of bait) do not use it.

   Thus this ``ligature'' is different from the "ff", "fi", "ffi" ligatures,
which are imposed by typographers as soon as the characters occur together: I
write "coefficient", and I want it to appear on paper as "coe<ffi>cient".

EN> Sigh!  If the French attempt to boycott ISO 8859-1 as the one-octet default
EN> for ISO 10646, and want their own ISO 8859-n (for some large n) why can't
EN> we just "update" ISO 8859-1 by re-inserting those OE and oe ligatures right
EN> in the middle of the other "O with random squiggle" series?

   I would second this kind of proposition, but I am afraid it is too late.

EN> I'm not impressed by this counter-productivity and random politicking.

   I do not want to comment on that.  The only thing I have to say is that I
would like to be able to use ISO 8859 to write texts in the french language,
and at the moment this is not possible with only ISO 8859-1.
-- 
					Philippe Deschamp.
Tlx: 697033F   Fax: +33 (1) 39-63-53-30   Tel: +33 (1) 39-63-58-58
Email: Philippe.Deschamp@Nuri.INRIA.Fr   ||   ...!inria!deschamp
Smail: INRIA, Rocquencourt, BP 105, 78153 Le Chesnay Cedex, France

huitema@jerry.inria.fr (Christian Huitema) (02/22/91)

In article <1941@seti.inria.fr>, Philippe.Deschamp@Seti.INRIA.Fr (Philippe
Deschamp) writes:
>    Yes (I should add, IMHO, but somehow cannot :-).  Some words will use
> "oe"
> (two separate letters), some others <oe> (the [in]famous so-called
> ligature).
> Examples: oeil (eye), oeuf (egg), boeuf (ox), oeuvre (work, opus), coeur
> (heart) all use the ligature <oe>, and must be written <oe>il, <oe>uf,
> b<oe>uf,
> <oe>uvre, c<oe>ur, while coefficient, coercition, coexister
> (self-explanatory)
> or boette (a kind of bait) do not use it.
> 
>    Thus this ``ligature'' is different from the "ff", "fi", "ffi" ligatures,
> which are imposed by typographers as soon as the characters occur together:
> I
> write "coefficient", and I want it to appear on paper as "coe<ffi>cient".

Three comments:

1- the <oe> group is really a ligature. Traditional directory sorting request
that
   <oe> be sorted as the group <o> <e>. Representing it by a single character
   would not help much.

2- there is a general rule on "when to apply the ligature", and that is "when
the
   <e> is mute". The ligature shall not be applied if the e is accentuated, or
   marked by a diaresis, or is necessary to "sound" the next letter. That
could
   easily be programmed -- without the help of a dictionnary.

3- moreover, the absence of the ligature has absolutely no impact on 
   prononciation and/or comprehension.

Like many specificities of the French written form, this ligature is much more
a scholastic mark of elegance than an improvement in readibility.  Leaving it
as two characters is, in my opinion, a good idea...

Christian Huitema