[net.internat] ISO Latin 1 alphabet

minow@decvax.UUCP (Martin Minow) (01/18/86)

"ISO Latin 1 8-bit alphabet, what is it?" -- these notes are mostly
from memory, and I apologize in advance for any errors.

Latin-1 is intended to replace the current mess of National Replacement
Character Sets (the ones that use any or all of #@[\]^`{|} for letters
that aren't in the US national alphabet that we usually call ASCII).

The alphabet is currently a draft international standard, being developed
by ISO, ANSI, and CBEMA (European Business Equipment Manufacturers).
It is very similar to the "Dec-Multinational" alphabet available
with the VT200-series terminals, and Dec's personal computers.
It suits the needs of the majority of Western European Latin-letter
languages, and there are proposals for "Latin-2" and "Latin-3" to
suit needs of Polish, Lithuanian, etc.

Latin-1 adds accented variants to upper- and lower-case vowels,
as well as a number of other language-specific letters.  There
are also a number of additional symbols.

AEIOU and aeiou are provided in grave, acute, circumflex, and umlaut
variants.  The following letters are also provided:

  A-ring and a-ring (Swedish, Danish, Finnish, Norwegian)
  AE and ae ligatures (Danish)
  A-tilde and a-tilde
  C-cedilla and c-cedilla (French)
  N-tilde and n-tilde (Spanish)
  O-tilde and o-tilde
  O-slash and o-slash (Danish, Norwegian)
  OE and oe ligatures (Danish)
  ss (German sharp-s)
  Y-umlaut and y-umlaut (French, also used for the ij ligature in Dutch)

The above refers only to Dec-Multinatinal.  Latin-1 adds a few more
letters -- I believe these include Icelandic th and dh, and Turkish
undotted-i and dotted-I.

While upper- and lower-case variants of the letters are related in
the same way as "standard" ASCII, the rules to convert between
cases are language-dependent.  For example, lower-case accented letters
generally lose their accents in French, but not in Swedish.

In preparing for Latin-1, you should carefully go over your programs
to remove any instance of "high-bit used for a flag".  Also,
programs such as grep that let you search for "any alphabetic"
or -- worse -- "upper-case" are going to need rethinking.

Hoping the above hasn't been too incorrect,

Martin Minow
decvax!minow

kay@warwick.UUCP (Kay Dekker) (02/05/86)

In article <163@decvax.UUCP> minow@decvax.UUCP (Martin minow) writes,
apologising in advance for inexactitudes, as he is doing it from memory,
and I (little smartass!) step in for a couple of corrections and a swipe
at my antepenultimately loathed editor:


>  Y-umlaut and y-umlaut (French, also used for the ij ligature in Dutch)

Since when has French used umlaute?

>For example, lower-case accented letters
>generally lose their accents in French

*Upper*-case letters in French usually lose their accents.

>In preparing for Latin-1, you should carefully go over your programs
>to remove any instance of "high-bit used for a flag".

Ho Boy!  isn't vi going to need rewriting...

						Kay.
-- 
Virtue is its own punishment.
			... mcvax!ukc!warwick!kay

mikeb@inset.UUCP (Mike Banahan) (02/05/86)

In article <402@snow.warwick.UUCP> kay@warwick.UUCP (Kay Dekker) writes:
> .. Isn't vi going to need rewriting
(to remove use of the eighth bit)
>						Kay.

Doesn't it just! But the work is under way as we speak. You should see
the Japanese vi that UNIX Pacific have done (though they call it jvi).
-- 
Mike Banahan, Technical Director, The Instruction Set Ltd.
mcvax!ukc!inset!mikeb

goudreau@dg_rtp.UUCP (Bob Goudreau) (02/07/86)

>Since when has French used umlaute?

For quite a long time.  For example, "Citro\:en", "Saint-Sa\:ens", "No\:el",
where "\:e" stands for umlaut-e.

Bob Goudreau

urban@spp2.UUCP (Mike Urban) (02/08/86)

In article <402@snow.warwick.UUCP> kay@warwick.UUCP (Kay Dekker) writes:
>
>>In preparing for Latin-1, you should carefully go over your programs
>>to remove any instance of "high-bit used for a flag".
>
>Ho Boy!  isn't vi going to need rewriting...
>
And everything else.  But is Latin-1 really suitable as a replacement
for any particular national character set?  In particular, it doesn't
collate correctly for any single country's  alphabetization scheme,
except of course the English-speakers.  And just
think of the rewrites for "isalpha" and all that stuff... 

I think we have a mess on our hands.

-- 

   Mike Urban
	...!trwrb!trwspp!spp2!urban 

"You're in a maze of twisty UUCP connections, all alike"

minow@decvax.UUCP (Martin Minow) (02/10/86)

(sigh)

Having started this mess, let me state that the two dots over
vowels (and perhaps y) fulfill different roles (and have different
names).   They define totally distinct vowels in the Scandinavian
and Finnish languages, a vowel modification in German, and a
syllable boundary in English and French.

The technical term for the two dots is "dieresis".

Since I can't spell dieresis (had to look it up) and assumed
the gentle reader would understand (or not care), I used a
more familiar term.  My apologies.

Dieresis is used in English and French to indicate
a syllable break.  Proper journals, such as the New York
Times and the New Yorker, add a dieresis to the second 'o'
of "cooperate" and most readers should be familiar with
"Noel" (Christmas) spelled with dieresis over the 'e'.

Hope this clears things up.

Martin Minow
decvax!minow

taylor@glasgow.glasgow.UUCP (Jem Taylor) (02/10/86)

In article <133@dg_rtp.UUCP> goudreau@dg_rtp.UUCP (Bob Goudreau) writes:
>>Since when has French used umlaute?
>For quite a long time.  For example, "Citro\:en", "Saint-Sa\:ens", "No\:el",
>where "\:e" stands for umlaut-e.

The point is that 'umlaut' is the german for a mark placed on a vowel to
indicate a vowel+letter-e combination - as in Go:ring/Goering.

In French the symbol 'trema' (visually identical to umlaut) is used on the
letters i and e to indicate that the sound is broken in two (Noe:l) rather
than flowing ( Noel, pronounced as per knoll ).

"Vive l'Alsace libre!"

-Jem

kay@warwick.UUCP (Kay Dekker) (02/12/86)

I asked (following a posting about the ISO Latin 1 alphabet):

>>Since when has French used umlaute?

and Bob Goudreau replied, saying:

>For quite a long time.  For example, "Citro\:en", "Saint-Sa\:ens", "No\:el",
>where "\:e" stands for umlaut-e.

Err, those aren't umlaute, (well, at least not in my book), they're
diaereses: marks to indicate that adjacent vowels should be pronounced
separately.  I believe my question still stands.

							Kay.



-- 
Virtue is its own punishment.
			... mcvax!ukc!warwick!kay

goudreau@dg_rtp.UUCP (02/17/86)

In article <360@glasgow.glasgow.UUCP> taylor@glasgow.UUCP (Jem Taylor) writes:
>>>Since when has French used umlaute?
>>For quite a long time.  For example, "Citro\:en", "Saint-Sa\:ens", "No\:el",
>>where "\:e" stands for umlaut-e.
>
>The point is that 'umlaut' is the german for a mark placed on a vowel to
>indicate a vowel+letter-e combination - as in Go:ring/Goering.
>
>In French the symbol 'trema' (visually identical to umlaut) is used on the
>letters i and e to indicate that the sound is broken in two (Noe:l) rather
>than flowing ( Noel, pronounced as per knoll ).
>
>"Vive l'Alsace libre!"
>
>-Jem

Actually, my point is that *any* information system's implementation of a French
character set *should* include a way of generating this character.  Whether
you want to call it "e avec trema", "umlaut - e" or even "yo" (as in Russian)
makes no difference.  The important issue is its distinction from plain "e"
or even from similar (but not identical) looking accents like the Hungarian
dieresis.

Bob Goudreau

wagner@utcs.uucp (Michael Wagner) (02/23/86)

In article <163@dg_rtp.UUCP> goudreau@dg_rtp.UUCP (Bob Goudreau) writes:
> (...)   The important issue is its distinction from plain "e"
>or even from similar (but not identical) looking accents like the Hungarian
>dieresis.
>
>Bob Goudreau

Well, my Hungarian dictionary, having been written to enable Hungarians
to learn English rather than to enable me to understand Hungarian, doesn't
give the proper name for these symbols.  But there are two of them.
One which looks like an oomlaut (although it has a different name), and
one where the two dots are stretched into lines that slope up and to the
right.  The second form lengthens the vowel but otherwise keeps it sounding
like the oomlaut form.  

Michael

goudreau@dg_rtp.UUCP (02/26/86)

In article <1118@utcs.uucp> wagner@utcs.UUCP (Michael Wagner) writes:
>In article <163@dg_rtp.UUCP> goudreau@dg_rtp.UUCP (Bob Goudreau) writes:
>> (...)   The important issue is its distinction from plain "e"
>>or even from similar (but not identical) looking accents like the Hungarian
>>dieresis.
>
>Well, my Hungarian dictionary, having been written to enable Hungarians
>to learn English rather than to enable me to understand Hungarian, doesn't
>give the proper name for these symbols.  But there are two of them.
>One which looks like an oomlaut (although it has a different name), and
>one where the two dots are stretched into lines that slope up and to the
>right.  The second form lengthens the vowel but otherwise keeps it sounding
>like the oomlaut form.  
>
>Michael

That's what I meant.  The distinction between these accents is sometimes
lost by non-Hungarian readers.

Bob Goudreau