[comp.std.internat] ASCII for national characters

sommar@enea.se (Erland Sommarskog) (11/19/89)

(This is hardly news for comp.std.internat readers, but the
subject belongs to that group.)

Salmela Jarmo (js@kaarne.tut.fi) writes:
>PS. The ASCII standard that supports national characters is really
>needed.

Well, ASCII supports all national characters it can think of.
I.e, American.

But, seriously it exists. The standard you want is ISO 8859,
which is a family of eight-bit standards, all with good all
ASCII in the 0-127 slots, new control characters in 128-159,
non-break space in 160 and "soft hyphen" in ord('-') + 128.
Then the rest is different in the various standards, which
are five standards with Latin characters, and one each with
Kyrillic, Arabian, Hebrew and Greek characters. I don't if
all of them are settled, but at least Latin-1 and Latin-2 are.

One can predict that for the next few years Latin-1 will be the
most important since it covers all major Western European languages
except Welsh and Catalan I think. Latin-2 covers Eastern European
languages.

Then of course there is problem to start posting Usenet articles
from your VT320 using Latin-1. People with seven-bit terminals,
of which there probably are a few, will get the new characters
folded into old making your text quite incomprehensible, even
worse than those brackets and braces you get using the national
seven-bit conventions for dotted "a":s and "o":s.
-- 
Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se

heimir@rhi.hi.is (Heimir Thor Sverrisson) (11/20/89)

sommar@enea.se (Erland Sommarskog) writes:

... deleted description of the eight bit character set standard, ISO 8859
(especially ISO 8859/1 or Latin-1).

>Then of course there is problem to start posting Usenet articles
>from your VT320 using Latin-1. People with seven-bit terminals,
>of which there probably are a few, will get the new characters
>folded into old making your text quite incomprehensible, even
>worse than those brackets and braces you get using the national
>seven-bit conventions for dotted "a":s and "o":s.

People with seven bit terminals can put filters on their news readers
so they get something meaningful out of the eight bit charaters. They
could for example translate the upper case icelandic thorn into 'Th'
and 'o accute' into 'o'. Then I would be able to use my middle name 
SPELLED CORRECTLY in my signature. I could also send you direct mail 
in Danish and you could answer me in Swedish. 

We have been using the ISO set here in Iceland for some years now and 
I'm very surprised of how far behind the Scandinavian contries are in 
this sense, they all seem to be using (their own special version of) 
seven bit modified ASCII sets.
--
Heimir Thor Sverrisson
heimir@rhi.hi.is

minow@mountn.dec.com (Martin Minow) (11/20/89)

In article <472@enea.se> sommar@enea.se (Erland Sommarskog) writes:
>
>Salmela Jarmo (js@kaarne.tut.fi) writes:
>>PS. The ASCII standard that supports national characters is really
>>needed.
>
>Well, ASCII supports all national characters it can think of.
>I.e, American.

ASCII is, strictly speaking, the "national character set" for
the United States.  It's one of a family of "national character
sets" standardized under ISO-646.  National standardization
authorities are empowered to define 12 (if I remember correctly)
of the character positions to suit their country's needs.  For
example, the United Kingdom replaces the "number sign" by "Pound
Sterling", while the Scandinavian countries define the character
positions past 'Z' and 'z' to support their national letters.

VT200/VT300 compatible terminals generally support about a dozen
different national replacement sets.

The standardization bodies realized in the early 1980's that ISO-646
was not a satisfactory solution, and built on the Dec Multinational
character set to form ISO Latin-1, along with a structure that will
define a family of 96 character supplemental sets.  (ISO/ECMA has
standardized about a dozen sets for Slavic, Lappish, Greek, and Hebrew,
among others.)

There are long-term plans to develop a 32-bit "universal" character set
that can be used to communicate among all written languges.  Much of
that space will be used for the Asian ideographic languages (China,
Taiwan, Korea, and Japan).  ISO 10646 is the working title of that
standard.  (No, you won't have to buy more memory: there will be
control sequences to let you select a slice of the character set space.)

Hope this clarifies matters.  This note does not represent the position
of Digital Equipment Corporation.

Martin Minow
minow@thundr.enet.dec.com

torkil@psivax.UUCP (Torkil Hammer) (11/21/89)

In article <472@enea.se> sommar@enea.se (Erland Sommarskog) writes:
#(This is hardly news for comp.std.internat readers, but the
#subject belongs to that group.)
#
#Salmela Jarmo (js@kaarne.tut.fi) writes:
#>PS. The ASCII standard that supports national characters is really
#>needed.
#
#Well, ASCII supports all national characters it can think of.
#I.e, American.
#
#But, seriously it exists. The standard you want is ISO 8859,
#which is a family of eight-bit standards, all with good all
#ASCII in the 0-127 slots, new control characters in 128-159,
#non-break space in 160 and "soft hyphen" in ord('-') + 128.
#Then the rest is different in the various standards, which
#are five standards with Latin characters, and one each with
#Kyrillic, Arabian, Hebrew and Greek characters. I don't if
#all of them are settled, but at least Latin-1 and Latin-2 are.

What I read is that the ESO got botched.  Some national letters
were overlooked, including the slashed o used in Danish and Norwegian
for the umlaut o, written o: in other European languages.
It does not help that the upper case variety of that letter is rather
close to the slashed zero used in USA to tell it from the letter O.
Danes are not likely to tolerate the o: as a substitute, and I doubt
Norwegians are.  WW2 and 1905 and such.

Can anybody confirm?

Torkil Hammer

rlee@weaver.ads.com (Richard Lee) (11/21/89)

Erland Sommarskog writes:

  The standard you want is ISO 8859, which is a family of eight-bit
  standards, all with good all [sic] ASCII in the 0-127 slots, new
  control characters in 128-159, non-break space in 160 and "soft
  hyphen" in ord('-') + 128.

As far as control characters are concerned, 8859 just states that 00/00
through 01/15 (0-31) and 07/15 through 09/15 (127-159) are non-graphic
characters not defined by the standard.  To quote from ISO 8859-1: 1987
(E):  "Their use is outside the scope of ISO 8859; it is specified in
other International Standards, for example ISO 646 or ISO 6429."  Or has
that changed since 1987?

--
 RICHARD LEE     rlee@ads.com  or  ...!{sri-spam | ames}!zodiac!rlee
 415-960-7300    ADS, 1500 Plymouth St., Mtn. View CA 94043-1230

psv@nada.kth.se (Peter Svanberg) (11/21/89)

In article <1353@krafla.rhi.hi.is>
heimir@rhi.hi.is (Heimir Thor Sverrisson) writes:
>
>People with seven bit terminals can put filters on their news readers
>so they get something meaningful out of the eight bit charaters. They
>could for example translate the upper case icelandic thorn into 'Th'
>and 'o accute' into 'o'. Then I would be able to use my middle name 
>SPELLED CORRECTLY in my signature. I could also send you direct mail 
>in Danish and you could answer me in Swedish. 
>

As usual, when you change fundamental things like this, you must make
it as invisible as possible for everybody who hasn't got the equipment
for or isn't interested in the improvements you can get as a
consequence of the change. So, those who want the improvements is the
ones who must make an effort to GET them, not everybody else to AVOID
them (at least not when "everybody else" is in great majority).

>We have been using the ISO set here in Iceland for some years now and 
>I'm very surprised of how far behind the Scandinavian contries are in 
>this sense, they all seem to be using (their own special version of) 
>seven bit modified ASCII sets.

There are a number of problems with converting to use an eight bit
character set. A large one is that most of the software and hardware
we use doesn't know anything about it. (Yes, this is slowly changing
now, but it isn't good yet, and certainly was not several years ago!)

What did you use before? Have you really converted to ISO 8859-1
everywhere in Iceland? On which operating systems?

Other differences between us and you is that you have more non-ASCII
characters than we have and that you - being a small isolated country
- are very caring of your language etc. (For us it's rather the
opposite on the latter point.)

But, as I said, things are changing. I predict some character set
confusion (of another kind than the current) in Europe in the next few
years, followed by - comparatively - calm, in perhaps five years.
---
psv@nada.kth.se			(should work!)	       Peter Svanberg
uunet!nada.kth.se!psv		(for lazy nodes...)    Dept of Num An & CS
psv%nada.kth.se@uunet.uu.net	(ARPA nodes)	       Royal Institute of Tech
						       Stockholm, SWEDEN

ok@mudla.cs.mu.OZ.AU (Richard O'Keefe) (11/21/89)

In article <2942@psivax.UUCP>, torkil@psivax.UUCP (Torkil Hammer) writes:
> What I read is that the ESO got botched.  Some national letters
> were overlooked, including the slashed o used in Danish and Norwegian
> for the umlaut o, written o: in other European languages.

In ISO 8859/1,
	D8   = 216   = upper-case-O-with-a-slash-through-it
	  16      10

	F8   = 248   = lower-case-o-with-a-slash-through it
	  16	  10

> Danes are not likely to tolerate the o: as a substitute, and I doubt
> Norwegians are.

If they use ISO 8859/1, they don't have to use o: as a substitute.
(Now if only 8859/1 had included 66--99 and 6--9 quotation marks...)

minow@mountn.dec.com (Martin Minow) (11/22/89)

In article <2942@psivax.UUCP> torkil@psivax.UUCP (Torkil Hammer) states
that slashed-O was omitted from ISO 8859-1.  Actually, upper- and
lower-case variants are in Latin-1 at hex D8 and F8 respectively
(assuming Latin-1 is in the "right half" of the code space).

However, late in the development of Latin-1, the OE and oe ligature
characters were removed, and were replaced by the "multiply" and
"division" signs.  (I will not defend this decision.)

Another missing character is the Dutch ij ligature, which is imperfectly
represented by y-dieresis.  Otherwise, it seems to me that most Western
European languages are well-supported by Latin-1.  Exceptions include
Turkish and the Slavic languages written in Roman letters.

Martin Minow
minow@thundr.enet.dec.com
The above does not represent the position of Digital Equipment Corporation

minow@mountn.dec.com (Martin Minow) (11/22/89)

I received a mail request for the differences between Latin-1 and
Dec Multinational (as implemented on the VT200 and VT300 series terminals).
The following might be of interest to others.

ISO Latin-1 is almost identical to Multinational.  The "blank spots"
in Multinational were filled in, and one or two character were changed,
possibly so Dec wouldn't have a competitive advantage.  We released
our first products with multinational in around 1983-84, during the
standardization process for Latin-1.  Both tables are in the VT300
documentation.  Here is how to convert Multinational to Latin-1,
(assuming that Latin-1/Multinational is in the right-half of the 8-bit
code space):

  A4	add currency symbol
  A6	add broken vertical bar
  A8	remove currency symbol, add dieresis
  AC	add logical not symbol
  AD	add small dash (soft hyphen)
  AE	add "registered" symbol (R inside a circle)
  AF	add macron (raised horizontal line)
  B4	add acute accent
  B8	add cedilla (comma, centered in the display area)
  BE	add 3/4
  D0	add Icelandic capital D-
  D7	remove OE, add multiplication sign
  DD	remove Y-dieresis, add Y with acute accent
  DE	add Icelandic capital Thorn (looks like Greek theta)
  F0	add Icelandic lower-case d-
  F7	remove oe, add division sign
  FD	remove y-dieresis, add y with acute accent
  FE	add Icelandic lower-case thorn
  FF	add y-dieresis (exists in lower-case only)

This is, of course, not an official list; and I apologize for any errors.

Martin Minow
minow@thundr.enet.dec.com
The above does not represent the position of Digital Equipment Corporation

dolf@idca.tds.PHILIPS.nl (Dolf Grunbauer) (11/22/89)

In article <1083@mountn.dec.com> minow@mountn.dec.com (Martin Minow) writes:
>Another missing character is the Dutch ij ligature, which is imperfectly
>represented by y-dieresis.

The 'ij' is *not* a special character in the Dutch language. It is only a
very common sequence of two characters in our language. We have the
normal (:-) 26 letter alphabet. You can find words containing 'ij' in our
dictionaries in between ..ii.. and ..ik.., so not after the 'z' (like the
special Scandinavian character in their dictionaries) or near the 'y'.
Only our telephone company (called PTT) does not know their own alphabet
and mix the 'ij' with 'y', which is of course *very* confusing :-(, e.g.
    Meijer ... 123456
    Meyer .... 234567
    Meijers .. 345678
    Meyers ... 456789
I am aware of the fact that due to the common use of 'ij' some typewriters
and keyboards have a special key for this string of characters.
I think that the Dutch version of WordPerfect has a special character for
'ij' and this really shows nice on the output because it is (almost) as
wide as one 'm' or 'w'.

-- 
Dolf Grunbauer          Tel: +31 55 433233  Internet dolf@idca.tds.philips.nl
Philips Telecommunication and Data Systems  UUCP ....!mcvax!philapd!dolf
Dept. SSP, P.O. Box 245, 7300 AE Apeldoorn, The Netherlands
           --> Holland is only 1/6 of the Netherlands <--

magnus@rhi.hi.is (Magnus Gislason) (11/22/89)

minow@mountn.dec.com (Martin Minow) writes:

>ISO Latin-1 is almost identical to Multinational.  The "blank spots"
>in Multinational were filled in, and one or two character were changed,
>possibly so Dec wouldn't have a competitive advantage.  We released

I think the reason why ISO did not just adopt DEC Multinational as Latin-1
is because Multinational does not include all Icelandic national characters
(Iceland is a part of Western Europe).  There wasn't room for all of them
in the "blank spots" in the upper quarter of Multinational (C0-FF).  When
DEC came up with Multinational we couldn't use it here in Iceland, so DEC
made an Icelandic version of Multinational, and of course they couldn't
put the missing Icelandic characters in the same places as in ISO Latin-1.

>  D0	add Icelandic capital D-
>  D7	remove OE, add multiplication sign
>  DD	remove Y-dieresis, add Y with acute accent
>  DE	add Icelandic capital Thorn (looks like Greek theta)
                                     ^^^^^
This should be "sounds", they look different.

>  F0	add Icelandic lower-case d-
>  F7	remove oe, add division sign
>  FD	remove y-dieresis, add y with acute accent
>  FE	add Icelandic lower-case thorn
>  FF	add y-dieresis (exists in lower-case only)

As you can see from this list the changes are mainly concerning the
Icelandic characters D-, Thorn and Y with acute accent (the Y-acute is
not used in any other Western European language, as far as I know).

	Magnus

finn@mojo.UUCP (Finn Markmanrud) (11/22/89)

Please be kind to us poor beginners! I have no ideas on how to convert ^ to Th
or anything similar. Being the only Norwegian in the company (I think), I am
pretty sure I cannot get a request through to include this on our system. Some
day I might be able to make my own conversion in my own directory, but until
then, I would appreciate being able to read mail & news from my Scandinavian
friends. Most of them use oe, ae, and aa as substitutes, and it works very 
well. We use 7-bits, and from what I hear, this is no longer any good. Am I
about to loose touch with my old country / continent? 
Maybe it's not as bad as it sounds, but I thought I'd remind all you whiz's out
there that there are a few people who call themselves "users," and do just 
that - use the facilities provided. Please be gentle!

                                     


-- 
+=====================+========================+=============================+
|   Finn Markmanrud   |   finn@mojo.nec.com    |   "It can't happen here."   | 
|   (508) 264 8668    |      Boxboro, MA       |                     F.Z.    |
+=====================+========================+=============================+

donn@hpfcdc.HP.COM (Donn Terry) (11/23/89)

There are actually a bunch of candidate character sets.

ISO646:	7-bit, kinda like ASCII, one country at a time.  Each country that
	uses it has it's own national variant in the "changable" characters.

ISO8859: 8-bit, using 2 96 (or 95, depending on what you do with DEL) planes.
	Suitable for English plus choose 1 of Western Europe
					      Eastern (Latin) Europe
					      Cyrllic
					      Arabic
					      (Others; all "small" phonetic
					      alphabets)
	I don't remember if Eastern Europe includes Turkish or whether it's
	another case.

ISO2022: Lays on top of 646 or 8859 (or others) and defines language shifts.
	Blows away any presumption that length of string in characters ==
	length in bytes == space used in displaying text.

Various Asian national standards for the "Han" ("Chinese") character set
plus national character sets for Japan and Korea.  No unification of
these sets.

ISO10646: 32-bit everything code.  Treats the various Han character sets
	as distinct character sets for each national usage, but unifies the
	Latin characters into a single set.  Variable length coding possible
	to reduce space.  Can degenerate to (something close to) 8859.

UNICODE:  this isn't a standard but is proposed.  Unifies the Han
	character sets in the same way as the Latin ones (but with
	obviously a much bigger payback because of the size).  Fixed
	length 16 bits.  This fixes the length in characters vs.  length
	in bytes issue.  (The issue of length in display space is
	inherently harder because characters do vary in width in natural
	usage in many phonetic alphabets, as well as in the ideographic
	ones.  See Arabic and Hindi where the constant-width usage is
	considered "pretty awful", albeit readable.  (Even in English,
	good typesetting is not constant width.))

CCITT T2xx (I don't have the exact number).  Another player that I just
	recently found out about and don't know anything about in detail.
	This is "teletext", I'm told.

There are certainly more.

Donn Terry
HP Ft. Collins

heimir@rhi.hi.is (Heimir Thor Sverrisson) (11/23/89)

psv@nada.kth.se (Peter Svanberg) writes:

>>People with seven bit terminals can put filters on their news readers
>>so they get something meaningful out of the eight bit charaters.

>As usual, when you change fundamental things like this, you must make
>it as invisible as possible for everybody who hasn't got the equipment
>for or isn't interested in the improvements you can get as a
>consequence of the change. So, those who want the improvements is the
>ones who must make an effort to GET them, not everybody else to AVOID
>them (at least not when "everybody else" is in great majority).

Because of the structure of ISO 8859, the eight-bit characters will
fold into 'printable' seven-bit characters anyhow. If someone does
not change his old system to interpret the eight-bit characters, so what?
He's not interested anyway!

>>We have been using the ISO set here in Iceland for some years now and 
>>I'm very surprised of how far behind the Scandinavian contries are in 
>>this sense, they all seem to be using (their own special version of) 
>>seven bit modified ASCII sets.

>There are a number of problems with converting to use an eight bit
>character set. A large one is that most of the software and hardware
>we use doesn't know anything about it. (Yes, this is slowly changing
>now, but it isn't good yet, and certainly was not several years ago!)

You will be surprised if you really try to use eight bit data :-)
Most systems are at least 'eight-bit transparent', i.e.  they don't
'scrub' the data to seven-bit. Unix systems that I've used that do
better than that are for example HP-UX, IBM's AIX (both RT and PS/2)
all Unix's for Intel 80386 I've tested. The worst experience I've had
recently was with a Sun 4 csh that logs you out if you enter a character
with the eighth bit set! Many software packages now allow eight-bit
data. I was just testing Informix RDBS on this same Sun 4 and found
out that I could really enter eight bit data into forms, what I could
not do two years ago.  We've also got some public domain software that
has been *corrected* to be able to use eight-bit characters such as
mailers, editors and news readers.

>What did you use before? Have you really converted to ISO 8859-1
>everywhere in Iceland? On which operating systems?

We did have a national version of ISO-646 that could not cover all
the accented characters we've got. The Unix systems are generally using
ISO, which is the only official Iclandic standard for eight-bit character
sets. On PC's people are using a national version of the American PC-set
(yuk) and very few have adopted Code Page 850 that came from IBM when
they introduced the PS/2 line. On the IBM-360/370 and 3X and AS400 they
are using some (different) versions of EBCDIC :-(

>Other differences between us and you is that you have more non-ASCII
>characters than we have and that you - being a small isolated country
>- are very caring of your language etc. (For us it's rather the
>opposite on the latter point.)

The first point is certainly true, our alphabet has 36 characters, which
means that we need 20 characters (uc+lc) that are not in ASCII. I would
certainly not tolerate a letter from the authorities that would not have
my name spelled correctly !

>But, as I said, things are changing. I predict some character set
>confusion (of another kind than the current) in Europe in the next few
>years, followed by - comparatively - calm, in perhaps five years.

I don't think it will even take so long. All major hardware manufacturers
have made most of their terminal equipment independent of the character
set by moving functions into software that were previously done in
hardware. The european market is also the fastest growing for many soft-
ware houses and is in many cases already bigger than the US market. If
these people really want to make it over here they can solve many of
their problems by using ONE character set that covers the US, Europe and
South America!
--
Heimir Thor Sverrisson
heimir@rhi.hi.is

psv@nada.kth.se (Peter Svanberg) (11/23/89)

In article <1083@mountn.dec.com> minow@mountn.dec.com (Martin Minow) writes:
>  :
>  :
>However, late in the development of Latin-1, the OE and oe ligature
>characters were removed, and were replaced by the "multiply" and
>"division" signs.  (I will not defend this decision.)
>

Are you stating that the document I have - "International Standard
ISO 8859-1, First edition 1987-02-15" - isn't valid any more? Are there
other changes than the characters you name? (It seems strange to change
a published standard so seriously.)
---
psv@nada.kth.se			(should work!)	       Peter Svanberg
uunet!nada.kth.se!psv		(for lazy nodes...)    Dept of Num An & CS
psv%nada.kth.se@uunet.uu.net	(ARPA nodes)	       Royal Institute of Tech
						       Stockholm, SWEDEN

pedersen@philmtl.philips.ca (Paul Pedersen) (11/23/89)

In article <2942@psivax.UUCP> torkil@psivax.UUCP (Torkil Hammer) writes:
>What I read is that the ESO got botched.  Some national letters
>were overlooked, including the slashed o used in Danish and Norwegian
>for the umlaut o, written o: in other European languages.
>It does not help that the upper case variety of that letter is rather
>close to the slashed zero used in USA to tell it from the letter O.
>Danes are not likely to tolerate the o: as a substitute, and I doubt
>Norwegians are.  WW2 and 1905 and such.
>
>Can anybody confirm?
>
>Torkil Hammer

I've got ISO 8859-1:1987(E) "Latin alphabet No.1" in front of me and see
both character you say are missing :

    pos	    char
    F6	    small o umlaut
    D6	    big o umlaut
    F8	small o slashed
    D8	big o slashed

Did I misunderstand your question ?
Paul

tml@hemuli.atk.vtt.fi (Tor Lillqvist) (11/23/89)

In article <540@ssp11.idca.tds.philips.nl> dolf@idca.tds.PHILIPS.nl (Dolf Grunbauer) writes:
>In article <1083@mountn.dec.com> minow@mountn.dec.com (Martin Minow) writes:
>>Another missing character is the Dutch ij ligature, 
>
>The 'ij' is *not* a special character in the Dutch language. It is only a
>very common sequence of two characters in our language. We have the

Well, as Martin Minow said, it is a _ligature_, which means that it is
perfectly OK to print it as "i" followed by "j", but in quality
typesetting you should use a specially designed character for the
combination.  I don't think it is necessary to include the ij ligature
in Latin-1 or similar character sets.  They don't contain the fi or
ffi ligatures, either (not to mention kerning information).

The Chicago Manual of Style says that ij should be capitalized as IJ
(for example: IJsland).  How well is this adhered to by the Dutch?
-- 
Tor Lillqvist, VTT/ATK

kkim@sparky.UUCP (kyongsok kim) (11/24/89)

In article <9300002@hpfcdc.HP.COM> donn@hpfcdc.HP.COM (Donn Terry) writes:
:ISO2022: Lays on top of 646 or 8859 (or others) and defines language shifts.
:	Blows away any presumption that length of string in characters ==
:	length in bytes == space used in displaying text.

Could you please elaborate "on top of" and "language shifts" (possibly
using examples)?

Thanks in advance.

Kyongsok Kim
Dept. of Comp. Sci., North Dakota State University

e-mail: nukim@plains.nodak.edu; nukim@ndsuvax.bitnet; uunet!ndsuvax!nukim

dolf@idca.tds.PHILIPS.nl (Dolf Grunbauer) (11/24/89)

In article <4318@hemuli.atk.vtt.fi> tml@hemuli.atk.vtt.fi (Tor Lillqvist) writes:
-In article <540@ssp11.idca.tds.philips.nl> dolf@idca.tds.PHILIPS.nl (Dolf Grunbauer) writes:
->In article <1083@mountn.dec.com> minow@mountn.dec.com (Martin Minow) writes:
->>Another missing character is the Dutch ij ligature, 
->The 'ij' is *not* a special character in the Dutch language. It is only a
->very common sequence of two characters in our language. We have the
-Well, as Martin Minow said, it is a _ligature_, which means that it is
-perfectly OK to print it as "i" followed by "j", but in quality
-typesetting you should use a specially designed character for the
-combination.  I don't think it is necessary to include the ij ligature
-in Latin-1 or similar character sets.  They don't contain the fi or
-ffi ligatures, either (not to mention kerning information).
I think I misinterpret the meaning of ligature :-). Quality printing will
most of the time use proportional character width so it will automatically
position the "i" and "j" very close to each other. Maybe even "fi" & "ffi"
will be printed quite nice this way.

-The Chicago Manual of Style says that ij should be capitalized as IJ
-(for example: IJsland).  How well is this adhered to by the Dutch?
Completly, this is the way we do it. The funny thing is that the Germans
do it wrong when they talk about our city IJmuiden or lake IJsselmeer,
as they write: Ijmuiden and Ijsselmeer.
-- 
Dolf Grunbauer          Tel: +31 55 433233  Internet dolf@idca.tds.philips.nl
Philips Telecommunication and Data Systems  UUCP ....!mcvax!philapd!dolf
Dept. SSP, P.O. Box 245, 7300 AE Apeldoorn, The Netherlands
           --> Holland is only 1/6 of the Netherlands <--

rlee@weaver.ads.com (Richard Lee) (11/25/89)

In article <2382@draken.nada.kth.se> psv@nada.kth.se (Peter Svanberg) writes:

  In article <1083@mountn.dec.com> minow@mountn.dec.com (Martin Minow) writes:
  >However, late in the development of Latin-1, the OE and oe ligature
  >characters were removed, and were replaced by the "multiply" and
  >"division" signs.  (I will not defend this decision.)

  Are you stating that the document I have - "International Standard
  ISO 8859-1, First edition 1987-02-15" - isn't valid any more? Are there
  other changes than the characters you name? (It seems strange to change
  a published standard so seriously.)

Now _I'm_ confused!  My copy of that _same_ document (ISO 8859-1 First
Edition 1987-02-15; Reference number ISO 8859-1: 1987 (E)) _does_ have
the multiplication and division signs exactly as Martin described.
Quoting from Table 1, page 4: "13/07  MULTIPLICATION SIGN" and "15/07
DIVISION SIGN".

--
 RICHARD LEE     rlee@ads.com  or  ...!{sri-spam | ames}!zodiac!rlee
 415-960-7300    ADS, 1500 Plymouth St., Mtn. View CA 94043-1230

magnus@rhi.hi.is (Magnus Gislason) (11/25/89)

heimir@rhi.hi.is (Heimir Thor Sverrisson) writes:

[Talking about the Icelandic alphabet]

>The first point is certainly true, our alphabet has 36 characters, which
>means that we need 20 characters (uc+lc) that are not in ASCII. I would

You should know that the Icelandic alphabet does not include C, Q, W and Z,
and thus only contains 32 characters. :-)

einari@rhi.hi.is (Einar Indridason) (11/26/89)

In article <1383@krafla.rhi.hi.is> magnus@rhi.hi.is (Magnus Gislason) writes:
>heimir@rhi.hi.is (Heimir Thor Sverrisson) writes:
>
>[Talking about the Icelandic alphabet]
>
>>The first point is certainly true, our alphabet has 36 characters, which
>>means that we need 20 characters (uc+lc) that are not in ASCII. I would
>
>You should know that the Icelandic alphabet does not include C, Q, W and Z,
>and thus only contains 32 characters. :-)

I will most definitely not write 'pizza' as 'pissa' :-)

(Besides 'pissa' has another meaning in icelandic as well)

But I'm really pissed off (no 'pizza' here :-) about 'americaned' software which
does not allow us here in Iceland to use our full national character set.
For example, DBase-III does not allow the big 'thorn', but instead considers 
that as a end-of-file.  Meaning that whatever comes after the big thorn is
ignored.

Some editors choke or perform some unwanted commands, whenever the special
icelandic characters are used, like 'kill-file', 'save-and-quit' and other 
nasties like that.

If there are any software-writers out there, please consider us Icelanders
(and other), that must use 8-bit character set.

While you are doing that, could you consider adding some 'sorting tables' so
that we can sort our applications in the icelandic way. ????????????????????

-- 
To quote Alfred E. Neuman: "What! Me worry????"

Internet:	einari@rhi.hi.is
UUCP:		..!mcvax!hafro!rhi!einari

sommar@enea.se (Erland Sommarskog) (11/26/89)

Martin Minow (minow@mountn.UUCP) writes:
>The standardization bodies realized in the early 1980's that ISO-646
>was not a satisfactory solution, and built on the Dec Multinational
>character set to form ISO Latin-1,

I'm a little surprised by this. My impression is that it was the
other way round. Digital took the drafts of Latin-1 and made
DEC Multinational. But I have no sources that confirms that.
-- 
Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se

stefan@svax.cs.cornell.edu (Kjartan Stefansson) (11/26/89)

In article <1386@krafla.rhi.hi.is> einari@rhi.hi.is (Einar Indridason) writes:
>In article <1383@krafla.rhi.hi.is> magnus@rhi.hi.is (Magnus Gislason) writes:
>>heimir@rhi.hi.is (Heimir Thor Sverrisson) writes:
>>
>>[Talking about the Icelandic alphabet]
>>
>>>The first point is certainly true, our alphabet has 36 characters, which
>>>means that we need 20 characters (uc+lc) that are not in ASCII. I would
>>
>>You should know that the Icelandic alphabet does not include C, Q, W and Z,
>>and thus only contains 32 characters. :-)

We can argue about this, but the main point is of course, that for
every practical purposes, Icelanders need to deal with those 36
characters.  For instance, every character you mention, appears in
the phone directory  -- names of Icelandic people.   (although the
roots of their names are typically foreign, or poor foreign imitation :-)

>But I'm really pissed off (no 'pizza' here :-) about 'americaned' software which
>does not allow us here in Iceland to use our full national character set.
...[examples deleted]
>If there are any software-writers out there, please consider us Icelanders
>(and other), that must use 8-bit character set.

Reminds me of this fantastic software called X11.  They have several
nice fonts, including the full ISO-8859-1 standards.  But typically
applications strip the most significant bit in the data, so they can
only display the English set :-(

Of course there is always a way to go around it, and I know Icelanders
have managed to hack their way through, in several cases.  But that
simply illustrates how stupid the design was, not to make this an
option in the first place.

Kjartan.

minow@mountn.dec.com (Martin Minow) (11/27/89)

In article <500@enea.se> sommar@enea.se (Erland Sommarskog) notes
that, in contrast to what I had written, Dec actually took an
early draft of Latin-1 when it came time to produce the first products
using Multinational.

My apologies for a poorly thought-out posting.

Martin.
minow@thundr.enet.dec.com

matsc@sics.se (Mats Carlsson) (11/27/89)

In article <1383@krafla.rhi.hi.is> magnus@rhi.hi.is (Magnus Gislason) writes:
   You should know that the Icelandic alphabet does not include C, Q, W and Z,
   and thus only contains 32 characters. :-)

Really?  Wasn't it quite recently that a spelling reform said words
like "yzt" should be spelled with an s instead of a z, reverting an
earlier law which banned writing s instead of z?  Didn't Halldor
Laxness even spend some time in prison for this "crime"?
--
Mats Carlsson
SICS, PO Box 1263, S-164 28  KISTA, Sweden    Internet: matsc@sics.se
Tel: +46 8 7521543      Ttx: 812 61 54 SICS S      Fax: +46 8 7517230

psv@nada.kth.se (Peter Svanberg) (11/27/89)

I wrote:
>
>  Are you stating that the document I have - "International Standard
>  ISO 8859-1, First edition 1987-02-15" - isn't valid any more?
>
rlee@weaver.ads.com (Richard Lee) answered:
>  Now _I'm_ confused!  My copy of that _same_ document (ISO 8859-1 First
>  Edition 1987-02-15; Reference number ISO 8859-1: 1987 (E)) _does_ have
>  the multiplication and division signs exactly as Martin described.
>  Quoting from Table 1, page 4: "13/07  MULTIPLICATION SIGN" and "15/07
>  DIVISION SIGN".

Sorry, I intermixed it with the discussion about slashed O.

So as it's a ligature, the same things apply as in the discussion
about the ij ligature, I suppose.
---
psv@nada.kth.se					       Peter Svanberg
uunet!nada.kth.se!psv		(for lazy nodes...)    Dept of Num An & CS
psv%nada.kth.se@uunet.uu.net	(ARPA nodes)	       Royal Institute of Tech
						       Stockholm, SWEDEN

stefan@svax.cs.cornell.edu (Kjartan Stefansson) (11/27/89)

In article <MATSC.89Nov27092541@vishnu.sics.se> matsc@sics.se (Mats Carlsson) writes:
>In article <1383@krafla.rhi.hi.is> magnus@rhi.hi.is (Magnus Gislason) writes:
>   You should know that the Icelandic alphabet does not include C, Q, W and Z,
>   and thus only contains 32 characters. :-)
>
>Really?  Wasn't it quite recently that a spelling reform said words
>like "yzt" should be spelled with an s instead of a z, reverting an
>earlier law which banned writing s instead of z?

Yes, this is correct.  'z' used to be perfectly valid Icelandic
letter.  But it is pronounced as 's' in modern Icelandic.  The only
way to distinguish between 's' and 'z' in spelling, was to know the
root of the word.  Few years ago, a spelling reform was made, to
replace the 'z' by a 's'.

>  Didn't Halldor
>Laxness even spend some time in prison for this "crime"?

Halldor Laxness has been known for his style of spelling, which in
general is closer to the spoken language than the official spelling.
In his early work he was criticized a lot for this, but I don't
believe he was ever imprisoned for it!

Kjartan.

eru@tnvsu1.tele.nokia.fi (Erkki Ruohtula) (11/28/89)

I have long wondered why the ISO 8-bit character set introduces 32 more
control characters, while the ANSI system of terminal controls demonstrates
that we could in principle get along with just one control character. Using
these 32 positions for printable characters would have made possible
a single set for all (or nearly all) languages that use a latin-derived
alphabet.
Erkki Ruohtula     / Nokia Telecommunications             !
eru@tele.nokia.fi /  P.O. Box 33 SF-02601 Espoo, Finland  !
Huomautus : Esitt{m{ni mielipiteet ovat vain omiani.      !
Disclaimer: The opinions I have presented are just my own.!