[soc.culture.nordic] ASCII for national characters

sommar@enea.se (Erland Sommarskog) (11/19/89)

(This is hardly news for comp.std.internat readers, but the
subject belongs to that group.)

Salmela Jarmo (js@kaarne.tut.fi) writes:
>PS. The ASCII standard that supports national characters is really
>needed.

Well, ASCII supports all national characters it can think of.
I.e, American.

But, seriously it exists. The standard you want is ISO 8859,
which is a family of eight-bit standards, all with good all
ASCII in the 0-127 slots, new control characters in 128-159,
non-break space in 160 and "soft hyphen" in ord('-') + 128.
Then the rest is different in the various standards, which
are five standards with Latin characters, and one each with
Kyrillic, Arabian, Hebrew and Greek characters. I don't if
all of them are settled, but at least Latin-1 and Latin-2 are.

One can predict that for the next few years Latin-1 will be the
most important since it covers all major Western European languages
except Welsh and Catalan I think. Latin-2 covers Eastern European
languages.

Then of course there is problem to start posting Usenet articles
from your VT320 using Latin-1. People with seven-bit terminals,
of which there probably are a few, will get the new characters
folded into old making your text quite incomprehensible, even
worse than those brackets and braces you get using the national
seven-bit conventions for dotted "a":s and "o":s.
-- 
Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se

heimir@rhi.hi.is (Heimir Thor Sverrisson) (11/20/89)

sommar@enea.se (Erland Sommarskog) writes:

... deleted description of the eight bit character set standard, ISO 8859
(especially ISO 8859/1 or Latin-1).

>Then of course there is problem to start posting Usenet articles
>from your VT320 using Latin-1. People with seven-bit terminals,
>of which there probably are a few, will get the new characters
>folded into old making your text quite incomprehensible, even
>worse than those brackets and braces you get using the national
>seven-bit conventions for dotted "a":s and "o":s.

People with seven bit terminals can put filters on their news readers
so they get something meaningful out of the eight bit charaters. They
could for example translate the upper case icelandic thorn into 'Th'
and 'o accute' into 'o'. Then I would be able to use my middle name 
SPELLED CORRECTLY in my signature. I could also send you direct mail 
in Danish and you could answer me in Swedish. 

We have been using the ISO set here in Iceland for some years now and 
I'm very surprised of how far behind the Scandinavian contries are in 
this sense, they all seem to be using (their own special version of) 
seven bit modified ASCII sets.
--
Heimir Thor Sverrisson
heimir@rhi.hi.is

psv@nada.kth.se (Peter Svanberg) (11/21/89)

In article <1353@krafla.rhi.hi.is>
heimir@rhi.hi.is (Heimir Thor Sverrisson) writes:
>
>People with seven bit terminals can put filters on their news readers
>so they get something meaningful out of the eight bit charaters. They
>could for example translate the upper case icelandic thorn into 'Th'
>and 'o accute' into 'o'. Then I would be able to use my middle name 
>SPELLED CORRECTLY in my signature. I could also send you direct mail 
>in Danish and you could answer me in Swedish. 
>

As usual, when you change fundamental things like this, you must make
it as invisible as possible for everybody who hasn't got the equipment
for or isn't interested in the improvements you can get as a
consequence of the change. So, those who want the improvements is the
ones who must make an effort to GET them, not everybody else to AVOID
them (at least not when "everybody else" is in great majority).

>We have been using the ISO set here in Iceland for some years now and 
>I'm very surprised of how far behind the Scandinavian contries are in 
>this sense, they all seem to be using (their own special version of) 
>seven bit modified ASCII sets.

There are a number of problems with converting to use an eight bit
character set. A large one is that most of the software and hardware
we use doesn't know anything about it. (Yes, this is slowly changing
now, but it isn't good yet, and certainly was not several years ago!)

What did you use before? Have you really converted to ISO 8859-1
everywhere in Iceland? On which operating systems?

Other differences between us and you is that you have more non-ASCII
characters than we have and that you - being a small isolated country
- are very caring of your language etc. (For us it's rather the
opposite on the latter point.)

But, as I said, things are changing. I predict some character set
confusion (of another kind than the current) in Europe in the next few
years, followed by - comparatively - calm, in perhaps five years.
---
psv@nada.kth.se			(should work!)	       Peter Svanberg
uunet!nada.kth.se!psv		(for lazy nodes...)    Dept of Num An & CS
psv%nada.kth.se@uunet.uu.net	(ARPA nodes)	       Royal Institute of Tech
						       Stockholm, SWEDEN

finn@mojo.UUCP (Finn Markmanrud) (11/22/89)

Please be kind to us poor beginners! I have no ideas on how to convert ^ to Th
or anything similar. Being the only Norwegian in the company (I think), I am
pretty sure I cannot get a request through to include this on our system. Some
day I might be able to make my own conversion in my own directory, but until
then, I would appreciate being able to read mail & news from my Scandinavian
friends. Most of them use oe, ae, and aa as substitutes, and it works very 
well. We use 7-bits, and from what I hear, this is no longer any good. Am I
about to loose touch with my old country / continent? 
Maybe it's not as bad as it sounds, but I thought I'd remind all you whiz's out
there that there are a few people who call themselves "users," and do just 
that - use the facilities provided. Please be gentle!

                                     


-- 
+=====================+========================+=============================+
|   Finn Markmanrud   |   finn@mojo.nec.com    |   "It can't happen here."   | 
|   (508) 264 8668    |      Boxboro, MA       |                     F.Z.    |
+=====================+========================+=============================+

heimir@rhi.hi.is (Heimir Thor Sverrisson) (11/23/89)

psv@nada.kth.se (Peter Svanberg) writes:

>>People with seven bit terminals can put filters on their news readers
>>so they get something meaningful out of the eight bit charaters.

>As usual, when you change fundamental things like this, you must make
>it as invisible as possible for everybody who hasn't got the equipment
>for or isn't interested in the improvements you can get as a
>consequence of the change. So, those who want the improvements is the
>ones who must make an effort to GET them, not everybody else to AVOID
>them (at least not when "everybody else" is in great majority).

Because of the structure of ISO 8859, the eight-bit characters will
fold into 'printable' seven-bit characters anyhow. If someone does
not change his old system to interpret the eight-bit characters, so what?
He's not interested anyway!

>>We have been using the ISO set here in Iceland for some years now and 
>>I'm very surprised of how far behind the Scandinavian contries are in 
>>this sense, they all seem to be using (their own special version of) 
>>seven bit modified ASCII sets.

>There are a number of problems with converting to use an eight bit
>character set. A large one is that most of the software and hardware
>we use doesn't know anything about it. (Yes, this is slowly changing
>now, but it isn't good yet, and certainly was not several years ago!)

You will be surprised if you really try to use eight bit data :-)
Most systems are at least 'eight-bit transparent', i.e.  they don't
'scrub' the data to seven-bit. Unix systems that I've used that do
better than that are for example HP-UX, IBM's AIX (both RT and PS/2)
all Unix's for Intel 80386 I've tested. The worst experience I've had
recently was with a Sun 4 csh that logs you out if you enter a character
with the eighth bit set! Many software packages now allow eight-bit
data. I was just testing Informix RDBS on this same Sun 4 and found
out that I could really enter eight bit data into forms, what I could
not do two years ago.  We've also got some public domain software that
has been *corrected* to be able to use eight-bit characters such as
mailers, editors and news readers.

>What did you use before? Have you really converted to ISO 8859-1
>everywhere in Iceland? On which operating systems?

We did have a national version of ISO-646 that could not cover all
the accented characters we've got. The Unix systems are generally using
ISO, which is the only official Iclandic standard for eight-bit character
sets. On PC's people are using a national version of the American PC-set
(yuk) and very few have adopted Code Page 850 that came from IBM when
they introduced the PS/2 line. On the IBM-360/370 and 3X and AS400 they
are using some (different) versions of EBCDIC :-(

>Other differences between us and you is that you have more non-ASCII
>characters than we have and that you - being a small isolated country
>- are very caring of your language etc. (For us it's rather the
>opposite on the latter point.)

The first point is certainly true, our alphabet has 36 characters, which
means that we need 20 characters (uc+lc) that are not in ASCII. I would
certainly not tolerate a letter from the authorities that would not have
my name spelled correctly !

>But, as I said, things are changing. I predict some character set
>confusion (of another kind than the current) in Europe in the next few
>years, followed by - comparatively - calm, in perhaps five years.

I don't think it will even take so long. All major hardware manufacturers
have made most of their terminal equipment independent of the character
set by moving functions into software that were previously done in
hardware. The european market is also the fastest growing for many soft-
ware houses and is in many cases already bigger than the US market. If
these people really want to make it over here they can solve many of
their problems by using ONE character set that covers the US, Europe and
South America!
--
Heimir Thor Sverrisson
heimir@rhi.hi.is

magnus@rhi.hi.is (Magnus Gislason) (11/25/89)

heimir@rhi.hi.is (Heimir Thor Sverrisson) writes:

[Talking about the Icelandic alphabet]

>The first point is certainly true, our alphabet has 36 characters, which
>means that we need 20 characters (uc+lc) that are not in ASCII. I would

You should know that the Icelandic alphabet does not include C, Q, W and Z,
and thus only contains 32 characters. :-)

einari@rhi.hi.is (Einar Indridason) (11/26/89)

In article <1383@krafla.rhi.hi.is> magnus@rhi.hi.is (Magnus Gislason) writes:
>heimir@rhi.hi.is (Heimir Thor Sverrisson) writes:
>
>[Talking about the Icelandic alphabet]
>
>>The first point is certainly true, our alphabet has 36 characters, which
>>means that we need 20 characters (uc+lc) that are not in ASCII. I would
>
>You should know that the Icelandic alphabet does not include C, Q, W and Z,
>and thus only contains 32 characters. :-)

I will most definitely not write 'pizza' as 'pissa' :-)

(Besides 'pissa' has another meaning in icelandic as well)

But I'm really pissed off (no 'pizza' here :-) about 'americaned' software which
does not allow us here in Iceland to use our full national character set.
For example, DBase-III does not allow the big 'thorn', but instead considers 
that as a end-of-file.  Meaning that whatever comes after the big thorn is
ignored.

Some editors choke or perform some unwanted commands, whenever the special
icelandic characters are used, like 'kill-file', 'save-and-quit' and other 
nasties like that.

If there are any software-writers out there, please consider us Icelanders
(and other), that must use 8-bit character set.

While you are doing that, could you consider adding some 'sorting tables' so
that we can sort our applications in the icelandic way. ????????????????????

-- 
To quote Alfred E. Neuman: "What! Me worry????"

Internet:	einari@rhi.hi.is
UUCP:		..!mcvax!hafro!rhi!einari

stefan@svax.cs.cornell.edu (Kjartan Stefansson) (11/26/89)

In article <1386@krafla.rhi.hi.is> einari@rhi.hi.is (Einar Indridason) writes:
>In article <1383@krafla.rhi.hi.is> magnus@rhi.hi.is (Magnus Gislason) writes:
>>heimir@rhi.hi.is (Heimir Thor Sverrisson) writes:
>>
>>[Talking about the Icelandic alphabet]
>>
>>>The first point is certainly true, our alphabet has 36 characters, which
>>>means that we need 20 characters (uc+lc) that are not in ASCII. I would
>>
>>You should know that the Icelandic alphabet does not include C, Q, W and Z,
>>and thus only contains 32 characters. :-)

We can argue about this, but the main point is of course, that for
every practical purposes, Icelanders need to deal with those 36
characters.  For instance, every character you mention, appears in
the phone directory  -- names of Icelandic people.   (although the
roots of their names are typically foreign, or poor foreign imitation :-)

>But I'm really pissed off (no 'pizza' here :-) about 'americaned' software which
>does not allow us here in Iceland to use our full national character set.
...[examples deleted]
>If there are any software-writers out there, please consider us Icelanders
>(and other), that must use 8-bit character set.

Reminds me of this fantastic software called X11.  They have several
nice fonts, including the full ISO-8859-1 standards.  But typically
applications strip the most significant bit in the data, so they can
only display the English set :-(

Of course there is always a way to go around it, and I know Icelanders
have managed to hack their way through, in several cases.  But that
simply illustrates how stupid the design was, not to make this an
option in the first place.

Kjartan.

matsc@sics.se (Mats Carlsson) (11/27/89)

In article <1383@krafla.rhi.hi.is> magnus@rhi.hi.is (Magnus Gislason) writes:
   You should know that the Icelandic alphabet does not include C, Q, W and Z,
   and thus only contains 32 characters. :-)

Really?  Wasn't it quite recently that a spelling reform said words
like "yzt" should be spelled with an s instead of a z, reverting an
earlier law which banned writing s instead of z?  Didn't Halldor
Laxness even spend some time in prison for this "crime"?
--
Mats Carlsson
SICS, PO Box 1263, S-164 28  KISTA, Sweden    Internet: matsc@sics.se
Tel: +46 8 7521543      Ttx: 812 61 54 SICS S      Fax: +46 8 7517230

stefan@svax.cs.cornell.edu (Kjartan Stefansson) (11/27/89)

In article <MATSC.89Nov27092541@vishnu.sics.se> matsc@sics.se (Mats Carlsson) writes:
>In article <1383@krafla.rhi.hi.is> magnus@rhi.hi.is (Magnus Gislason) writes:
>   You should know that the Icelandic alphabet does not include C, Q, W and Z,
>   and thus only contains 32 characters. :-)
>
>Really?  Wasn't it quite recently that a spelling reform said words
>like "yzt" should be spelled with an s instead of a z, reverting an
>earlier law which banned writing s instead of z?

Yes, this is correct.  'z' used to be perfectly valid Icelandic
letter.  But it is pronounced as 's' in modern Icelandic.  The only
way to distinguish between 's' and 'z' in spelling, was to know the
root of the word.  Few years ago, a spelling reform was made, to
replace the 'z' by a 's'.

>  Didn't Halldor
>Laxness even spend some time in prison for this "crime"?

Halldor Laxness has been known for his style of spelling, which in
general is closer to the spoken language than the official spelling.
In his early work he was criticized a lot for this, but I don't
believe he was ever imprisoned for it!

Kjartan.