[comp.std.unix] 8859 vs. 646

std-unix@longway.TIC.COM (Moderator, John S. Quarterman) (03/20/90)

From: Donn Terry <uunet!hpfcrn.fc.hp.com!donn>

>From: randall@uvaarpa.virginia.edu (Randall Atkinson)

>As one who is fairly active in the multilingual computing
>side of things, I'm fairly certain that it just isn't worth
>it to try to make ISO 646 the basis of *anything* for the
>practical reason that it wasn't well thought out to begin with
>and has already been superceded by the ISO 8859/* family of
>8-bit character sets.

Agreed.  I believe that the Danes and other Europeans will agree, too.

...

>I thought that trigraphs got excessive attention back when ANSI C
>was being developed and I fear that excessive attention will be
>devoted to ISO 646 when there are other areas of internationalisation
>that really deserve being thought about and solved cleanly.

Yup.... but it's also a real problem.

>Most of the vendors of hardware in Europe are supporting ISO 8859/1
>now, so it is the real long term solution to European needs anyway.
>Worrying about support for ISO 646 is a mistake, worrying about
>supporting ISO 8859/* and the Asian need for larger character sets 
>being fully supported and ways of handling date formats and such
>aren't a mistake at all.

The problem is that reality impinges on the ideal world.  In particular
there are LOTS of 646 terminals out there.  And, as the European
participants note, they aren't going to get replaced with 8859 ones
for on the order of 10 years.  (646 also is still a lowest common
denominator: as I understand it, sendmail can't handle 8-bit (if
I'm wrong, I apologize, but you get my point)).

Thus, there is a real problem to be solved here.  I personally lean toward
some sort of many-to-one and one-to-many translation at the terminal
interface, but that doesn't always appear successful.  Add to it the
problem of not knowing whether the user is an expert or not.  (The
expert can handle | being slashed-o, but the ordinary terminal operator
probably can't.)

Donn Terry
(No position is official, but as U.S. Rapporteur for SC22/WG15/IRG I'm
at least plugged in.)

Volume-Number: Volume 19, Number 10

randall@uvaarpa.Virginia.EDU (Randall Atkinson) (03/21/90)

From: randall@uvaarpa.Virginia.EDU (Randall Atkinson)

I understand that there are many sites that currently have
terminals supporting ISO 646, but by the same token, there
are a lot more terminals that support US ASCII and a lot of
other terminals out there that are vaguely derived from US ASCII
in a variety of incompatible ways.  My understanding is that
ISO 646 isn't a subset of all of the common 7-bit roman
character sets in use.  If that is indeed a correct understanding,
then the ISO 646 effort isn't going to provide a general solution
anyway.

These problems don't have a good general solution because of the
many conflicting extensions/modifications of what was ASCII.
Japanese and Chinese extensions are also a problem in this regard.

My own position is that the standard should not attempt to address
the ISO 646 problem but instead make the "work arounds" (which is
the best way to describe what I hear proposed) implementation 
defined as being outside the scope of the standard.

The standard should use ISO 8859 as the base standard.

Volume-Number: Volume 19, Number 15

keld@diku.dk (Keld J|rn Simonsen) (03/23/90)

From: keld@diku.dk (Keld J|rn Simonsen)

I confess: I was the Dane attending the ISO POSIX Internationalization
meeting in Copenhagen. Yes, we attracted the attention to ISO 646
based non-ASCII equipment - which there are general guidelines
within ISO to work with.

I do share the other posters' concern about supporting 8-bit
and multibyte character sets, and bringing support to this
is more important to us (Danish Standards) than the 7-bit issue.

On the other hand, there is a lot of hardware, including terminals
and printers, which only supports national variants
of ISO 646. And that equipment will be around for a long time.

For Americans: try to imagine that all your 7-bit ASCII equipment
was not usable for running UNIX or C. It lacked some say 6 to 10
essential characters. How long would it take before you only
would have 8-bit equipment and software running?
Well, this is the situation we have in quite some parts of Europe.

ISO has rules for dealing with this. I think it would be worth
it to try out the ISO recommendations on a software
platform as important to the whole society as POSIX is.

Keld Simonsen

Volume-Number: Volume 19, Number 21

ruediger@ramz.uucp (Ruediger Helsch) (04/06/90)

From: uunet!relay.EU.net!ramz!ruediger (Ruediger Helsch)

In article <579@longway.TIC.COM> std-unix@uunet.uu.net writes:
>From: Donn Terry <uunet!hpfcrn.fc.hp.com!donn>
>The problem is that reality impinges on the ideal world.  In particular
>there are LOTS of 646 terminals out there.  And, as the European
>participants note, they aren't going to get replaced with 8859 ones
>for on the order of 10 years.  (646 also is still a lowest common
>denominator: as I understand it, sendmail can't handle 8-bit (if
>I'm wrong, I apologize, but you get my point)).

IMHO that's just not true any more. A great part of the common terminals in
germany are of the VT220 style, and though they are not 8859 compatible,
they are close enough for many purposes. 8859 and DEC multinational character
set differ mainly in the special characters section. For german letters there
is no difference between the two, same for most european letters. When we
are looking for terminals, we don't consider those 7 bit oldies.
For PCs under some Unix variants you can map characters on output to the
screen. E. g. under Xenix we work with 8859 internally and map them to the
IBM-PC character set on output. Works great!

More difficult is input of national characters. Most german keyboards miss
those braces and brackets that UNIX and C depend on, so we prefer using an
american keyboard and need the ALT-key to input national letters. We would
certainly prefer to buy keyboards with four additional keys if they existed.

Most problems stem from uncooperative software: Ultrix shell and C shell are
mot 8 bit clean, many communications programs mask the eighth bit, and standard
TeX does't allow for input of eight bit characters (our patched version does).
Hands up for System V, they are miles ahead of BSD in respect to 8 bit
handling.

Volume-Number: Volume 19, Number 58