[comp.std.unix] Report on WG15 Rapporteur Group

randall@uvaarpa.virginia.edu (Randall Atkinson) (03/15/90)

From: randall@uvaarpa.virginia.edu (Randall Atkinson)

As one who is fairly active in the multilingual computing
side of things, I'm fairly certain that it just isn't worth
it to try to make ISO 646 the basis of *anything* for the
practical reason that it wasn't well thought out to begin with
and has already been superceded by the ISO 8859/* family of
8-bit character sets.

The latter fully support European linguistic needs (yes, including
Danish and Icelandic and ...) and can be used quite nicely with
most UNIX shells that I'm familiar with.

I thought that trigraphs got excessive attention back when ANSI C
was being developed and I fear that excessive attention will be
devoted to ISO 646 when there are other areas of internationalisation
that really deserve being thought about and solved cleanly.

Most of the vendors of hardware in Europe are supporting ISO 8859/1
now, so it is the real long term solution to European needs anyway.
Worrying about support for ISO 646 is a mistake, worrying about
supporting ISO 8859/* and the Asian need for larger character sets 
being fully supported and ways of handling date formats and such
aren't a mistake at all.

Volume-Number: Volume 18, Number 73

marius@rhi.hi.is (Marius Olafsson) (03/17/90)

From: marius@rhi.hi.is (Marius Olafsson)

randall@uvaarpa.virginia.edu (Randall Atkinson) writes:

>                I'm fairly certain that it just isn't worth
>it to try to make ISO 646 the basis of *anything* for the
>practical reason that it wasn't well thought out to begin with
>and has already been superceded by the ISO 8859/* family of
>8-bit character sets.

I agree. The ISO 8859 series of charactersets have the (in my opinion
neccessary) quality that the *complete* set of ASCII characters can be 
represented. If ISO 646 will be taken into consideration must we then
allow alternate syntax in the varius shells and utilites that make 
use of the characters {}[]@\| and ` - I think that is a can of worms
best left unopened.

>The latter fully support European linguistic needs (yes, including
>Danish and Icelandic and ...) and can be used quite nicely with
>most UNIX shells that I'm familiar with.

And it seems that most major manufacturers already have (or have announced)
support for ISO 8859 - at least HP-UX, Ultrix, AIX, SunOS and
more I am sure. The X window system now supports ISO 8859 fonts, the
latest Adobe rel of Postscripts support ISO 8859 encoding of the fonts,
and the list goes on ... NONE provide any support for or consideration
for ISO 646 (fortunately).


>                        I fear that excessive attention will be
>devoted to ISO 646 when there are other areas of internationalisation
>that really deserve being thought about and solved cleanly.

Definately, and serious consideration should be given to the way X/Open
has defined some of these other areas. That system actually works pretty
well in practice.  It has been used here for about two years (on HP-UX).

--
Marius Olafsson 		internet: marius@rhi.hi.is
University of Iceland		UUCP:     {mcsun,sunic,uunet}!isgate!rhi!marius


Volume-Number: Volume 18, Number 77

wheeler@ida.org (David Wheeler) (03/17/90)

From: wheeler@ida.org (David Wheeler)

domo@tsa.co.uk (Dominic Dunlop):
= From: Dominic Dunlop <domo@tsa.co.uk>
= 
= 	   Report on ISO/IEEE JTC1/SC22/WG15 Rapporteur Group on
= 	         Internationalization Meeting of 5th - 7th
= 	              March, 1990, Copenhagen, Denmark
= 
= 	            Dominic Dunlop   --  domo@tsa.co.uk
= 
= 	                  The Standard Answer Ltd.
= 

I enjoyed your posting, thank you!  You included a lot of "what this
phrase really means" that I appreciated.

= 
= 	 3. ISO 646[4], the earliest ISO standard for information
= 	    technology, is the international derivative of ASCII.
= 	    Its Danish variant replaces ASCII's } with aa.  Around
= 	    the world, #$@[\]^`{|}~, all of which have a special
= 	    meaning to the shell, are replaced by other characters
= 	    in standards derived from ISO 646.  See [5] for much
= 	    more information.
= 

Isn't there an 8-bit standard character set that defines the first 128
characters as a standard set (say as USASCII, provincial I'm afraid but it
would break no Unix tools), then includes all the international
characters as those with values > 127?   If this were used in the POSIX
standard, wouldn't this solve many problems for those using a
Latin-based alphabet? Or is this standard unused in the real world?
Admittedly this eliminates the non-Latin alphabet world, and that
is a weakness.

= 	Apart from all this organizational stuff, we did review some
= 	existing documents.  For example, DTR (draft technical
= 	report) 10176, a product of SC14, discusses the treatment of
= 	characters appearing in language constructs, variable names,
= 	literals and comments, and turns out to have implications
= 	for sh, awk, yacc and the other ``little languages'' defined
= 	in DP 9945-2, the forthcoming international standard for the
= 	shell and tools.  And a document from SC22's study group on
= 	character sets suggests that source files should have some
= 	means of announcing the character set that they're using.
= 	Could this mean typed files or resource forks for POSIX6?
= 	Gee.  How would we hide that?
= 

Some C programs would have to be fixed to deal with signed characters
but at least the rules would be simple: 128+ are ordinary characters &
can be used in identifiers, etc.

Source file tagging for language sounds like an abomination!

--- David A. Wheeler
    wheeler@ida.org


Volume-Number: Volume 18, Number 80