[comp.std.internat] International Language Support

rja@edison.GE.COM (rja) (03/18/88)

In recent postings in comp.unix.wizards, International language support
has been mentioned.  Dave Decot (spelling ?) @ HP indicated that there is
on-going standards development in these areas.  I'd really like to find
out more about what is going on.  Would folks involved with this care to
comment ??
  My particular area of interest is Asian Language Support.  I'm aware
of the JIS C6220 and JIS C6226 for Kana & Kanji support.  There was some
talk a few years back of developing an ISO standard for Kanji/Chinese Han Zi.
Has anything happened ? What coding standards do exist for characters other
than the JIS standards mentioned above ? 
  I'd also be interested in the European support, particularly X/OPEN 
standards that exist or are in development.  In fact, it would be nice
if X/OPEN could post (quarterly perhaps) a summary of its standards 
development.  This newsgroup seems under-utilised as it is.

  rja@edison.GE.COM         {preferred}
  uunet!virginia!edison!rja {if you must}

Any comments above are the author's and should NOT be construed as comments
of GE-Fanuc, GE, or Fanuc.

watanabe@hpycla.HP.COM (Sotoji Watanabe) (03/25/88)

>  My particular area of interest is Asian Language Support.  I'm aware
>of the JIS C6220 and JIS C6226 for Kana & Kanji support.  There was some
>talk a few years back of developing an ISO standard for Kanji/Chinese Han Zi.
>Has anything happened ? What coding standards do exist for characters other
>than the JIS standards mentioned above ? 

	For Japanese Katakana and Kanji: JIS X0201-1976 (former C6220) 
					 and JIS X0208-1983 (former C6226)
	For Chinese used in Mainland China: GB 5007.1 and 5007.2-1985
	For Korean Hanja: KS C5601-1987

	are existing as far as I know of.

Sotoji Watanabe

dml@rabbit1.UUCP (David Langdon) (03/29/88)

I am posting this for a friend who does not normally have net
access. Please email responses to me directly.

*********************
I am interested in standards used by printers, and
terminals for representing international characters.
This is of interest both in the IBM 3270 world and in general.

Particular items are: accentuated character handling,
dead key handling, 7 or 8 bit representation, International
EBCDIC and ASCII tables/standards.
**********************

Send responses to UUCP address listed below. Thanx .. David Langdon

David Langdon    Rabbit Software Corp.
(215) 647-0440   7 Great Valley Parkway East  Malvern PA 19355

...!ihnp4!{cbmvax,cuuxb}!hutch!dml        ...!psuvax1!burdvax!hutch!dml

-- 
David Langdon    Rabbit Software Corp.
(215) 647-0440   7 Great Valley Parkway East  Malvern PA 19355

...!ihnp4!{cbmvax,cuuxb}!hutch!dml        ...!psuvax1!burdvax!hutch!dml

irf@kuling.UUCP (Bo Thide) (04/03/88)

In article <1399@edison.GE.COM> rja@edison.GE.COM (rja) writes:
>In recent postings in comp.unix.wizards, International language support
>has been mentioned.  Dave Decot (spelling ?) @ HP indicated that there is
>on-going standards development in these areas.  I'd really like to find
>out more about what is going on.  Would folks involved with this care to
>comment ??
>  My particular area of interest is Asian Language Support.  I'm aware

   [deleted ...]

>  I'd also be interested in the European support, particularly X/OPEN 
>standards that exist or are in development.  In fact, it would be nice
>if X/OPEN could post (quarterly perhaps) a summary of its standards 
>development.  This newsgroup seems under-utilised as it is.

NLS is a great idea!  To give you some background, I take the liberty to quote
from the EUUG Newsletter Vol7 No2 article "An Overview of the Native Language
System" by Michael J. C. Terry (mcjt@inset.co.uk):

"In January this year [1987 -bt], the X/OPEN group published the second edition
of its X/OPEN Portability Guide (XPG).  Section 3 of the guide included a
software internationalisation interface standard specification -- the Native
Language System (NLS).  Although many propietary solutions to the
internationalisation problem have been attempted over the years, this is the
first time that a commercial standard has been specified for
internationalisation on UNIX (R) [or should it be (TM)? -bt] systems.

The X/OPEN NLS standard specification has arrived as a response to a pressure
that has been growing slowly but relentlessly from non-English-speaking UNIX
users as use of the system has filtered down from the ivory towers of
academe to the air-conditioned offices of modern commerce.  It is not
surprising that this internationalisation specification has emerged from
the X/OPEN group rather than from AT&T -- after all, despite the recent
addition of American companies to the X/OPEN roll call, X/OPEN started out 
as a purely European grouping, and is still predominantly European.  What
is perhaps surprising is that the NLS specification is based on an
internationalisation architecture developed in the USA by Hewlett-Packard.

...

Hewlett-Packard have a working version of NLS on their HP-UX opreating
system.  The source code has been made available to the other members of
X/OPEN in order to expedite its implementation on currently available
versions of UNIX.

...

The eventual intention is that NLS will support multiple 8-bit character
sets.  The XPG states:

   This first issue of the X/OPEN NLS specification defines the major
   transmission codeset for Western European use as the standard
   IS8859/1, and also recommends its use as the corresponding internal
   codeset.  Other codesets will be identified in later issues.

The IS8859/1 codeset is capable of supporting most major Western European
languages.  In addition, it is compatible with ASCII functionality,
since it incorporates the ASCII codeset as the first 128 characters of
the codeset"

To describe the agrred-upon standard codeset I quote from "International
Standard. Information processing -- 8-bit single byte coded graphic
character sets -- Part 1: Latin alphabet No. 1", ISO 8859-1, First
edition 1987-02-15:

ISO 8859 [this is the correct name -bt] consists of several parts. Each
part specifies a set of up to 191 graphic characters and the coded
representation of each of these characters by means of a single 8-bit
byte.  The use of control functions for the coded representation of
composite characters is prohibited by ISO 8859.  Each set is
intended for use for a group of languages.

ISO 8859/2 secifies a set of 191 graphic charactes identified as
Latin alphabet No. 2.

....

This set of graphic characters, the Latin alphabet No. 1, is intended
for use in data processing and text applications and may also be used
for information interchange.

The set contains graphic characters used for general purpose applications
in typical office environments in at least the following languages:

Danish, Dutch, English, Faroese, Finnish, French, German, Icelandic, Irish,
Italian, Norwegian, Portuguese, Spanish and Swedish"

The ISO 8859/1 codeset contains things like soft hyphen, capital and small
letter A with acute accent, capital and small Icelandic characters ETH and
THORN, capital and small german letter SHARP S, capital and small letter
A with ring above, diaeresis characters, and much more.

ISO 8859/2 is useful for Albanian, Czech, English, German, Hungarian,
Polish, Rumanian, Serbocroatian, Slovak and Slovene and contain for instance
characters with carons ("inverted circumflex accents") used in
some of these languages.

The ISO 8859 codesets are very complete and are extremely cleverly
designed with the capitals coded as SHIFTed small characters.  This is NOT
true for other 8-bit character codesets! (Do you listen, HP???)

-Bo

-- 
>>> Bo Thide', Swedish Institute of Space Physics, S-755 90 Uppsala, Sweden <<<  Phone (+46) 18-300020.  Telex: 76036 (IRFUPP S).  UUCP: ..enea!kuling!irfu!bt

bas+@andrew.cmu.edu (Bruce Sherwood) (04/06/88)

To repeat a major complaint I have about ISO 8859 (which I'm distressed to see
is a component of NLS):

This standard is based on nations rather than languages.  So the West European
version doesn't handle Welsh or Catalan or Esperanto (which don't have their
own nations).

The older standard, ISO 6937, was based on forty Latin-alphabet-using
languages, not on nations.  So it handled just about everything (except for
Vietnamese) including Welsh and Catalan and Esperanto.

ISO 8859 is a MAJOR step backward in terms of linguistic equality.

Bruce Sherwood