[net.internat] Is 8-bit ASCII enough?

kupfer@ucbvax.ARPA (Mike Kupfer) (10/09/85)

I think that 8 bits is still not enough if you want to include oriental
or other non-Roman character sets.  So using only 8 bits is reasonable
if you assume that a typical UNIX system will not be able to display
these characters (so why bother with them), but you should realize that
this assumption is being made.
-- 
Mike Kupfer
Xerox Corporation - SDD
kupfer.pa@xerox.ARPA
...!ucbvax!kupfer

david@ukma.UUCP (David Herron, NPR Lover) (10/10/85)

In article <10597@ucbvax.ARPA> kupfer@ucbvax.UUCP (Mike Kupfer) writes:
>I think that 8 bits is still not enough if you want to include oriental
>or other non-Roman character sets.  So using only 8 bits is reasonable
>if you assume that a typical UNIX system will not be able to display
>these characters (so why bother with them), but you should realize that
>this assumption is being made.

There's some work being done at Xerox, etc in representing foreign
character sets and word-processing them -- Look in Sci. Am. in an
issue a year or two ago.  I think maybe that was the 'topic' for 
that month even.

The method (As I recall) described in one article was to define one
code as an "escape" code.  You could follow the escape code with
commands to switch character sets or whatever.  So instead of an
absolute encoding, you had a context sensitive encoding.  Which will
give you greater flexibility in the character sets you are storing.
(They are aiming for a system whereby ALL text, regardless of language,
may be word-processed, etc).

One of the most interesting things I remember is that some languages
have characters which *surround* other characters.  This was making
for an interesting typesetting problem.
-- 
David Herron, ukma!david@ANL-MCS.ARPA, cbosgd!ukma!david
(Soon -- david@UKMA.BITNET, and (hopefully) david@ukma.csnet)

Hackin's in me blood!  My mother was known as Miss Hacker before she married!

michaelm@bcsaic.UUCP (michael b maxwell) (10/10/85)

In article <10597@ucbvax.ARPA> kupfer@ucbvax.UUCP (Mike Kupfer) writes:
>I think that 8 bits is still not enough if you want to include oriental
>or other non-Roman character sets.  So using only 8 bits is reasonable
>if you assume that a typical UNIX system will not be able to display
>these characters...

Along these lines, readers of this newsgroup may be interested in the
ff. article:
	Anderson, Lloyd B. 1984. "Multilingual Text Processing in a Two-
	Byte Code."  10th. Int'l. Conf. on Computational Linguistics,
	pg. 1-4.
Part of the abstract:
	...standards committees are now discussing a two-byte code for
	multilingual information processing... 65,536 separate character
	and control codes, enough to make permanent code assignments for
	all national alphabets of the world, and also to include
	Chinese/ Japanese characters...  It is possible to arrange
	alphabet codes to provide transliteration equivalence...
He discusses the problems of diacritics, digraphs, alphabetization, etc.
The committee referred to is apparently the "ANSI X3L2" committee (at
least it's the only committee I can find reference to in the text).
-- 
Mike Maxwell
Boeing Artificial Intelligence Center
	..uw-beaver!{uw-june,ssc-vax}!bcsaic!michaelm