[net.internat] Character sets, sorting etc.

blarson@oberon.UUCP (Bob Larson) (10/28/85)

[Let's demonstrate the need by cross-posting to something other than
net.internat]

Some people seem to be under the mistaken impression that ASCII hasn't
changed.  Lower case letters were added in (rather than the shift-in /
shift-out cluge), _ was changed from left arrow to underline, ^ was chaned
from up arrow to carrot, etc.  I don't think adding an eighth bit would
change it enogh to consider it something other than ASCII.

Sorting order in ASCII realy isn't correct either.  Do you like all of your
upper case words coming before your lower case ones?  The sorting order
problem is realy one of replacing a case translator with a table lookup.
Hopefully the table could be make easy to change for working in different
languages.

-- 
Bob Larson
Arpa: Blarson@Usc-Ecl.Arpa
Uucp: {the (mostly unknown) world}!ihnp4!sdcrdcf!oberon!blarson
                 {several select chunks}!sdcrdcf!oberon!blarson

guido@boring.UUCP (11/01/85)

In article <150@oberon.UUCP> blarson@oberon.UUCP (Bob Larson) writes:
>The sorting order
>problem is really one of replacing a case translator with a table lookup.
>Hopefully the table could be make easy to change for working in different
>languages.

YES!  Decent sourting should always be done be table lookup.  As an
example, the Macintosh international utilities package sorts strings
in this way, and the table can be customized to cope with national
variations in the desired dictionary order.  The Mac still uses the
character set's native ordering to determine an ordering for strings
that compare equal using the table (e.g., AA equals aa but precedes
it, while aa precedes AB), so the character set's ordering still
matters.

I don't know whether the Macintosh character set (which is a superset
of ASCII and contains most accented or otherwise slightly modified
characters found in various Western European languages, but does not
support differenty alphabets) would be acceptable as a standard,
but at least it addresses the problems that are encountered most
frequently, it fits in 8 bits and is compatible with ASCII.

(I'm afraid that there is another standard extension of ASCII which
uses up the 8th bit for lots of control codes like cursor up.
However this does not seem to have caught on very much.)

	Guido van Rossum, CWI, Amsterdam (guido@mcvax.UUCP)

franka@mmintl.UUCP (Frank Adams) (11/04/85)

In article <6672@boring.UUCP> guido@mcvax.UUCP (Guido van Rossum) writes:
>I don't know whether the Macintosh character set (which is a superset
>of ASCII and contains most accented or otherwise slightly modified
>characters found in various Western European languages, but does not
>support differenty alphabets) would be acceptable as a standard,
>but at least it addresses the problems that are encountered most
>frequently, it fits in 8 bits and is compatible with ASCII.
>
>(I'm afraid that there is another standard extension of ASCII which
>uses up the 8th bit for lots of control codes like cursor up.
>However this does not seem to have caught on very much.)

There is another standard extension of ASCII which is used for the IBM
PC.  It has a fair number of modified characters; I don't know how it
compares with the Macintosh set.  (It does not have the eastern European
c's, s's, or z's with curlicues; it does have the vaguely similar French
c.)  It also has a fair selection of special characters.  I am not
actually recommending it, just putting it up for consideration.  Given
the source, I think it has to be taken into account.

Frank Adams                           ihpn4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

jack@boring.UUCP (11/05/85)

In article <6672@boring.UUCP> guido@mcvax.UUCP (Guido van Rossum) writes:

>(I'm afraid that there is another standard extension of ASCII which
>uses up the 8th bit for lots of control codes like cursor up.
>However this does not seem to have caught on very much.)
>
>	Guido van Rossum, CWI, Amsterdam (guido@mcvax.UUCP)


As far as I remember, this 8 bit ASCII (which isn't called ASCII, by the
way, but ISO-something-or-other) uses codes 0200-0240 for extra control
functions, and 0241-0277 for extra characters.
I even think that if you take a letter in normal ASCII, and add bit 8, you
still have a letter (be it a different one, of course:-).

Since this code seems to have been more-or-less accepted (I know of at least
two terminals that accept it, or part of it), I guess the MAC will probably
use the same code.

If there is interest, I'll type in the code-table (more-or-less, of course).
-- 
	Jack Jansen, jack@mcvax.UUCP
	The shell is my oyster.