blarson@oberon.UUCP (Bob Larson) (10/28/85)
[Let's demonstrate the need by cross-posting to something other than net.internat] Some people seem to be under the mistaken impression that ASCII hasn't changed. Lower case letters were added in (rather than the shift-in / shift-out cluge), _ was changed from left arrow to underline, ^ was chaned from up arrow to carrot, etc. I don't think adding an eighth bit would change it enogh to consider it something other than ASCII. Sorting order in ASCII realy isn't correct either. Do you like all of your upper case words coming before your lower case ones? The sorting order problem is realy one of replacing a case translator with a table lookup. Hopefully the table could be make easy to change for working in different languages. -- Bob Larson Arpa: Blarson@Usc-Ecl.Arpa Uucp: {the (mostly unknown) world}!ihnp4!sdcrdcf!oberon!blarson {several select chunks}!sdcrdcf!oberon!blarson
guido@boring.UUCP (11/01/85)
In article <150@oberon.UUCP> blarson@oberon.UUCP (Bob Larson) writes: >The sorting order >problem is really one of replacing a case translator with a table lookup. >Hopefully the table could be make easy to change for working in different >languages. YES! Decent sourting should always be done be table lookup. As an example, the Macintosh international utilities package sorts strings in this way, and the table can be customized to cope with national variations in the desired dictionary order. The Mac still uses the character set's native ordering to determine an ordering for strings that compare equal using the table (e.g., AA equals aa but precedes it, while aa precedes AB), so the character set's ordering still matters. I don't know whether the Macintosh character set (which is a superset of ASCII and contains most accented or otherwise slightly modified characters found in various Western European languages, but does not support differenty alphabets) would be acceptable as a standard, but at least it addresses the problems that are encountered most frequently, it fits in 8 bits and is compatible with ASCII. (I'm afraid that there is another standard extension of ASCII which uses up the 8th bit for lots of control codes like cursor up. However this does not seem to have caught on very much.) Guido van Rossum, CWI, Amsterdam (guido@mcvax.UUCP)
franka@mmintl.UUCP (Frank Adams) (11/04/85)
In article <6672@boring.UUCP> guido@mcvax.UUCP (Guido van Rossum) writes: >I don't know whether the Macintosh character set (which is a superset >of ASCII and contains most accented or otherwise slightly modified >characters found in various Western European languages, but does not >support differenty alphabets) would be acceptable as a standard, >but at least it addresses the problems that are encountered most >frequently, it fits in 8 bits and is compatible with ASCII. > >(I'm afraid that there is another standard extension of ASCII which >uses up the 8th bit for lots of control codes like cursor up. >However this does not seem to have caught on very much.) There is another standard extension of ASCII which is used for the IBM PC. It has a fair number of modified characters; I don't know how it compares with the Macintosh set. (It does not have the eastern European c's, s's, or z's with curlicues; it does have the vaguely similar French c.) It also has a fair selection of special characters. I am not actually recommending it, just putting it up for consideration. Given the source, I think it has to be taken into account. Frank Adams ihpn4!philabs!pwa-b!mmintl!franka Multimate International 52 Oakland Ave North E. Hartford, CT 06108
mikeb@inset.UUCP (Mike Banahan) (11/04/85)
In article <150@oberon.UUCP> blarson@oberon.UUCP (Bob Larson) writes: >Sorting order in ASCII realy isn't correct either. Do you like all of your >upper case words coming before your lower case ones? The sorting order >problem is realy one of replacing a case translator with a table lookup. >Hopefully the table could be make easy to change for working in different >languages. How right you are Bob! There's lots to it as well. The sorting problem is going to be a famous one - UNIX hackers have sort of got used (sorry about the pun) to making do with ASCII sorting order, but it's completely unacceptable in a number of environments. The current proposals for ISO 8859 mean that only English has even poor sorting order based on character encoding - for the other languages that it is meant to support, such as French, Scandinavian and so on, it's a non-starter. A whole bunch of accented and further alphabetic characters are found in the ``top'' 128 character positions, with absolutely no correlation to their expected sorting position. Some languages confound this by not being very sure about just what their collating sequence is: see the item posted by Jaap Akkerhuis which points out that in Dutch, depending on which of 3 more or less official alphabets you choose, there may or may not be a ``y''. If there is, it sorts the same as the character PAIR ``ij''. So the algorithms can't even work on character-by-character basis. Also, my spies tell me that in French, when two words are compared, accents are ignored unless the word is the same without them, when rules are used to separate the two. Fun stuff, isn't it? It's going to take some fancy table-driven stuff to make sense of all this! As for ranges in Regular Expressions ..... I would love to hear how to make sense of them. -- Mike Banahan, Technical Director, The Instruction Set Ltd. mcvax!ukc!inset!mikeb
jack@boring.UUCP (11/05/85)
In article <6672@boring.UUCP> guido@mcvax.UUCP (Guido van Rossum) writes: >(I'm afraid that there is another standard extension of ASCII which >uses up the 8th bit for lots of control codes like cursor up. >However this does not seem to have caught on very much.) > > Guido van Rossum, CWI, Amsterdam (guido@mcvax.UUCP) As far as I remember, this 8 bit ASCII (which isn't called ASCII, by the way, but ISO-something-or-other) uses codes 0200-0240 for extra control functions, and 0241-0277 for extra characters. I even think that if you take a letter in normal ASCII, and add bit 8, you still have a letter (be it a different one, of course:-). Since this code seems to have been more-or-less accepted (I know of at least two terminals that accept it, or part of it), I guess the MAC will probably use the same code. If there is interest, I'll type in the code-table (more-or-less, of course). -- Jack Jansen, jack@mcvax.UUCP The shell is my oyster.
herbie@polaris.UUCP (Herb Chong) (11/07/85)
In article <6681@boring.UUCP> jack@boring.UUCP (Jack Jansen) writes: >>(I'm afraid that there is another standard extension of ASCII which >>uses up the 8th bit for lots of control codes like cursor up. >>However this does not seem to have caught on very much.) >> >> Guido van Rossum, CWI, Amsterdam (guido@mcvax.UUCP) > > >As far as I remember, this 8 bit ASCII (which isn't called ASCII, by the >way, but ISO-something-or-other) uses codes 0200-0240 for extra control >functions, and 0241-0277 for extra characters. >I even think that if you take a letter in normal ASCII, and add bit 8, you >still have a letter (be it a different one, of course:-). is this the same 8-bit ASCII code called US8-ASCII that was pushed by IBM as the 8-bit standard character code when they announced the 360's (a long time ago). Herb Chong... I'm still user-friendly -- I don't byte, I nybble.... New net address -- VNET,BITNET,NETNORTH,EARN: HERBIE AT YKTVMH UUCP: {allegra|cbosgd|cmcl2|decvax|ihnp4|seismo}!philabs!polaris!herbie CSNET: herbie.yktvmh@ibm-sj.csnet ARPA: herbie.yktvmh.ibm-sj.csnet@csnet-relay.arpa ======================================================================== DISCLAIMER: what you just read was produced by pouring lukewarm tea for 42 seconds onto 9 people chained to 6 Ouiji boards.