erc@pai.UUCP (Eric F. Johnson) (02/26/91)
InfoWorld (25 Feb 91) has an interesting article on internationalizing character sets. According to the article, a joint venture, Unicode, Inc., will replace the ASCII with a 16-bit character set (with 27,000 characters used). IBM, Apple, Sun, Microsoft, Next, Go (a pen-based company) and Novell are members of Unicode. The article was rather scant on technical details, which makes it hard to judge the merits of their approach. Apparently, the Unicode character set will include traditional ASCII for characters 0-127 and other national alphabets in the higher characters. It looks like one can use at least Arabic, French and English in the same document using Unicode. This seems to indicate that Unicode supports multi-lingual applications (which the X internationalization effort has apparently chosen to put off, according to statements at the 5th X Technical Conference). I wonder what impact, if any, this will have on the internationalization (I18N) efforts for the X Window System, with IBM, Apple and Sun as members of Unicode. From the purely selfish attitude of an application developer, it would be nice if the X folks involved with internationalization would at least _talk_ to the Unicode folks. Yes, I know that a character set is not all internationalization entails (especially if it has to justify I18N and other jargon terms :-). But, it looks like the industry is moving to standardize at least _part_ of this (a character set). Have fun, -Eric -- Eric F. Johnson phone: +1 612 894 0313 BTI: Industrial Boulware Technologies, Inc. fax: +1 612 894 0316 automation systems 415 W. Travelers Trail email: erc@pai.mn.org and services Burnsville, MN 55337 USA
harald.alvestrand@elab-runit.sintef.no (02/26/91)
UNICODE is (IMHO) another US loser. It is (as far as I know) heartily disliked by the Japanese, Chinese, Korean and others whom IBM et al are trying to say that they make it for. The reason is that they try to squeeze characters that look *almost* the same and mean *almost* the same thing into a single character position. Kind of like writing French without the accents: Readable, but UGLY. The ISO guys are gathering around ISO 10646, a *32-bit* (gasp) character set with compaction methods that make it compatible with ISO 8859-1 (Latin-1). Harald Tveit Alvestrand Harald.Alvestrand@elab-runit.sintef.no C=no;PRMD=uninett;O=sintef;OU=elab-runit;S=alvestrand;G=harald +47 7 59 70 94
mleisher@nmsu.edu (Mark Leisher) (02/27/91)
In article <1991Feb26.094326.3341@ugle.unit.no> harald.alvestrand@elab-runit.sintef.no writes: >UNICODE is (IMHO) another US loser. >It is (as far as I know) heartily disliked by the Japanese, Chinese, Korean >and others whom IBM et al are trying to say that they make it for. >The reason is that they try to squeeze characters that look *almost* the same >and mean *almost* the same thing into a single character position. >Kind of like writing French without the accents: Readable, but UGLY. > Yep. The argument, wrt both Unicode and ISO 10646, is over the Han unification. >The ISO guys are gathering around ISO 10646, a *32-bit* (gasp) character set >with compaction methods that make it compatible with ISO 8859-1 (Latin-1). IMHO, if a Han unification is not agreed upon, it looks like it would be easier to fit potentially different Han sets in ISO 10646. Besides, 32 bits gives a lot of working room for future additions, of whatever sort, and the compaction methods available in ISO 10646 still allow some modicum of efficiency. Another thing to keep in mind is that Japan, Korea, and PRC are now working on new standards with internationalization in mind. Once these standards are out, maybe the unification questions will be easier to resolve. Perhaps delaying the internationalization of X is a good idea. Who wants potentially major modifications staring them in the face after choosing one international character set over another. ----------------------------------------------------------------------------- mleisher@nmsu.edu "I laughed. Mark Leisher I cried. Computing Research Lab I fell down. New Mexico State University It changed my life." Las Cruces, NM - Rich [Cowboy Feng's Space Bar and Grille]
mleisher@nmsu.edu (Mark Leisher) (02/27/91)
>Perhaps delaying the internationalization of X is a good idea. Who >wants potentially major modifications staring them in the face after >choosing one international character set over another. Thanks to Bob Scheifler for alerting me to this (wrong) little gem from my last posting. As he pointed out to me, internationalization efforts are primarily oriented towards programming interfaces that provide codeset independence as opposed to being dependent on a particular codeset. Humblest apologies for my clumsy mis-info. ----------------------------------------------------------------------------- mleisher@nmsu.edu "I laughed. Mark Leisher I cried. Computing Research Lab I fell down. New Mexico State University It changed my life." Las Cruces, NM - Rich [Cowboy Feng's Space Bar and Grille]
harkcom@spinach.pa.yokogawa.co.jp (03/16/91)
In article <MLEISHER.91Feb27070453@thrinakia.nmsu.edu> mleisher@nmsu.edu (Mark Leisher) writes: =}>Perhaps delaying the internationalization of X is a good idea. Who =}>wants potentially major modifications staring them in the face after =}>choosing one international character set over another. =} =}Thanks to Bob Scheifler for alerting me to this (wrong) little gem =}from my last posting. =} =}As he pointed out to me, internationalization efforts are primarily =}oriented towards programming interfaces that provide codeset =}independence as opposed to being dependent on a particular codeset. But your question was useful in that it prompted me to ask myself some others. And one of those questions seems to point to a potential pitfall. The type wchar_t will be supported in X. X can be used over a network. wchar_t can be defined to have different sizes on two different machines as the only requirement is that it be large enough to support all locales on one machine (one machine has largest size of 2 bytes while another has 3 or 4, particularly 4). Now suppose we have a machine, A, running the server and it has wchar_t defined as unsigned short. Now I remotely run a client on another machine, B, which has wchar_t defined as unsigned int. I use the display on A for the client. The client sends a text string which has only two bytes per character in the four byte format to the server. The server will draw a mess. And the reverse (server on B client on A), will make an even prettier mess. Has there been an attempt to avoid this situation? If so, how?
dan@ibm.COM (Walt Daniels) (03/17/91)
>From: harkcom@spinach.pa.yokogawa.co.jp > Now suppose we have a machine, A, running the server and it has >wchar_t defined as unsigned short. Now I remotely run a client on >another machine, B, which has wchar_t defined as unsigned int. I use >the display on A for the client. The client sends a text string which >has only two bytes per character in the four byte format to the server. >The server will draw a mess. And the reverse (server on B client on A), >will make an even prettier mess. > > Has there been an attempt to avoid this situation? If so, how? There are no problems - read your xlib manual about the draw string functions. They come in two flavors, 8 bit and 16 bits. The codepoints used in the text strings of the applications get converted to the glyph indexes into fonts by the draw operations for transmition over the wire protocol to the server. There are problems with cuting and pasting arbitrary strings between clients. Both sides must agree on a codeset. The usual thing is to use compound text but conscenting clients can use other codesets.