kato%cs.titech.junet@utokyo-relay.CSNET (Akira Kato) (05/25/88)
About a couple of months ago, someone posted an article about xterm with the 8bit character handling. With the 8bit through tty/pty, it is very much convenient to make use of such a font, for example, ISO 8859-1 for European people. For Japanese text handling, we have to make three GC's, one for 7bit-ASCII, one for 7bit-Kana, and one for (7bit)^2-Kanji. This makes the coding more complicated. If we can provide 8bit character set in one font, we can reduce the number of GC's to two. In European version only one GC would be necessary. The modification of xterm to make use of 8bit character set would be a quick hack, however, the problem is whether such 8bit character set will be used for *any* client programs as standard. Obviously, it helps all of the people in the world (but U.S. and U.K. :-), if appropriate font naming convention will be used in future release. Is there any mailing list or organization for internationalizing of X Window System ? -- Akira Kato, Tokyo Inst. of Tech. kato%cs.titech.junet@relay.cs.net
RWS@ZERMATT.LCS.MIT.EDU (Robert Scheifler) (05/27/88)
Date: Wed, 25 May 88 02:50:30 V From: Akira Kato <kato%cs.titech.junet%utokyo-relay.csnet@relay.cs.net> Is there any mailing list or organization for internationalizing of X Window System ? Not at present. I would prefer to see the discussions start on xtensions (or perhaps xpert), and move them to another list when it becomes obvious that it would help progress. Actually, I'm hoping that your X11 working group within the Japan Unix Society will help to lead the X community in the right direction on internationalization concerns.
diamant@hpfclp.SDE.HP.COM (John Diamant) (05/29/88)
> I would prefer to see the discussions start on > xtensions (or perhaps xpert), and move them to another list when it > becomes obvious that it would help progress. OK -- I've got something to start the discussion off with. Why is String (defined by X Toolkit) defined to be char * instead of unsigned char *, and is there some reason it can't be changed? Use of char * is bad for the health of characters that need to retain 8-bit integrity. John Diamant Software Development Environments Hewlett-Packard Co. ARPA Internet: diamant@hpfclp.sde.hp.com Fort Collins, CO UUCP: {hplabs,hpfcla}!hpfclp!diamant
guido@cwi.nl (Guido van Rossum) (05/31/88)
In article <9740026@hpfclp.SDE.HP.COM> diamant@hpfclp.SDE.HP.COM (John Diamant) writes: >OK -- I've got something to start the discussion off with. Why is >String (defined by X Toolkit) defined to be char * instead of unsigned char *, >and is there some reason it can't be changed? Use of char * is bad for >the health of characters that need to retain 8-bit integrity. I don't see why this is bad. How could you possibly lose the high bit if the character is signed? Char-to-char transfers copy all 8 bits, even if you use ints as intermediates. Indexing arrays must be done by masking off the sign bits: array[c & 0xff]. This is often recognized by the compiler as a special case. It is also a long-living tradition in C. I have a reason for *not* using unsigned characters: you'll get lint complaints and compiler errors or warnings on the use of perfectly safe functions like strcpy, which have 'char *' as their argument and return type. (And I hate to type 'unsigned'. :-) -- Guido van Rossum, Centre for Mathematics and Computer Science (CWI), Amsterdam guido@piring.cwi.nl or mcvax!piring!guido or guido%piring.cwi.nl@uunet.uu.net
diamant@hpfclp.UUCP (06/01/88)
> I don't see why this is bad. How could you possibly lose the high bit if > the character is signed? I didn't mean that you lose the high bit, but you have to handle the bogus sign extension. > Char-to-char transfers copy all 8 bits, even > if you use ints as intermediates. Indexing arrays must be done by > masking off the sign bits: array[c & 0xff]. This is often recognized by > the compiler as a special case. It is also a long-living tradition in > C. Well, but char-to-int transfers also require masking off the sign bits, as do any expressions involving chars (which may be cast to ints or longs depending on the size of the other parts of the expression). This strikes me as pretty error-prone. > I have a reason for *not* using unsigned characters: you'll get lint > complaints and compiler errors or warnings on the use of perfectly safe > functions like strcpy, which have 'char *' as their argument and return > type. I see your point. It seems to me that either way, you're screwed. The only hope of a simple solution would have been if char would be defined by the C language to be unsigned, but it isn't. So, we're stuck with having to do very careful casting no matter whether you use unsigned char * or char *. Now, I'm not really sure which way makes it easier overall. John Diamant Software Development Environments Hewlett-Packard Co. ARPA Internet: diamant@hpfclp.sde.hp.com Fort Collins, CO UUCP: {hplabs,hpfcla}!hpfclp!diamant
diamant@hpfclp.SDE.HP.COM (John Diamant) (06/09/88)
> >OK -- I've got something to start the discussion off with. Why is String > >(defined by X Toolkit) defined to be char * instead of unsigned char *, > >and is there some reason it can't be changed? I've finally tracked down the source of all this unsigned char *. The problem is that the conv and ctype macros (tolower, isupper, etc) are typically called with a char, however, their legal range is -1 to 255. If an 8-bit char is passed to them, it will be converted to an int as a negative number and be out of range (in fact, these macros are typically implemented as array indices, thus the problem you mention below). You can call them as toupper((unsigned char*) *p), but if you forget to cast, the program will bomb in mysterious ways since it probably indexes into a part of memory not in the array. One solution to that problem is to maintain your character data as unsigned char, but as you mentioned, this has other problems. By the way, I believe the behavior I am describing is the Native Language Support as adopted by X/OPEN. > Indexing arrays must be done by masking off the sign bits: array[c & 0xff]. > This is often recognized by the compiler as a special case. It is also a > long-living tradition in C. Writing non-portable programs is also a long-living tradition in C. That doesn't make it a good idea. In fact, your suggestion of masking the sign bits is non-portable. You are assuming the size of your char data type, which might not be 8 bits on some machines. The portable way to do this is to cast the data from char to unsigned char. I realize most machines use 8-bit chars, but I don't think all of them do. John Diamant Software Development Environments Hewlett-Packard Co. ARPA Internet: diamant@hpfclp.sde.hp.com Fort Collins, CO UUCP: {hplabs,hpfcla}!hpfclp!diamant
pho@dde.dk (Peter Holm) (10/24/89)
I added the following hack to xterm/charproc.c : 392c392 < switch(parsestate[c = doinput()]) { --- > switch(parsestate[(c = doinput()) & 0x7f]) { 398c398 < while(top > 0 && isprint(*cp)) { --- > while(top > 0 && isprint(*cp & 0x7f)) { in order to use 8 bit character sets. It works fine, but my problem is the line graphics characters. They are not part of the iso8859 character set. So how do i display them? Any suggestions? -- Peter Holm Tel: int +45 42 84 50 11 (UTC + 1) Dansk Data Elektronik A/S Fax: int +45 42 84 52 20 Herlev Hovedgade 199 Telex: 35258 dde dk DK-2730 Herlev, Denmark E-mail: pho@dde.dk