[comp.windows.x] 8 bit xterm.

kato%cs.titech.junet@utokyo-relay.CSNET (Akira Kato) (05/25/88)

About a couple of months ago, someone posted an article about xterm
with the 8bit character handling.

With the 8bit through tty/pty, it is very much convenient to make use
of such a font, for example, ISO 8859-1 for European people. For Japanese
text handling, we have to make three GC's, one for 7bit-ASCII, one for
7bit-Kana, and one for (7bit)^2-Kanji. This makes the coding more complicated.
If we can provide 8bit character set in one font, we can reduce the number of
GC's to two. In European version only one GC would be necessary.

The modification of xterm to make use of 8bit character set would be a
quick hack, however, the problem is whether such 8bit character set will be
used for *any* client programs as standard.

Obviously, it helps all of the people in the world (but U.S. and U.K. :-),
if appropriate font naming convention will be used in future release.

Is there any mailing list or organization for internationalizing of
X Window System ?

-- Akira Kato, Tokyo Inst. of Tech.
   kato%cs.titech.junet@relay.cs.net

RWS@ZERMATT.LCS.MIT.EDU (Robert Scheifler) (05/27/88)

    Date: Wed, 25 May 88 02:50:30 V
    From: Akira Kato <kato%cs.titech.junet%utokyo-relay.csnet@relay.cs.net>

    Is there any mailing list or organization for internationalizing of
    X Window System ?

Not at present.  I would prefer to see the discussions start on
xtensions (or perhaps xpert), and move them to another list when it
becomes obvious that it would help progress.

Actually, I'm hoping that your X11 working group within the Japan Unix
Society will help to lead the X community in the right direction on
internationalization concerns.

diamant@hpfclp.SDE.HP.COM (John Diamant) (05/29/88)

> I would prefer to see the discussions start on
> xtensions (or perhaps xpert), and move them to another list when it
> becomes obvious that it would help progress.

OK -- I've got something to start the discussion off with.  Why is
String (defined by X Toolkit) defined to be char * instead of unsigned char *,
and is there some reason it can't be changed?  Use of char * is bad for
the health of characters that need to retain 8-bit integrity.


John Diamant
Software Development Environments
Hewlett-Packard Co.		ARPA Internet: diamant@hpfclp.sde.hp.com
Fort Collins, CO		UUCP:  {hplabs,hpfcla}!hpfclp!diamant

guido@cwi.nl (Guido van Rossum) (05/31/88)

In article <9740026@hpfclp.SDE.HP.COM> diamant@hpfclp.SDE.HP.COM (John Diamant) writes:
>OK -- I've got something to start the discussion off with.  Why is
>String (defined by X Toolkit) defined to be char * instead of unsigned char *,
>and is there some reason it can't be changed?  Use of char * is bad for
>the health of characters that need to retain 8-bit integrity.

I don't see why this is bad. How could you possibly lose the high bit if
the character is signed?  Char-to-char transfers copy all 8 bits, even
if you use ints as intermediates.  Indexing arrays must be done by
masking off the sign bits: array[c & 0xff].  This is often recognized by
the compiler as a special case.  It is also a long-living tradition in
C.

I have a reason for *not* using unsigned characters: you'll get lint
complaints and compiler errors or warnings on the use of perfectly safe
functions like strcpy, which have 'char *' as their argument and return
type.

(And I hate to type 'unsigned'. :-)
--
Guido van Rossum, Centre for Mathematics and Computer Science (CWI), Amsterdam
guido@piring.cwi.nl or mcvax!piring!guido or guido%piring.cwi.nl@uunet.uu.net

diamant@hpfclp.UUCP (06/01/88)

> I don't see why this is bad. How could you possibly lose the high bit if
> the character is signed?

I didn't mean that you lose the high bit, but you have to handle the
bogus sign extension.

>  Char-to-char transfers copy all 8 bits, even
> if you use ints as intermediates.  Indexing arrays must be done by
> masking off the sign bits: array[c & 0xff].  This is often recognized by
> the compiler as a special case.  It is also a long-living tradition in
> C.

Well, but char-to-int transfers also require masking off the sign bits,
as do any expressions involving chars (which may be cast to ints or longs
depending on the size of the other parts of the expression).  This strikes
me as pretty error-prone.

> I have a reason for *not* using unsigned characters: you'll get lint
> complaints and compiler errors or warnings on the use of perfectly safe
> functions like strcpy, which have 'char *' as their argument and return
> type.

I see your point.  It seems to me that either way, you're screwed.  The
only hope of a simple solution would have been if char would be defined
by the C language to be unsigned, but it isn't.  So, we're stuck with
having to do very careful casting no matter whether you use unsigned char *
or char *.  Now, I'm not really sure which way makes it easier overall.

John Diamant
Software Development Environments
Hewlett-Packard Co.		ARPA Internet: diamant@hpfclp.sde.hp.com
Fort Collins, CO		UUCP:  {hplabs,hpfcla}!hpfclp!diamant

diamant@hpfclp.SDE.HP.COM (John Diamant) (06/09/88)

> >OK -- I've got something to start the discussion off with.  Why is String
> >(defined by X Toolkit) defined to be char * instead of unsigned char *,
> >and is there some reason it can't be changed? 

I've finally tracked down the source of all this unsigned char *.  The
problem is that the conv and ctype macros (tolower, isupper, etc) are
typically called with a char, however, their legal range is -1 to 255.
If an 8-bit char is passed to them, it will be converted to an int as
a negative number and be out of range (in fact, these macros are
typically implemented as array indices, thus the problem you mention below).
You can call them as toupper((unsigned char*) *p), but if you forget
to cast, the program will bomb in mysterious ways since it probably
indexes into a part of memory not in the array.  One solution to that
problem is to maintain your character data as unsigned char, but as
you mentioned, this has other problems.  By the way, I believe the
behavior I am describing is the Native Language Support as adopted
by X/OPEN.

> Indexing arrays must be done by masking off the sign bits: array[c & 0xff].
> This is often recognized by the compiler as a special case.  It is also a
> long-living tradition in C.

Writing non-portable programs is also a long-living tradition in C.  That
doesn't make it a good idea.  In fact, your suggestion of masking
the sign bits is non-portable.  You are assuming the size of your
char data type, which might not be 8 bits on some machines.   The
portable way to do this is to cast the data from char to unsigned char.
I realize most machines use 8-bit chars, but I don't think all of
them do.

John Diamant
Software Development Environments
Hewlett-Packard Co.		ARPA Internet: diamant@hpfclp.sde.hp.com
Fort Collins, CO		UUCP:  {hplabs,hpfcla}!hpfclp!diamant

pho@dde.dk (Peter Holm) (10/24/89)

I added the following hack to xterm/charproc.c :

392c392
< 		switch(parsestate[c = doinput()]) {
---
> 		switch(parsestate[(c = doinput()) & 0x7f]) {
398c398
< 			while(top > 0 && isprint(*cp)) {
---
> 			while(top > 0 && isprint(*cp & 0x7f)) {

in order to use 8 bit character sets. It works fine, but my problem
is the line graphics characters. They are not part of the iso8859
character set. So how do i display them? Any suggestions?
-- 
Peter Holm                        Tel: int +45 42 84 50 11 (UTC + 1)
Dansk Data Elektronik A/S         Fax: int +45 42 84 52 20
Herlev Hovedgade 199              Telex: 35258 dde dk
DK-2730 Herlev, Denmark           E-mail:  pho@dde.dk