[comp.binaries.ibm.pc.d] Internationalization of programs

rsargent@alias.com (Richard Sargent) (06/22/91)

In article <46770002@hpcndjdz.CND.HP.COM> jason@hpcndjdz.CND.HP.COM (Jason Zions) writes:
>8-bit localizeability is a damn complex subject. Some of the items that need
>to be done include:
>
>1) Leave all 8 bits alone. You can touch 0 through 31 (base-10) and 255, but
>   that's it.

That's not quite it. The way the 8-bit character sets I am familiar
with are defined is: C0, G0, C1, and G1, where these terms refer to
'columns' of the code set as follows: C0 is the 'standard' control
characters (columns 0 and 1, or the first 32 characters), G0 is the
first 'graphic' character set (columns 2 to 7), C1 is the second control 
character set (columns 8 and 9), and G1 is the second graphic set
(columns 10 to 15).

The term 'graphic' is used in the typographic sense: i.e. it displays
something visual. I believe the first and last entries in the Gn sets
are off-limits (i.e. the space and DEL character).

You need to leave the remainder of the Gn sets alone to ensure basic
compatibility with 8-bit character codes for all countries. This does
not attempt to cover the >8-bit codes from countries such as Japan.

>
>2) For each language supported, build tables indicating:
>   a) Upper-case/lower-case equivalents.
>   b) For word-processors, word-break characters and hyphenation rules
>   (which may depend upon syllabification rules.
>   c) Lexicographic ordering, case-sensitive and case-insensitive.
>   d) Ideally, defineable input-character mappings, in case the stupid
>   system doesn't do keyboard mapping correctly (or at all).

The C Standard defines 'locales' which are supposed to allow proper
lexicographic handling of international character sets. I don't
have my copy of the Standard handy, so I can give you the necessary
details. P.J.Plauger has written quite extensively about the C
Standard in "The C User's Journal", and most resently his articles
have dealt with localization. If my memory serves me, this makes
item #2 above unnecessary, but that depends on the compiler 
implementation.



#ifdef SOAP_BOX
There have been a lot of flames lately on this whole subject.
Some the flames have degenerated into name calling and other
insults. This is completely unnecessary.

The world is tending to greater internationalization with each
passing year. While Iceland, for example, may respresent a small
market, the European market is considered to be the fastest
growing market in the world right now for computers and software.

Europe is not a tiny market, yet it is a very hungry one for
good, quality software.


There has, in the past, been a lot of discussion of this topic
in comp.std.c (and perhaps comp.lang.c). Many of our European
correspondents have been participating in the discussions. I'm
not surprised that the "camel's back has broken". You would
probably feel the same way if you talked for years and no one
seemed to be listening.
#endif