rsargent@alias.com (Richard Sargent) (06/22/91)
In article <46770002@hpcndjdz.CND.HP.COM> jason@hpcndjdz.CND.HP.COM (Jason Zions) writes: >8-bit localizeability is a damn complex subject. Some of the items that need >to be done include: > >1) Leave all 8 bits alone. You can touch 0 through 31 (base-10) and 255, but > that's it. That's not quite it. The way the 8-bit character sets I am familiar with are defined is: C0, G0, C1, and G1, where these terms refer to 'columns' of the code set as follows: C0 is the 'standard' control characters (columns 0 and 1, or the first 32 characters), G0 is the first 'graphic' character set (columns 2 to 7), C1 is the second control character set (columns 8 and 9), and G1 is the second graphic set (columns 10 to 15). The term 'graphic' is used in the typographic sense: i.e. it displays something visual. I believe the first and last entries in the Gn sets are off-limits (i.e. the space and DEL character). You need to leave the remainder of the Gn sets alone to ensure basic compatibility with 8-bit character codes for all countries. This does not attempt to cover the >8-bit codes from countries such as Japan. > >2) For each language supported, build tables indicating: > a) Upper-case/lower-case equivalents. > b) For word-processors, word-break characters and hyphenation rules > (which may depend upon syllabification rules. > c) Lexicographic ordering, case-sensitive and case-insensitive. > d) Ideally, defineable input-character mappings, in case the stupid > system doesn't do keyboard mapping correctly (or at all). The C Standard defines 'locales' which are supposed to allow proper lexicographic handling of international character sets. I don't have my copy of the Standard handy, so I can give you the necessary details. P.J.Plauger has written quite extensively about the C Standard in "The C User's Journal", and most resently his articles have dealt with localization. If my memory serves me, this makes item #2 above unnecessary, but that depends on the compiler implementation. #ifdef SOAP_BOX There have been a lot of flames lately on this whole subject. Some the flames have degenerated into name calling and other insults. This is completely unnecessary. The world is tending to greater internationalization with each passing year. While Iceland, for example, may respresent a small market, the European market is considered to be the fastest growing market in the world right now for computers and software. Europe is not a tiny market, yet it is a very hungry one for good, quality software. There has, in the past, been a lot of discussion of this topic in comp.std.c (and perhaps comp.lang.c). Many of our European correspondents have been participating in the discussions. I'm not surprised that the "camel's back has broken". You would probably feel the same way if you talked for years and no one seemed to be listening. #endif