walter@hpclwjm.HP.COM (Walter Murray) (02/22/90)
Tim McDaniel writes: > If I want to write C programs to be more easily portable to Europe, > say, what facilities exist in the Standard to help? > K&R2 says almost nothing about the subject, except to briefly mention > wchar_t in an appendix. "The Waite Group's Essential Guide to ANSI C" > mentions <locale.h>, localeconv, mb*, wc*, strxfrm, and strcoll, but > doesn't go into any details. One book that I like, which does go into details, is _ANSI_C_A_Lexical_ Guide_ by the Mark Williams Company. Start with the overview topic "localization". There have also been some good articles in recent issues of the _C_Users_Journal_. Note that the standard does not require any locales other than the basic "C" locale. Check your vendor's documentation for information on what other locales are supported. Walter Murray This is a personal opinion. I have no connection with the Mark Williams Company or the _Journal_. ----------
henry@utzoo.uucp (Henry Spencer) (02/22/90)
In article <MCDANIEL.90Feb20135828@orenda.amara.uucp> mcdaniel@amara.uucp (Tim McDaniel) writes: >If I want to write C programs to be more easily portable to Europe, >say, what facilities exist in the Standard to help? The standard neither particularly helps nor particularly hinders this. There are some minor oddments -- the "locale" stuff -- for customizing things like date formats, but you have to do most of the work yourself. Apart from the problems of error messages etc. (I think one of the POSIX committees is looking at this), the big thing to watch is that you treat characters as 8-bit objects, and don't get clever with using the top bit for a flag. >How about for Asian countries, like Japan? That's a whole new can of worms. The problem there is that 8 bits is not enough for the character set. There is a standard-defined type wchar_t, which is a sufficiently wide character type, plus a few functions that operate on it, plus a way to write wchar_t string literals, but it's not what you would call an elaborate set of aids to internationalization. -- "The N in NFS stands for Not, | Henry Spencer at U of Toronto Zoology or Need, or perhaps Nightmare"| uunet!attcan!utzoo!henry henry@zoo.toronto.edu
markha@microsoft.UUCP (Mark HAHN) (03/01/90)
there is no good answer yet. I offer a few things to be aware of: CHARACTER CODING don't make assumptions about how "character" data is coded. once upon a time, all characters were ASCII, that is, 0 to 127. nowadays, the minimum is 8-bit characters, with much movement in the direction of multi-byte characters or 16-bit chars. as far as I can see, there are no difinitive or complete standards. (for instance: what is the format of a locale string, are wchar_t's supposed to be portable, are MB chars or wchar_t's valid in file names, etc.) PRESENTATION PREFERENCES you also should avoid assumptions about language and country-oriented behavior like sort order, up/down casing, date/time/number/currency formats. to be truly virtuous, you can't even assume text directionality! ISOLATE MESSAGES FROM CODE keep any strings in some separate file - if nothing else, just have them in a big array somewhere, and refer to them using symbolic indices. various OSs have better (?) or more elaborate support than this - X/Open message catalogs, Mac/Windows/PM resources, OS/2 message files. ON THE HORIZON there are a number of promising directions. UniCode is one of them: a 16-bit character set that is able to represent everything uniformly. I don't know of any promising ideas for managing messages, though. Internationalization is not glamorous, hence the various Unix groups estimate 1992 for shipping international support. Just remember that someone, probably not the original author, will be trying to translate those messages. Maybe the real benefit of iconic or direct-manipulation user interfaces is the smaller number of messages... regards, Mark Hahn -- Mark Hahn microsof!markha@uunet.uu.net uunet!microsof!markha I don't speak for Microsoft.