[comp.std.c] Internationalization

walter@hpclwjm.HP.COM (Walter Murray) (02/22/90)

Tim McDaniel writes:

> If I want to write C programs to be more easily portable to Europe,
> say, what facilities exist in the Standard to help?

> K&R2 says almost nothing about the subject, except to briefly mention
> wchar_t in an appendix.  "The Waite Group's Essential Guide to ANSI C"
> mentions <locale.h>, localeconv, mb*, wc*, strxfrm, and strcoll, but
> doesn't go into any details.

One book that I like, which does go into details, is _ANSI_C_A_Lexical_
Guide_ by the Mark Williams Company.  Start with the overview topic
"localization".  

There have also been some good articles in recent issues of the
_C_Users_Journal_.

Note that the standard does not require any locales other than
the basic "C" locale.  Check your vendor's documentation for
information on what other locales are supported.

Walter Murray
This is a personal opinion.  I have no connection with the Mark Williams
Company or the _Journal_.
----------

henry@utzoo.uucp (Henry Spencer) (02/22/90)

In article <MCDANIEL.90Feb20135828@orenda.amara.uucp> mcdaniel@amara.uucp (Tim McDaniel) writes:
>If I want to write C programs to be more easily portable to Europe,
>say, what facilities exist in the Standard to help?

The standard neither particularly helps nor particularly hinders this.
There are some minor oddments -- the "locale" stuff -- for customizing
things like date formats, but you have to do most of the work yourself.
Apart from the problems of error messages etc. (I think one of the POSIX
committees is looking at this), the big thing to watch is that you treat
characters as 8-bit objects, and don't get clever with using the top bit
for a flag.

>How about for Asian countries, like Japan?

That's a whole new can of worms.  The problem there is that 8 bits is
not enough for the character set.  There is a standard-defined type
wchar_t, which is a sufficiently wide character type, plus a few functions
that operate on it, plus a way to write wchar_t string literals, but it's
not what you would call an elaborate set of aids to internationalization.
-- 
"The N in NFS stands for Not, |     Henry Spencer at U of Toronto Zoology
or Need, or perhaps Nightmare"| uunet!attcan!utzoo!henry henry@zoo.toronto.edu

markha@microsoft.UUCP (Mark HAHN) (03/01/90)

there is no good answer yet.  I offer a few things to be aware of:
 
CHARACTER CODING
don't make assumptions about how "character" data is coded.
once upon a time, all characters were ASCII, that is, 0 to 127.
nowadays, the minimum is 8-bit characters, with much movement
in the direction of multi-byte characters or 16-bit chars.
as far as I can see, there are no difinitive or complete standards.
(for instance: what is the format of a locale string,  are wchar_t's 
supposed to be portable, are MB chars or wchar_t's valid in file names, etc.)

PRESENTATION PREFERENCES
you also should avoid assumptions about language and country-oriented
behavior like sort order, up/down casing, date/time/number/currency formats.
to be truly virtuous, you can't even assume text directionality!

ISOLATE MESSAGES FROM CODE
keep any strings in some separate file - if nothing else,
just have them in a big array somewhere, and refer to them
using symbolic indices.  various OSs have better (?) or more elaborate
support than this - X/Open message catalogs, Mac/Windows/PM resources,
OS/2 message files.

ON THE HORIZON
there are a number of promising directions.  UniCode is one of them:
a 16-bit character set that is able to represent everything uniformly.
I don't know of any promising ideas for managing messages, though.
Internationalization is not glamorous, hence the various Unix groups
estimate 1992 for shipping international support.  Just remember that someone,
probably not the original author, will be trying to translate those messages.

Maybe the real benefit of iconic or direct-manipulation user interfaces 
is the smaller number of messages...

regards,
Mark Hahn
-- 
Mark Hahn	microsof!markha@uunet.uu.net	uunet!microsof!markha
I don't speak for Microsoft.