[comp.lang.c] Internationalisation, setlocale

karl@haddock.ima.isc.com (Karl Heuer) (05/02/90)

In article <11071@cbmvax.commodore.com> valentin@cbmvax (Valentin Pepelea) writes: [paraphrased --kwzh]
>[How should locale information be organized?  The monetary information is
>usually specific to a country, while the collating information is specific to
>a language.  A country may have multiple languages, or a language may span
>multiple countries.]

Seems like the locale name ought to mention both the country and the language,
e.g. "usa-english".  There would be ample opportunity for the data to be
linked%: usa-english/LC_COLLATE could be the same as uk-english/LC_COLLATE and
can-english/LC_COLLATE, and likewise can-english/LC_MONETARY could be linked
to can-french/LC_MONETARY.

It would also be reasonable to support incompletely defined locales, e.g.
"english" could be a valid local name when used in conjunction with LC_COLLATE
but invalid for LC_MONETARY (and hence invalid for LC_ALL).

Karl W. Z. Heuer (karl@ima.ima.isc.com or harvard!ima!karl), The Walking Lint
________
% The likely UNIX implementation is as a bunch of directories with cross-
  linked files.  An alternate scheme, less specific to quirks of UNIX, is to
  have a single index file where a key like "usa-english/LC_COLLATE" is paired
  with a file name containing the data.  I mention this to demonstrate that my
  use of the word "link" need not imply a property of the filesystem.

news@OSF.ORG (USENET News System) (05/03/90)

From: martin@osf.osf.org (Sandra Martin)
Path: osf!martin

You're right that the examples are confusing, and not entirely appropriate.
The problem is that there are no current standards for locale names or for
the way locale information should be organized. Most implementations that I
know of use some form of the X/Open naming recommendation which consists of
three parts:

	language_territory.codeset

At this point, however, there is no agreement about the contents of the
individual parts. For example, some implementations might use "long" country
names for the the territory segment (e.g., canada, germany), while others use
abbreviations (can, ger). Still others use the nationality rather than the
country name (e.g., using "swiss" rather than "switzerland"). There are many,
many other examples of different approaches.

As for your question about how the locales should be organized, again, it isn't
standardized, and so depends on the implementation. There are two fairly popular
approaches: flat and tiered. With the flat approach, information is stored 
something like this

	.../locale/<locale_name>/<locale_related_file(s)>

With the tiered approach, information is stored something like this:

	.../locale/<language>/<territory>/<codeset>/<locale_related_file(s)>

In the tiered approach, the territory and codeset directories are optional
and therefore might not exist. 

You noted that some locale-related info is language-specific, while other 
info is country-specific. Notice that neither the flat nor tiered approach
makes these kinds of distinctions. Some implementations do have separate
files for language- and country-specific info, but they store them together
in the same directory.

Confused? I wouldn't be surprised if you were. I've thought for a long time
that it would be a good idea to have some standards for locale names, but
have been voted down in a couple of different groups. However, lately 
there have been some rumblings about the confusion inherent in the current
chaotic system, so we may see some standards soon. Standards for the
organization of locale info also would be helpful.

Hope this helps.

		-- Sandra Martin
		Open Software Foundation
		email: martin@osf.org
		tel: (617) 621-8707

goudreau@larrybud.rtp.dg.com (Bob Goudreau) (05/04/90)

In article <7513@paperboy.OSF.ORG>, martin@osf.osf.org (Sandra Martin) writes:
> 
> The problem is that there are no current standards for locale names or for
> the way locale information should be organized. Most implementations that I
> know of use some form of the X/Open naming recommendation which consists of
> three parts:
> 
> 	language_territory.codeset
> 
> At this point, however, there is no agreement about the contents of the
> individual parts. For example, some implementations might use "long" country
> names for the the territory segment (e.g., canada, germany), while others use
> abbreviations (can, ger). Still others use the nationality rather than the
> country name (e.g., using "swiss" rather than "switzerland"). There are many,
> many other examples of different approaches.
>
> ....
> 
> Confused? I wouldn't be surprised if you were. I've thought for a long time
> that it would be a good idea to have some standards for locale names, but
> have been voted down in a couple of different groups. However, lately 
> there have been some rumblings about the confusion inherent in the current
> chaotic system, so we may see some standards soon. Standards for the
> organization of locale info also would be helpful.

And to open a whole separate can of worms, what about the different
ways to name a country or a language?  E.g., "germany" vs. "deutschland",
or "English" vs. "Englisch" vs. "Anglais" vs. "Ingles", etc.  It would
appear to be necessary to introduce many separate standards of locale
names (each naming all locales), one for each language locale!  Of course,
the character set(s) used to form such names is yet another problem....

------------------------------------------------------------------------
Bob Goudreau				+1 919 248 6231
Data General Corporation
62 Alexander Drive			goudreau@dg-rtp.dg.com
Research Triangle Park, NC  27709	...!mcnc!rti!xyzzy!goudreau
USA

Bob.Stout@p6.f506.n106.z1.fidonet.org (Bob Stout) (05/04/90)

  Perhaps this is why ANSI uses the category as the first argument to  
setlocale(). In my implementation, you could simulate a Quebec locale by  
calling setlocale(LC_ALL, "USA"); setlocale(LC_TIME, "FRANCE");. Once set this  
way, retrieving the locale using localeconv() would  fetch a locale that  
looked like American English, but with the week days and months, etc. in  
French. (Yeah, I know that Quebec is more complicated than that - I merely  
used it as an example. I also support non-integer (second-specified) time  
zones, and other oddball stuff folks requested from various parts of the  
world.) 

barr@frog.UUCP (Chris Barr) (05/05/90)

In article <11071@cbmvax.commodore.com>, valentin@cbmvax.commodore.com (Valentin Pepelea) writes:

> The ANSI C function setlocale() allows the programmer to set the locale to
> be used in localised functions. As examples we are given
> 
> /usr/lib/locale/german/LC_MESSAGES/		contains message catalogues
>                       /LC_COLLATE               collation (sorting) information
>                       /LC_TIME                  time & date information
>                       /LC_NUMERIC               number format infomation
>                       /LC_MONETARY              monetary symbol & format info
> 
> But this is rather confusing. While messages and collation information varies
> according to language, time format and monetary information is country specific.
> So how are locale directories supposed to be organised?

Name directories for BOTH country and language.  
Files which are the same for different 'locales' might be linked, e.g. messages 
in switz_french & canada_french.
e.g.:
 /usr/lib/locale/switz_german/
 /usr/lib/locale/switz_french/
 /usr/lib/locale/canada_french/
 /usr/lib/locale/canada_english/

meissner@osf.org (Michael Meissner) (05/07/90)

In article <14535@frog.UUCP> barr@frog.UUCP (Chris Barr) writes:

| In article <11071@cbmvax.commodore.com>, valentin@cbmvax.commodore.com (Valentin Pepelea) writes:
| 
| > The ANSI C function setlocale() allows the programmer to set the locale to
| > be used in localised functions. As examples we are given
| > 
| > /usr/lib/locale/german/LC_MESSAGES/		contains message catalogues
| >                       /LC_COLLATE               collation (sorting) information
| >                       /LC_TIME                  time & date information
| >                       /LC_NUMERIC               number format infomation
| >                       /LC_MONETARY              monetary symbol & format info
| > 
| > But this is rather confusing. While messages and collation information varies
| > according to language, time format and monetary information is country specific.
| > So how are locale directories supposed to be organised?
| 
| Name directories for BOTH country and language.  
| Files which are the same for different 'locales' might be linked, e.g. messages 
| in switz_french & canada_french.
| e.g.:
|  /usr/lib/locale/switz_german/
|  /usr/lib/locale/switz_french/
|  /usr/lib/locale/canada_french/
|  /usr/lib/locale/canada_english/

Nothing in the locale stuff mandates that a locale be a country,
place, or what have you (though that's how it mostly will be used).
For example, you could have a locale that is used for sorting things
in American Library Order (case insignificant, Mc and Mac at the
beginning of words are considered the same, insignificant words like
'the' not counting in collation), etc.

--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA

Catproof is an oxymoron, Childproof is nearly so

morten@modulex.dk (Morten Hastrup) (05/09/90)

barr@frog.UUCP (Chris Barr) writes:

>In article <11071@cbmvax.commodore.com>, valentin@cbmvax.commodore.com (Valentin Pepelea) writes:

>> The ANSI C function setlocale() allows the programmer to set the locale to
>> be used in localised functions. As examples we are given
>> 
>> /usr/lib/locale/german/LC_MESSAGES/		contains message catalogues
>>                       /LC_COLLATE               collation (sorting) information
>>                       /LC_TIME                  time & date information
>>                       /LC_NUMERIC               number format infomation
>>                       /LC_MONETARY              monetary symbol & format info
>> 
>> But this is rather confusing. While messages and collation information varies
>> according to language, time format and monetary information is country specific.
>> So how are locale directories supposed to be organised?

>Name directories for BOTH country and language.  
>Files which are the same for different 'locales' might be linked, e.g. messages 
>in switz_french & canada_french.
>e.g.:
> /usr/lib/locale/switz_german/
> /usr/lib/locale/switz_french/
> /usr/lib/locale/canada_french/
> /usr/lib/locale/canada_english/

You might be right, but why organise it this way when ther is no recommenda-
tion on this field. I have tryed to find some in X/OPEN Portability Guide.
Besides, other companies/people ( Digital f.x. ) organise the locales this way:

	/usr/lib/intln/646/ENG_GB.646	/* English ISO646 */
	                  /GER_DE.646	/* German ISO646 */
	/usr/lib/intln/8859/ENG_GB.8859	/* English ISO8859-1 */
                           /GER_DE.8859	/* German ISO8859-1 */

They also use the environment variable INTLINFO to specify this directory-
structure (e.i. INTLINFO = /usr/lib/intln/%c/%L). The dafault path is 
/usr/lib/intln.

I could not find INTLINFO in X/OPEN, so I would like to hear from other about
simular variables (And of course YOUR opinion on this field overall).

How do you avoid comflicts between your own locale and locales that belongs to
another application (I know ideel that they should be the same, but you never
know).

--
Morten Hastrup			<morten@modulex.dk>
A/S MODULEX			Phone:    +45 44 53 30 11
Lyskaer 15			Telefax:  +45 44 53 30 74
DK-2730 Herlev
Denmark