[comp.lang.c] locales

mark@cbosgd.ATT.COM (Mark Horton) (01/08/87)

I've been looking at the new ANSI C standard, and there's one
feature I just don't understand: the locale.  The feature seems
intended to help port programs to environments with different
character sets, such as Europe (with their umlouts, accents, etc)
and Japan with their 16 bit picture characters.

I don't, however, see how the feature, as defined, meets this goal.
It seems to be very nice if a program wants to explicitly say that
it conforms to the German rules (say), and that by default everything
conforms to the C (e.g. USA) rules.  But I don't see how it helps a
program conform to whatever country it happens to be running in,
without recompiling it.

If I'm misunderstanding the intent, can someone please set me straight?
Show me an example of how this is supposed to be used, and what it buys you.

While we're on the subject, I have another question.  In German, for
example, the lower case ess-tset letter has no single character upper
case equivalent, and is supposed to be mapped into "SS" in upper case.
(There are other languages with similar mappings.)  What is the toupper
function supposed to do when presented with an ess-tset?  Wouldn't a
string-to-string mapping function similar to strupr be more portable?

	Mark

gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/09/87)

The concept of locale in itself doesn't solve character set problems
such as Spanish ch, German ss, or Asian (16-bit) codes.  However, it
does provide the necessary hook to write applications that operate
either in an environment-independent manner (as system software
generally should) or in a locally-customized environment (as human
interface software normally should).

There is a string-to-string mapping strxfrm (formerly strcoll) for
the purpose of turning a "natural native" character string into
something amenable to collation.  This mechanism doesn't solve the
general multi-byte character problem; a presentation of the issues
involved there (and hopefully a solution) are planned for the next
X3J11 meeting.

corwin@hope.UUCP (John Kempf) (01/11/87)

> While we're on the subject, I have another question.  In German, for
> example, the lower case ess-tset letter has no single character upper
> case equivalent, and is supposed to be mapped into "SS" in upper case.
> (There are other languages with similar mappings.)  What is the toupper
> function supposed to do when presented with an ess-tset?  Wouldn't a
> string-to-string mapping function similar to strupr be more portable?
> 

It has been a while since I last had a german class, but isn't the
ess-tset character equivilant to 'ss' (or was that 'sz')?  Wouldn't it
make more sense to leave the toupper 'function' as is, and create a
different function for local mapping?  or possible have toupper, when
faced with an ess-tset return 'S'?  

string to string might be more portable, but it is often more than is
needed.  On this machine (VAX11/750, 4.3BSD) toupper is implemented as a
macro.  If toupper were removed, a lot of code would break, and a lot of
excess overhead would be entailed (function vs. macro).  A string
conversion function might be usefull in addition tho.
-- 
-cory

'My ancestors are sorry about yours'

UUCP:   ucbvax!ucdavis!ucrmath!hope!corwin
ARPA:   ucdavis!ucrmath!hope!corwin@lll-crg.ARPA

MRC%PANDA@sumex-aim.stanford.edu (Mark Crispin) (01/13/87)

     If by "Asian (16-bit) codes" you are referring to the Japanese
Industrial Standard (JIS) character set, this is a 14-bit character
set and not a 16 bit one.  Also, it only uses the code values which
have printable representations in ASCII.  That is, the lowest value
of either byte is 21h and the highest value is 7Eh.  The first JIS
character, a blank, is therefore 2121h.  The last JIS character is
717Eh and is a level 2 kanji ("level 1" are the commonly-used chinese
characters (kanji), "level 2" are much more rare and most native
Japanese only know a few).  There are holes in the character set as
well.

     How are shift-in and shift-out effected?  I'm aware of at least
5 ways this can be done!
-------