mark@cbosgd.ATT.COM (Mark Horton) (01/08/87)
I've been looking at the new ANSI C standard, and there's one feature I just don't understand: the locale. The feature seems intended to help port programs to environments with different character sets, such as Europe (with their umlouts, accents, etc) and Japan with their 16 bit picture characters. I don't, however, see how the feature, as defined, meets this goal. It seems to be very nice if a program wants to explicitly say that it conforms to the German rules (say), and that by default everything conforms to the C (e.g. USA) rules. But I don't see how it helps a program conform to whatever country it happens to be running in, without recompiling it. If I'm misunderstanding the intent, can someone please set me straight? Show me an example of how this is supposed to be used, and what it buys you. While we're on the subject, I have another question. In German, for example, the lower case ess-tset letter has no single character upper case equivalent, and is supposed to be mapped into "SS" in upper case. (There are other languages with similar mappings.) What is the toupper function supposed to do when presented with an ess-tset? Wouldn't a string-to-string mapping function similar to strupr be more portable? Mark
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/09/87)
The concept of locale in itself doesn't solve character set problems such as Spanish ch, German ss, or Asian (16-bit) codes. However, it does provide the necessary hook to write applications that operate either in an environment-independent manner (as system software generally should) or in a locally-customized environment (as human interface software normally should). There is a string-to-string mapping strxfrm (formerly strcoll) for the purpose of turning a "natural native" character string into something amenable to collation. This mechanism doesn't solve the general multi-byte character problem; a presentation of the issues involved there (and hopefully a solution) are planned for the next X3J11 meeting.