mark@cbosgd.ATT.COM (Mark Horton) (01/08/87)
I've been looking at the new ANSI C standard, and there's one feature I just don't understand: the locale. The feature seems intended to help port programs to environments with different character sets, such as Europe (with their umlouts, accents, etc) and Japan with their 16 bit picture characters. I don't, however, see how the feature, as defined, meets this goal. It seems to be very nice if a program wants to explicitly say that it conforms to the German rules (say), and that by default everything conforms to the C (e.g. USA) rules. But I don't see how it helps a program conform to whatever country it happens to be running in, without recompiling it. If I'm misunderstanding the intent, can someone please set me straight? Show me an example of how this is supposed to be used, and what it buys you. While we're on the subject, I have another question. In German, for example, the lower case ess-tset letter has no single character upper case equivalent, and is supposed to be mapped into "SS" in upper case. (There are other languages with similar mappings.) What is the toupper function supposed to do when presented with an ess-tset? Wouldn't a string-to-string mapping function similar to strupr be more portable? Mark
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/09/87)
The concept of locale in itself doesn't solve character set problems such as Spanish ch, German ss, or Asian (16-bit) codes. However, it does provide the necessary hook to write applications that operate either in an environment-independent manner (as system software generally should) or in a locally-customized environment (as human interface software normally should). There is a string-to-string mapping strxfrm (formerly strcoll) for the purpose of turning a "natural native" character string into something amenable to collation. This mechanism doesn't solve the general multi-byte character problem; a presentation of the issues involved there (and hopefully a solution) are planned for the next X3J11 meeting.
corwin@hope.UUCP (John Kempf) (01/11/87)
> While we're on the subject, I have another question. In German, for > example, the lower case ess-tset letter has no single character upper > case equivalent, and is supposed to be mapped into "SS" in upper case. > (There are other languages with similar mappings.) What is the toupper > function supposed to do when presented with an ess-tset? Wouldn't a > string-to-string mapping function similar to strupr be more portable? > It has been a while since I last had a german class, but isn't the ess-tset character equivilant to 'ss' (or was that 'sz')? Wouldn't it make more sense to leave the toupper 'function' as is, and create a different function for local mapping? or possible have toupper, when faced with an ess-tset return 'S'? string to string might be more portable, but it is often more than is needed. On this machine (VAX11/750, 4.3BSD) toupper is implemented as a macro. If toupper were removed, a lot of code would break, and a lot of excess overhead would be entailed (function vs. macro). A string conversion function might be usefull in addition tho. -- -cory 'My ancestors are sorry about yours' UUCP: ucbvax!ucdavis!ucrmath!hope!corwin ARPA: ucdavis!ucrmath!hope!corwin@lll-crg.ARPA
MRC%PANDA@sumex-aim.stanford.edu (Mark Crispin) (01/13/87)
If by "Asian (16-bit) codes" you are referring to the Japanese Industrial Standard (JIS) character set, this is a 14-bit character set and not a 16 bit one. Also, it only uses the code values which have printable representations in ASCII. That is, the lowest value of either byte is 21h and the highest value is 7Eh. The first JIS character, a blank, is therefore 2121h. The last JIS character is 717Eh and is a level 2 kanji ("level 1" are the commonly-used chinese characters (kanji), "level 2" are much more rare and most native Japanese only know a few). There are holes in the character set as well. How are shift-in and shift-out effected? I'm aware of at least 5 ways this can be done! -------