ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (08/22/90)
In article <1881@jura.tcom.stc.co.uk>, rmj@tcom.stc.co.uk (Rhodri James) writes: > In article <3585@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: > }For why? Internationalisation, _that's_ for why. > I cringe when I see this (unwords like "internationalisation", I mean). One uses language for the purpose of communication. In order to effect that purpose, one uses words that other people know and use, not the words one happens to like. Like it or not, "internationalise" and its derivatives are *words* in 1990s computing jargon. Perhaps Rhodri James may have a better term that is a miracle of euphony and clarity; well for heaven's sake tell us what it is *now* and let's get pushing it, for "internationalisation" bolted from its stable long ago. (By the way, there is no such word as "unword". If there were such a term, it would be "nonword". "dictcheck -pedantic") > Also I fail to see your point. Surely such #ifdef switching > as above is more efficient, simpler to maintain and more legible than > the scrabbling about with resource files you prefer? So now Cn James reads minds and knows what I prefer. Wonderful just. No, it is *not* simpler to maintain. The point of the resource file approach (not my invention by any means; no-hopers like IBM, DEC, HP, X/Open, AT&T, Apple, ... have been using it for a while and I just copied the idea and simplified it a bit for this newsgroup) is that you have all the text in one place; you don't have to go "scrabbling about" in the source files to find all the strings. You can give the resource file to a human translator who knows nothing about the programming language you are using. A minor addition to such a tool (have it generate INTEGER MSGNO PARAMETER (MSGNO=...... instead of #defines) will let you use the *same* message file with a Fortran program. Speaking as a no-hoper, I must admit that using a technique that adapts to *all* the programming languages I use, not just C, sounds like a saving. But what do I know? As for efficiency, the point is that we are talking about a scheme for generating messages for display to humans. The cost of fishing the text out of a file is (or was every time I measured it) considerably less than the cost of displaying it on the terminal. The real schemes (such as the X/Open one) identify messages by numbers, not by address in the text file. That has the disadvantage that finding the right text is a wee bit more complex (but not very; one need merely attaches a directory at the end of the file), but it has the great advantage that the program does not need to be recompiled. This means that one customer can be running the program with messages coming from the "English-speaking idiot" message file and another with messages coming from the "Spanish-speaking wizard" message file, and both can be sharing the same copy of the program without any recompilation at all. That's the way it *is* in UNIX System V Release 4. We might as well get used to thinking about messages in that way now. > Demonstrate to me a negative impact on internationalisation (ugh) and I > might believe you. Any negative impact will do, I'm not too choosy. The schemes actually used by IBM (MVS, CMS, AIX) HP (HP-UX), DEC (VMS, Ultrix), AT&T (SVR4) and others essentially add another couple of layers of indirection above what I presented. Those systems all allow you to switch languages at run time, without any recompilation. Those systems all allow you to translate message files without having any other access to the sources. They all allow many programs, and many programming languages, to share the same message files. They all allow a customer to substitute his own translation of a message file (perhaps amplifying some messages, or getting the grammar right, or ...) without access to the sources. There's four negative impacts of the #ifdef approach, just for starters. -- The taxonomy of Pleistocene equids is in a state of confusion.
cbp@icc.com (Chris Preston) (08/24/90)
In article <3603@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: >In article <1881@jura.tcom.stc.co.uk>, rmj@tcom.stc.co.uk (Rhodri James) writes: >> In article <3585@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: >> }For why? Internationalisation, _that's_ for why. > >> I cringe when I see this (unwords like "internationalisation", I mean). > >One uses language for the purpose of communication. etc., deleted. > >> Also I fail to see your point. Surely such #ifdef switching >> as above is more efficient, simpler to maintain and more legible than >> the scrabbling about with resource files you prefer? > >So now Cn James reads minds and knows what I prefer. Wonderful just. >No, it is *not* simpler to maintain. The point of the resource file >approach (not my invention by any means; no-hopers like IBM, DEC, HP, >X/Open, AT&T, Apple, ... have been using it for a while and I just >copied the idea and simplified it a bit for this newsgroup) is that >you have all the text in one place; you don't have to go "scrabbling >about" in the source files to find all the strings. You can give the >resource file to a human translator who knows nothing about the >programming language you are using. A minor addition to such a tool >(have it generate > INTEGER MSGNO > PARAMETER (MSGNO=...... >instead of #defines) will let you use the *same* message file with a >Fortran program. Speaking as a no-hoper, I must admit that using a >technique that adapts to *all* the programming languages I use, not >just C, sounds like a saving. But what do I know? Indeed, an interesting proposition. There are two immediate (I am sure the creative will have more still) ways that will work with internationalization while using labels and allow both extraction tools to work and are simple to implement and preven the repetitious use of literals and constants. Here goes: If you reaaaaaly want the text in the source section (incidentally, xscc on System V [your original example] does invoke the C preprocessor, so text substitution is absolutely not broken under MNLS, and any extractor that does not invoke the preprocessor should be considered broken) - #define DOS_DCOMM_MSG 1 #define UNIX_DCOMM_MSG 2 #define DEF_DCOMM_MSG 3 #if DOS #define DCOM_ERR_MSG DOS_DCOMM_MSG #elif UNIX #define DCOM_ERR_MSG UNIX_DCOMM_MSG #else #define DCOM_ERR_MSG DEF_DCOMM_MSG #endif #define DCOM_ERR getmsg(DCOM_ERR_MSG) /* tools.c */ char * getmsg(ErrMsg) int ErrMsg; { switch (ErrMsg){ case DOS_DCOMM_MSG: return "Run dcom.exe"; case UNIX_DCOMM_MSG: return "Datacomm not initialized, contact S/A"; case DEF_DCOMM_MSG: return "Datacomm not running"; default: return "Run for cover, they're commin' to get us"; } /* somefile.c */ int CheckDatacomm() { int RetVal; if ( (RetVal=DataCommRunning()) != 0) (void) fprintf(stderr,"%s\n",DCOM_ERR); return RetVal; } /* Makefile */ LANG = de fr sw gr neatunix: main.o somefile.o tools.o xscc -O main.o somefile.o tools.o -o neatunix @for i in $(LANG); do gencat $@.X $i.cat neatdos: main.o somefile.o tools.o xscc -O main.o somefile.o tools.o -o neatdos @dosomethingelsealtogether Another method would be to do something like the following (assuming that you are invoking the C preprocessor): #define DCOM_ERR 0 #define DRVR_ERR 1 /* etc. etc. */ char *ErrMsg[]={ #if DOS "Run dcom.com", "Run driver.com", #elif UNIX "Datacomm not initialized, contact S/A", "Driver error, contact S/A", #else "Datacomm not running", "Driver not responding", #endif }; #define MSG_ERR_DCOM ErrMsg[DCOM_ERR] #define MSG_ERR_DRVR ErrMsg[DRVR_ERR] int foo() { int Dcm, Dvr; . . . if (!Dcom()) printf("%s",MSG_ERR_DCOM); if ( SomeDriverCheck() == FAILURE) printf("%s",MSG_ERR_DRVR); . . . return somevalue_etc; } So, we have accomplished coding for purposes of internationalization, either way, we have separated string literals to a central place, and we have made the code more maintainable, since changes in messages for the environment can occure at one major juncture, and life is a cabaret. (BTW, all the above just got retyped in a max speed, so errors are surely there and to be expected, the point remains). > >As for efficiency, the point is that we are talking about a scheme for >generating messages for display to humans. The cost of fishing the text >out of a file is (or was every time I measured it) considerably less than >the cost of displaying it on the terminal. Considering the program that pays no concern for "internationalization" does not have to source anything external to it's data segment at any time other than normal operations, to say that the additional overhead is equal to or less than existing overhead is a non-sequitor. If you don't do it the cost ain't there. > >The real schemes (such as the X/Open one) identify messages by numbers, >not by address in the text file. That has the disadvantage that finding >the right text is a wee bit more complex (but not very; one need merely >attaches a directory at the end of the file), but it has the great >advantage that the program does not need to be recompiled. This means >that one customer can be running the program with messages coming from >the "English-speaking idiot" message file and another with messages >coming from the "Spanish-speaking wizard" message file, and both can be >sharing the same copy of the program without any recompilation at all. like MNLS, perhaps? > >That's the way it *is* in UNIX System V Release 4. We might as well get >used to thinking about messages in that way now. and it is not such a horrible thing. Just think, we can pop streams modules for the simple stuff, and run extractors and programs to modify the source for multibyte character sets, and use different curses libraries for right to left output. What a treasure. It has been pointed out here by several that are in the know on these things, that arguing about string literals is moot in comparison to other inherent difficulties presented by internationalization, and that the necessary crusade to "C programming practices" is long a commin'. For instance, I am told that the following is a problem in Kanji char p[10]; /* xscc provides for allowing twenty bytes as needed in Kanji */ *(p+1)='x'; /* this is the next byte, and an error */ p[n+1]='x'; /* this is the next _character_ and ok */ Given trivial differences like this, I am sure that there are many things "broken" for internationalization, and we should all prepare to cringe; however, substitution for string literals and constants is not one of them. > >> Demonstrate to me a negative impact on internationalisation (ugh) and I >> might believe you. Any negative impact will do, I'm not too choosy. > >The schemes actually used by IBM (MVS, CMS, AIX) HP (HP-UX), DEC (VMS, >Ultrix), AT&T (SVR4) and others essentially add another couple of layers >of indirection above what I presented. Those systems all allow you to >switch languages at run time, without any recompilation. Those systems >all allow you to translate message files without having any other access >to the sources. They all allow many programs, and many programming >languages, to share the same message files. They all allow a customer >to substitute his own translation of a message file (perhaps amplifying >some messages, or getting the grammar right, or ...) without access to >the sources. And still can. xscc in Unix System V (your example) does all of this for you. You need not make the resource catalogues. It is done for you. > >There's four negative impacts of the #ifdef approach, just for starters. Given the above examples, do you still feel this to be the case? I do not think so. I also believe that this shows that it is an unsafe practice to say that something cannot be done within the framework of C and the C preprocessor. cbp -------- Of course these are opinions.
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (08/26/90)
In article <1990Aug24.064203.20942@icc.com>, cbp@icc.com (Chris Preston) writes: > If you reaaaaaly want the text in the source section (incidentally, xscc on > System V [your original example] does invoke the C preprocessor No, xscc was *not* my example nor anyone else's in this thread before this. I mentioned System V Release 4, to be sure, but I did not mention xscc. How on earth is using xscc supposed to help me use the same message file for C, Pascal, Fortran, and Lisp? > so text substitution is absolutely not broken under MNLS Whoever said it was? > Another method would be to do something like the following (assuming that > you are invoking the C preprocessor): > #define DCOM_ERR 0 > #define DRVR_ERR 1 /* etc. etc. */ > char *ErrMsg[]={ > #if DOS > "Run dcom.com", > "Run driver.com", > #elif UNIX > "Datacomm not initialized, contact S/A", > "Driver error, contact S/A", > #else > "Datacomm not running", > "Driver not responding", > #endif > }; Again, this technique means that you need the sources, and that to change the messages you need access to the sources and to recompile. That was an objection validly raised against the stripped-down message file technique I posted, and it applies with greater force to this. > So, we have accomplished coding for purposes of internationalization, > either way, we have separated string literals to a central place, > and we have made the code more maintainable, since changes in messages for > the environment can occure at one major juncture, and life is a cabaret. The point of a message file is that -- the "central place" is OUTSIDE THE PROGRAM -- a message file can be got at by someone with no (other) access to sources (this is a *big* deal for developers!) -- *one* version of the object file can be shared by people using *different* message files. > >As for efficiency, the point is that we are talking about a scheme for > >generating messages for display to humans. The cost of fishing the text > >out of a file is (or was every time I measured it) considerably less than > >the cost of displaying it on the terminal. > > Considering the program that pays no concern for "internationalization" > does not have to source anything external to it's data segment at any > time other than normal operations, to say that the additional overhead is > equal to or less than existing overhead is a non-sequitor. If you > don't do it the cost ain't there. That's non-sequitUr, and this "rebuttal" is badly flawed. What I claimed was (cost of fetching message) << (cost of displaying message) Someone with measurements to disprove this can refute me (for a particular hardware/software combination) by displaying his figures. Of course, what is *really* interesting about this "rebuttal" is that in a virtual memory environment it simply isn't true. We're talking about messages here, things which are displayed at relatively infrequent (we hope!) intervals. Text, in short, which is paged OUT. In a system which supports memory- mapped files (VMS, Aegis, SunOS 4.x, AIX, ...) one could open the message file as a memory-mapped file, and then the process of fetching a message from the message file would cost no more than the process of fetching a message from a pre-initialised character array, because the two would be exactly the same process. > It has been pointed out here by several that are in the know on these > things, that arguing about string literals is moot in comparison to other > inherent difficulties presented by internationalization, and that the > necessary crusade to "C programming practices" is long a commin'. That is why, for example, ANSI C has wchar_t wcstombs() mbstowcs() mblen() and so on, and why it is set up to allow multi-byte characters in constants. > >There's four negative impacts of the #ifdef approach, just for starters. > Given the above examples, do you still feel this to be the case? Of course. Those four negative impacts still stand. > I do not think so. I also believe that this shows that it is an unsafe > practice to say that something cannot be done within the framework of C > and the C preprocessor. Again, who said _that_? Not me! That there are *better* ways to do some things than using the C preprocessor, who can challenge that? The only question is, _which_ tasks? Given that I said I would like to share message files between several programming languages, using a facility peculiar to one of them (there is no guarantee that /usr/lib/cpp will be available nor anything like it) would be rather silly, wouldn't it? A serious problem concerned with "the need to make the texts we write for the tools that count work with more than one tongue of men" (otherwise known as "internationalisation" if you have no fear of words that have more than one sound in them) is that C formats don't quite work. One common problem is that different languages put phrases in different orders. The X/Open answer to that is to have an extra piece of information in %format controls, saying which argument to use. I presume that the ANSI C committee considered that, and didn't include it because it basically needs pointers and integers to be the same size. The following suggestion is not altogether serious. But bearing in mind things like wanting to put phrases in different orders, and all sorts of things one might like to let customers configure for themselves (without having to give them *all* the sources), it might not be as crazy as it sounds. How about using TCL (Tool Command Language) for "messages"? TCL is a free "extension language" which somewhat resembles the Unix shells, and is set up to be a *small* library that can be linked into C code. When one wants to report an event, one could format the arguments of that event into strings, fetch a TCL command from a file, and execute that TCL command. It was intended to customise input to things like the editor "mx", but there's no reason it couldn't be used to customise *output*. As I say, not altogether serious. -- The taxonomy of Pleistocene equids is in a state of confusion.
cbp@icc.com (Chris Preston) (08/29/90)
In article <3617@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: >In article <1990Aug24.064203.20942@icc.com>, cbp@icc.com (Chris Preston) writes: >> If you reaaaaaly want the text in the source section (incidentally, xscc on >> System V [your original example] does invoke the C preprocessor > >No, xscc was *not* my example nor anyone else's in this thread before this. No one else in the thread before this talked about X/Open and Ansi's handling of multi-language portability on System V R4. The standard offering on System V Release 4 for doing "internationalization" _is_ MNLS. I have a fax of the price schedule from UNIX research laboratories dated June 22 in front of me. MNLS has xscc to produce message files (or catalogues) from the application as part of the compilation process. Gencat is then used to produce the message file in the applicable language as applicable. Your example was Sys V R4. This is simply part of the tools that anyone doing internationalization work is likely to do it with. Particularly, contractors for products on vendors that repackage System V R4 on their own boxes will stay with these tools for the export systems, as will third party vendors that do applications that are offered with the base system. Rolling your own extractor is less applicable here when looking at System V R4 since it has so many standards rolled into one that it is best to stick with the tools offered. Otherwise, one is likely to find that personalized extractors and resource catalogues that do might meet the same format as produced for MNLS and conform with X/Open. >I mentioned System V Release 4, to be sure, but I did not mention xscc. IMHO this is like mentioning Unix software development and not mentioning the C compiler and development system. There are certainly basic compilers, ratfor and f77, but these would not be the assumed development tools normally. >How on earth is using xscc supposed to help me use the same message >file for C, Pascal, Fortran, and Lisp? Because it produces an external message file that can be translated into multiple languages using gencat, can be modified at the customer site and is going to be about as portable to Pascal as is the "awk extracted" version that was previously offered. > >> so text substitution is absolutely not broken under MNLS > >Whoever said it was? > >> Another method would be to do something like the following (assuming that >> you are invoking the C preprocessor): This example deleted. >Again, this technique means that you need the sources, and that to >change the messages you need access to the sources and to recompile. >That was an objection validly raised against the stripped-down message >file technique I posted, and it applies with greater force to this. There were two examples, one of which was to use labels and to have the string literals in a single place in functions like geterrmsg(), getusermsg() and so forth. This was somehow deleted, but will work without difficulty when extracting the messages from the start, at under the proposed technique (awk extractor) would produce: #define USER_MSG0 0 #define USER_MSG1 1 #if DOS #define CONTINU_MSG 0 #elif UNIX #define CONTINU_MSG 1 etc. #define CONTINUE getusermsg(CONTINU_MSG) getusermsg(UserMsgNo) char * UserMsgNo; { switch (UserMsgNo){ case USER_MSG0: return ExternMsgGet(soveval); where before the awk extractor the return value was return "type any key to continue"; and the application says printf("%s\n",CONTINUE); This is, of course, an example and is subject to ones own methods and style. > My comments deleted. > >The point of a message file is that > -- the "central place" is OUTSIDE THE PROGRAM > -- a message file can be got at by someone with no (other) access to sources > (this is a *big* deal for developers!) > -- *one* version of the object file can be shared by people using > *different* message files. The point is that an intelligent use of labels can allow -- the "central place" is OUTSIDE THE PROGRAM -- a message file can be got at by someone with no (other) access to sources (which is certainly a big deal with our products) -- *one* version of the object file can be shared by people using *different* message files. -- code can hide machine dependencies for string literals and constants in label form and "internationalization" is _not_ broken. > >> >As for efficiency, the point is that we are talking about a scheme for >> >generating messages for display to humans. The cost of fishing the text >> >out of a file is (or was every time I measured it) considerably less than >> >the cost of displaying it on the terminal. >> My comments deleted. > >That's non-sequitUr, and this "rebuttal" is badly flawed. This is not a debate. As to using external message files, if the application is in a native language it would not hurt to compile a straight version without messages being extracted and use a standard tool to do the extraction for a multi-language version, for example. In such a case, the multi-language version will either spend no additional time, or some additional time "fishing" the text out of a file. Yes, if some form of memory mapping is used so that the the messages are mapped into the heap, then great, there is no difference in the speed for the multi-language version. That is the best case scenario. The worst case scenario is that it will take additional time and slow the application down. That is the worst case situation. The multi-language version will not be any faster than the native version and at best slower. That the additional delay is less than some other delay, like display time is not significant. The display of messages in English or Kanji, will occur at some point in most applications, period. Whether there is an additional overhead or no-overhead from fetching the message to display is not a valid comparison to the guaranteed display time of the message. >What I claimed was > (cost of fetching message) << (cost of displaying message) The actual point appears to be that the additional delay will only be some or none irrespective what the choice for comparison is. This is not an argument for or against having message files but rather an additional performance consideration. >Someone with measurements to disprove this can refute me (for a particular various explanation about virtual mapping deleted. my comments about additional concerns in programming practises for internationalization deleted. > >That is why, for example, ANSI C has > wchar_t > wcstombs() > mbstowcs() > mblen() >and so on, and why it is set up to allow multi-byte characters in >constants. And why much code must be rewritten using the Ansi standard. By the same token, a great deal of development is done with non-Ansi compliant compilers because that is what is native to the system and that is what the prime contractor requires be used in the application development (witness open desktop). It is, therefore, not just a matter of "well lets just use gcc or buy some Ansi compliant compiler with the appropriate libraries." It is like talking about using posix compliant system calls only on an earlier release of System V that is not posix compliant. > >> >There's four negative impacts of the #ifdef approach, just for starters. > >> Given the above examples, do you still feel this to be the case? > >Of course. Those four negative impacts still stand. The example that you deleted would negate all for of thee negative impacts. > My comments deleted > >Again, who said _that_? Not me! That there are *better* ways to do some >things than using the C preprocessor, who can challenge that? The only >question is, _which_ tasks? Given that I said I would like to share >message files between several programming languages, using a facility >peculiar to one of them (there is no guarantee that /usr/lib/cpp will be To anticpate a version of C that does not perform a preprocessing stage is an interesting prospect. To anticipate using other languages as a programming consideration when coding in C is probably the beyond the bounds of this newsgroup and not likely to be the concern of those whose applications are done completely in C. It is oftentimes chosen (like here) _because_ of its intermachine portability. Your examples and the drift of discussion indicates that portability to Pascal, Fortran and Lisp is worthwhile. Perhaps this is true in some cases, but it is just as applicable to rely on labeling for literals and constants in order to port the same C code, which is what we do here. >available nor anything like it) would be rather silly, wouldn't it? For something other than C, yes. Given the newsgroup, no. > Various deleted about Ansi and X/Open. > > Comment about TCL deleted. In summary, using labels for string literals is a good thing, and can be done without "breaking internationalization" as was previously suggested. cbp ------ Recent conversation between Kurt Waldheim and Saddam Hussein: "Saddam, I *knew* Hitler, and believe me, you're no Adolf Hitler."
cbp@icc.com (Chris Preston) (08/29/90)
In article <1990Aug29.043513.19715@icc.com> cbp@icc.com (Chris Preston) writes: That's me wrote: > from UNIX research laboratories dated June 22 in front of me. That's actually, UNIX System Laboratories, Inc. (Small Print) A Subsidiary of AT&T cbp
rmj@tcom.stc.co.uk (Rhodri James) (08/30/90)
In article <3603@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes about me writing about him writing: >> }For why? Internationalisation, _that's_ for why. > >> I cringe when I see this (unwords like "internationalisation", I mean). > >One uses language for the purpose of communication. My point exactly. "Internationalisation" communicated absolutely nothing to me for several minutes, even given the example. Me, I'd prefer to call it "language switching" or something a tad more obvious like that, but the potential confusion *that* could cause is enormous. So I guess I'll have to lump it. >In order to effect that purpose, one uses words that other people know >and use, not the words one happens to like. True. See above. Although a counterexample has just sprung to mind - "program". >Like it or not, >"internationalise" and its derivatives are *words* in 1990s computing >jargon. I had never seen or heard the word prior to this thread. Whether this means I am not up on the jargon, or the jargon isn't nearly as international as it would like to think, I don't know. >(By the way, there is no such word as "unword". If >there were such a term, it would be "nonword". "dictcheck -pedantic") Oh good, my attempt to get into this style of linguistic evolution worked. :-) >> Also I fail to see your point. Surely such #ifdef switching >> as above is more efficient, simpler to maintain and more legible than >> the scrabbling about with resource files you prefer? > >So now Cn James reads minds and knows what I prefer. Wonderful just. Cn? Oh, Citizen. Sorry, Pr O'Keefe. (Both the above lines are ad hominem and ought to be ignored, but are much more fun this way). >[Sundry bits of info and arguments that are actually useful] OK. You've convinced me. For programs requiring multi-linguistic output and input of medium or greater complexity (or any requiring run-time switching), the resource file approach wins. Personally, it'll still take me a long time to give up #ifdeffing, as I know I can maintain that and I have an aversion to complicating preprocessing (it just doesn't feel right), but that's just me. Mind you, arguing that "this is the way System V does it, so get used to it" nearly lost you my sympathy. How Unix of any sort has become the dominant operating system is beyond me, it's not as if it's actually very good or anything :-\ -- * Windsinger * "Nothing is forgotten..." * rmj@islay.tcom.stc.co.uk * Mike Whitaker * or (occasionally) * "...except sometimes the words" * rmj10@phx.cam.ac.uk * Phil Allcock
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (08/30/90)
In article <1911@islay.tcom.stc.co.uk>, rmj@tcom.stc.co.uk (Rhodri James) writes: > Mind you, arguing that "this is the way System V does it, so get used to > it" nearly lost you my sympathy. It wasn't *supposed* to *keep* your sympathy. There are a lot of things in System V Release 4 I don't particularly care for (having gone to the trouble of learning how X/Open handles internationalisation, I didn't really appreciate discovering that the new Official way of doing it was different, and while the TLI routines may perhaps be a considerable improvement on sockets, I have yet to find anything which explains them clearly enough for me to use them). The point I was making is that *customers* are going to expect SVR4 programs to behave in a particular way. SVR4 has a convention for generating multi-line error messages (SVR4 is an adventure; if you win you find you're playing VMS), and it has lots of features for "locale" support (if you want the "C" word rather than the "UNIX" word, which has been current for, oh, at least 3 years). In a couple of years time, customers are going to expect programs to follow the UNIX Way, just as Macintosh customers expect Mac programs to follow the Mac Way. So we had better get used to it if we want to produce programs that the next decade's UNIX customers will continue to be willing to buy. By the way, "language switching" would NOT be an appropriate replacement for the word "internationalisation" because the latter covers rather more. Wales and the USA both use English (Wales also uses Welsh and the US also uses Spanish). But they don't represent dates the same way, and they don't use the same symbols for currency. Internationalisation refers to collating order, date and time representation, currency representation, and a couple of other things I forget as well as the language that messages are displayed in. A program portable between different locales will _not_, for example, assume that everyone has a three-part name, a common US-ism. > How Unix of any sort has become the > dominant operating system is beyond me, it's not as if it's actually > very good or anything :-\ "Democracy is the worst possible political system, except for all the others." Unix hasn't succeeded by being particularly good, but by not being excruciatingly bad (unlike xx-xxx, xx/xxx, xxx-xx, xxx, or xxx -- names changed to protect _me_). -- You can lie with statistics ... but not to a statistician.
domo@tsa.co.uk (Dominic Dunlop) (08/30/90)
Ho hum. Been vaguely following this thread, basking in its heat and peering into its dim light. But I'm not about to fight those individual smoky flames. I'll just make these points: 1. While ``internationalization'' (abbreviated by the cognoscenti to the cute, opaque and finger-wear saving ``i18n'' because it has 20 letters) is a nasty construction, it is an accepted term. 2. As well as being a nasty neologism, internationalization is a misnomer: you can use the facilities it confers, even if your market is confined entirely to -- say -- Lake Wobegon. It allows you easily to have different sets of prompts and messages for Catholic and Lutheran users. Or kids and their parents. It lets you display and search for Swedish characters if you need to in order to satisfy the needs of some of your users. It deals with timezone issues when you want to catch that plane to Florida or meet that flight from south-east Asia... (Barely resisted temptation to cross-post to alt.wobegon...) 3. The international information technology standards community is rather belatedly taking an interest in internationalization. For example, ISO's working group on POSIX has established a ``Rapporteur Group on Internationalization''. (And note that ISO spells the word with a Z: the organization follows the guidelines of the Oxford English Dictionary, widely ignored in Britain, which favour ``ize'' over ``ise'' in all but a few cases.) 4. I say ``belatedly'' because it has taken the standardizers a long time to realise that their products should be of equal utility to all users, independent of the language that they use to communicate, or the means that they use to represent it. Efforts to sort out just the lowest level of this problem -- that of character sets -- have been continuing for years, and look set to drag on for years more. By the time you get to computer languages, you have, until very recently, pretty much been expected to be speaking English. (POSIX is, for largely political reasons, is treated as a computer language by ISO.) 5. To a first approximation, the internationalization work of two organizations forms the basis of moves towards international standards in the area of C and POSIX. These organizations are X/Open and UniForum (formerly /usr/group). The published POSIX and C standards (ANSI/IEEE Std.1003.:1988 and ANSI Std. X3.159-1989 respectively) currently embody fairly minimal internationalization features. Future revisions will have more to say on the topic. 1003.2, the forthcoming shell and tools standard, has a great deal to say on the issue. X3J16, the newly-formed C++ standards working group, has internationalization among its fundamental requirements. 6. There's little literature on the topic. Here's what I know of: - Volume 3 of the X/Open Portability Guide, issue 3 (XSI Supplementary Definitions, Prentice Hall, 1989, ISBN 0-13-685830-3) defines X/Open's proposals. If you purchase a system with the X/Open XPG3 brand, you should get an implementation of this stuff. (Internationalization is part of ``base'' brand requirements; you don't even need the more comprehensive ``plus'' brand.) The problem with the XPG is that it's a definition, not a user's guide: you have to figure out how on earth to hang all that stuff together and make it work for you. - UniForum has published a white paper on internationalization. It presents a good technical background to the topic, although it's inclined to rush off into neat details at the slightest provocation. For copies, contact UniForum at 2901 Tasman Drive, #201, Santa Clara CA 95054, U.S.A., phone +1 408 986 8840, fax +1 408 986 1645 (sorry, no email address to hand). If there's a UniForum affiliate in your country, they may have copies too. (If the document IS NOT freely available, could somebody please post a correction!) There's a fair degree of commonality between UniForum and X/Open, particularly in the area of regular expressions, where (simplifying somewhat) essentially the same people were involved on behalf of both organizations. Implementations of the remainder of the UniForum proposals are not widely distributed or easy to get hold of. - As part of the ISO POSIX watchdog work I do under the sponsorship of EUUG and USENIX, I have written two articles concerned with internationalization: ``Report on ISO/IEC JTC1/SC22/WG15 Rapporteur Group on Internationalization Meeting of 5th - 7th March, 1990, Copenhagen, Denmark'', and ``International Standardization -- An informal view of the formal structures as they apply to POSIX internationalization''. Both appeared in ;login 15:3 (May/June 1990). They were also published in the EUUG Newsletter (10:2 and 10:1 respectively -- summer and spring 1990). And the report was posted to comp.std.unix on 14th March. (Although I regret it's missing from my archive, so I can't quote a reference.) But, if you can't put your hands on the documents in those places, mail me and I'll send copies. - A forthcoming book, Open Systems: A Business Strategy for the Nineties, by Dr. Pamela Gray (McGraw Hill, late 1990) presents the business case for internationalization, along with technical background (written by yours truly). 7. A fundamental concept in internationalization is that it is part of a two-step process. An application which is internationalized is independent of any cultural bias. It's also useless. In order that anybody can use it, a cultural bias of their choice has to be added. This process is called localization. (The abbreviation ``l10n'' is not widely used.) The benefit of the two-step approach is that the first step needs only to be done once, and makes (or should make) the method of carrying out the second step reasonably obvious. (I know from experience that replacing with another bias the cultural bias inherent in an uninternationalized application is a debilitating and expensive process. Worse, it has to be repeated essentially in full for each new bias (market) that is desired.) 8. The economics of the two-step approach merit some study, but I have yet to see any analysis of the subject. It seems to me that the cost of using internationalization features in an application, followed by localizing to its first market, is likely to be higher than that of hardwiring support for a single market. This is particularly true now, when knowledge of the techniques is not widespread, and programmers have to be retrained before they can apply them. (Finding programmers able to take the mental step back needed to identify those aspects of an application dependent on cultural considerations is also likely to be a problem.) Only if and when second and subsequent localizations are carried out does the payback begin, both it terms of reduced conversion costs, and (probably) in support costs which are relatively lower than those for radically hacked versions of an initial non-internationalized application. 9. A further attraction of the technology under development (it is too early to describe it as mature, although it's approaching puberty) is that it should allow non-programmers to perform the localization step. Now and in the past, when adaptation of an application for a new market has typically involved heavy-duty hacking, those best qualified to describe the new culture -- natives educated in the humanities, and not working for the original developer -- have typically been barred from involvement both because they are not programmers and because the original software author is either unwilling to relinquish any element of control of the source code, or because the licensing fee demanded for the source is greater than can be recouped from the new target market. In theory then, internationalization should make markets more open by reducing economic and technical barriers to the movement of software between cultures. 10. Too bad the concepts are not widely known or understood. But we're working on it... -- Dominic Dunlop
gwyn@smoke.BRL.MIL (Doug Gwyn) (08/31/90)
In article <1990Aug30.115608.3729@tsa.co.uk> domo@tsa.co.uk (Dominic Dunlop) writes: > 5. To a first approximation, the internationalization work of two > organizations forms the basis of moves towards international > standards in the area of C and POSIX. These organizations are > X/Open and UniForum (formerly /usr/group). Actually, while these organizations have influenced UNIX, the C hooks for internationalization were hammered out specially in unofficial working groups of interested parties and are for the most part not based on previously published specifications. > The published POSIX and C standards (ANSI/IEEE Std.1003.:1988 and > ANSI Std. X3.159-1989 respectively) currently embody fairly minimal > internationalization features. Future revisions will have more to > say on the topic. Not necessarily true. The ISO C standard may eventually have an addendum that specifies additional internationization-related features, however.
userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) (08/31/90)
Regarding the arguments over whether or not "internationalization" is a valid word, I have concluded that there are two points of view, which could generally be classified as "internationalizationalism" and "anti-internationalizationalism". -------------------+------------------------------------------- Alastair Dunbar | Edmonton: a great place, but... Edmonton, Alberta | before Gretzky trade: "City of Champions" CANADA | after Gretzky trade: "City of Champignons" -------------------+-------------------------------------------
dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (09/01/90)
In <1302@mts.ucs.UAlberta.CA> userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) writes:
Regarding the arguments over whether or not "internationalization"
is a valid word, I have concluded that there are two points of
view, which could generally be classified as
"internationalizationalism" and "anti-internationalizationalism".
Other analogous dichotomies
exist. For example, there is
"justification", "right justification",
and "anti - right justification".
--
Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP: oliveb!cirrusl!dhesi
userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) (09/05/90)
In article <2349@cirrusl.UUCP>, dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes: >In <1302@mts.ucs.UAlberta.CA> userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) writes: > > Regarding the arguments over whether or not "internationalization" > is a valid word, I have concluded that there are two points of > view, which could generally be classified as > "internationalizationalism" and "anti-internationalizationalism". > >Other analogous dichotomies >exist. For example, there is >"justification", "right justification", >and "anti - right justification". >-- >Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com> >UUCP: oliveb!cirrusl!dhesi I am left wondering how you justify your comments. I use an editor (to justify my comments) that _attempts_ to minimize whitespace where possible, but this is clearly not easy now that words of the length of "internationalizationalism" (sic) are becoming the norm (or normalizationalism). Faulty though it may be, it would never produce as anti-aesthetic a paragraph as yours above: Other analogous dichotomies exist. For example, there is "justification", "right justification", and "anti - right justification". -------------------+------------------------------------------- Alastair Dunbar | Edmonton: a great place, but... Edmonton, Alberta | before Gretzky trade: "City of Champions" CANADA | after Gretzky trade: "City of Champignons" -------------------+------------------------------------------- #! r