[comp.lang.c] Internationalisation

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (08/22/90)

In article <1881@jura.tcom.stc.co.uk>, rmj@tcom.stc.co.uk (Rhodri James) writes:
> In article <3585@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
> }For why?  Internationalisation, _that's_ for why.

> I cringe when I see this (unwords like "internationalisation", I mean).

One uses language for the purpose of communication.
In order to effect that purpose, one uses words that other people know
and use, not the words one happens to like.  Like it or not,
"internationalise" and its derivatives are *words* in 1990s computing
jargon.  Perhaps Rhodri James may have a better term that is a miracle
of euphony and clarity; well for heaven's sake tell us what it is *now*
and let's get pushing it, for "internationalisation" bolted from its
stable long ago.  (By the way, there is no such word as "unword".  If
there were such a term, it would be "nonword".  "dictcheck -pedantic")

> Also I fail to see your point. Surely such #ifdef switching
> as above is more efficient, simpler to maintain and more legible than
> the scrabbling about with resource files you prefer?

So now Cn James reads minds and knows what I prefer.  Wonderful just.
No, it is *not* simpler to maintain.  The point of the resource file
approach (not my invention by any means; no-hopers like IBM, DEC, HP,
X/Open, AT&T, Apple, ... have been using it for a while and I just
copied the idea and simplified it a bit for this newsgroup) is that
you have all the text in one place; you don't have to go "scrabbling
about" in the source files to find all the strings.  You can give the
resource file to a human translator who knows nothing about the
programming language you are using.  A minor addition to such a tool
(have it generate
	INTEGER MSGNO
	PARAMETER (MSGNO=......
instead of #defines) will let you use the *same* message file with a
Fortran program.  Speaking as a no-hoper, I must admit that using a
technique that adapts to *all* the programming languages I use, not
just C, sounds like a saving.  But what do I know?

As for efficiency, the point is that we are talking about a scheme for
generating messages for display to humans.  The cost of fishing the text
out of a file is (or was every time I measured it) considerably less than
the cost of displaying it on the terminal.

The real schemes (such as the X/Open one) identify messages by numbers,
not by address in the text file.  That has the disadvantage that finding
the right text is a wee bit more complex (but not very; one need merely
attaches a directory at the end of the file), but it has the great
advantage that the program does not need to be recompiled.  This means
that one customer can be running the program with messages coming from
the "English-speaking idiot" message file and another with messages
coming from the "Spanish-speaking wizard" message file, and both can be
sharing the same copy of the program without any recompilation at all.

That's the way it *is* in UNIX System V Release 4.  We might as well get
used to thinking about messages in that way now.

> Demonstrate to me a negative impact on internationalisation (ugh) and I
> might believe you.  Any negative impact will do, I'm not too choosy.

The schemes actually used by IBM (MVS, CMS, AIX) HP (HP-UX), DEC (VMS,
Ultrix), AT&T (SVR4) and others essentially add another couple of layers
of indirection above what I presented.  Those systems all allow you to
switch languages at run time, without any recompilation.  Those systems
all allow you to translate message files without having any other access
to the sources.  They all allow many programs, and many programming
languages, to share the same message files.  They all allow a customer
to substitute his own translation of a message file (perhaps amplifying
some messages, or getting the grammar right, or ...) without access to
the sources.

There's four negative impacts of the #ifdef approach, just for starters.
-- 
The taxonomy of Pleistocene equids is in a state of confusion.

cbp@icc.com (Chris Preston) (08/24/90)

In article <3603@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>In article <1881@jura.tcom.stc.co.uk>, rmj@tcom.stc.co.uk (Rhodri James) writes:
>> In article <3585@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>> }For why?  Internationalisation, _that's_ for why.
>
>> I cringe when I see this (unwords like "internationalisation", I mean).
>
>One uses language for the purpose of communication.

  etc., deleted.

>
>> Also I fail to see your point. Surely such #ifdef switching
>> as above is more efficient, simpler to maintain and more legible than
>> the scrabbling about with resource files you prefer?
>
>So now Cn James reads minds and knows what I prefer.  Wonderful just.
>No, it is *not* simpler to maintain.  The point of the resource file
>approach (not my invention by any means; no-hopers like IBM, DEC, HP,
>X/Open, AT&T, Apple, ... have been using it for a while and I just
>copied the idea and simplified it a bit for this newsgroup) is that
>you have all the text in one place; you don't have to go "scrabbling
>about" in the source files to find all the strings.  You can give the
>resource file to a human translator who knows nothing about the
>programming language you are using.  A minor addition to such a tool
>(have it generate
>	INTEGER MSGNO
>	PARAMETER (MSGNO=......
>instead of #defines) will let you use the *same* message file with a
>Fortran program.  Speaking as a no-hoper, I must admit that using a
>technique that adapts to *all* the programming languages I use, not
>just C, sounds like a saving.  But what do I know?

   Indeed, an interesting proposition.  There are two immediate (I am
   sure the creative will have more still) ways that will work with 
   internationalization while using labels and allow both extraction tools 
   to work and are simple to implement and preven the repetitious use of 
   literals and constants.  Here goes:

If you reaaaaaly want the text in the source section (incidentally, xscc on
System V [your original example] does invoke the C preprocessor, so text
substitution is absolutely not broken under MNLS, and any extractor that
does not invoke the preprocessor should be considered broken) -

#define DOS_DCOMM_MSG	1
#define UNIX_DCOMM_MSG	2
#define DEF_DCOMM_MSG	3

#if DOS
#define DCOM_ERR_MSG	DOS_DCOMM_MSG
#elif UNIX
#define DCOM_ERR_MSG	UNIX_DCOMM_MSG
#else
#define DCOM_ERR_MSG	DEF_DCOMM_MSG
#endif

#define DCOM_ERR	getmsg(DCOM_ERR_MSG)

/* tools.c */

char * 
getmsg(ErrMsg)
int ErrMsg;
{
	switch (ErrMsg){
		case DOS_DCOMM_MSG:
			return "Run dcom.exe";
		case UNIX_DCOMM_MSG:
			return "Datacomm not initialized, contact S/A";
		case DEF_DCOMM_MSG:
			return "Datacomm not running";
		default:
			return "Run for cover, they're commin' to get us";
}
/* somefile.c */

int 
CheckDatacomm()
{
	int RetVal;

	if ( (RetVal=DataCommRunning()) != 0)
		(void) fprintf(stderr,"%s\n",DCOM_ERR);

	return RetVal;
}

/* Makefile */

LANG = de fr sw gr

neatunix: main.o somefile.o tools.o 
	xscc -O main.o somefile.o tools.o -o neatunix 
	@for i in $(LANG); do gencat $@.X  $i.cat

neatdos: main.o somefile.o tools.o 
	xscc -O main.o somefile.o tools.o -o neatdos 
	@dosomethingelsealtogether


Another method would be to do something like the following (assuming that
you are invoking the C preprocessor):

#define DCOM_ERR	0
#define DRVR_ERR	1  /* etc. etc. */

char *ErrMsg[]={
#if DOS
				"Run dcom.com",
				"Run driver.com",
#elif UNIX
				"Datacomm not initialized, contact S/A",
				"Driver error, contact S/A",
#else
				"Datacomm not running",
				"Driver not responding",
#endif
};

#define MSG_ERR_DCOM	ErrMsg[DCOM_ERR]
#define MSG_ERR_DRVR		ErrMsg[DRVR_ERR]

int
foo()
{
	int Dcm, Dvr;
	.
	.
	.
	if (!Dcom())
		printf("%s",MSG_ERR_DCOM);

	if ( SomeDriverCheck() == FAILURE)
		printf("%s",MSG_ERR_DRVR);
	.
	.
	.
	return somevalue_etc;
}		


So, we have accomplished coding for purposes of internationalization,
either way, we have separated string literals to a central place,
and we have made the code more maintainable, since changes in messages for
the environment can occure at one major juncture, and life is a cabaret.

(BTW, all the above just got retyped in a max speed, so errors are surely
there and to be expected, the point remains).

>
>As for efficiency, the point is that we are talking about a scheme for
>generating messages for display to humans.  The cost of fishing the text
>out of a file is (or was every time I measured it) considerably less than
>the cost of displaying it on the terminal.

   Considering the program that pays no concern for "internationalization" 
   does not have to source anything external to it's data segment at any 
   time other than normal operations, to say that the additional overhead is 
   equal to or less than existing overhead is a non-sequitor.  If you 
   don't do it the cost ain't there.

>
>The real schemes (such as the X/Open one) identify messages by numbers,
>not by address in the text file.  That has the disadvantage that finding
>the right text is a wee bit more complex (but not very; one need merely
>attaches a directory at the end of the file), but it has the great
>advantage that the program does not need to be recompiled.  This means
>that one customer can be running the program with messages coming from
>the "English-speaking idiot" message file and another with messages
>coming from the "Spanish-speaking wizard" message file, and both can be
>sharing the same copy of the program without any recompilation at all.

   like MNLS, perhaps?

>
>That's the way it *is* in UNIX System V Release 4.  We might as well get
>used to thinking about messages in that way now.

  and it is not such a horrible thing.  Just think, we can pop streams
  modules for the simple stuff, and run extractors and programs to modify
  the source for multibyte character sets, and use different curses
  libraries for right to left output.  What a treasure.

  It has been pointed out here by several that are in the know on these
  things, that arguing about string literals is moot in comparison to other
  inherent difficulties presented by internationalization, and that the
  necessary crusade to "C programming practices" is long a commin'.

  For instance, I am told that the following is a problem in Kanji

  char p[10]; /* xscc provides for allowing twenty bytes as needed in Kanji */

  *(p+1)='x'; /* this is the next byte, and an error */ 
   p[n+1]='x'; /* this is the next _character_ and ok */

   Given trivial differences like this, I am sure that there are many
   things "broken" for internationalization, and we should all prepare to
   cringe; however, substitution for string literals and constants is not
   one of them.

>
>> Demonstrate to me a negative impact on internationalisation (ugh) and I
>> might believe you.  Any negative impact will do, I'm not too choosy.
>
>The schemes actually used by IBM (MVS, CMS, AIX) HP (HP-UX), DEC (VMS,
>Ultrix), AT&T (SVR4) and others essentially add another couple of layers
>of indirection above what I presented.  Those systems all allow you to
>switch languages at run time, without any recompilation.  Those systems
>all allow you to translate message files without having any other access
>to the sources.  They all allow many programs, and many programming
>languages, to share the same message files.  They all allow a customer
>to substitute his own translation of a message file (perhaps amplifying
>some messages, or getting the grammar right, or ...) without access to
>the sources.

   And still can.  xscc in Unix System V (your example) does all of this
   for you.  You need not make the resource catalogues.  It is done
   for you.  

>
>There's four negative impacts of the #ifdef approach, just for starters.

  Given the above examples, do you still feel this to be the case?

  I do not think so.  I also believe that this shows that it is an unsafe
  practice to say that something cannot be done within the framework of C
  and the C preprocessor. 

cbp
--------
Of course these are opinions.

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (08/26/90)

In article <1990Aug24.064203.20942@icc.com>, cbp@icc.com (Chris Preston) writes:
> If you reaaaaaly want the text in the source section (incidentally, xscc on
> System V [your original example] does invoke the C preprocessor

No, xscc was *not* my example nor anyone else's in this thread before this.
I mentioned System V Release 4, to be sure, but I did not mention xscc.
How on earth is using xscc supposed to help me use the same message
file for C, Pascal, Fortran, and Lisp?

> so text substitution is absolutely not broken under MNLS

Whoever said it was?

> Another method would be to do something like the following (assuming that
> you are invoking the C preprocessor):

> #define DCOM_ERR	0
> #define DRVR_ERR	1  /* etc. etc. */

> char *ErrMsg[]={
> #if DOS
> 				"Run dcom.com",
> 				"Run driver.com",
> #elif UNIX
> 				"Datacomm not initialized, contact S/A",
> 				"Driver error, contact S/A",
> #else
> 				"Datacomm not running",
> 				"Driver not responding",
> #endif
> };

Again, this technique means that you need the sources, and that to
change the messages you need access to the sources and to recompile.
That was an objection validly raised against the stripped-down message
file technique I posted, and it applies with greater force to this.

> So, we have accomplished coding for purposes of internationalization,
> either way, we have separated string literals to a central place,
> and we have made the code more maintainable, since changes in messages for
> the environment can occure at one major juncture, and life is a cabaret.

The point of a message file is that
 -- the "central place" is OUTSIDE THE PROGRAM
 -- a message file can be got at by someone with no (other) access to sources
    (this is a *big* deal for developers!)
 -- *one* version of the object file can be shared by people using
    *different* message files.

> >As for efficiency, the point is that we are talking about a scheme for
> >generating messages for display to humans.  The cost of fishing the text
> >out of a file is (or was every time I measured it) considerably less than
> >the cost of displaying it on the terminal.
> 
>    Considering the program that pays no concern for "internationalization" 
>    does not have to source anything external to it's data segment at any 
>    time other than normal operations, to say that the additional overhead is 
>    equal to or less than existing overhead is a non-sequitor.  If you 
>    don't do it the cost ain't there.

That's non-sequitUr, and this "rebuttal" is badly flawed.
What I claimed was
	(cost of fetching message) << (cost of displaying message)
Someone with measurements to disprove this can refute me (for a particular
hardware/software combination) by displaying his figures.  Of course, what
is *really* interesting about this "rebuttal" is that in a virtual memory
environment it simply isn't true.  We're talking about messages here,
things which are displayed at relatively infrequent (we hope!) intervals.
Text, in short, which is paged OUT.  In a system which supports memory-
mapped files (VMS, Aegis, SunOS 4.x, AIX, ...) one could open the message
file as a memory-mapped file, and then the process of fetching a message
from the message file would cost no more than the process of fetching a
message from a pre-initialised character array, because the two would be
exactly the same process.

>   It has been pointed out here by several that are in the know on these
>   things, that arguing about string literals is moot in comparison to other
>   inherent difficulties presented by internationalization, and that the
>   necessary crusade to "C programming practices" is long a commin'.

That is why, for example, ANSI C has
	wchar_t
	wcstombs()
	mbstowcs()
	mblen()
and so on, and why it is set up to allow multi-byte characters in
constants.

> >There's four negative impacts of the #ifdef approach, just for starters.

>   Given the above examples, do you still feel this to be the case?

Of course.  Those four negative impacts still stand.

>   I do not think so.  I also believe that this shows that it is an unsafe
>   practice to say that something cannot be done within the framework of C
>   and the C preprocessor. 

Again, who said _that_?  Not me!  That there are *better* ways to do some
things than using the C preprocessor, who can challenge that?  The only
question is, _which_ tasks?  Given that I said I would like to share
message files between several programming languages, using a facility
peculiar to one of them (there is no guarantee that /usr/lib/cpp will be
available nor anything like it) would be rather silly, wouldn't it?

A serious problem concerned with "the need to make the texts we write for
the tools that count work with more than one tongue of men" (otherwise
known as "internationalisation" if you have no fear of words that have
more than one sound in them) is that C formats don't quite work.  One
common problem is that different languages put phrases in different orders.
The X/Open answer to that is to have an extra piece of information in
%format controls, saying which argument to use.  I presume that the ANSI C
committee considered that, and didn't include it because it basically needs
pointers and integers to be the same size.


The following suggestion is not altogether serious.  But bearing in mind
things like wanting to put phrases in different orders, and all sorts of
things one might like to let customers configure for themselves (without
having to give them *all* the sources), it might not be as crazy as it
sounds.
	How about using TCL (Tool Command Language) for "messages"?
TCL is a free "extension language" which somewhat resembles the Unix
shells, and is set up to be a *small* library that can be linked into C
code.  When one wants to report an event, one could format the arguments
of that event into strings, fetch a TCL command from a file, and execute
that TCL command.  It was intended to customise input to things like the
editor "mx", but there's no reason it couldn't be used to customise *output*.
As I say, not altogether serious.

-- 
The taxonomy of Pleistocene equids is in a state of confusion.

cbp@icc.com (Chris Preston) (08/29/90)

In article <3617@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>In article <1990Aug24.064203.20942@icc.com>, cbp@icc.com (Chris Preston) writes:
>> If you reaaaaaly want the text in the source section (incidentally, xscc on
>> System V [your original example] does invoke the C preprocessor
>
>No, xscc was *not* my example nor anyone else's in this thread before this.

  No one else in the thread before this talked about X/Open and Ansi's
  handling of multi-language portability on System V R4.

  The standard offering on System V Release 4 for doing
  "internationalization" _is_ MNLS.  I have a fax of the price schedule
  from UNIX research laboratories dated June 22 in front of me.

  MNLS has xscc to produce message files (or catalogues)
  from the application as part of the compilation process.  Gencat is then
  used to produce the message file in the applicable language as
  applicable.

  Your example was Sys V R4.  This is simply part of the tools that anyone
  doing internationalization work is likely to do it with.  Particularly,
  contractors for products on vendors that repackage System V R4 on their
  own boxes will stay with these tools for the export systems, as will third
  party vendors that do applications that are offered with the base system.

  Rolling your own extractor is less applicable here when looking at System
  V R4 since it has so many standards rolled into one that it is best to
  stick with the tools offered.  Otherwise, one is likely to find that
  personalized extractors and resource catalogues that do might meet the same
  format as produced for MNLS and conform with X/Open.  

>I mentioned System V Release 4, to be sure, but I did not mention xscc.

  IMHO this is like mentioning Unix software development and not mentioning
  the C compiler and development system.  There are certainly basic
  compilers, ratfor and f77, but these would not be the assumed development
  tools normally.

>How on earth is using xscc supposed to help me use the same message
>file for C, Pascal, Fortran, and Lisp?

  Because it produces an external message file that can be translated into
  multiple languages using gencat, can be modified at the customer site and
  is going to be about as portable to Pascal as is the "awk extracted"
  version that was previously offered.

>
>> so text substitution is absolutely not broken under MNLS
>
>Whoever said it was?
>
>> Another method would be to do something like the following (assuming that
>> you are invoking the C preprocessor):

  This example deleted.

>Again, this technique means that you need the sources, and that to
>change the messages you need access to the sources and to recompile.
>That was an objection validly raised against the stripped-down message
>file technique I posted, and it applies with greater force to this.

  There were two examples, one of which was to use labels and to have the
  string literals in a single place in functions like geterrmsg(),
  getusermsg() and so forth.  This was somehow deleted, but will work
  without difficulty when extracting the messages from the start, at under
  the proposed technique (awk extractor) would produce:

#define USER_MSG0 0
#define USER_MSG1 1

#if DOS
#define CONTINU_MSG 0
#elif UNIX
#define CONTINU_MSG 1

etc.

#define CONTINUE getusermsg(CONTINU_MSG)

  getusermsg(UserMsgNo)
  char * UserMsgNo;
  {

    switch (UserMsgNo){
	  case USER_MSG0:
		return ExternMsgGet(soveval);

where before the awk extractor the return value was

		 return "type any key to continue";

and the application says

    printf("%s\n",CONTINUE);

This is, of course, an example and is subject to ones own methods and
style.

>
   My comments deleted.

>
>The point of a message file is that
> -- the "central place" is OUTSIDE THE PROGRAM
> -- a message file can be got at by someone with no (other) access to sources
>    (this is a *big* deal for developers!)
> -- *one* version of the object file can be shared by people using
>    *different* message files.

  The point is that an intelligent use of labels can allow 
	-- the "central place" is OUTSIDE THE PROGRAM
	-- a message file can be got at by someone with no (other) access to
	sources (which is certainly a big deal with our products)
	-- *one* version of the object file can be shared by people using
	   *different* message files.
    -- code can hide machine dependencies for string literals and
	constants in label form and "internationalization" is _not_ broken.

>
>> >As for efficiency, the point is that we are talking about a scheme for
>> >generating messages for display to humans.  The cost of fishing the text
>> >out of a file is (or was every time I measured it) considerably less than
>> >the cost of displaying it on the terminal.
>> 

  My comments deleted.

>
>That's non-sequitUr, and this "rebuttal" is badly flawed.

  This is not a debate.

  As to using external message files, if the application is in
  a native language it would not hurt to compile a straight version  
  without messages being extracted and use a standard tool 
  to do the extraction for a multi-language version, for example.

  In such a case, the multi-language version will either spend no
  additional time, or some additional time "fishing" the text out of a
  file.  Yes, if some form of memory mapping is used so that the the
  messages are mapped into the heap, then great, there is no difference in
  the speed for the multi-language version.  That is the best case
  scenario.  The worst case scenario is that it will take additional time
  and slow the application down.  That is the worst case situation.  

  The multi-language version will not be any faster than the native 
  version and at best slower.  That the additional delay is less than some
  other delay, like display time is not significant.  The display of
  messages in English or Kanji, will occur at some point in most
  applications, period.  Whether there is an additional overhead or
  no-overhead from fetching the message to display is not a valid
  comparison to the guaranteed display time of the message.

>What I claimed was
>	(cost of fetching message) << (cost of displaying message)

  The actual point appears to be that the additional delay will only be
  some or none irrespective what the choice for comparison is.

  This is not an argument for or against having message files but rather
  an additional performance consideration.

>Someone with measurements to disprove this can refute me (for a particular

  various explanation about virtual mapping deleted.

  my comments about additional concerns in programming practises for
  internationalization deleted.

>
>That is why, for example, ANSI C has
>	wchar_t
>	wcstombs()
>	mbstowcs()
>	mblen()
>and so on, and why it is set up to allow multi-byte characters in
>constants.

  And why much code must be rewritten using the Ansi standard.  By the same
  token, a great deal of development is done with non-Ansi compliant
  compilers because that is what is native to the system and that is what
  the prime contractor requires be used in the application development
  (witness open desktop).  It is, therefore, not just a matter of
  "well lets just use gcc or buy some Ansi compliant compiler with the
  appropriate libraries."  It is like talking about using posix compliant
  system calls only on an earlier release of System V that is not posix
  compliant.  

>
>> >There's four negative impacts of the #ifdef approach, just for starters.
>
>>   Given the above examples, do you still feel this to be the case?
>
>Of course.  Those four negative impacts still stand.

  The example that you deleted would negate all for of thee negative
  impacts.

>

   My comments deleted
>
>Again, who said _that_?  Not me!  That there are *better* ways to do some
>things than using the C preprocessor, who can challenge that?  The only
>question is, _which_ tasks?  Given that I said I would like to share
>message files between several programming languages, using a facility
>peculiar to one of them (there is no guarantee that /usr/lib/cpp will be

  To anticpate a version of C that does not perform a preprocessing
  stage is an interesting prospect.  To anticipate using other languages as
  a programming consideration when coding in C is probably the beyond the
  bounds of this newsgroup and not likely to be the concern of those whose
  applications are done completely in C.  It is oftentimes chosen (like
  here) _because_ of its intermachine portability.  Your examples and the
  drift of discussion indicates that portability to Pascal, Fortran and
  Lisp is worthwhile.  Perhaps this is true in some cases, but it is just
  as applicable to rely on labeling for literals and constants in order to
  port the same C code, which is what we do here.

>available nor anything like it) would be rather silly, wouldn't it?

  For something other than C, yes.  Given the newsgroup, no.

>

  Various deleted about Ansi and X/Open.

>
>
   Comment about TCL deleted.

   In summary, using labels for string literals is a good thing, and can be
   done without "breaking internationalization" as was previously
   suggested.  

cbp
------
Recent conversation between Kurt Waldheim and Saddam Hussein:
  "Saddam, I *knew* Hitler, and believe me, you're no Adolf Hitler."

cbp@icc.com (Chris Preston) (08/29/90)

In article <1990Aug29.043513.19715@icc.com> cbp@icc.com (Chris Preston) writes:
  That's me wrote:

>  from UNIX research laboratories dated June 22 in front of me.

   That's actually,
   
   UNIX System Laboratories, Inc. 
	 (Small Print) A Subsidiary of AT&T

cbp

rmj@tcom.stc.co.uk (Rhodri James) (08/30/90)

In article <3603@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes about me writing about him writing:
>> }For why?  Internationalisation, _that's_ for why.
>
>> I cringe when I see this (unwords like "internationalisation", I mean).
>
>One uses language for the purpose of communication.

My point exactly. "Internationalisation" communicated absolutely nothing
to me for several minutes, even given the example. Me, I'd prefer to
call it "language switching" or something a tad more obvious like that,
but the potential confusion *that* could cause is enormous. So I guess
I'll have to lump it.

>In order to effect that purpose, one uses words that other people know
>and use, not the words one happens to like.

True. See above. Although a counterexample has just sprung to mind -
"program".

>Like it or not,
>"internationalise" and its derivatives are *words* in 1990s computing
>jargon.

I had never seen or heard the word prior to this thread. Whether this
means I am not up on the jargon, or the jargon isn't nearly as
international as it would like to think, I don't know.

>(By the way, there is no such word as "unword".  If
>there were such a term, it would be "nonword".  "dictcheck -pedantic")

Oh good, my attempt to get into this style of linguistic evolution
worked. :-)

>> Also I fail to see your point. Surely such #ifdef switching
>> as above is more efficient, simpler to maintain and more legible than
>> the scrabbling about with resource files you prefer?
>
>So now Cn James reads minds and knows what I prefer.  Wonderful just.

Cn? Oh, Citizen. Sorry, Pr O'Keefe.
(Both the above lines are ad hominem and ought to be ignored, but are
much more fun this way).

>[Sundry bits of info and arguments that are actually useful]

OK. You've convinced me. For programs requiring multi-linguistic output
and input of medium or greater complexity (or any requiring run-time
switching), the resource file approach wins. Personally, it'll still
take me a long time to give up #ifdeffing, as I know I can maintain that
and I have an aversion to complicating preprocessing (it just doesn't
feel right), but that's just me.

Mind you, arguing that "this is the way System V does it, so get used to
it" nearly lost you my sympathy. How Unix of any sort has become the
dominant operating system is beyond me, it's not as if it's actually
very good or anything :-\
-- 
* Windsinger                 * "Nothing is forgotten..."
* rmj@islay.tcom.stc.co.uk   *                    Mike Whitaker
*     or (occasionally)      * "...except sometimes the words"
* rmj10@phx.cam.ac.uk        *                    Phil Allcock

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (08/30/90)

In article <1911@islay.tcom.stc.co.uk>, rmj@tcom.stc.co.uk (Rhodri James) writes:
> Mind you, arguing that "this is the way System V does it, so get used to
> it" nearly lost you my sympathy.

It wasn't *supposed* to *keep* your sympathy.
There are a lot of things in System V Release 4 I don't particularly
care for (having gone to the trouble of learning how X/Open handles
internationalisation, I didn't really appreciate discovering that the
new Official way of doing it was different, and while the TLI routines
may perhaps be a considerable improvement on sockets, I have yet to
find anything which explains them clearly enough for me to use them).

The point I was making is that *customers* are going to expect SVR4
programs to behave in a particular way.  SVR4 has a convention for
generating multi-line error messages (SVR4 is an adventure; if you
win you find you're playing VMS), and it has lots of features for "locale"
support (if you want the "C" word rather than the "UNIX" word, which
has been current for, oh, at least 3 years).  In a couple of years
time, customers are going to expect programs to follow the UNIX Way,
just as Macintosh customers expect Mac programs to follow the Mac Way.
So we had better get used to it if we want to produce programs that
the next decade's UNIX customers will continue to be willing to buy.

By the way, "language switching" would NOT be an appropriate replacement
for the word "internationalisation" because the latter covers rather more.
Wales and the USA both use English (Wales also uses Welsh and the US also
uses Spanish).  But they don't represent dates the same way, and they
don't use the same symbols for currency.  Internationalisation refers to
collating order, date and time representation, currency representation,
and a couple of other things I forget as well as the language that messages
are displayed in.  A program portable between different locales will _not_,
for example, assume that everyone has a three-part name, a common US-ism.

> How Unix of any sort has become the
> dominant operating system is beyond me, it's not as if it's actually
> very good or anything :-\

"Democracy is the worst possible political system, except for all the others."
Unix hasn't succeeded by being particularly good, but by not being
excruciatingly bad (unlike xx-xxx, xx/xxx, xxx-xx, xxx, or xxx -- names
changed to protect _me_).

-- 
You can lie with statistics ... but not to a statistician.

domo@tsa.co.uk (Dominic Dunlop) (08/30/90)

Ho hum.  Been vaguely following this thread, basking in its heat and
peering into its dim light.  But I'm not about to fight those individual
smoky flames.  I'll just make these points:

 1. While ``internationalization'' (abbreviated by the cognoscenti to the
    cute, opaque and finger-wear saving ``i18n'' because it has 20
    letters) is a nasty construction, it is an accepted term.

 2. As well as being a nasty neologism, internationalization is a
    misnomer: you can use the facilities it confers, even if your
    market is confined entirely to -- say -- Lake Wobegon.  It allows
    you easily to have different sets of prompts and messages for
    Catholic and Lutheran users.  Or kids and their parents.  It lets
    you display and search for Swedish characters if you need to in
    order to satisfy the needs of some of your users.  It deals with
    timezone issues when you want to catch that plane to Florida or
    meet that flight from south-east Asia...  (Barely resisted
    temptation to cross-post to alt.wobegon...)

 3. The international information technology standards community is rather
    belatedly taking an interest in internationalization.  For example,
    ISO's working group on POSIX has established a ``Rapporteur Group on
    Internationalization''.  (And note that ISO spells the word with a
    Z: the organization follows the guidelines of the Oxford English
    Dictionary, widely ignored in Britain, which favour ``ize'' over
    ``ise'' in all but a few cases.)

 4. I say ``belatedly'' because it has taken the standardizers a long
    time to realise that their products should be of equal utility to
    all users, independent of the language that they use to
    communicate, or the means that they use to represent it.  Efforts
    to sort out just the lowest level of this problem -- that of
    character sets -- have been continuing for years, and look set to
    drag on for years more.  By the time you get to computer languages,
    you have, until very recently, pretty much been expected to be
    speaking English.  (POSIX is, for largely political reasons, is
    treated as a computer language by ISO.)

 5. To a first approximation, the internationalization work of two
    organizations forms the basis of moves towards international
    standards in the area of C and POSIX.  These organizations are
    X/Open and UniForum (formerly /usr/group).  The published POSIX and
    C standards (ANSI/IEEE Std.1003.:1988 and ANSI Std. X3.159-1989
    respectively) currently embody fairly minimal internationalization
    features.  Future revisions will have more to say on the topic.
    1003.2, the forthcoming shell and tools standard, has a great deal
    to say on the issue.  X3J16, the newly-formed C++ standards working
    group, has internationalization among its fundamental requirements.

 6. There's little literature on the topic.  Here's what I know of:

    - Volume 3 of the X/Open Portability Guide, issue 3 (XSI
      Supplementary Definitions, Prentice Hall, 1989, ISBN
      0-13-685830-3) defines X/Open's proposals.  If you purchase a
      system with the X/Open XPG3 brand, you should get an
      implementation of this stuff.  (Internationalization is
      part of ``base'' brand requirements; you don't even need the more
      comprehensive ``plus'' brand.)  The problem with the XPG is that
      it's a definition, not a user's guide: you have to figure out how
      on earth to hang all that stuff together and make it work for
      you.

    - UniForum has published a white paper on internationalization.  It
      presents a good technical background to the topic, although it's
      inclined to rush off into neat details at the slightest
      provocation.  For copies, contact UniForum at 2901 Tasman Drive,
      #201, Santa Clara CA 95054, U.S.A., phone +1 408 986 8840, fax +1
      408 986 1645 (sorry, no email address to hand).  If there's a
      UniForum affiliate in your country, they may have copies too.
      (If the document IS NOT freely available, could somebody please
      post a correction!)  There's a fair degree of commonality between
      UniForum and X/Open, particularly in the area of regular
      expressions, where (simplifying somewhat) essentially the same
      people were involved on behalf of both organizations.
      Implementations of the remainder of the UniForum proposals are
      not widely distributed or easy to get hold of.

    - As part of the ISO POSIX watchdog work I do under the sponsorship
      of EUUG and USENIX, I have written two articles concerned with
      internationalization: ``Report on ISO/IEC JTC1/SC22/WG15
      Rapporteur Group on Internationalization Meeting of 5th - 7th
      March, 1990, Copenhagen, Denmark'', and ``International
      Standardization -- An informal view of the formal structures as
      they apply to POSIX internationalization''.  Both appeared in
      ;login 15:3 (May/June 1990).  They were also published in the EUUG
      Newsletter (10:2 and 10:1 respectively -- summer and spring 1990).
      And the report was posted to comp.std.unix on 14th March.
      (Although I regret it's missing from my archive, so I can't quote
      a reference.)  But, if you can't put your hands on the documents
      in those places, mail me and I'll send copies.

    - A forthcoming book, Open Systems: A Business Strategy for the
      Nineties, by Dr. Pamela Gray (McGraw Hill, late 1990) presents the
      business case for internationalization, along with technical
      background (written by yours truly).

 7. A fundamental concept in internationalization is that it is part of
    a two-step process.  An application which is internationalized is
    independent of any cultural bias.  It's also useless.  In order that
    anybody can use it, a cultural bias of their choice has to be added.
    This process is called localization.  (The abbreviation ``l10n'' is
    not widely used.)  The benefit of the two-step approach is that the
    first step needs only to be done once, and makes (or should make)
    the method of carrying out the second step reasonably obvious.  (I
    know from experience that replacing with another bias the cultural
    bias inherent in an uninternationalized application is a
    debilitating and expensive process.  Worse, it has to be repeated
    essentially in full for each new bias (market) that is desired.)

 8. The economics of the two-step approach merit some study, but I have
    yet to see any analysis of the subject.  It seems to me that the
    cost of using internationalization features in an application,
    followed by localizing to its first market, is likely to be higher
    than that of hardwiring support for a single market.  This is
    particularly true now, when knowledge of the techniques is not
    widespread, and programmers have to be retrained before they can
    apply them.  (Finding programmers able to take the mental step back
    needed to identify those aspects of an application dependent on
    cultural considerations is also likely to be a problem.)  Only if
    and when second and subsequent localizations are carried out does
    the payback begin, both it terms of reduced conversion costs, and
    (probably) in support costs which are relatively lower than those
    for radically hacked versions of an initial non-internationalized
    application.

 9. A further attraction of the technology under development (it is too
    early to describe it as mature, although it's approaching puberty)
    is that it should allow non-programmers to perform the localization
    step.  Now and in the past, when adaptation of an application for a
    new market has typically involved heavy-duty hacking, those best
    qualified to describe the new culture -- natives educated in the
    humanities, and not working for the original developer -- have
    typically been barred from involvement both because they are not
    programmers and because the original software author is either
    unwilling to relinquish any element of control of the source code,
    or because the licensing fee demanded for the source is greater than
    can be recouped from the new target market.  In theory then,
    internationalization should make markets more open by reducing
    economic and technical barriers to the movement of software between
    cultures.

10. Too bad the concepts are not widely known or understood.  But we're
    working on it...
-- 
Dominic Dunlop

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/31/90)

In article <1990Aug30.115608.3729@tsa.co.uk> domo@tsa.co.uk (Dominic Dunlop) writes:
> 5. To a first approximation, the internationalization work of two
>    organizations forms the basis of moves towards international
>    standards in the area of C and POSIX.  These organizations are
>    X/Open and UniForum (formerly /usr/group).

Actually, while these organizations have influenced UNIX, the C hooks
for internationalization were hammered out specially in unofficial
working groups of interested parties and are for the most part not
based on previously published specifications.

>    The published POSIX and C standards (ANSI/IEEE Std.1003.:1988 and
>    ANSI Std. X3.159-1989 respectively) currently embody fairly minimal
>    internationalization features.  Future revisions will have more to
>    say on the topic.

Not necessarily true.  The ISO C standard may eventually have an addendum
that specifies additional internationization-related features, however.

userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) (08/31/90)

Regarding the arguments over whether or not "internationalization"
is a valid word, I have concluded that there  are  two  points  of
view,     which     could     generally     be    classified    as
"internationalizationalism" and "anti-internationalizationalism".
-------------------+-------------------------------------------
Alastair Dunbar    | Edmonton: a great place, but...
Edmonton, Alberta  | before Gretzky trade: "City of Champions"
CANADA             | after Gretzky trade: "City of Champignons"
-------------------+-------------------------------------------

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (09/01/90)

In <1302@mts.ucs.UAlberta.CA> userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) writes:

     Regarding the arguments over whether or not "internationalization"
     is a valid word, I have concluded that there  are  two  points  of
     view,     which     could     generally     be    classified    as
     "internationalizationalism" and "anti-internationalizationalism".

Other            analogous            dichotomies
exist.       For      example,      there      is
"justification",      "right      justification",
and         "anti - right         justification".
--
Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi

userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) (09/05/90)

In article <2349@cirrusl.UUCP>, dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
>In <1302@mts.ucs.UAlberta.CA> userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) writes:
>
>     Regarding the arguments over whether or not "internationalization"
>     is a valid word, I have concluded that there  are  two  points  of
>     view,     which     could     generally     be    classified    as
>     "internationalizationalism" and "anti-internationalizationalism".
>
>Other            analogous            dichotomies
>exist.       For      example,      there      is
>"justification",      "right      justification",
>and         "anti - right         justification".
>--
>Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
>UUCP:  oliveb!cirrusl!dhesi
 
I  am  left  wondering  how  you  justify your comments. I use an
editor (to justify  my  comments)  that  _attempts_  to  minimize
whitespace  where possible, but this is clearly not easy now that
words of the  length  of  "internationalizationalism"  (sic)  are
becoming  the  norm (or normalizationalism). Faulty though it may
be, it would never produce as anti-aesthetic a paragraph as yours
above:
 
Other  analogous  dichotomies  exist. For example,
there is "justification",  "right  justification",
and "anti - right justification".
-------------------+-------------------------------------------
Alastair Dunbar    | Edmonton: a great place, but...
Edmonton, Alberta  | before Gretzky trade: "City of Champions"
CANADA             | after Gretzky trade: "City of Champignons"
-------------------+-------------------------------------------
#! r