[net.unix] The Internationalisation of Unix - A European View

leif@erisun.UUCP (Leif Samuelsson) (06/11/85)

In article <211@pyrltd.UUCP> bejc@pyrltd.UUCP (Brian Clark) writes:
> /usr/group/UK is proposing to establish an International Working Group to
> develop current ideas on the integration of European character sets into
> formal proposals.

I think we need to define our terms here. What is the difference
between saying "internationalising" and "nationalising"? To me,
they seem to be two radically different concepts.

Translating Unix commands to other languages and/or
incorporating other character sets should really be called
"nationalising", while the word "internationalising" should be
used to describe the act of making Unix less depending on the
U.S. character set. (And thereby making "nationalising" Unix an
easier task).

For everyone's info, the following eleven characters are to be
considered national, and should be avoided in software meant to
be "international":

	#$@[\]^{|}~

----
Leif Samuelsson

Ericsson Information Systems AB			..mcvax!enea!erix!erisun!leif
Advanced Workstations Division
S-172 93  SUNDBYBERG				59 19 N / 17 57 E
SWEDEN

aeb@mcvax.UUCP (Andries Brouwer) (06/11/85)

In article <330@erisun.UUCP> leif@erisun.UUCP (Leif Samuelsson) writes:
>
>For everyone's info, the following eleven characters are to be
>considered national, and should be avoided in software meant to
>be "international":
>
>	#$@[\]^{|}~
>

No, one wishes to use the full national character set in identifiers,
command names etc. On the other hand, one also wishes to use the graphics
mentioned, both in texts and as syntax specifiers.
Finally, to write all european languages that use the roman alphabet
requires a little more than eleven additional characters.
Conclusion: make the codes for Scandinavian aa,ae,oe, for Icelandic -d,th,
for German sz, for Dutch ij, for French c,, for Spanish n~, for Turkish
dotless i, for accented vowels in many languages and the various special
symbols in Polish, Czech and Romanian distinct from each other and from
the codes for the graphics mentioned above. Clearly this requires an
expansion of the ASCII space from 7-bit to 8-bit.

andersa@kuling.UUCP (Anders Andersson) (06/18/85)

In article <330@erisun.UUCP> leif@erisun.UUCP (Leif Samuelsson) writes:
>In article <211@pyrltd.UUCP> bejc@pyrltd.UUCP (Brian Clark) writes:
>> /usr/group/UK is proposing to establish an International Working Group to
>> develop current ideas on the integration of European character sets into
>> formal proposals.

>For everyone's info, the following eleven characters are to be
>considered national, and should be avoided in software meant to
>be "international":
>
>	#$@[\]^{|}~

Regardless of whether this is the proper way to go about the problem
or not (I don't think it is), shouldn't "`" be among those characters?

The subject line (and the fact that this discussion goes to net.unix,
not net.text) seems a little strange to me. Does the Working Group
focus on character sets in Unix specifically? I consider this a problem
of natural language text representation in general. I would appreciate
if someone could make these things clear.