[comp.std.internat] What kinds of things would you want in the GNU OS?

domo@riddle.UUCP (Dominic Dunlop) (06/22/89)

In article <205@marvin.moncam.co.uk> paul@moncam.co.uk (Paul Hudson) writes:
>In article <20037@adm.BRL.MIL>, dsill@relay.nswc.navy.mil writes:
>> >From: Peter da Silva <peter@ficc.uu.net>
>> >... Any sort of fancy new command option scheme [will never fly in Japan].
>
>> Why not?  And why are we suddenly so concerned about internationalization?
>
>Becuase not all users of a future GNU OS will live in ths US, or use English
>as their first language. Stop being so parochial. 

Yes.  I guess a product of the Free Software Foundation's not going to do
too much to fix the trade imbalance between the US and Japan, even if it
can handle Japanese characters...
>
>Since we have a chance to remove some of the stupid retrictions in the
>existing stuff, let's do so. 8bit characters everywhere. Locale-specific
>messages. Why not? Properly done it shold have little impact on what the US
>version will be like, so you shouldn't mind.

Agreed, except that eight-bit cleanliness on its own is not likely to be
sufficient in the long term: ISO is, believe it or not, working on a
``multi-octet coded character set'' (draft international standard 10646),
which is 32 bits wide (although they've only worked out what to do with 24
of them so far).  Reading it, it seems that the cleanest way to implement
anything which processes data (as opposed to communicating or storing data)
is to have a module which delivers a constant-width 32-bit version to its
input.  This allows transmission and storage to use more compact forms
laced with nasty things like shift sequences, while hiding these details
from most programs.  Thirty-two bit cleanliness, anyone?

Those still feeling parochial may care to reflect on the fact that serious
users of troff regularly exceed the bounds of an eight bit character set,
never mind seven, even when writing English, just by using \(bu, \(34, and
so on.  _Wouldn't it be nice_ to get away from all those illegible escape
sequences?  (Of course, we won't: troff is itself a de-facto standard, and
so we're stuck with it the way it is -- although see _An Extension to the
troff Character Set for Europe_ by Keizer, Simonsen and Akkerhuis in the
Summer 1989 European UNIX systems user Group Newsletter.)

Seriously, though, multi-byte character sets are a hot and knotty topic
for both UNIX and C.  If you're in the least bit interested, watch
comp.std.internat.
-- 
Dominic Dunlop
The Standard Answer Ltd., using Sphinx' facilities (for which much thanks)
domo@sphinx.co.uk