[comp.std.internat] non-English characters + OS's supporting them.

aj@zyx.UUCP (Arndt Jonasson) (08/23/87)

Below is a table of non-English characters used in all European
languages that I was able to look up.

					a e i o u y  c d g l n r s t z

'  accent egue				x x x x x x  x       x   x   x
`  accent grave				x x x x x
^  accent circonflexe			x x x x x
   bar above				x x x x x
   cedilla (hook below)			x x x   x    x   x x x x x x
   trema/diaeresis (two dots above)	x x x x x x
   circle above				x       x
   tilde (wave-form above)		x     x              x
'' double accent egue			      x x
   half-circle above (curved downwards)	x                x
   hacek (^ up-side down)		  x          x x     x x x x x
   single dot above			  x
   no dot above ('anti-accent')		    x
   short bar through the middle		               x   x

					a e i o u y  c d g l n r s t z

Additionally, we have these characters:

   ae written together
   oe written together
   Danish o with a slash through it
   Icelandic thorn
   Icelandic curved d with a short bar through it
   German scharf-s/ess-zett

English alphabet	26
The above		69
			--
Sum			97 * 2 (for capitals) = 194

[To forestall possible nit-picking: I know that this calculation is
not 100% correct.]

I someone finds some letter of his/her native language to be missing
in the above table, I would be glad to know about it.


I have seen no article mentioning actual implementations which address
these problems. In at least two operating system, the effort has been
made to make it possible for programs to use the user's native
language. Are there more?

HP's Unix operating system HP-UX provides library routines which makes
it possible to write programs that allow the user to receive messages
in his/her native tongue. This includes date, time and currency having
the correct format.

To the ordinary user, this amounts to setting the LANG environment
variable to, say, swedish, whereafter most system programs will give
their messages in Swedish.

The default font, called Roman8, to my knowledge supports all
languages of western Europe (except Icelandic; one single letter is
missing). I don't know if Roman8 is identical or similar to Latin1.

[Minor point of interest: the character [s hacek] is present in the
font. What use does it have?]

Other fonts are available, e.g. Katakana, Greek (I don't know about
Eastern European languages or the Cyrillic alphabet). Kanji is
supported as well, but we don't have it, so I can't say how, e.g.,
input and output are performed in Kanji.

The problems of sorting are dealt with by having strcmp8 and strcmp16
routines which use their first argument to determine what collating
sequence to use.

Still lacking is a good editor that knows about national characters
and handles them correctly.


The OS for the Apple Macintosh doesn't provide for Kanji (yet), but
supports about the same character set as HP-UX for the Western
European languages, as well as the correct date, time etc. format.

Editors for the Macintosh mostly handle non-English characters
splendidly.

-- 
Arndt Jonasson, ZYX Sweden AB, Styrmansgatan 6, 114 54 Stockholm, Sweden
Mail address:	 ...!seismo!mcvax!zyx!aj	=	aj@zyx.SE

john@frog.UUCP (08/28/87)

In article <1274@zyx.UUCP>, aj@zyx.UUCP (Arndt Jonasson) writes:
> 
> Below is a table of non-English characters used in all European
> languages that I was able to look up...
> If someone finds some letter of his/her native language to be missing
> in the above table, I would be glad to know about it.
> 
Just a minor point:  it isn't anyone's NATIVE language, but Esperanto adds:

> 					a e i o u y  c d g h j l n r s t z
> ^  accent circonflexe			x x x x x    X   X X X       X
> 

In the use I've seen, these sort immediately after their "vanilla" versions,
e.g., c then ^c then d ...

--
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, ...!mit-eddie!jfw, jfw@eddie.mit.edu

ROBOTS!!  Underpaid robots from hyper-space INSPECTED my DELICATE WASHABLES!!

bas+@andrew.cmu.edu (Bruce Sherwood) (09/04/87)

> Just a minor point:  it isn't anyone's NATIVE language, but Esperanto adds:

Just a minor point, but in fact Esperanto IS the native language of some
hundreds of people.  I've met some of them.  What happens is that a couple
meets thru Esperanto activities (conferences, pen pals, whatever), has no
language other than Esperanto in common, marries, continues to speak
Esperanto at home (since they have affectionate ties to the language and for
a time have still have no other language in common), and the children learn
Esperanto as their first language.  Of course these children soon become
multilingual, learning the language of the community.

After this brief interruption, back to the diacritic issue.  ISO 6937 gives a
complete list of the combinations of letters plus diacritics, and of the
special characters, which are used in 41 roman-letter alphabets (including
Esperanto).

Bruce Sherwood