[net.followup] Use of national characters in names

chris@umcp-cs.UUCP (05/19/84)

Personally, I'd vote for accents \'a l\`a \TeX.  (Forgive me if I've
got the accents wrong here, I don't really know French and I've just
spent 15 minutes trying to find the right ones....)  In \TeX, the
following accents are available:

	\`	grave
	\'	acute
	\^	circumflex
	\"	umlaut or dieresis
	\~	tilde
	\=	macron (bar)
	\.	dot
	\u	breve (the little ``u'')
	\v	h\'a\v cek (the little ``v'')
	\H	long Hungarian umlaut
	\t	tie-after accent (I've no idea what this one is for)

and these which go underneath:

	\c	cedilla (``Fran\c caise'')
	\d	dot, but underneath
	\b	bar, but underneath

Then there are some special letters:

	\oe	oe ligature (there is also \OE for uppercase)
	\ae	ae ligature (and \AE for uppercase)
	\aa	a with circle (``\AA ngstrom'')
	\o	o with slash (and \O for uppercase)
	\l	Polish suppressed-L (a small stroke through the middle;
		there is also \L for an uppercase version)
	\ss	German ``es-zet'' or sharp S

That covers everything I've ever seen, and a few I've never seen,
though I have no idea if it covers all the European characters in
languages using Roman alphabets.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

otto@whuxle.UUCP (George V.E. Otto) (05/20/84)

The German solution to this problem is, I believe, to use the older
spellings for names with "funny" (i.e., national) characters in them.  Hence
umlauted "o" is spelled "oe" when typed on a typewriter without national
characters.  Using this scheme, the name of Hofstadter's book would be typed
"Goedel, Escher, Bach."  I think the same scheme should be used with other
national characters.  This can be done easily when there is a history
connecting letter combinations to national characters.  I don't know if this
is a possible with *all* national characters, however.

Even if so, there would have to be some sort of lexicon published on the net
to inform everyone of the conventions being followed.

					George Otto
					AT&T Bell Labs, Whippany
					------------------------

davew@shark.UUCP (05/21/84)

 Goran brings up a problem that really has not been addressed
yet by the Unix community. What do you do when you need to
design a keyboard for a language that uses a character set
greater than English. In many applications the ASCII codes
are stolen from symbols such as: #, $, @, [, ], {, },
\, |, and/or ~. All these symbols have important functions
in Unix, C or file editors such as ed, ex or vi. The user
who wishes to use his national language is faced with a
problem. If he/she uses a keyboard designed for his/her language
all the commands and code using the above symbols must be
transformed into the character that has replaced them.
 When using the net spelling must be transliterated so
that a name like Bj(../o)rn does not show up as Bj|rn.
One way around this is to have a keyboard that could be
shifted in and out of the extended character set so that
both extra characters and the standard symbols could be used.
The news network would have to detect if the keyboard were
shifted in or out of the extended mode. What seems to be
required is some sort of standard to accomplish this.
Without this ability I feel Unix will meet with a lot
of resistance to its usage in Europe, particularly
among the business and governmental communities.
Any other thoughts on the subject out there?
                             Dave Williams
                             Tektronix, Inc.
                             ECS

 (Unix is a trademark of Bell Laboratories.)

leif@erix.UUCP (Leif Samuelsson) (05/22/84)

Well, it seems that Chris has given us a veritable
sm\"org\aasbord of characters to choose from.
(Or sm|rg}sbord as we write it in "swescii"). 

Now it seems that we are only an \AAngstr\"om away
from an international standard :-)



	Leif Samuelsson

	..{decvax, philabs}!mcvax!enea!erix!leif

piet@mcvax.UUCP (Piet Beertema) (05/22/84)

<...>

	>Is this a "new" problem or has it been discussed before...
As far as I know it has been discussed before, but not on the net, i.e.
not in public. I know the problem too well from my attempts to keep
the EUNET map up-to-date, resulting only a few days ago in a fierce
discussion with Denmark.

	>It may be better to use some ascii-adapted spelling rather
	>than take the risk that someone out there throws his terminal
	>or system out of the window because he/she thinks it malfunctions
	>when strange characters appear in names.
I agree, but not for that reason. What would you say, if a (postal)
address was entered in the maps like "L}d|r\" (assuming for a while the
place exists). Would you really believe any postal service would be able
to interpret it and deliver a mail correctly?

	>I am getting used to have my name spelt in strange ways anyhow
	>(not to mention pronunciation)...
	>What translation rule should be used, similar appearance or
	>pronunciation, or perhaps the evolution of the letter?
Correct me if I'm wrong, but as far as I know there are "official" or
at least common-used expansions for certain letters; e.g. the '}' in your
name I've commonly seen as 'aa' (especially in combination with a capital
like 'Aa'); and the '|' as 'oe'; the latter also goes for German. And don't
worry about mispronunciation; that will be the case anyway; after all who
would ever come to the idea to pronunciate the first 'G' in your name as
a 'J' (even that's only an approximation, the right pronunciation doesn't
exist in English).

So I would say that you stick to the "standard" alfabet. Otherwise what
would you expect from Greek sites (or even Russian...)? They are using a
transcription to the standard alfabet, e.g. 'omega' -> 'w', which works
very well.
-- 
	Piet Beertema, CWI, Amsterdam
	...{decvax,philabs}!mcvax!piet

dan@ttdsv.UUCP (Dan Sahlin) (05/24/84)

There IS a standard on how to represent the national characters
in a way that makes ASCII a true subset.
Also the Teletex and the Videotex standards are subsets of this
standard (ISO 6937/2-1983).
The code is essentially an 8-bit code with 328 different characters!
Most characters consist of just one byte, but the characters having
a diacritical mark consist of two bytes. The first byte is a "non-spacing"
diacritical mark affecting the next byte.
I have written an editor using this standard, and I find it very
convenient.
All languages in Europe, except English, need something more than ASCII
in order to be written properly. So far, each country has solved
this problem in its own way by exchanging the characters "{}[]|\" to some
national characters. Let's hope for something better in the future!

Dan Sahlin              (decvax!mcvax!enea!ttds!dan)

piet@mcvax.UUCP (Piet Beertema) (05/25/84)

<...>

Even if there is a standard, can you suppose every Usenet reader to know
that? Or even worse: to know all the standards about all the languages
in use in the different countries hooked up to Usenet (Greece, Korea,
maybe Japan in the near future)?
-- 
	Piet Beertema, CWI, Amsterdam
	...{decvax,philabs}!mcvax!piet