chris@umcp-cs.UUCP (05/19/84)
Personally, I'd vote for accents \'a l\`a \TeX. (Forgive me if I've got the accents wrong here, I don't really know French and I've just spent 15 minutes trying to find the right ones....) In \TeX, the following accents are available: \` grave \' acute \^ circumflex \" umlaut or dieresis \~ tilde \= macron (bar) \. dot \u breve (the little ``u'') \v h\'a\v cek (the little ``v'') \H long Hungarian umlaut \t tie-after accent (I've no idea what this one is for) and these which go underneath: \c cedilla (``Fran\c caise'') \d dot, but underneath \b bar, but underneath Then there are some special letters: \oe oe ligature (there is also \OE for uppercase) \ae ae ligature (and \AE for uppercase) \aa a with circle (``\AA ngstrom'') \o o with slash (and \O for uppercase) \l Polish suppressed-L (a small stroke through the middle; there is also \L for an uppercase version) \ss German ``es-zet'' or sharp S That covers everything I've ever seen, and a few I've never seen, though I have no idea if it covers all the European characters in languages using Roman alphabets. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690 UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@maryland
otto@whuxle.UUCP (George V.E. Otto) (05/20/84)
The German solution to this problem is, I believe, to use the older spellings for names with "funny" (i.e., national) characters in them. Hence umlauted "o" is spelled "oe" when typed on a typewriter without national characters. Using this scheme, the name of Hofstadter's book would be typed "Goedel, Escher, Bach." I think the same scheme should be used with other national characters. This can be done easily when there is a history connecting letter combinations to national characters. I don't know if this is a possible with *all* national characters, however. Even if so, there would have to be some sort of lexicon published on the net to inform everyone of the conventions being followed. George Otto AT&T Bell Labs, Whippany ------------------------
davew@shark.UUCP (05/21/84)
Goran brings up a problem that really has not been addressed yet by the Unix community. What do you do when you need to design a keyboard for a language that uses a character set greater than English. In many applications the ASCII codes are stolen from symbols such as: #, $, @, [, ], {, }, \, |, and/or ~. All these symbols have important functions in Unix, C or file editors such as ed, ex or vi. The user who wishes to use his national language is faced with a problem. If he/she uses a keyboard designed for his/her language all the commands and code using the above symbols must be transformed into the character that has replaced them. When using the net spelling must be transliterated so that a name like Bj(../o)rn does not show up as Bj|rn. One way around this is to have a keyboard that could be shifted in and out of the extended character set so that both extra characters and the standard symbols could be used. The news network would have to detect if the keyboard were shifted in or out of the extended mode. What seems to be required is some sort of standard to accomplish this. Without this ability I feel Unix will meet with a lot of resistance to its usage in Europe, particularly among the business and governmental communities. Any other thoughts on the subject out there? Dave Williams Tektronix, Inc. ECS (Unix is a trademark of Bell Laboratories.)
leif@erix.UUCP (Leif Samuelsson) (05/22/84)
Well, it seems that Chris has given us a veritable sm\"org\aasbord of characters to choose from. (Or sm|rg}sbord as we write it in "swescii"). Now it seems that we are only an \AAngstr\"om away from an international standard :-) Leif Samuelsson ..{decvax, philabs}!mcvax!enea!erix!leif
piet@mcvax.UUCP (Piet Beertema) (05/22/84)
<...> >Is this a "new" problem or has it been discussed before... As far as I know it has been discussed before, but not on the net, i.e. not in public. I know the problem too well from my attempts to keep the EUNET map up-to-date, resulting only a few days ago in a fierce discussion with Denmark. >It may be better to use some ascii-adapted spelling rather >than take the risk that someone out there throws his terminal >or system out of the window because he/she thinks it malfunctions >when strange characters appear in names. I agree, but not for that reason. What would you say, if a (postal) address was entered in the maps like "L}d|r\" (assuming for a while the place exists). Would you really believe any postal service would be able to interpret it and deliver a mail correctly? >I am getting used to have my name spelt in strange ways anyhow >(not to mention pronunciation)... >What translation rule should be used, similar appearance or >pronunciation, or perhaps the evolution of the letter? Correct me if I'm wrong, but as far as I know there are "official" or at least common-used expansions for certain letters; e.g. the '}' in your name I've commonly seen as 'aa' (especially in combination with a capital like 'Aa'); and the '|' as 'oe'; the latter also goes for German. And don't worry about mispronunciation; that will be the case anyway; after all who would ever come to the idea to pronunciate the first 'G' in your name as a 'J' (even that's only an approximation, the right pronunciation doesn't exist in English). So I would say that you stick to the "standard" alfabet. Otherwise what would you expect from Greek sites (or even Russian...)? They are using a transcription to the standard alfabet, e.g. 'omega' -> 'w', which works very well. -- Piet Beertema, CWI, Amsterdam ...{decvax,philabs}!mcvax!piet
dan@ttdsv.UUCP (Dan Sahlin) (05/24/84)
There IS a standard on how to represent the national characters in a way that makes ASCII a true subset. Also the Teletex and the Videotex standards are subsets of this standard (ISO 6937/2-1983). The code is essentially an 8-bit code with 328 different characters! Most characters consist of just one byte, but the characters having a diacritical mark consist of two bytes. The first byte is a "non-spacing" diacritical mark affecting the next byte. I have written an editor using this standard, and I find it very convenient. All languages in Europe, except English, need something more than ASCII in order to be written properly. So far, each country has solved this problem in its own way by exchanging the characters "{}[]|\" to some national characters. Let's hope for something better in the future! Dan Sahlin (decvax!mcvax!enea!ttds!dan)
piet@mcvax.UUCP (Piet Beertema) (05/25/84)
<...> Even if there is a standard, can you suppose every Usenet reader to know that? Or even worse: to know all the standards about all the languages in use in the different countries hooked up to Usenet (Greece, Korea, maybe Japan in the near future)? -- Piet Beertema, CWI, Amsterdam ...{decvax,philabs}!mcvax!piet