sommar@enea.se (Erland Sommarskog) (08/09/90)
(The first attempt didn't seem to leave our site. Apologies if you are seeing this twice.) Uwe Geuder (geuder@informatik.uni-stuttgart.de) writes: >From Keld J|rn Simonsen: > I use it in email, it is build into the sendmail we use here, > and EUnet has decided to run this on an experimental basis > on all the backbones of EUnet. > >What does this mean? When I get mail from Sweden, it's still in Swedish >ASCII (is that SSCII??), which is horrible too read on (US) ASCII devices >used in Germany (German 7-bit Code is never used here). If I run conv SE US >on such files they get much prettier. So I can't imagine that any host in >between has already done it. Or is there no "EUnet backbone" between Sweden >and Germany? I get a little anxious here, but I may misunderstand some things here. I certainly don't want mail I send out to be automatically transformed when they get out. Yes, I understand that occurrances of ][\}{| are not nice to read, but it seems a risky business to translate them straight off. If I use them in an non-Swedish mail, I usually explain them. With a non-wanted transformation, that would look a little stupid. (And how does the machine know that I use an "[" as a dotted capital "A" and not as a left bracket?) Wouldn't it be better, if this was done at receiver's end on request? Another question: Through a mailing-list I have indirectly received a list of two-character code stemming from Keld Simonsen. I don't know whether it is this one we discuss, but I would assume so. I must admit that I laid that one aside with the thought: "My God, how unreadable and what an overkill!" I tend to think I missed some points with its purpose. Could Keld or anyone else clarify? And a final question: we are moving into an eight-bit world. Instead of relying on old standards, why not aim to have Eunet work with ISO 8859/1 instead? (8859 is apparently already obsolete with the recent changes in Eastern Europe, but that is another matter.) -- Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
keld@login.dkuug.dk (Keld J|rn Simonsen) (08/10/90)
sommar@enea.se (Erland Sommarskog) writes: >Uwe Geuder (geuder@informatik.uni-stuttgart.de) writes: >>From Keld J|rn Simonsen: >> I use it in email, it is build into the sendmail we use here, >> and EUnet has decided to run this on an experimental basis >> on all the backbones of EUnet. >> >>What does this mean? When I get mail from Sweden, it's still in Swedish >>ASCII (is that SSCII??), which is horrible too read on (US) ASCII devices >>used in Germany (German 7-bit Code is never used here). If I run conv SE US >>on such files they get much prettier. So I can't imagine that any host in >>between has already done it. Or is there no "EUnet backbone" between Sweden >>and Germany? >I get a little anxious here, but I may misunderstand some things >here. I certainly don't want mail I send out to be automatically >transformed when they get out. Yes, I understand that occurrances >of ][\}{| are not nice to read, but it seems a risky business to >translate them straight off. If I use them in an non-Swedish mail, >I usually explain them. With a non-wanted transformation, that >would look a little stupid. (And how does the machine know that >I use an "[" as a dotted capital "A" and not as a left bracket?) >Wouldn't it be better, if this was done at receiver's end on request? Yes, I share Erland's concerns. You cannot just translate 7-bit [\] (these 7-bit values are defined as letters in both Swedish and Danish 7-bit) to ISO 8859-1 Swedish/Danish letters. What we do at dkuug.dk (the Danish Internet backbone) is transforming both 8-bit curly braces and Scandinavian letters to 7-bit [\]. The other way, from 7-bit Danish or Swedish to ASCII or some 8-bit code, we normally do not touch these codes. The conversion we do here are mostly for use on 8-bit machines, where some run ISO 8859-1 and some runs some IBM Codepage. Doing it at the receivers end: well the receiver needs to know what information is in there. This information must be generated on the senders side, who knows what the message is. >Another question: Through a mailing-list I have indirectly received >a list of two-character code stemming from Keld Simonsen. I don't >know whether it is this one we discuss, but I would assume so. I >must admit that I laid that one aside with the thought: "My God, >how unreadable and what an overkill!" I tend to think I missed >some points with its purpose. Could Keld or anyone else clarify? Yes, I have made a quite elaborate list of character names, which is being used for mail. It is designed for worldwide use, and the world is big. There is about 940 characters in there covering all 7 and 8-bit character sets I know of. It does not yet contain any Japanese nor Chinese character. The character names are primarily used to identify a character and to be able to registrate properties of these, such as membership of a character set or that it is a lower case character, and then the upper case character can be specified alongside. It does have some mnemonic value, eg a with dieresis (a-umlaut) is called "a:". How readable and beautiful this is can always be discussed, but there are some rules to it which are consistently applied. It is also been designed with short names of the characters to improve compactness and translation costs, and also to improve readability and writability. >And a final question: we are moving into an eight-bit world. Instead >of relying on old standards, why not aim to have Eunet work with ISO >8859/1 instead? (8859 is apparently already obsolete with the recent >changes in Eastern Europe, but that is another matter.) I am collaborating with a fellow countryman of yours, Dan Oscarsson from LTH, on using the new ISO 10646 character set for email. This character set has almost all characters in the world in a 32 bit compactable code set. No ISO 8859 is not outdated. ISO 8859-2 covers Eastern Europe, and ISO 859-5 covers Russia (Cyrillic). 8859 does not cover Japanese and other Eastern character sets, though. This was the reason we decided on ISO 10646. Keld Simonsen
sommar@enea.se (Erland Sommarskog) (08/12/90)
Keld J|rn Simonsen (keld@login.dkuug.dk) writes: >No ISO 8859 is not outdated. ISO 8859-2 covers Eastern Europe, >and ISO 859-5 covers Russia (Cyrillic). 8859 does not cover >Japanese and other Eastern character sets, though. This was the reason >we decided on ISO 10646. What I meant when I said that 8859 was obsolete is that one year ago it seemed like you could live with having to change to another character set to read and write Polish, Hungarian etc, since the political and econimical situation would make such cases would be rare. Now when they suddenly are joining the free world this cases could be expected to be more freequent. And I am not only talking articles and mail in these languages, but also multi-language texts. For instance if Lech Walesa ever appears on Usenet, it would be nice if his name could come up right with a slashed "l" and cedilla on the "e". (But of course, with all ancient mailers, archaic Unix varieties, GNU-Emacs, etc, I'm in doubts that anything than plain seven-bit ASCII will ever be regarded comme-il-faut on Usenet. :-) -- Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se