sommar@enea.se (Erland Sommarskog) (08/09/90)
(The first attempt didn't seem to leave our site. Apologies if you are seeing this twice.) Uwe Geuder (geuder@informatik.uni-stuttgart.de) writes: >From Keld J|rn Simonsen: > I use it in email, it is build into the sendmail we use here, > and EUnet has decided to run this on an experimental basis > on all the backbones of EUnet. > >What does this mean? When I get mail from Sweden, it's still in Swedish >ASCII (is that SSCII??), which is horrible too read on (US) ASCII devices >used in Germany (German 7-bit Code is never used here). If I run conv SE US >on such files they get much prettier. So I can't imagine that any host in >between has already done it. Or is there no "EUnet backbone" between Sweden >and Germany? I get a little anxious here, but I may misunderstand some things here. I certainly don't want mail I send out to be automatically transformed when they get out. Yes, I understand that occurrances of ][\}{| are not nice to read, but it seems a risky business to translate them straight off. If I use them in an non-Swedish mail, I usually explain them. With a non-wanted transformation, that would look a little stupid. (And how does the machine know that I use an "[" as a dotted capital "A" and not as a left bracket?) Wouldn't it be better, if this was done at receiver's end on request? Another question: Through a mailing-list I have indirectly received a list of two-character code stemming from Keld Simonsen. I don't know whether it is this one we discuss, but I would assume so. I must admit that I laid that one aside with the thought: "My God, how unreadable and what an overkill!" I tend to think I missed some points with its purpose. Could Keld or anyone else clarify? And a final question: we are moving into an eight-bit world. Instead of relying on old standards, why not aim to have Eunet work with ISO 8859/1 instead? (8859 is apparently already obsolete with the recent changes in Eastern Europe, but that is another matter.) -- Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
keld@login.dkuug.dk (Keld J|rn Simonsen) (08/10/90)
sommar@enea.se (Erland Sommarskog) writes: >Uwe Geuder (geuder@informatik.uni-stuttgart.de) writes: >>From Keld J|rn Simonsen: >> I use it in email, it is build into the sendmail we use here, >> and EUnet has decided to run this on an experimental basis >> on all the backbones of EUnet. >> >>What does this mean? When I get mail from Sweden, it's still in Swedish >>ASCII (is that SSCII??), which is horrible too read on (US) ASCII devices >>used in Germany (German 7-bit Code is never used here). If I run conv SE US >>on such files they get much prettier. So I can't imagine that any host in >>between has already done it. Or is there no "EUnet backbone" between Sweden >>and Germany? >I get a little anxious here, but I may misunderstand some things >here. I certainly don't want mail I send out to be automatically >transformed when they get out. Yes, I understand that occurrances >of ][\}{| are not nice to read, but it seems a risky business to >translate them straight off. If I use them in an non-Swedish mail, >I usually explain them. With a non-wanted transformation, that >would look a little stupid. (And how does the machine know that >I use an "[" as a dotted capital "A" and not as a left bracket?) >Wouldn't it be better, if this was done at receiver's end on request? Yes, I share Erland's concerns. You cannot just translate 7-bit [\] (these 7-bit values are defined as letters in both Swedish and Danish 7-bit) to ISO 8859-1 Swedish/Danish letters. What we do at dkuug.dk (the Danish Internet backbone) is transforming both 8-bit curly braces and Scandinavian letters to 7-bit [\]. The other way, from 7-bit Danish or Swedish to ASCII or some 8-bit code, we normally do not touch these codes. The conversion we do here are mostly for use on 8-bit machines, where some run ISO 8859-1 and some runs some IBM Codepage. Doing it at the receivers end: well the receiver needs to know what information is in there. This information must be generated on the senders side, who knows what the message is. >Another question: Through a mailing-list I have indirectly received >a list of two-character code stemming from Keld Simonsen. I don't >know whether it is this one we discuss, but I would assume so. I >must admit that I laid that one aside with the thought: "My God, >how unreadable and what an overkill!" I tend to think I missed >some points with its purpose. Could Keld or anyone else clarify? Yes, I have made a quite elaborate list of character names, which is being used for mail. It is designed for worldwide use, and the world is big. There is about 940 characters in there covering all 7 and 8-bit character sets I know of. It does not yet contain any Japanese nor Chinese character. The character names are primarily used to identify a character and to be able to registrate properties of these, such as membership of a character set or that it is a lower case character, and then the upper case character can be specified alongside. It does have some mnemonic value, eg a with dieresis (a-umlaut) is called "a:". How readable and beautiful this is can always be discussed, but there are some rules to it which are consistently applied. It is also been designed with short names of the characters to improve compactness and translation costs, and also to improve readability and writability. >And a final question: we are moving into an eight-bit world. Instead >of relying on old standards, why not aim to have Eunet work with ISO >8859/1 instead? (8859 is apparently already obsolete with the recent >changes in Eastern Europe, but that is another matter.) I am collaborating with a fellow countryman of yours, Dan Oscarsson from LTH, on using the new ISO 10646 character set for email. This character set has almost all characters in the world in a 32 bit compactable code set. No ISO 8859 is not outdated. ISO 8859-2 covers Eastern Europe, and ISO 859-5 covers Russia (Cyrillic). 8859 does not cover Japanese and other Eastern character sets, though. This was the reason we decided on ISO 10646. Keld Simonsen
sommar@enea.se (Erland Sommarskog) (08/12/90)
Keld J|rn Simonsen (keld@login.dkuug.dk) writes: >No ISO 8859 is not outdated. ISO 8859-2 covers Eastern Europe, >and ISO 859-5 covers Russia (Cyrillic). 8859 does not cover >Japanese and other Eastern character sets, though. This was the reason >we decided on ISO 10646. What I meant when I said that 8859 was obsolete is that one year ago it seemed like you could live with having to change to another character set to read and write Polish, Hungarian etc, since the political and econimical situation would make such cases would be rare. Now when they suddenly are joining the free world this cases could be expected to be more freequent. And I am not only talking articles and mail in these languages, but also multi-language texts. For instance if Lech Walesa ever appears on Usenet, it would be nice if his name could come up right with a slashed "l" and cedilla on the "e". (But of course, with all ancient mailers, archaic Unix varieties, GNU-Emacs, etc, I'm in doubts that anything than plain seven-bit ASCII will ever be regarded comme-il-faut on Usenet. :-) -- Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
src@scuzzy.mbx.sub.org (Heiko Blume) (08/14/90)
sommar@enea.se (Erland Sommarskog) writes: >(But of course, with all ancient mailers, archaic Unix varieties, >GNU-Emacs, etc, I'm in doubts that anything than plain seven-bit >ASCII will ever be regarded comme-il-faut on Usenet. :-) as far as i know GNU emacs 19 will have 16bit characters. -- Heiko Blume c/o Diakite blume@scuzzy.mbx.sub.org FAX (+49 30) 882 50 65 Kottbusser Damm 28 blume@netmbx.UUCP VOICE (+49 30) 691 88 93 D-1000 Berlin 61 blume@netmbx.de TELEX 184174 intro d scuzzy Any ACU,e 19200 6919520 ogin:--ogin: nuucp ssword: nuucp
yfcw14@castle.ed.ac.uk (K P Donnelly) (08/14/90)
I find that I can send 8-bit mail messages over the UK JANET network to and from VAX/VMS machines without trouble. However, it seems that if mail goes anywhere near a Unix machine it gets the eighth bit stripped. The trouble seems to be the file transfer utility hhcp. I find that I can work happily in 8-bits on Edinburgh University's central Unix machine, provided I set stty -odd or stty -even. The new versions of microEMACS, MS-KERMIT and TeX all support 8-bit text. However, if I try to transfer an 8-bit text file into the Unix machine using hhcp it gets the eighth bit stripped; and if I try to transfer an 8-bit text file out from the Unix machine, hhcp treats it as binary file and the file gets horribly garbled. An Irish Gaelic conferencing system which I have accessed is hosted on a VAX/VMS machine and uses ISO 8859-1 as standard. However, when I access it, via IPSS and the Irish commercial packet switch network, EIRPAC, the eighth bit gets stripped somewhere along the way. Would anyone like to summarize experiences elsewhere with 8-bit mail. Are any networks already happily using ISO 8859-1 for mail? What are the main bottlenecks at present to 8-bit work? Kevin Donnelly
prc@erbe.se (Robert Claeson) (08/15/90)
In article <5681@castle.ed.ac.uk>, yfcw14@castle.ed.ac.uk (K P Donnelly) writes: > I find that I can send 8-bit mail messages over the UK JANET network > to and from VAX/VMS machines without trouble. However, it seems that if > mail goes anywhere near a Unix machine it gets the eighth bit stripped. > The trouble seems to be the file transfer utility hhcp. I assume that with "hhcp" you mean "uucp". Uucp doesn't strip anything. One can send binary files using uucp (this is how we get our 'news'). I believe that the problem lies in "sendmail", especially as implemented on many UNIX systems running BSD UNIX. Many System V UNIXes that comes with sendmail does allow 8 bit characters. > Would anyone like to summarize experiences elsewhere with 8-bit mail. > Are any networks already happily using ISO 8859-1 for mail? Yes, our internal network is using ISO 8859/1 for e-mail. It consists of UNIX hosts only. However, as soon as any message goes up to the EUnet backbone here, the eight bit is chopped off all characters in the message. > What are the main bottlenecks at present to 8-bit work? I don't believe that there are any. In fact, chopping off the eight bit, as done on many hosts, is likely to consume more CPU power. The network bandwidth used would be the same. The X.25 protocol as such is also eight-bit clean, but when e-mail is transferred over X.3/X.29 (ie, the PAD function), it is common to have to encode 8 bit data using plain ASCII or something similar, as done by, for example the uucp 'f' protocol, btoa and uuencode. This is because many PADs don't permit a completely eight-bit transparent path. -- Robert Claeson |Reasonable mailers: rclaeson@erbe.se ERBE DATA AB | Dumb mailers: rclaeson%erbe.se@sunet.se | Perverse mailers: rclaeson%erbe.se@encore.com These opinions reflect my personal views and not those of my employer (ask him).
tkld@castle.ed.ac.uk (K Davidson) (08/17/90)
In article <1740@hugo.erbe.se> prc@erbe.se (Robert Claeson) writes: >In article <5681@castle.ed.ac.uk>, yfcw14@castle.ed.ac.uk (K P Donnelly) writes: > >> I find that I can send 8-bit mail messages over the UK JANET network >> to and from VAX/VMS machines without trouble. However, it seems that if >> mail goes anywhere near a Unix machine it gets the eighth bit stripped. >> The trouble seems to be the file transfer utility hhcp. > >I assume that with "hhcp" you mean "uucp". Uucp doesn't strip anything. >One can send binary files using uucp (this is how we get our 'news'). >I believe that the problem lies in "sendmail", especially as implemented >on many UNIX systems running BSD UNIX. Many System V UNIXes that comes >with sendmail does allow 8 bit characters. No, he really does mean hhcp. JANET does not use TCP/IP or uucp, it has its own ``coloured book'' protocols, one of which is NIFTP (I forget the colour) for transfering files. This is what hhcp uses. hhcp -b8 file remote:file might work, but most Unix utilities that deal with rs232 lines seem to have 7bit braindamage all the way through them. Get hold of a uu{en,de}code that compiles on a VMS system and encode your mail. ( Horrible isn't it :-( ) >-- >Robert Claeson |Reasonable mailers: rclaeson@erbe.se >ERBE DATA AB | Dumb mailers: rclaeson%erbe.se@sunet.se > | Perverse mailers: rclaeson%erbe.se@encore.com >These opinions reflect my personal views and not those of my employer (ask him). I have to add this because Pnews complains. What idiot wrote this? :-( -- .Kevin. <tkld@castle.ed.ac.uk> || <tkld@lfcs.ed.ac.uk> || <tkld@tardis.cs.ed.ac.uk> ...and he did not think it too many.
richard@aiai.ed.ac.uk (Richard Tobin) (08/21/90)
In article <1740@hugo.erbe.se> prc@erbe.se (Robert Claeson) writes: >> I find that I can send 8-bit mail messages over the UK JANET network >> to and from VAX/VMS machines without trouble. However, it seems that if >> mail goes anywhere near a Unix machine it gets the eighth bit stripped. >> The trouble seems to be the file transfer utility hhcp. > >I assume that with "hhcp" you mean "uucp". No, he means hhcp. Hhcp is a program implementing NIFTP ("network-independent file transfer protocol"). Unfortunately, it doesn't implement it well enough for 8-bit transfers to work. The problem does not exist with the (free) unix-niftp software, and possibly not with more recent versions of hhcp. -- Richard -- Richard Tobin, JANET: R.Tobin@uk.ac.ed AI Applications Institute, ARPA: R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk Edinburgh University. UUCP: ...!ukc!ed.ac.uk!R.Tobin