jc@minya.UUCP (12/05/87)
Hello again. A week or so back I requested info on cases of damage to files by various mailers. I got a lot of requests for a summary. Since the responses have died down, it's about time to summarize. First, I'd like to thank all the folks who sent entries. Some of the mailers out there are truly demented! I got a lot of yuks from some of the contributions. As was to be expected, I got a good collection of flames telling me why the things I listed were proper. I don't care about that. The point is to make a list of the kinds of damage that might be done. A lot of us are faced with "How can I get this document to so-and-so over on that machine?" The idea is to get it there, undamaged, in whatever funny format the word processor uses. Format translation is interesting, but it's an unrelated problem. This can't be handled correctly by mailers, anyway, so the ideal situation would be mailers that forward files undamaged. The bits that go in should be the bits that come out the other end. We aren't very close to that ideal. To solve the problems with mailers, it is necessary to run some sort of encoding program (such as uuencode) on the source file, and then run the inverse decoding program at the receiving end. In order to write such programs, it is helpful if we know just what sort of damage is possible (not likely, not acceptable, not standard, but possible) from intervening mailers. Anyhow, off the soapbox and on to the list. Here's what I have now: 1. Occurrences of the string "\nFrom " have '>' inserted before the 'F'. This is from the uucp mailer. 2. If the string "\n.\n" occurs, the tail end of the file (starting at the '.') is discarded. Some mailers try to prevent this by converting the offending string to "\n..\n". Both uucp mail and sendmail are guilty of this one. 3. High-order bits are turned off (or set to parity or randomized). This is usually the fault of a serial-port interface. 4. Null bytes are dropped. Also, strings between a null and the next CR or NL may be dropped. This often happens as a side-effect of the "standard" null-terminated string representation in C. 5. If a backspace occurs, it and the preceding character are deleted. This is also usually do to a serial-port interface. 6. ASCII tabs are expanded to some number of spaces. This may be done by just about any piece of hardware or software in the path. 7. Spaces and tabs may be replaced by a compressed space count. 8. Trailing spaces may be deleted from message lines, or added to make lines a multiple of some number (usually 4 or 6). This includes padding null lines (which are illegal on some systems). 9. Truncation or wrapping of long lines. For instance, BITNET mail is 80 column "PUNCH" files sent to a virtual card reader. 10. Silent discarding (or truncating) of mail which is "too long". SendMail has a limit (configurable) of message size, which is usually something like 100K. Uucp truncates files to 32K on some 16-bit machines. The mail system on [one system] has a limit of 200 lines. 11. Some mailers add a ^M (CR) to the end of every line; others delete ^M before ^J (LF) or wherever it is found. This is part of the religious debate about whether "lines" should be separated by LF or by CR/LF. Sometimes this conversion is actually done by the low-level serial port interface. 12. Control chars other than CR, LF, FF, and TAB converted to ?. There was also the interesting comment: | And don't forget the worst damage of all - ASCII/EBCDIC translation! | Since there's no one-to-one mapping, and different sites use different | translation tables, there's no way you can know what the mail will look | like when it gets through. Most commonly caught characters are characters | in ASCII range 5B-5F and 7B-7F. And, of course, tabs are expanded to | spaces and formfeeds are usually lost.... Another writer listed the characters most likely to be corrupted as: {}~`[]|^\" This one is especially interesting, because it invalidates the uuencode program. This encoding produces characters in the specified ranges, and thus uuencoded files may be garbled as they pass through EBCDIC machines. It would be interesting to learn just what characters (i.e., hex values) can be safely transferred through ASCII/EBCDIC interfaces. An encoding scheme like uuencode could be written using translation tables, if there are 64 character codes that can be guaranteed reliable in all ASCII/EBCDIC interfaces. Can people out there with EBCDIC systems give me some information about how their translation tables work? Are there 64 codes that can be trusted to any ASCII/EBCDIC translators, and will come out the same when fed to any other EBCDIC/ASCII translator? To end with a bit of levity: ==> Mailers that let "From:" addresses like "user@host.UUCP", "host!user", or "user@host.BITNET" escape on to the Internet without fixing the address (e.g., "user@host.UUCP" becomes "user%host.UUCP@gateway.do.main). ==> Prepending "host!" to the From: lines of mail passing through the site and going out through UUCP. Maybe I'm being weird, but I really can't see any end user getting very excited about such things. -- John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)
pdb@sei.cmu.edu (Patrick Barron) (12/06/87)
In article <425@minya.UUCP> jc@minya.UUCP (John Chambers) writes: >To end with a bit of levity: > > ==> Mailers that let "From:" addresses like "user@host.UUCP", > "host!user", or "user@host.BITNET" escape on to the Internet > without fixing the address (e.g., "user@host.UUCP" becomes > "user%host.UUCP@gateway.do.main). > ==> Prepending "host!" to the From: lines of mail passing > through the site and going out through UUCP. > >Maybe I'm being weird, but I really can't see any end user getting >very excited about such things. Since I was the one who mentioned the first point, I'll explain why it's a really bad thing: Suppose I'm on a machine that understands Internet/ARPANET style mail addresses. If I get a message that has (for instance) a "From:" line containing "user@host.BITNET", and I try to use my mailer's "Reply" command to reply to it, the message will be rejected by the mailer because ".BITNET" is not a valid domain. Even worse, if the message can't be delivered on the Internet side (because, for instance, the user on the destination machine doesn't exist), the mail may not be returned to sender, but might in fact end up in some "postmaster" mailbox somewhere, again because the message didn't contain a valid "From:" address. Sure, you can hack up your mailer to try and do something intelligent with that sort of address, but such hacks can't always be depended on. Under BSD Unix, lots of people have rules in their sendmail.cf file to send all ".BITNET" mail to wiscvm.wisc.edu. Well, in 10 days, all those mailers are going to break, because wiscvm isn't going to be the gateway anymore. The same thing happened with ".UUCP" mail, which a lot of people hardwired to go to seismo. When the plug was pulled, a lot of "end users" were real upset because "this address that worked yesterday doesn't work today." I guess the bottom line is that, if you're going to gateway mail to the Internet from some other net, you had best make sure that you fix the headers up to be in RFC 822 format, otherwise some mailer somewhere down the line that you have no control over may choke on it. --Pat.
paul@umix.cc.umich.edu ('da Kingfish) (12/06/87)
In article <3473@aw.sei.cmu.edu> pdb@sei.cmu.edu (Pat Barron) writes: >Under >BSD Unix, lots of people have rules in their sendmail.cf file to send all >".BITNET" mail to wiscvm.wisc.edu. Well, in 10 days, all those mailers >are going to break, because wiscvm isn't going to be the gateway anymore. >The same thing happened with ".UUCP" mail, which a lot of people hardwired >to go to seismo. Yeah, but the the bitnet forward is just a one line change in sendmail.cf. The uucp shouldn't be much more. Or, you could alias seismo to be something else for uucp. These things strike me as an aspect of site administration, and if the warnings for the bitnet cutovers, or the shift from seismo -> uunet didn't give people an opportunity to make the change, that's not the notifiers' problems. Both cases provided quite a bit of lead time. This message brings to mind something else. Once in while, one sees a message in this group (or a related one) that goes "i am writing this new mailer, and it will fix these problems i have with sendmail and do these sendmail things as trivial cases of this general principle." Now, mailer writing is not a bad thing, and this is an area to learn by doing and all that, but it sure seems to me that much of this is more the result of someone knowing a programming language better than they know the sendmail.cf syntax or sendmail itself. I realize that not everybody likes sendmail, and there are some attractive and very workable alternatives, like mmdf, but i'll bet that people who don't understand their mailer are bound to repeat it. Then again, I could have missed the point. Let me (oof) step down off of this soapbox and get on to the next article. --paul -- Trying everything that whiskey cures in Ann Arbor, Michigan. Over one billion messages read.
zeeff@b-tech.UUCP (Jon Zeeff) (12/09/87)
In article <425@minya.UUCP> jc@minya.UUCP (John Chambers) writes: > ==> Mailers that let "From:" addresses like "user@host.UUCP", > "host!user", or "user@host.BITNET" escape on to the Internet > without fixing the address (e.g., "user@host.UUCP" becomes > "user%host.UUCP@gateway.do.main). > ==> Prepending "host!" to the From: lines of mail passing > through the site and going out through UUCP. > >Maybe I'm being weird, but I really can't see any end user getting >very excited about such things. I think you're not thinking about it enough. End users do tend to get very excited when that find that they can't reply to their mail. -- Jon Zeeff Branch Technology, uunet!umix!b-tech!zeeff zeeff%b-tech.uucp@umix.cc.umich.edu