[comp.mail.misc] Summary of mail-damage survey.

jc@minya.UUCP (12/05/87)

Hello again.  A week or so back I requested info on cases of damage
to files by various mailers.  I got a lot of requests for a summary.
Since the responses have died down, it's about time to summarize.

First, I'd like to thank all the folks who sent entries.  Some of
the mailers out there are truly demented!  I got a lot of yuks from
some of the contributions.

As was to be expected, I got a good collection of flames telling me
why the things I listed were proper.  I don't care about that.  The
point is to make a list of the kinds of damage that might be done.

A lot of us are faced with "How can I get this document to so-and-so 
over on that machine?"  The idea is to get it there, undamaged, in
whatever funny format the word processor uses.  Format translation
is interesting, but it's an unrelated problem.  This can't be handled 
correctly by mailers, anyway, so the ideal situation would be mailers
that forward files undamaged.  The bits that go in should be the bits
that come out the other end.  We aren't very close to that ideal.

To solve the problems with mailers, it is necessary to run some sort
of encoding program (such as uuencode) on the source file, and then
run the inverse decoding program at the receiving end.  In order to
write such programs, it is helpful if we know just what sort of damage
is possible (not likely, not acceptable, not standard, but possible)
from intervening mailers.

Anyhow, off the soapbox and on to the list.  Here's what I have now:

	1.	Occurrences of the string "\nFrom " have '>' inserted before
		the 'F'.  This is from the uucp mailer.

	2.	If the string "\n.\n" occurs, the tail end of the file (starting
		at the '.') is discarded.  Some mailers try to prevent this by
		converting the offending string to "\n..\n".  Both uucp mail and
		sendmail are guilty of this one.

	3.	High-order bits are turned off (or set to parity or randomized). 
		This is usually the fault of a serial-port interface.

	4.	Null bytes are dropped.  Also, strings between a null and the next
		CR or NL may be dropped. This often happens as a side-effect of the 
		"standard" null-terminated string representation in C.

	5.	If a backspace occurs, it and the preceding character are deleted.
		This is also usually do to a serial-port interface.

	6.	ASCII tabs are expanded to some number of spaces.  This may be
		done by just about any piece of hardware or software in the path.

	7.	Spaces and tabs may be replaced by a compressed space count.

	8.	Trailing spaces may be deleted from message lines, or added to
		make lines a multiple of some number (usually 4 or 6).  This
		includes padding null lines (which are illegal on some systems).

  	9.	Truncation or wrapping of long lines.  For instance, BITNET mail
		is 80 column "PUNCH" files sent to a virtual card reader.

  	10.	Silent discarding (or truncating) of mail which is "too long".  
		SendMail has a limit (configurable) of message size, which is 
		usually something like 100K.  Uucp truncates files to 32K on 
		some 16-bit machines. The mail system on [one system] has a 
		limit of 200 lines.

	11. Some mailers add a ^M (CR) to the end of every line; others
		delete ^M before ^J (LF) or wherever it is found.  This is 
		part of the religious debate about whether "lines" should
		be separated by LF or by CR/LF.  Sometimes this conversion
		is actually done by the low-level serial port interface.

	12.	Control chars other than CR, LF, FF, and TAB converted to ?.

There was also the interesting comment:

| And don't forget the worst damage of all - ASCII/EBCDIC translation!
| Since there's no one-to-one mapping, and different sites use different
| translation tables, there's no way you can know what the mail will look
| like when it gets through.  Most commonly caught characters are characters
| in ASCII range 5B-5F and 7B-7F.  And, of course, tabs are expanded to
| spaces and formfeeds are usually lost....  

Another writer listed the characters most likely to be corrupted as:
	{}~`[]|^\"

This one is especially interesting, because it invalidates the uuencode
program. This encoding produces characters in the specified ranges, and
thus uuencoded files may be garbled as they pass through EBCDIC machines.
It would be interesting to learn just what characters (i.e., hex values)
can be safely transferred through ASCII/EBCDIC interfaces.  An encoding
scheme like uuencode could be written using translation tables, if there
are 64 character codes that can be guaranteed reliable in all ASCII/EBCDIC
interfaces.

Can people out there with EBCDIC systems give me some information about
how their translation tables work?  Are there 64 codes that can be trusted
to any ASCII/EBCDIC translators, and will come out the same when fed to
any other EBCDIC/ASCII translator?

To end with a bit of levity:

	==> Mailers that let "From:" addresses like "user@host.UUCP", 
		"host!user", or "user@host.BITNET" escape on to the Internet 
		without fixing the address (e.g., "user@host.UUCP" becomes 
		"user%host.UUCP@gateway.do.main).
  	==> Prepending "host!" to the From: lines of mail passing
  		through the site and going out through UUCP.
  
Maybe I'm being weird, but I really can't see any end user getting 
very excited about such things.

-- 
John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)

pdb@sei.cmu.edu (Patrick Barron) (12/06/87)

In article <425@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>To end with a bit of levity:
>
>	==> Mailers that let "From:" addresses like "user@host.UUCP", 
>		"host!user", or "user@host.BITNET" escape on to the Internet 
>		without fixing the address (e.g., "user@host.UUCP" becomes 
>		"user%host.UUCP@gateway.do.main).
>  	==> Prepending "host!" to the From: lines of mail passing
>  		through the site and going out through UUCP.
>  
>Maybe I'm being weird, but I really can't see any end user getting 
>very excited about such things.

Since I was the one who mentioned the first point, I'll explain why it's
a really bad thing:  Suppose I'm on a machine that understands Internet/ARPANET
style mail addresses.  If I get a message that has (for instance) a "From:"
line containing "user@host.BITNET", and I try to use my mailer's "Reply"
command to reply to it, the message will be rejected by the mailer because
".BITNET" is not a valid domain.  Even worse, if the message can't be delivered
on the Internet side (because, for instance, the user on the destination
machine doesn't exist), the mail may not be returned to sender, but might
in fact end up in some "postmaster" mailbox somewhere, again because the
message didn't contain a valid "From:" address.

Sure, you can hack up your mailer to try and do something intelligent with
that sort of address, but such hacks can't always be depended on.  Under
BSD Unix, lots of people have rules in their sendmail.cf file to send all
".BITNET" mail to wiscvm.wisc.edu.  Well, in 10 days, all those mailers
are going to break, because wiscvm isn't going to be the gateway anymore.
The same thing happened with ".UUCP" mail, which a lot of people hardwired
to go to seismo.  When the plug was pulled, a lot of "end users" were real
upset because "this address that worked yesterday doesn't work today."

I guess the bottom line is that, if you're going to gateway mail to
the Internet from some other net, you had best make sure that you fix
the headers up to be in RFC 822 format, otherwise some mailer somewhere
down the line that you have no control over may choke on it.

--Pat.

paul@umix.cc.umich.edu ('da Kingfish) (12/06/87)

In article <3473@aw.sei.cmu.edu> pdb@sei.cmu.edu (Pat Barron) writes:
>Under
>BSD Unix, lots of people have rules in their sendmail.cf file to send all
>".BITNET" mail to wiscvm.wisc.edu.  Well, in 10 days, all those mailers
>are going to break, because wiscvm isn't going to be the gateway anymore.
>The same thing happened with ".UUCP" mail, which a lot of people hardwired
>to go to seismo.

Yeah, but the the bitnet forward is just a one line change in
sendmail.cf.  The uucp shouldn't be much more.  Or, you could alias
seismo to be something else for uucp.  These things strike me as an
aspect of site administration, and if the warnings for the bitnet
cutovers, or the shift from seismo -> uunet didn't give people an
opportunity to make the change, that's not the notifiers' problems.
Both cases provided quite a bit of lead time.

This message brings to mind something else.  Once in while, one sees a
message in this group (or a related one) that goes "i am writing this
new mailer, and it will fix these problems i have with sendmail and do
these sendmail things as trivial cases of this general principle."
Now, mailer writing is not a bad thing, and this is an area to learn by
doing and all that, but it sure seems to me that much of this is more
the result of someone knowing a programming language better than they
know the sendmail.cf syntax or sendmail itself.

I realize that not everybody likes sendmail, and there are some
attractive and very workable alternatives, like mmdf, but i'll bet that
people who don't understand their mailer are bound to repeat it.

Then again, I could have missed the point.  Let me (oof) step down off
of this soapbox and get on to the next article.

--paul
-- 
Trying everything that whiskey cures in Ann Arbor, Michigan.
Over one billion messages read.

zeeff@b-tech.UUCP (Jon Zeeff) (12/09/87)

In article <425@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>	==> Mailers that let "From:" addresses like "user@host.UUCP", 
>		"host!user", or "user@host.BITNET" escape on to the Internet 
>		without fixing the address (e.g., "user@host.UUCP" becomes 
>		"user%host.UUCP@gateway.do.main).
>  	==> Prepending "host!" to the From: lines of mail passing
>  		through the site and going out through UUCP.
>  
>Maybe I'm being weird, but I really can't see any end user getting 
>very excited about such things.

I think you're not thinking about it enough.  End users do tend to get 
very excited when that find that they can't reply to their mail.  








-- 
Jon Zeeff           		Branch Technology,
uunet!umix!b-tech!zeeff  	zeeff%b-tech.uucp@umix.cc.umich.edu