[comp.mail.misc] Survey on damage by mailers.

jc@minya.UUCP (John Chambers) (11/22/87)

Hello.  I'm interested in characterizing the sorts of damage that the
existing electronic mail systems can do to mail as they move it about.
To start the ball rolling, I'll give a few examples.  First, the mail
command that comes with most Unix systems:

	1. Any occurrence of the string "\nFrom " has '>' inserted before
		the 'F'.
	2. If the string "\n.\n" occurs, the tail end of the file (starting
		at the '.') is discarded.

In addition, I know of mailers that do the following:

	3. High-order bits are turned off (or set to parity).
	4. Null bytes are dropped.
	5. If a backspace occurs, it and the preceding character are deleted.
	6. ASCII tabs are expanded to some number of spaces.

Can you add to the list?

-- 
John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)

pdb@sei.cmu.edu (Patrick Barron) (11/22/87)

In article <408@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>	2. If the string "\n.\n" occurs, the tail end of the file (starting
>		at the '.') is discarded.

This isn't a bug, it's a feature.  And it can be disabled by putting
"unset dot" in your .mailrc file.

>In addition, I know of mailers that do the following:
>
>	3. High-order bits are turned off (or set to parity).
>	4. Null bytes are dropped.

Assuming that your mailer is RFC 822 compatible, these aren't bugs either.
RFC 822 specifies that mail messages can only contain 7-bit printable ASCII
characters, along with formatting codes like <CR> and <LF>.  If you really
have to mail such a file, you can use uuencode/uudecode, or some other
similar program.

>Can you add to the list?

How about:

   1) Mailers that delete trailing spaces from message lines, occasionally
      messing up uuencode/uudecode and other similar programs.
   2) Mailers that let "From:" addresses like "user@host.UUCP", "host!user",
      or "user@host.BITNET" escape on to the Internet without fixing the
      address (e.g., "user@host.UUCP" becomes "user%host.UUCP@gateway.do.main).

--Pat.

billw@killer.UUCP (11/23/87)

In article <408@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>Hello.  I'm interested in characterizing the sorts of damage that the
>existing electronic mail systems can do to mail as they move it about.
[..]
>Can you add to the list?

The mailer at texsun, which about 1/4 of my mail is routed through, adds a
^M to the end of every line. I must manually strip them off of anything
that I plan to work with, like source.
-- 
Bill Wisner, HASA "A" Division		..{codas,ihnp4}!killer!billw
"I don't mind at all.." -- Bourgeois Tagg

david@ms.uky.edu (David Herron -- Resident E-mail Hack) (11/23/87)

In article <408@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>Hello.  I'm interested in characterizing the sorts of damage that the
>existing electronic mail systems can do to mail as they move it about.
>
>	1. Any occurrence of the string "\nFrom " has '>' inserted before
>		the 'F'.

Actually, the "From " line at the beginning of the file is a misfeature
(in this day and age ... when they came up with mail back then they
weren't connected to the arpanet, rfc822 hadn't been written, and all
sorts of fun things).

>	2. If the string "\n.\n" occurs, the tail end of the file (starting
>		at the '.') is discarded.

geeee ... 

>In addition, I know of mailers that do the following:
>
>	3. High-order bits are turned off (or set to parity).
>	4. Null bytes are dropped.

eh?  both of those things aren't part of normal ascii text.  Why would
you expect a system which is designed for passing normal ascii text
to do something reasonable with strange things.  If you wanna do something
like this then uuencode the file before mailing it.

>	5. If a backspace occurs, it and the preceding character are deleted.
>	6. ASCII tabs are expanded to some number of spaces.

Weeeell... my comment above sort-of counts here.  Also, different operating
systems treat tabs in different ways.  There is one OS around which sets
tab stops at 10 columns rather than 8.  Also, for some reason tabs MUST
be expanded with mail (I dunno why ... it simply must).  I have a
vague memory that the system in question is Multics, but may be mistaken.

	7. Prepending "host!" to the From: lines of mail passing
	   through the site and going out through UUCP.

This is a problem because it creates unreplyable mail.  For instance,
we are on the mtxinu-users@emory.edu mailing list.  Before we joined
the Internet we were getting our mail from the list via uucp.  The
mail would arrive here with headers like:

	From: emory!someone@utah.edu

And the From: should have just been "someone@utah.edu".  Depending on
how we interpret the address, it will behave differently.  I suppose
that the people at emory wanted us to interpret ! first.  However we're
running MMDF and it's fairly tightly conformant to RFC-822.  It'll
interpret the @ first.  So the message trundles off to utah.edu who is
told to deliver the mail to emory!someone, which is almost certainly
not right because it'll cause the mail to be delivered to Georgia!

Note that I'm only mentioning the problem at emory because it was the
first one which came to mind.  Many sites have this problem ...
It's a severe problem which should be fixed.

	8. Truncation of long lines.  For instance, mail on BITNET is
	   80 column PUNCH files sent to people's virtual reader.

	9. Silent discarding (or any other sort of discarding) of
	   mail which is "too long".  SendMail has a limit (configurable)
	   of message size ... it's usually set to something like 100K.
	   BUT the mail system on the main instructional system here
	   has a limit of 200 lines.  (!)  Further, it drops the
	   message silently if it's too long.

	   One of the common things to do here is print files at
	   cluster sites by mailing them from a unix machine to
	   "printer-name@ukpr.bitnet".  It works and happens to be
	   fairly fast ... however, if someone isn't careful
	   the file could be too long and you'd never know it.
-- 
<---- David Herron -- Resident E-Mail Hack     <david@ms.uky.edu>
<---- or:                {rutgers,uunet,cbosgd}!ukma!david, david@UKMA.BITNET
<---- "The market doesn't drop hundreds of points on a normal day..." --
<---- 		Fidelity Investments Corporation

jso@edison.GE.COM (John Owens) (11/25/87)

And don't forget the worst damage of all - ASCII/EBCDIC translation!
Since there's no one-to-one mapping, and different sites use different
translation tables, there's no way you can know what the mail will look
like when it gets through.  Most commonly caught characters are characters
in ASCII range 5B-5F and 7B-7F.  And, of course, tabs are expanded to
spaces and formfeeds are usually lost....

rshwake@irs1.UUCP (rshwake) (11/25/87)

	I don't know if the original poster's intent was to suggest that
prepending lines beginning with "From " with ">" constitutes damage. Since
the "From " string is used as a delimiter, separating one message from
another, SOME means is required to prevent lines beginning with such a
string from signaling the start of a new message.

	More critically, it would be TOO easy to fake a message from some
user if such potential delimiters were not masked.

					Ray Shwake
					IRS User Assistance Branch

blarson@skat.usc.edu (Bob Larson) (11/25/87)

In article <256@irs1.UUCP> rshwake@irs1.UUCP (rshwake) writes:
>	I don't know if the original poster's intent was to suggest that
>prepending lines beginning with "From " with ">" constitutes damage. 

I don't see how it could be considered anything but damgage.

>Since
>the "From " string is used as a delimiter, separating one message from
>another, SOME means is required to prevent lines beginning with such a
>string from signaling the start of a new message.

This is true in some mail implementations.  They should NOT mess with
outgoing messages or messages just passing through, since not everyone
is using such poor software.

>	More critically, it would be TOO easy to fake a message from some
>user if such potential delimiters were not masked.

They could just figure out a way to delimit messages without messing
with the text of the message.  This isn't even an invertable bogosity.
--
Bob Larson		Arpa: Blarson@Ecla.Usc.Edu
Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson		blarson@skat.usc.edu
Prime mailing list (requests):	info-prime-request%fns1@ecla.usc.edu

henry@utzoo.UUCP (Henry Spencer) (11/25/87)

> Actually, the "From " line at the beginning of the file is a misfeature
> (in this day and age ...

Unfortunately, when Unix mail got RFC822ized (at Berkeley, I believe), it
did not occur to the people doing it that RFC822 format and the old Unix
format ("From " lines at the beginning) were two *different* formats and
that they should convert between them rather than smushing them together.
There is just no excuse for having the sender's address appear in two
different places in two different forms.  (Well, actually, nowadays it can
be handy to have a domainized address in "From:" and a bang form in "From ",
but that is a kludge if there ever was one.)
-- 
Those who do not understand Unix are |  Henry Spencer @ U of Toronto Zoology
condemned to reinvent it, poorly.    | {allegra,ihnp4,decvax,utai}!utzoo!henry

brian@ncrcan.UUCP (11/27/87)

In article <8991@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>> Actually, the "From " line at the beginning of the file is a misfeature
>> (in this day and age ...
>
>Unfortunately, when Unix mail got RFC822ized (at Berkeley, I believe), it
>did not occur to the people doing it that RFC822 format and the old Unix
>format ("From " lines at the beginning) were two *different* formats and
>that they should convert between them rather than smushing them together.
>There is just no excuse for having the sender's address appear in two
>different places in two different forms.  (Well, actually, nowadays it can
>be handy to have a domainized address in "From:" and a bang form in "From ",
>but that is a kludge if there ever was one.)

I agree!  I hate those "From " lines and "remote from" lines in mail messages.
I have been giving serious thought to hacking on smail so that it removes 
those lines.  As long as we have a domain mailer, I don't care how the mail
got here. 

Of course this means that all sites that mess with the "From: " line will
have to refrain from doing this :-).

Anyone have any reasons (besides the obvious one above) as to why I should
not go ahead and do this?

Brian.

-- 
 +-------------------+--------------------------------------------------------+
 | Brian Onn         | UUCP:..!{uunet!mnetor, watmath!utai}!lsuc!ncrcan!brian |
 | NCR Canada Ltd.   | INTERNET: Brian.Onn@Toronto.NCR.COM                    |
 +-------------------+--------------------------------------------------------+

daveb@geac.UUCP (11/30/87)

In article <471@ncrcan.Toronto.NCR.COM> brian@ncrcan.Toronto.NCR.COM (Brian Onn) writes:
|In article <8991@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
||| Actually, the "From " line at the beginning of the file is a misfeature
||| (in this day and age ...
|I agree!  I hate those "From " lines and "remote from" lines in mail messages.
|I have been giving serious thought to hacking on smail so that it removes 
|those lines.  As long as we have a domain mailer, I don't care how the mail
|got here. 
  You may... When the mailer frogs trying to reply.

|Of course this means that all sites that mess with the "From: " line will
|have to refrain from doing this :-).
|
|Anyone have any reasons (besides the obvious one above) as to why I should
|not go ahead and do this?

  Well, you probably want to do it in the mail display agent (mail
reader) and not in the transfer agent(s).  Otherwise you'll get
flamed by someone trying to get in and out of a domainized universe
to/from a path-based universe.

 --dave
-- 
 David Collier-Brown.                 {mnetor|yetti|utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind)
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

alex@.UUCP (Alex Laney) (11/30/87)

In article <2181@killer.UUCP>, billw@killer.UUCP writes:
> In article <408@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
> >Hello.  I'm interested in characterizing the sorts of damage that the
> >existing electronic mail systems can do to mail as they move it about.
> [..]
> >Can you add to the list?

What I find annoying, is mail spoolers that reverse the order of the mail
passing through them. This makes USENET news article replies arrive before
the original article! This, I know, is not damaging to the articles themselves,
but is part of the mailing/transport process.


-- 
Alex Laney   alex@xicom.UUCP   ...utzoo!dciem!nrcaer!xios!xicom!alex
Xicom Technologies, 205-1545 Carling Av., Ottawa, Ontario, Canada
We may have written the SNA software you use.
The opinions are my own.

coffin@xroads.UUCP (Chris Coffin) (12/01/87)

In article <471@ncrcan.Toronto.NCR.COM>, brian@ncrcan.UUCP writes:
> I have been giving serious thought to hacking on smail so that it removes 
> those lines.  As long as we have a domain mailer, I don't care how the mail
> got here. 
> 
> Anyone have any reasons (besides the obvious one above) as to why I should
> not go ahead and do this?

We here at crossroads are a newsite on the net. We do not have
pathalias yet. (waiting for the next posting) We did, however
get smail2.5 from our news-feed and are using it as a
"smart-host" We have had problems with mail from our site beign
bounced and wonder if mail to us has been getting bounced
because we are not in the pathalias data base yet.

Chris Coffin

-- 
\  /  C r o s s r o a d s  C o m m u n i c a t i o n s
 \/   (602) 971-2240
 /\   (602) 992-5007 300|1200 Baud 24 hrs/day
/  \  ihnp4!mot!nud!xroads!coffin

henry@utzoo.UUCP (Henry Spencer) (12/02/87)

There is nothing wrong with having the mailer use "From " as a way of
finding the boundary of messages, and putting ">" in front to avoid false
boundaries arising from occurrences of that string in text, *provided*
that the transformation is reversible, and is in fact reversed when the
message leaves the mailer.  Unfortunately, neither of these conditions
is true of the existing scheme.
-- 
Those who do not understand Unix are |  Henry Spencer @ U of Toronto Zoology
condemned to reinvent it, poorly.    | {allegra,ihnp4,decvax,utai}!utzoo!henry

matt@ncr-sd.UUCP (12/05/87)

In article <471@ncrcan.Toronto.NCR.COM> brian@ncrcan.Toronto.NCR.COM (Brian Onn) writes:
>I agree!  I hate those "From " lines and "remote from" lines in mail messages.
>I have been giving serious thought to hacking on smail so that it removes 
>those lines.  As long as we have a domain mailer, I don't care how the mail
>got here. 
>
>Of course this means that all sites that mess with the "From: " line will
>have to refrain from doing this :-).
>
>Anyone have any reasons (besides the obvious one above) as to why I should
>not go ahead and do this?

At least one From_ line is necessary at the beginning of a mail message
for the normal mailbox format.  The first one (without the '>') is used
to denote the start of a mail message in a mailbox file.  This convention
is used by /bin/mail, mailx, elm, etc.  If you are willing to use a
different scheme such as that used by MH then getting rid of the From_
lines might be reasonble.  As it it you'd just be cutting your throat.

I have to agree that there can be too many From_ lines.  The solution is
to let smail (i.e. the message tranport agent) collapse all the From_
lines to a single line.  Smail has supported this from at least version 1.3.
In smail2.5 the function rline() is defined at line 373 of the file
headers.c; this function collapses all the From_ lines into a single
line and also removes redundant host information.

-- 
Matt Costello	<matt.costello@SanDiego.NCR.COM>
+1 619 485 2926	<matt.costello%SanDiego.NCR.COM@Relay.CS.NET>
		{sdcsvax,cbosgd,pyramid,nosc.ARPA}!ncr-sd!matt

phil@amdcad.AMD.COM (Phil Ngai) (12/17/87)

In article <483@.UUCP> alex@.UUCP (Alex Laney) writes:
>What I find annoying, is mail spoolers that reverse the order of the mail
>passing through them. This makes USENET news article replies arrive before
>the original article! This, I know, is not damaging to the articles themselves,
>but is part of the mailing/transport process.

This is, I believe, due to a misguided optimization by some uucps
which send the shortest files first. Do modern (ie HDB) uucps do so? 

-- 
Let's go to the mall and see how long people will wait for our parking space!

Phil Ngai, {ucbvax,decwrl,allegra}!amdcad!phil or amdcad!phil@decwrl.dec.com

mark@sickkids.UUCP (Mark Bartelt) (12/17/87)

In article <408@minya.UUCP> jc@minya.UUCP (John Chambers) writes:

> Hello.  I'm interested in characterizing the sorts of damage that the
> existing electronic mail systems can do to mail as they move it about.
> To start the ball rolling, I'll give a few examples.  First, the mail
> command that comes with most Unix systems:

[ ... ]

> 	2. If the string "\n.\n" occurs, the tail end of the file (starting
> 		at the '.') is discarded.

Are you sure?  Our /bin/mail (a truly ancient one, dating back to Seventh
Edition days, more or less) contains the following code ...

	onatty = isatty(0);
[ ... ]
	while (fgets(line, LSIZE, stdin) != NULL) {
		if (line[0] == '.' && line[1] == '\n' && onatty)
			break;
	[ etc. ]

... to avoid exactly that problem.  The /bin/mail that comes with 4.3bsd
contains different, but equivalent, code.  Are there *really* UNIX mailers
that exhibit that bug when passing mail between systems, or have you merely
inferred this because of the fact that a '.' bracketed by pair of newlines
can be used as a message terminator from a terminal?

> Can you add to the list?

I'd be delighted.

I consider complaints about minor mangling of messages (for example, the
"From" ==> ">From" controversy; talk about getting worked up about trivia!)
to be hardly worth discussing, when compared to the real disasters:  Mailers
that diddle with headers, especially when they diddle in demonstrably WRONG
ways.  For example, one of the sites through which mail from here to other
places often passes (it shall remain nameless, to protect the guilty) seems
to like to play jokes on the recipients, by mis-identifying the senders.
Suppose a user at our site, "bozo", sends mail to a friend "zippy", using
"neighbor!somewhere!another!cleveland" as a path.  Bozo addresses mail to ...

	neighbor!somewhere!another!cleveland!zippy

... but when the mail arrives at its destination, the "From" line reads ...

	From: bozo@somewhere.uucp

Now if the intermediate mailer had mangled the header to read ...

	From: bozo@oursite.uucp

... I'd be only a tiny bit upset.  But if the recipient of this message
uses a "reply" option with one of the so-called "intelligent" mailers
plaguing us these days, some poor bozo at the wrong site will be getting
the replies intended for our bozo.  Thus far, we've been able to avoid
that problem by telling all our recipients to send mail to the address
which will actually work, but the behaviour of the mailer (hard to tell
whether it's at site "somewhere" or "another") is rather annoying.

This of course brings up another complaint:  Mailers that ignore explicit
uucp routings, and choose one of their own if the all-wise pathalias deems
it to be better.  I have no objection to using a pathalias-generated path
if mail is addressed to "someone@site.uucp", but if a sender explicitly
specifies a path, intermediate mailers have no business messing with it.
If mailers didn't misbehave in this way, we could avoid the first problem
above by routing mail to avoid "somewhere" and "another".  But, sadly, if
"neighbor" thinks that "somewhere!another" is the best path to "cleveland",
there's nothing we can do about it.  Sigh.

-----
				Mark Bartelt
				Hospital for Sick Children, Toronto
				416/598-6442
				{utzoo,decvax,ihnp4}!sickkids!mark

honey@umix.cc.umich.edu (Peter Honeyman) (12/18/87)

not so phil, or at least not exactly.  some old versions of uucp
delivered in directory order.  honey danber delivers in sequence
order.  (there's a small problem at sequence number wraparound,
which embarrasses me.)

however, the behavior you describe, delivering small files before
large ones, pertains to sendmail, or so allman told me.  (having
never run it, i'm no authority on sendmail.)

	peter

romain@pyrnj.uucp (Romain Kang) (12/30/87)

In article <77@sickkids.UUCP> mark@sickkids.UUCP (Mark Bartelt) writes:
| In article <408@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
| > 	2. If the string "\n.\n" occurs, the tail end of the file (starting
| > 		at the '.') is discarded.
|
| [...]  The /bin/mail that comes with 4.3bsd
| contains different, but equivalent, code.  Are there *really* UNIX mailers
| that exhibit that bug when passing mail between systems, or have you merely
| inferred this because of the fact that a '.' bracketed by pair of newlines
| can be used as a message terminator from a terminal?

BSD mail maintainers take note:

There are a great many 4.2-based /bin/rmail's still out there (including
Pyramid's, *blush*) that invoke sendmail without the -i option; this means
that sendmail will use "\n.\n" as a message terminator and flush anything
else that rmail feeds it.

Thus if I run the following shell script, John's bug surfaces:

#! /bin/sh
/bin/rmail $USER << EoF
From adm Tue Dec 29 04:00 EST 1987 remote from test

IMPORTANT MESSAGE FOLLOWS:
.
.
.
***UPDATE /usr/lib/acct/holidays WITH NEW HOLIDAYS***
EoF

--
Romain Kang		{allegra,cmcl2,mirror,pyramid,rutgers}!pyrnj!romain
Pyramid Technology Corp. / 10 Woodbridge Center. Dr / Woodbridge, NJ  07095

"Eggheads unite! You have nothing to lose but your yolks!" -Adlai Stevenson

daveb@geac.UUCP (David Collier-Brown) (01/01/88)

>In article <408@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>
>> Hello.  I'm interested in characterizing the sorts of damage that the
>> existing electronic mail systems can do to mail as they move it about.
>> Can you add to the list?
>
In article <77@sickkids.UUCP> mark@sickkids.UUCP (Mark Bartelt) writes:
>I'd be delighted.
>
>I consider complaints about minor mangling of messages (for example, the
>"From" ==> ">From" controversy; talk about getting worked up about trivia!)
>to be hardly worth discussing, when compared to the real disasters:  Mailers
>that diddle with headers, especially when they diddle in demonstrably WRONG
>ways.

  Another interesting form of header munging is to assume that if I
reach site far via near, that far is a subdomain of near.
  Case in point?  If I send to near-host!medium-distance-host!somewhere!fred,
fred gets handed a message which claims that I'm
	daveb@geac.near-host.medium-distance-host[.uucp]
  I can assure you that geac is not a subdomain of near-host, much less
medium-distance-host.  Geac is the only canadian mainframe
manufacturer, not a subdomain of some #$!&&%*+@!?? unregistered
domain... [Gee I'm grumpy today: happy new year?]
  This can make it hard for fred to reply to me unless he transforms
the address back to !s or to @medium-distance-host,@near-host,daveb@geac 
As you might guess, fred get annoyed with me...
-- 
 David Collier-Brown.                 {mnetor|yetti|utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind)
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.