[net.mail.headers] Mail looping

PKARP@SRI-IU.ARPA (Peter Karp) (02/22/86)

I believe I have at least a good theoretical understanding of how
to prevent mail loops.  In the previous messages on this topic it
hasn't been clear to me how one could theoretically prevent mail
loops, or if this is even possible.  I believe I do know how to do
it in theory; if you all buy this argument then we can talk about
the implementation later.

I apologize if this is obvious to everyone; it wasn't obvious to me and on
re-consideration of the messages I've seen it still doesn't appear obvious.

Consider the following example.  A message originates on Host-A, and is
set to a mailing list called "LIST@Host-B".  One element of LIST on Host-B
is the address SUB-LIST-1@Host-C.  An element of SUB-LIST-1 on Host-C is
SUB-LIST-2@Host-B, from which it gets distributed to various individuals.
Let us postulate that the original message was also set to "USER@Host-B",
and that  "USER@Host-B" is also a member of SUB-LIST-1. Thus,
"USER@Host-B" should receive two copies of the message: one direct from
Host-A, and one with return path: @Host-C,@Host-B:Originator@Host-A.

Notice that the "same message" gets routed through Host-B several times,
and that it would be incorrect for Host-B to think it has detected a loop
simply based on the Message-ID created by the message originator (this has
been pointed out before).  Also note that the "same message" gets sent to
the same recipient several times (USER@Host-B), and it would also be
incorrect for Host-B to suppress the duplicate simply because it sees two
messages with the same Message-ID going to the same recipient.  Both of
these conditions look like loops but are not.


So, what's the solution?  Consider an abstraction.  Imagine that a mail
message is simply a packet getting switched between the nodes of a
network. Mail packets are special in that any one packet can be duplicated
into several other packets at other nodes as mailing lists are expanded.
These child packets then go their own way in the network.  There are two
strcutures of interest here.  One is the path a given packet follows through
the network. The other is the mailing-list-based packet-synthesis tree
which shows how one initial message packet gets duplicated into a whole
swarm of child packets which eventually get either dropped on the floor or
land in someone's mailbox.

A loop condition occurs when a packet P with the following properties has
arrived at a network node:
	a) either that same packet or one of its ancestors in the packet-
	synthesis tree, P',  has been to that node before.

	b) Packet P and packet P' were both addressed to the same
	recipient.

How can a host detect a loop condition?  Simple: when it relays a packet
it puts a mark on that packet which it will recognize if that packet or
any of its descendants ever arrives at that host again.  It also must record
what recipients the packet was destined for, and check all incoming
packets to determine if it has seen them or an ancestor of theirs before,
addressed to the same recipient(s).


So again, if this looks right we can start worrying about implementation.


Peter
-------

stef@uci-icsa.ARPA (Einar Stefferud) (02/23/86)

Long ago in a discussion group not too far away, we discussed one
aspect of this "distribution" problem that I think is relevant in this
current looping discussion.  The original topic was failure return
addresses, and hinged on the question of what is legit for a
distributor to do to a message it is distributing to a list.

Can it ethically and morally change the From: Sender: Reply-to: or
Return-Path: fields to divert mail the might result from downstream
events like failure, or reply, or whatever.

Also, is it a violation of the sanctity of the seal for a LIST
DISTRIBUTOR to look inside the header in the first place.

As I recall the discussion, it was concluded that a LIST DISTRIBUTOR is
in fact operating as a "USER AGENT" and not as a "MAIL TRANSFER AGENT"
so it is fine for the list distributor to do anything its administrator
wants it to do to the content of any distributed item.  If there is any
kind of contract between the administrator and the subscribers, it is
that agreement that governs the administrator's actions.

With this in mind, it seems simple enough to me for any LIST
DISTRIBUTOR to put a special header field (like DIST-ID: <unique-id>)
which is can look for in any incoming item.  If it sees such a thing it
should shunt it aside for manual inspection, and this will prevent
loops through that LIST DISTRIBUTOR, as long as no intervening transfer
agants or user agents or LIST DISTRIBUTORS, et al, remove or change the
magic <unique-id> in any DIST-ID: fields.

This does not take care of other kinds of loops, but it would simply
take care of LIST DISTRIBUTOR LOOPS.

For those of you who are wondering what was decided about the earlier
failure return question, we simply decided to have the list distributor
insert a new "RETURN-PATH: list-request@host" field in place of the
original RETURN-PATH field, without deciding what to do with the old one.

 ---- Stef

netinfo%ucbjade@ucb-vax.ARPA (Postmaster + BITINFO) (02/26/86)

Handling of mailing list error messages (Reply-to, Errors-to, etc) was
discussed at length in this mailing list group about a year or two ago.

The final conclusion was that "mailing list exploders" should generate
a new message (ie. with a new Message-ID). (The "Errors-to" header was
also disapproved if I remember rightly.)

In reply to:

	Date: Fri 21 Feb 86 11:47:35-PST
	From: Peter Karp <PKARP@sri-iu.arpa>
	Subject: Mail looping
	To: header-people@mc.lcs.mit.edu
	Cc: pkarp@sri-iu.arpa
	Message-Id: <VAX-MM(174)+TOPSLIB(114)+PONY(0)
		21-Feb-86 11:47:35.SRI-IU.ARPA>

	... Thus,
	"USER@Host-B" should receive two copies of the message: one direct from
	Host-A, and one with return path: @Host-C,@Host-B:Originator@Host-A.

The first message would have the original Message-ID and the new message
would have a new Message-ID generated by the exploder.

	Notice that the "same message" gets routed through Host-B
	several times, and that it would be incorrect for Host-B to
	think it has detected a loop simply based on the Message-ID
	created by the message originator (this has been pointed out
	before).  Also note that the "same message" gets sent to the
	same recipient several times (USER@Host-B), and it would also
	be incorrect for Host-B to suppress the duplicate simply
	because it sees two messages with the same Message-ID going to
	the same recipient.  Both of these conditions look like loops
	but are not.

To avoid this problem, I suggest that each SUB-LIST exploder also replace
the Message-ID with a new Message-ID. (ie. Mailing list exploders at any
level should generate a new message.)

The mail exploder should also change the envelope and/or mail heading so
that error messages are returned to that exploder's administrator.
(ie. to the lowest level exploder).

There is a basic assumption about mailing lists that may be wrong.  Can
we assume that list mail goes from the root to all branches without going
through the same node twice?

Bill Wells
netinfo%ucbjade@Berkeley.EDU

MRC%PANDA@SUMEX-AIM.ARPA (Mark Crispin) (02/26/86)

Bill Wells -

     You cannot assume that list mail goes from the root to all branches
without going through the same node twice.  This is especially true with
mailers which do not differentiate between "mail exploders", "mail aliases",
and "mail forwardings".  Add to this swamp the large numbers of individuals
who regularly use multiple hosts (I have two primary, two secondary, and
dozens of tertiary Internet hosts in addition to my DEC-20 at home) and
who have "classed mailboxes" and forwardings all over the place and it isn't
hard to see how you can lose big.

     I'd argue that if you really care about loop abatement you will use
what SMTP provides to help you out from this -- the EXPN command.  That is,
the sender fully resolves the destination list.  It isn't hard to detect
loops when you do this.
-------

pallas@su-pescadero.ARPA (Joseph I. Pallas) (02/27/86)

Mark, you of all people should know just how futile it would be to attempt
to resolve the entire destination list of a message using EXPN.  Aside from
the fact that some of the destinations may go outside of SMTP, there's no
guarantee that EXPN will return meaningful information.  Several implementations
don't support it, and most that do don't deal correctly with naming-domain
boundary crossings (including yours).

joe

GUMBY@mit-mc.ARPA (David Vinayak Wallace) (02/27/86)

    Date: Tue, 25 Feb 86 21:41:33 pst
    From: netinfo%ucbjade at BERKELEY.EDU (Postmaster + BITINFO)

    The final conclusion was that "mailing list exploders" should generate
    a new message (ie. with a new Message-ID). (The "Errors-to" header was
    also disapproved if I remember rightly.)

    ...I suggest that each SUB-LIST exploder also replace
    the Message-ID with a new Message-ID. (ie. Mailing list exploders at any
    level should generate a new message.)

What if I accidentally get two compies of a message?  If they've come
through different forwarders there'll be no way for my mail reader do
detect this condition and delete excess copies.

netinfo%ucbjade@ucb-vax.ARPA (Postmaster + BITINFO) (02/27/86)

In reply to:

	Date: Wed 26 Feb 86 00:12:44-PST
	From: Mark Crispin <MRC%PANDA@sumex-aim.arpa>
	Subject: Re: Mail looping
	Message-Id: <12186377826.8.MRC@PANDA>

	     I'd argue that if you really care about loop abatement you will
	use what SMTP provides to help you out from this -- the EXPN command.
	That is, the sender fully resolves the destination list.  It isn't
	hard to detect loops when you do this.
	-------

Unfortunately, mail lists are not limited to the SMTP/RFC822 mail world. So
this is only a partial solution.  I suspect there is not one fail-safe method
and that a varies of methods will be needed to used to reduce looping.

One alternative that is less subject to looping is the USENET news distribution
system.  It is available for Unix systems and shortly will be available for
IBM CMS systems. I would prefer to see large mailing lists converted to
news groups. It is much nicer to see one message going between news/conference
systems than to see several duplications of the same message resulting
from mail list explosion being transmitted.  I suggest that the USENET
article/batch formats be adopted as a standard method of transporting
collective address messages between news and conferencing systems.

Bill Wells
netinfo%ucbjade@Berkeley.EDU