[net.mail.headers] loop control

gds@SRI-SPAM.ARPA (02/19/86)

Hello everyone.  Remember this event?

> From @MIT-MC.ARPA:mcvax!ace!teus@seismo.CSS.GOV  Wed Dec 11 01:37:39 1985
> Received: from MIT-MC.ARPA by sri-spam.arpa (5.4/4.16)
> 	id AA01923; Wed, 11 Dec 85 01:37:39 PST
> Received: from seismo.CSS.GOV by MIT-MC.ARPA 10 Dec 85 17:28:49 EST
> Return-Path: <mcvax!ace!teus>
 ...
> Message-Id: <8512101018.AA16148@sunrel1>

This was the solution I previously proposed.

> It might be the case that we also want to modify our mailers so that
> upon receipt and full delivery of a message with a given Message-ID, all
> subsequent messages received with that same Message-Id are dropped on
> the floor (no need to advise anybody automagically of duplicates, since
> we don't want to see exponential growth of looping mail :-).  I don't
> know if this is an MHS requirement or not but it is essential in a
> conferencing system (where the same messsage might come from different
> nodes in a network) and would relieve us of seeing duplicates.  The only
> problem is that not all mailers generate Message-IDs at the source of
> the message, plus a redistributor may replace (or add) its own
> Message-ID.  We should limit the number of Message-IDs to one per
> message.

Given the growing number of "conferences" (actually Usenet groups)
which are fed into the ARPA Internet as mailing lists, I think we should
honor the Message-ID's generated at their source and refuse to
process anything arriving with a Message-ID that has previously been
processed.  Although Usenet lists arrive in the ARPA Internet through
single points of entry (commonly known as "gateways"), the increasing
load to these machines may cause them to use multiple entry points.
This increases the possibility of a loop if the gateways have common
forwarding entries.  Also, in our current mail environment, mail looping
is caused for other, more obscure reasons.

I've never looked into the part of sendmail which handles Message-IDs,
but I imagine it wouldn't be too difficult to modify sendmail to refuse
to process a message with an ID which has already been processed.

There is a fundamental difference between Mailnet, Usenet and the ARPA
mailing lists, in that the ARPA lists are not currently conferenced.  I
was wondering if anyone here is on the multimedia conferencing project,
will there be standards set for converting our mailing lists into
conferences, so we can read the mail as a conference?  Also, the NNTP
work can help solve looping problems caused by generation of duplicate
postings. 

--gregbo

dennis@CSNET-SH.ARPA (Dennis Rockwell) (02/20/86)

	From: Greg Skinner <gds@sri-spam.ARPA>
	Date: Wed, 19 Feb 86 00:07:48 PST
	Subject: loop control

	[ ... ] I think we should
	honor the Message-ID's generated at their source and refuse to
	process anything arriving with a Message-ID that has previously been
	processed.

For end-delivery duplicate deletion, that would work OK, except in the
presence of resenders (not forwarders) that add annotations.  However, for
loop detection and breaking, the stroke is too broad; some simple cases
break down.  Consider our case (RELAY.CS.NET, admittedly not your typical
host): we see a submission go by for header-people from one of our PhoneNet
sites with message-id 1234.foo@pudunk.edu.  You scheme would have us drop
the posting of that message to everybody behind relay.cs.net!  That doesn't
seem quite right, somehow.  With more local hosts being hidden behind mail
relays (when the MX namesolver implementations percolate out), this sort of
situation will be much more common than it is now.  It's already bad for
relay.cs.net, because we'll accept any mail message that we think we can
deliver, so some sites on the Internet use us as a staging host to get to
hard-to-reach sites.

The list exploder can't change the message-id, or direct recipients wouldn't
be able to filter out duplicates.

I'm all for loop control, but more information needs to be processed to
determine whether there's been a loop.  We use a very generous hop count,
and we would like something that would catch loops sooner.  Possibly a
repeated pattern or sequence of received postmarks would do it?
Unfortunately, that requires that at least two mailers in the loop insert
postmarks (if only one mailer does, it reduces to hop count, which is
probably a good last-ditch scheme) which *most* mailers do.  Of course, any
mailer trying to do loop detection should add its own postmark before
checking for a loop!  I don't think that simply checking for our own
postmark would work, either; consider a mail message being resent (as
opposed to forwarded) to multiple people or mailing lists by multiple
senders; all messages would have the original message-id.

What heuristics do *you* use when looking at a message to detect a loop? Can
you write code that executes in reasonable time (specifically, that you
would be willing to add to the incoming message processing) to implement
it?  This isn't the AI Digest, but...

Dennis Rockwell
CSNET Technical Staff

BLARSON%ECLD@USC-ECL.ARPA (Bob Larson) (02/20/86)

It seems to me that the best idea is to have each mailing list keep
a list of message id's that have been sent to the list, and avoid
resending a message with a duplicate id.  (Preferably complaining
to the list maintainer.)  This of course does not work if something
deletes/changes the message id, so should be backed up with another
form of loop control.

(P.S: I just added mailing lists to our local prime mailer--
currently without any form of loop control.  I should probably be
planning for the future, especially since we do have a (flaky)
link to the outside world...)

Bob Larson
Blarson@Usc-Ecl.Arpa
sdcrdcf!oberon!blarson
-------

GRZ027%DBNGMD21.BITNET@ucb-vax.A (Peter Sylvester +49 228 303245) (02/20/86)

Bob:

How long do You want to keep ids of old messages in Your system?
One year? Ten years? How much disk space do You have?
Although the idea is correct it is not praticable in a large
network environment I guess.

Our local mailer has mailing lists. When messages arrive from
the outside, it knows that he must not use outside addresses.
This avoids loops in that special case.

The situation we are talking about is that we have cascades of
lists with one list pointing to the other or a loop in a list.

We should try to avoid such situations:

First, we should distinguish between general redistribution lists
i.e. with lists that have a "global" target.
Those lists should not be placed into other global lists.
The second type are more local lists that cover a local node
or a subdomain. Those lists can be contained in global lists
and can contain global list when the following procedure is used:
The list processor must have a "responsibility list", i.e a list
or an algorithm to determine what members of the target are to be
selected. A special case is a "local" redistribution list.
where message are delivered to all local members when the message
comes from outside. When the message comes from a local user
the message can be delivered to all users or to remote users only
when it is known that one of the remote users is a global lists
containing this list. Even if that is not done, the local users
will get about two copies (perhaps some more if more lists are
involved.)

In addition it would be helpful to have a centralized data base
or server that contains the names of ALL global lists.
Sometimes I get messages from anywhere that tells: Now we
a have new list here at the "white house" where You can discuss
technical things about SDI or whatever.
The general problem is that normal users will have those information
earlier than a postmaster. Then it happens that a user quits his job
etc. and the postmaster has the poor job to find out the source
of the distribution, i.e all subscriptions.

"local" redistribution lists must not be contained in that
data base. only global lists, I guess there are current about
200 global list?

Peter Sylvester GMD Bonn

fair@ucbarpa.berkeley.edu.BERKELEY.EDU (Erik E. &) (02/24/86)

There have been some misconceptions and I have a point to add:

1. sendmail does loop detection by counting Received: headers,
	WITH ITS OWN HOSTNAME IN THEM. After 30 such lines (i.e.
	thirty times through this host), the letter gets mailed
	to postmaster. It does not arbitrarily throw away letters
	that have passed through 30 hosts.

2. sendmail will add a message-id to any letter passing through it
	that does not already have one. Seeing as how mailers that send
	out mail messages without message-ids are in violation of the
	relevant standards (hello TOPS-20!), this is a good thing.

3. The recnews program running here at ucbvax (the one that I hacked up)
	does leave the message-id of mail coming from the internet
	through to the USENET strictly alone (so long as it is a legal
	message-id according to RFC822).

Now. It should be possible for items originating on the ARPANET bound
for *moderated* USENET newsgroups to have multiple gateways, as long as
the originating site adds a unique message-id to the letter, and ALL of
the gateways run my hacked up recnews with changes to inews.

I've already fed the changes back to Rick Adams, rather than distribute
them myself, and when that release of netnews is official, I will try
and coordinate a multiple gateway effort.

	Erik E. Fair    ucbvax!fair     fair@ucbarpa.berkeley.edu