[net.sources.bugs] A proposal for a consistent REPOST scheme

reid@glacier.ARPA (Brian Reid) (01/25/86)

Reposting is a nuisance. The net is flooded with a lot of requests for
reposting, and (invisibly) the mail system is flooded with the replies. The
one time I asked for a copy of something I got 40 copies, all by mail.

Repostings are needed for 2 reasons:
  (1) The original copy got lost, and never made it to a particular segment
of the net (this happens a lot when big software systems are posted from
leaf nodes).
  (2) The person asking for the repost did not save a copy of the original.

The right thing to do for repostings is to repost. But nobody wants to
repost because they are afraid that somebody else has reposted, and we don't
want to flood the net with copies. So the requested repostings get sent by
mail.

There was an analysis by Chuq von Rospach about a year ago that showed that
it was cheaper (in terms of cost to the net) to post something instead of
mailing it if it is going to go to more than 15 or 20 people. 

The right thing to do is for each reposting request to have a serial number
or a Repost-request-ID. Anyone who sees that request and who would like to
be helpful should be encouraged to repost, **BUT**, with the Message-ID of
the reposting being the Repost-request-ID of the request. That way if 15
people repost something, the net is not flooded with 15 copies, because all
15 of them will have the same Message-ID, and the inews software will think
that they are all the same message and will not propagate it to or through a
site that already has one.

This scheme will handle, perfectly, the case in which somebody wants a copy
of an old message. The protocol would be that the request for reposting is
given a Message-ID, allocated from the name space of the poster's machine,
and posted to the newsgroup in which the original message appeared (this is
what people do anyhow right now, even though they are not supposed to). The
"repostnews" and/or "RPnews" programs would do 2 things:
	(1) Post the requested article under the Message-ID used by
	    the Repost-request
	(2) Send out a "cancel" control message on the reposting request.
	    (yes I know that this involves cancelling a message that
	     was posted by somebody else, but the software can cope).
I believe that this scheme will do an optimal job of handling the case of a
person asking for an old posting. If people respond fast enough then the
request will not even propagate very far.

The case of 200 people asking for an immediate reposting of something that
didn't get through is a bit harder to handle, because you don't want to have
to go through the task of figuring out which one of those 200 requests
should be the one to determine the Message-ID of the reposting. The only
sensible thing to do here is to repost the article under its original
message-ID, but with an "R" in front of it. So for example if I posted a
Squid Body Weight computation program with Message-ID: 12345@glacier.ARPA,
and John@greipa wanted a copy, he would post "Request repost of Squid
program", not knowing its message id, and then smith@decwrl could 
put up another copy as "Message-ID: R1234@glacier.ARPA".

The reason for not doing distant-past reposting by old-MessageID is that
many people trim the netnews headers from things that they post, so the
old Message-ID is not always available.

I believe that all of this algorithm can be easily implemented in a simple
"repost" program, which I propose to write and post in the next week or two
unless I hear wild complaints about the idea in the interim.
-- 
	Brian Reid	decwrl!glacier!reid
	Stanford	reid@SU-Glacier.ARPA

chuq@sun.uucp (Chuq Von Rospach) (01/25/86)

> There was an analysis by Chuq von Rospach about a year ago that showed that
> it was cheaper (in terms of cost to the net) to post something instead of
> mailing it if it is going to go to more than 15 or 20 people. 

Brian Dropped a zero. I played with some numbers about 6 months ago and came
up with a break-even point of about 120-200 people (what I was looking at was
when a mailing list became large enough to be cheaper to the net as a
moderated group).


Actually, reposts are quite simple. The algorithm I've used for years is:

    if (I need something)
    {
	ask the net to tell me if they have it but not to send it immediately;
	if (I get multiple replies) 
	{
	    ask the first or closest for a copy
	    thank the rest
	} else /* I get one reply */ {
	    ask for a copy
	}
	if (I get a few requests for it [<5-10])
	{
	    mail out copies
	} else {
	    repost
	}

I've never been inundated by copies that way, and the control of the
posting remains in a single source -- the original requestor. You don't
end up with 99 people posting a version of shar to the net, and everyone
is happy.

-- 
:From catacombs of Castle Tarot:        Chuq Von Rospach 
sun!chuq@decwrl.DEC.COM                 {hplabs,ihnp4,nsc,pyramid}!sun!chuq

It's not looking, it's heat seeking.

gst@talcott.UUCP (Gary S. Trujillo) (01/30/86)

In article <3473@glacier.ARPA>, reid@glacier.ARPA (Brian Reid) writes:
> 
> Reposting is a nuisance. The net is flooded with a lot of requests for
> reposting, and (invisibly) the mail system is flooded with the replies.
> 
> ...
> 
> The right thing to do is for each reposting request to have a serial number
> or a Repost-request-ID.
> 
> ...
> 
> I believe that all of this algorithm can be easily implemented in a simple
> "repost" program, which I propose to write and post in the next week or two
> unless I hear wild complaints about the idea in the interim.
> -- 
> 	Brian Reid	decwrl!glacier!reid
> 	Stanford	reid@SU-Glacier.ARPA

What happens when someone blows it and reposts incorrectly, either
intentionally or unintentionally?  Especially in the case of source
code, I would imagine there could be massive confusion >= that which
already comes into being with multiple repostings.  One of the many
problems is that, depending on how messages bearing the same message-ID
propogate through the net, recipients end up potentially getting somewhat
or very different versions of something.  And what about the malicious
reposter who changes a few lines here and there?  Seems to me that if
the scheme is to work, it should be only the author who is allowed to
repost (maybe that makes it a different scheme).  I tend to think
mod.sources is a somewhat better solution in this case, for reasons
already cited in other discussions.
-- 
	Gary Trujillo
	(harvard!talcott!gst)