reid@glacier.ARPA (Brian Reid) (01/25/86)
Reposting is a nuisance. The net is flooded with a lot of requests for reposting, and (invisibly) the mail system is flooded with the replies. The one time I asked for a copy of something I got 40 copies, all by mail. Repostings are needed for 2 reasons: (1) The original copy got lost, and never made it to a particular segment of the net (this happens a lot when big software systems are posted from leaf nodes). (2) The person asking for the repost did not save a copy of the original. The right thing to do for repostings is to repost. But nobody wants to repost because they are afraid that somebody else has reposted, and we don't want to flood the net with copies. So the requested repostings get sent by mail. There was an analysis by Chuq von Rospach about a year ago that showed that it was cheaper (in terms of cost to the net) to post something instead of mailing it if it is going to go to more than 15 or 20 people. The right thing to do is for each reposting request to have a serial number or a Repost-request-ID. Anyone who sees that request and who would like to be helpful should be encouraged to repost, **BUT**, with the Message-ID of the reposting being the Repost-request-ID of the request. That way if 15 people repost something, the net is not flooded with 15 copies, because all 15 of them will have the same Message-ID, and the inews software will think that they are all the same message and will not propagate it to or through a site that already has one. This scheme will handle, perfectly, the case in which somebody wants a copy of an old message. The protocol would be that the request for reposting is given a Message-ID, allocated from the name space of the poster's machine, and posted to the newsgroup in which the original message appeared (this is what people do anyhow right now, even though they are not supposed to). The "repostnews" and/or "RPnews" programs would do 2 things: (1) Post the requested article under the Message-ID used by the Repost-request (2) Send out a "cancel" control message on the reposting request. (yes I know that this involves cancelling a message that was posted by somebody else, but the software can cope). I believe that this scheme will do an optimal job of handling the case of a person asking for an old posting. If people respond fast enough then the request will not even propagate very far. The case of 200 people asking for an immediate reposting of something that didn't get through is a bit harder to handle, because you don't want to have to go through the task of figuring out which one of those 200 requests should be the one to determine the Message-ID of the reposting. The only sensible thing to do here is to repost the article under its original message-ID, but with an "R" in front of it. So for example if I posted a Squid Body Weight computation program with Message-ID: 12345@glacier.ARPA, and John@greipa wanted a copy, he would post "Request repost of Squid program", not knowing its message id, and then smith@decwrl could put up another copy as "Message-ID: R1234@glacier.ARPA". The reason for not doing distant-past reposting by old-MessageID is that many people trim the netnews headers from things that they post, so the old Message-ID is not always available. I believe that all of this algorithm can be easily implemented in a simple "repost" program, which I propose to write and post in the next week or two unless I hear wild complaints about the idea in the interim. -- Brian Reid decwrl!glacier!reid Stanford reid@SU-Glacier.ARPA
chuq@sun.uucp (Chuq Von Rospach) (01/25/86)
> There was an analysis by Chuq von Rospach about a year ago that showed that > it was cheaper (in terms of cost to the net) to post something instead of > mailing it if it is going to go to more than 15 or 20 people. Brian Dropped a zero. I played with some numbers about 6 months ago and came up with a break-even point of about 120-200 people (what I was looking at was when a mailing list became large enough to be cheaper to the net as a moderated group). Actually, reposts are quite simple. The algorithm I've used for years is: if (I need something) { ask the net to tell me if they have it but not to send it immediately; if (I get multiple replies) { ask the first or closest for a copy thank the rest } else /* I get one reply */ { ask for a copy } if (I get a few requests for it [<5-10]) { mail out copies } else { repost } I've never been inundated by copies that way, and the control of the posting remains in a single source -- the original requestor. You don't end up with 99 people posting a version of shar to the net, and everyone is happy. -- :From catacombs of Castle Tarot: Chuq Von Rospach sun!chuq@decwrl.DEC.COM {hplabs,ihnp4,nsc,pyramid}!sun!chuq It's not looking, it's heat seeking.
gst@talcott.UUCP (Gary S. Trujillo) (01/30/86)
In article <3473@glacier.ARPA>, reid@glacier.ARPA (Brian Reid) writes: > > Reposting is a nuisance. The net is flooded with a lot of requests for > reposting, and (invisibly) the mail system is flooded with the replies. > > ... > > The right thing to do is for each reposting request to have a serial number > or a Repost-request-ID. > > ... > > I believe that all of this algorithm can be easily implemented in a simple > "repost" program, which I propose to write and post in the next week or two > unless I hear wild complaints about the idea in the interim. > -- > Brian Reid decwrl!glacier!reid > Stanford reid@SU-Glacier.ARPA What happens when someone blows it and reposts incorrectly, either intentionally or unintentionally? Especially in the case of source code, I would imagine there could be massive confusion >= that which already comes into being with multiple repostings. One of the many problems is that, depending on how messages bearing the same message-ID propogate through the net, recipients end up potentially getting somewhat or very different versions of something. And what about the malicious reposter who changes a few lines here and there? Seems to me that if the scheme is to work, it should be only the author who is allowed to repost (maybe that makes it a different scheme). I tend to think mod.sources is a somewhat better solution in this case, for reasons already cited in other discussions. -- Gary Trujillo (harvard!talcott!gst)