alan@mn-at1.UUCP (Alan Klietz) (11/25/87)
From experience it seems that one of the most chronic modes of munging articles is truncation. A proposal, If a new article is received with the same message-ID and "Lines:" header as an old article, and the new article is physically larger than the corresponding old article, then replace the old article with the new article. Also, Forward the new article. Of course, if this has already been done, you may disregard this ar
jgp@moscom.UUCP (Jim Prescott) (12/03/87)
In article <403@mn-at1.UUCP> alan@mn-at1.UUCP (Alan Klietz) writes: >A proposal, > If a new article is received with the same message-ID and > "Lines:" header as an old article, and the new article is > physically larger than the corresponding old article, then > replace the old article with the new article. >Also, > Forward the new article. Good idea, currently even if you get a good copy of a truncated article on a redundant feed, news just drops it as a duplicate. After looking through the code a bit to see what would be involved I've come up with the following ideas/observations. - Detecting potential replacements for truncated articles: To do this quickly the history file must indicate whether an on-line article is truncated (rnews already notices the truncation and logs it). In the cleartext history files this probably isn't too much trouble but in the DBM files it may be. The DBM file uses the Message-ID: as a key to the fseek() offset into the cleartext file. Some possibilities are: - add another bit to the contents entry. - use a bit from the existing entry, this would be safe if fseek() offsets were really longs since the history file shouldn't be over 2G. Unfortunately the offsets can be magic cookies and there may not be any easy way to find an available bit. - always get the info out of the cleartext file. Since this needs to be done for every duplicate article this may be too slow. - forget about DBM files. I remember some comments from when the split cleartext files first came out that they were faster than DBM. This doesn't sound likely to me but maybe some measurements are in order. Certainly splitting the file into more than 11 chunks would help. - Determining if the new article is better than the old: This shouldn't be too bad since we know where the old article lives (having already looked it up in history). If the new article is worse than the old just drop it, otherwise - Using the new article, some options are: - overwrite the old version, maybe clearing the truncated flag in history if appropriate (must be clearable "in place" in the cleartext file) (actually clearing it shouldn't matter too much, how many times are you going to receive the same article?). - add the new article like any other, maybe having it Supercede: the original. This leaves the article in the cleartext file twice which probably messes stuff up. - give the article a new Message-ID: and Supercede: the old version :-) - Sharing the wealth: - if the article is an improvement to you, then you should send it to your neighbors as if it had just come in for the first time. Comments: - sites not running new software don't get truncated articles replaced (oh well) and also don't pass the untruncated one along (boo hiss). - if a large percentage of articles have invalid Lines: then things will be slowing down a lot. Lots of linecount errors are only off by +/- 5 lines, these probably are not truncations but some kind of bug. - it would be nice if you could just send a sendme control message to some distant site that is likely to have received the article via a different path and have them send it, even if you don't normally exchange news with them. Does sendme only work on articles in newsgroups in your sys entry on the remote system? - it may be necessary to be able to tell if the old article has been canceled and whether it was done by us (as part of a Supercede: on a previous replacement article). Additional comments? Anybody feel like implementing it? -- Jim Prescott moscom!jgp@cs.rochester.edu {rutgers,ames,cmcl2}!rochester!moscom!jgp