[news.software.b] Proposal to stop truncation of articles

alan@mn-at1.UUCP (Alan Klietz) (11/25/87)

From experience it seems that one of the most chronic modes of
munging articles is truncation.

A proposal, 
	
	If a new article is received with the same message-ID and
	"Lines:" header as an old article, and the new article is
	physically larger than the corresponding old article, then
	replace the old article with the new article.

Also,

	Forward the new article.

Of course, if this has already been done, you may disregard this ar

jgp@moscom.UUCP (Jim Prescott) (12/03/87)

In article <403@mn-at1.UUCP> alan@mn-at1.UUCP (Alan Klietz) writes:
>A proposal, 
>	If a new article is received with the same message-ID and
>	"Lines:" header as an old article, and the new article is
>	physically larger than the corresponding old article, then
>	replace the old article with the new article.
>Also,
>	Forward the new article.
Good idea, currently even if you get a good copy of a truncated article
on a redundant feed, news just drops it as a duplicate.

After looking through the code a bit to see what would be involved
I've come up with the following ideas/observations.

   - Detecting potential replacements for truncated articles:
	To do this quickly the history file must indicate whether an
	on-line article is truncated (rnews already notices the truncation
	and logs it).  In the cleartext history files this probably isn't
	too much trouble but in the DBM files it may be.  The DBM file
	uses the Message-ID: as a key to the fseek() offset into the
	cleartext file.  Some possibilities are:
		- add another bit to the contents entry.
		- use a bit from the existing entry, this would be safe
			if fseek() offsets were really longs since the
			history file shouldn't be over 2G.  Unfortunately
			the offsets can be magic cookies and there may not
			be any easy way to find an available bit.
		- always get the info out of the cleartext file.  Since
			this needs to be done for every duplicate article
			this may be too slow.
		- forget about DBM files.  I remember some comments from
			when the split cleartext files first came out that
			they were faster than DBM.  This doesn't sound
			likely to me but maybe some measurements are in
			order.  Certainly splitting the file into more
			than 11 chunks would help.

   - Determining if the new article is better than the old:
	This shouldn't be too bad since we know where the old article
	lives (having already looked it up in history).  If the new
	article is worse than the old just drop it, otherwise

   - Using the new article, some options are:
	- overwrite the old version, maybe clearing the truncated flag in
		history if appropriate (must be clearable "in place" in
		the cleartext file) (actually clearing it shouldn't matter
		too much, how many times are you going to receive the same
		article?).
	- add the new article like any other, maybe having it Supercede:
		the original.  This leaves the article in the cleartext
		file twice which probably messes stuff up.
	- give the article a new Message-ID: and Supercede: the old
		version :-)

   - Sharing the wealth:
	- if the article is an improvement to you, then you should send
		it to your neighbors as if it had just come in for the
		first time.

Comments:
	- sites not running new software don't get truncated articles
		replaced (oh well) and also don't pass the untruncated
		one along (boo hiss).
	- if a large percentage of articles have invalid Lines: then things
		will be slowing down a lot.  Lots of linecount errors are
		only off by +/- 5 lines, these probably are not truncations
		but some kind of bug.
	- it would be nice if you could just send a sendme control message
		to some distant site that is likely to have received the
		article via a different path and have them send it, even if
		you don't normally exchange news with them.  Does sendme
		only work on articles in newsgroups in your sys entry on
		the remote system?
	- it may be necessary to be able to tell if the old article has
		been canceled and whether it was done by us (as part of
		a Supercede: on a previous replacement article).

Additional comments?  Anybody feel like implementing it?
-- 
Jim Prescott	moscom!jgp@cs.rochester.edu
		{rutgers,ames,cmcl2}!rochester!moscom!jgp