[comp.sources.bugs] My SPOOLNEWS patch for NNTP

david@ms.uky.edu (David Herron -- One of the vertebrae) (08/09/88)

As Matt pointed out before I may lose big time by installing a patch
like the one I'd made.  He was concerned that I'd have a lot of
articles arriving from many different directions at the same time and
that since they weren't being inserted immediately upon arrival that
I'd have 2 or 3 copies arrive and have 1 or 2 junked almost
immediately.

So I decided to give a look-see and the early results are discouraging.
I used to have my background scripts happen every 15 minutes.  I later
moved it up to 10 minutes and later to 5.  Each time because I was seeing
some duplicate articles in SPOOLDIR/.rnews.  (I have one script for
unbatching news and another for batching, with locking to ensure that only
one is running at a time, and the two scripts are offset from each other
by a couple of minutes).

To examine the logs I ran the following script:

	awk <log '$5 == "Duplicate" {print $7}' | sort -u

which gives me a list of message-id's which are duplicated.  A different
version has just "print" and "sort +6 -7" instead, which gives me all
the lines about duplicate articles from the log file sorted by message
id.  I haven't taken the output of the above command to it's logical
conclusion yet:

	while read id; do
		grep $id log
	done

to give me all the lines talking about the message id's which had been
duplicated.

To make the rest of this story short ...

I saw what seems like a lot of articles arrive here >2 times (A total
of 515 message id's were duplicated, with 653 total duplicates, on
the traffic from about 4 am til around midnight; I don't know off
hand how many total articles arrived in that time).  Of the 653
number 188 were from psuvm.bitnet, a neighbor known to give us lots
of duplicates and which I'll have to deal with sometime Real Soon Now.
I don't know how many of the duplicates *only* came through psuvm.

A quick reading of the log file showed that we were often getting two
copies of the same article within 2-3 minutes of each other and less
often getting 3 and even less often getting 4.  That is, when we had
duplicates that is what was happening.  Since I don't know what the
total traffic for that time period was, I don't know how significant
those numbers are.  -- A look at some slightly old reports (June)
shows ~25000 per week, or ~3500 articles per day.  If that's accurate
then we're having 10-20 a percent rejection rate.  Which maybe isn't
so bad after all.

How does this stack up against other peoples experiences?

-- 
<---- David Herron -- The E-Mail guy                         <david@ms.uky.edu>
<---- ska: David le casse\*'      {rutgers,uunet}!ukma!david, david@UKMA.BITNET
<----
<---- Looking forward to a particularly blatant, talkative and period bikini ...