[comp.archives] [nntp] Re: Argh! Duplicates abound!

paul@uxc.cso.uiuc.edu (Paul Pomes - UofIllinois CSO) (12/06/90)

Archive-name: news/nntp/msgidd/1990-12-05
Archive: uxc.cso.uiuc.edu:/pub/nntp-1.5.10+.tar.Z [128.174.5.50]
Original-posting-by: paul@uxc.cso.uiuc.edu (Paul Pomes - UofIllinois CSO)
Original-subject: Re: Argh! Duplicates abound!
Reposted-by: emv@ox.com (Edward Vielmetti)

brendan@cs.widener.edu (Brendan Kehoe) writes:

> I'm having a strange problem and I've exhausted all of the
>possibilities that I can think of.
> I recently added a second feed (both are partial, but with nearly
>exactly the same groups) to my newsfeed. Now everytime the second site
>connects, a whole load of articles show up as duplicates (rather than
>be rejected). Syd Weinstein (one of the folks giving me a feed)
>suggested perhaps I'd built nntp wrong, since it may be reading the
>history file wrong.

I assume you are using cnews since you mention dbz.

What is likely happening is that while cnews is chunking away at a batch from
site #1, nntpd accepts a batch of the same articles from site #2 before the
message-ids in batch #1 make it into the history file.  When cnews finishes
with batch #1, many of the articles in batch #2 will be rightly rejected as
duplicates.  This all comes about with the fast propagation rate for Internet
NNTP sites.

Our news machine, ux1.cso.uiuc.edu, has six outside feeds and was always
falling behind during the day.  When it finally caught up at night, most
of the batches were discarded as duplicates.  The fix was installing the
msgidd daemon written by Paul Vixie of DEC.  It stores in memory the last
N minutes worth of message-ids received.  Each nntpd communicates with it
via a unix-domain socket to check whether an incoming article has been
received but not yet processed.  This has been a huge improvement.

The msgidd code can be obtained either from the NNTP managers archive on
either ucbvax or ucbarpa.berkeley.edu or in the file pub/nntp-1.5.10+.tar.Z
on uxc.cso.uiuc.edu.

/pbp
--
         Paul Pomes

alias emacs='/usr/ucb/vi' --  All the EMACS you need to know.

UUCP: {att,iuvax,uunet}!uiucuxc!paul   Internet, BITNET: paul@uxc.cso.uiuc.edu
US Mail:  UofIllinois, CSO, 1304 W Springfield Ave, Urbana, IL  61801-2910