[news.software.nntp] Argh! Duplicates abound!

brendan@cs.widener.edu (Brendan Kehoe) (11/29/90)

 I'm having a strange problem and I've exhausted all of the
possibilities that I can think of.
 I recently added a second feed (both are partial, but with nearly
exactly the same groups) to my newsfeed. Now everytime the second site
connects, a whole load of articles show up as duplicates (rather than
be rejected). Syd Weinstein (one of the folks giving me a feed)
suggested perhaps I'd built nntp wrong, since it may be reading the
history file wrong.
 After some digging, I discovered I hadn't built nntp with DBZ (as I
did with cnews). So I went and rebuilt it & installed it. Nada. No change.
 So I did mkhistory & rebuilt the history file. That didn't even fix
it. I'm at a loss.
 I do have NDBM also defined. Should I? (I tried it with just DBZ and
it failed cuz it defined USGHIST which is totally wrong.)
 Anyone have any ideas what could be going wrong here? The log file is
growing exponentially with all of the so-called duplicates. Any help
at all, even if you think it's way off base, is more than welcome!

-- 
    Brendan Kehoe - Widener Sun Network Manager - brendan@cs.widener.edu
 Widener University in Chester PA              A Bloody Sun-vs-Dec War Zone
Top Ten Surprises in Rocky V  --  Number 5, Loveable Character Chewbacca Dies

paul@uxc.cso.uiuc.edu (Paul Pomes - UofIllinois CSO) (12/05/90)

brendan@cs.widener.edu (Brendan Kehoe) writes:

> I'm having a strange problem and I've exhausted all of the
>possibilities that I can think of.
> I recently added a second feed (both are partial, but with nearly
>exactly the same groups) to my newsfeed. Now everytime the second site
>connects, a whole load of articles show up as duplicates (rather than
>be rejected). Syd Weinstein (one of the folks giving me a feed)
>suggested perhaps I'd built nntp wrong, since it may be reading the
>history file wrong.

I assume you are using cnews since you mention dbz.

What is likely happening is that while cnews is chunking away at a batch from
site #1, nntpd accepts a batch of the same articles from site #2 before the
message-ids in batch #1 make it into the history file.  When cnews finishes
with batch #1, many of the articles in batch #2 will be rightly rejected as
duplicates.  This all comes about with the fast propagation rate for Internet
NNTP sites.

Our news machine, ux1.cso.uiuc.edu, has six outside feeds and was always
falling behind during the day.  When it finally caught up at night, most
of the batches were discarded as duplicates.  The fix was installing the
msgidd daemon written by Paul Vixie of DEC.  It stores in memory the last
N minutes worth of message-ids received.  Each nntpd communicates with it
via a unix-domain socket to check whether an incoming article has been
received but not yet processed.  This has been a huge improvement.

The msgidd code can be obtained either from the NNTP managers archive on
either ucbvax or ucbarpa.berkeley.edu or in the file pub/nntp-1.5.10+.tar.Z
on uxc.cso.uiuc.edu.

/pbp
--
         Paul Pomes

alias emacs='/usr/ucb/vi' --  All the EMACS you need to know.

UUCP: {att,iuvax,uunet}!uiucuxc!paul   Internet, BITNET: paul@uxc.cso.uiuc.edu
US Mail:  UofIllinois, CSO, 1304 W Springfield Ave, Urbana, IL  61801-2910

brendan@cs.widener.edu (Brendan Kehoe) (12/05/90)

In <1990Dec5.034523.21682@ux1.cso.uiuc.edu>, paul@uxc.cso.uiuc.edu writes:
>I assume you are using cnews since you mention dbz.

 Yep. Sorry I neglected to mention it.

>The msgidd code can be obtained either from the NNTP managers archive on
>either ucbvax or ucbarpa.berkeley.edu or in the file pub/nntp-1.5.10+.tar.Z
>on uxc.cso.uiuc.edu.

 Thanks for the suggestion -- I'll check it out.
 It turns out that the problem was using NDBM at all. At least in
Sun's version it dies miserably. When I built NNTP with DBM & DBZ, it
worked like a dream! Over 3100 articles with *no* duplicates. You've
no idea how good that feels. Hehe.

 My thanks to everyone that responded. (Especially to Syd Weinstein
for putting up with my continuous stream of babble as I was trying to
get this thing figured out.)



-- 
    Brendan Kehoe - Widener Sun Network Manager - brendan@cs.widener.edu
 Widener University in Chester PA              A Bloody Sun-vs-Dec War Zone
 Hey ... do you think George Bush carries money or any kind of ID with him?