[news.software.b] Duplicate articles from B news site - would C news help ??

ianh@bhpmrl.oz.au (Ian Hoyle) (11/21/90)

Since moving to C news in the past week (with the associated monitoring of
log files to make sure everything is working) I've noticed lots of duplicated
articles being fed to us from our single, upstream newsfeed. They currently
batch news to us using B news 2.11.17 (I think thats the patch level they're
using) but they also have multiple feeds to themselves.

My question: does B news simply pass all messages on to hosts they feed as
defined in the sys file without checking that they've sent the same file
twice ??? Does C news have the same behaviour ??? 

I don't think that C news does, but I'd really like it confirmed by those in
the know :-) If C news does check I'll probably recommend our feed change to
C news to save some transmission bandwidth by not transmitting those annoying
duplicates.

ian
--
                Ian Hoyle
     /\/\       Image Processing & Data Analysis Group
    / / /\      BHP Melbourne Research Laboratories
   / / /  \     245 Wellington Rd, Mulgrave, 3170
  / / / /\ \    AUSTRALIA
  \ \/ / / /
   \  / / /     Phone   :  +61-3-560-7066
    \/\/\/      FAX     :  +61-3-561-6709
                E-mail  :  ianh@bhpmrl.oz.au

gary@proa.sv.dg.com (Gary Bridgewater) (11/22/90)

In article <ianh.659156546@morgana> ianh@bhpmrl.oz.au (Ian Hoyle) writes:
>...  I've noticed lots of duplicated
>articles being fed to us from our single, upstream newsfeed. They currently
>batch news to us using B news 2.11.17 (I think thats the patch level they're
>using) but they also have multiple feeds to themselves.
>
>My question: does B news simply pass all messages on to hosts they feed as
>defined in the sys file without checking that they've sent the same file
>twice ??? ...

No - B news keeps a history file and should only forward "new" articles. 
BUT - if they are running the broken DBM or using some flavor other than
the current C news heuristic DBZ then they may have experienced hash
key overflow.  The unbelievable increase in article postings may have
pushed them over whatever carefully tuned values they were using (
speaking from experience) and they either haven't noticed it or haven't
had time/motivation to fix it (or their news guru went on vacation, etc.).
When the hash key overflows duplicates are no longer detected, for the
most part.  This can result in spool/news partition overflow, extremly
long expire times and general havoc on the system (which can also cause
them to consider duplicates a minor inconvenience for the nonce).
-- 
Gary Bridgewater, Data General Corporation, Sunnyvale California
gary@sv.dg.com or {amdahl,aeras,amdcad}!dgcad!gary
C++ - it's the right thing to do.

henry@zoo.toronto.edu (Henry Spencer) (11/23/90)

In article <ianh.659156546@morgana> ianh@bhpmrl.oz.au (Ian Hoyle) writes:
>Since moving to C news in the past week (with the associated monitoring of
>log files to make sure everything is working) I've noticed lots of duplicated
>articles being fed to us from our single, upstream newsfeed...

You shouldn't be getting literally-duplicate articles -- same article twice --
from any properly functioning news system, B or C.  However, beware:  there
has been at least one incident recently of a combination of software oddities
causing repeated posting of articles which *looked* similar but were really
different articles, with different message-IDs.  There is nothing any news
system can do about that.
-- 
"I'm not sure it's possible            | Henry Spencer at U of Toronto Zoology
to explain how X works."               |  henry@zoo.toronto.edu   utzoo!henry

andy@xwkg.Icom.Com (Andrew H. Marrinson) (11/25/90)

henry@zoo.toronto.edu (Henry Spencer) writes:

>In article <ianh.659156546@morgana> ianh@bhpmrl.oz.au (Ian Hoyle) writes:
>>Since moving to C news in the past week (with the associated monitoring of
>>log files to make sure everything is working) I've noticed lots of duplicated
>>articles being fed to us from our single, upstream newsfeed...

>You shouldn't be getting literally-duplicate articles -- same article twice --
>from any properly functioning news system, B or C.  However, beware:  there
>has been at least one incident recently of a combination of software oddities
>causing repeated posting of articles which *looked* similar but were really
>different articles, with different message-IDs.  There is nothing any news
>system can do about that.

Can you tell us a little more about this weird combination?  We are a
leaf site getting our feed only from uunet, and almost have the
articles arriving here recently have been rejected as duplicates.  We
are running C news (with libdbm, not dbz -- is that a problem?).  I
investigated this further today trying to figure out if it was my
problem or someone else's (uunet?).

I wrote some scripts that took the message IDs rejected as duplicates
from the log file and used gethist to construct a path to those
messages.  That showed me that the articles did exist on our system.
But, if the failure is in the dbm fetch returning the wrong datum for
a given key, my technique wouldn't be correct would it!  (I WILL go
back and try and verify that the key resulting from the dbm fetch
matches the one asked for.

We have been running news for a while, so it is possible that we are
having problems with hash buckets filling up or some such, but Mr.
Hoyle said he just installed news so that shouldn't be his problem.
I sure would like to get to the bottom of this.  Either we are missing
about half the articles, or we are paying to transfer twice as much
data as we should be.

Is anyone else running a leaf site fed by uunet seeing lots of
duplicates in their logs?
--
		Andrew H. Marrinson
		Icom Systems, Inc.
		Wheeling, IL, USA
		(andy@icom.icom.com)

andy@xwkg.Icom.Com (Andrew H. Marrinson) (11/25/90)

andy@xwkg.Icom.Com (Andrew H. Marrinson) writes:

>I wrote some scripts that took the message IDs rejected as duplicates
>from the log file and used gethist to construct a path to those...

First of all, that should have been newshist, not gethist.  Sorry.

Second, I just went back and checked, and the message IDs in the log
that were rejected as duplicates match exactly the message IDs
recorded as already having been received in the history file, as well
as matching message IDs in the articles themselves in /usr/spool/news.

Therefore, I assume I haven't run into a problem with libdbm yet --
right?  These certainly look like genuine duplicates.  A problem at
uunet?  (I sent a message to postmaster there, but have had no reply
yet.  You listening James?  Or is someone else responsible for news?)

I take it it is recommended we switch to dbz in any case?
--
		Andrew H. Marrinson
		Icom Systems, Inc.
		Wheeling, IL, USA
		(andy@icom.icom.com)

news@ddi1.UUCP (News Administrator) (11/28/90)

In article <andy.659499523@xwkg> andy@xwkg.Icom.Com (Andrew H. Marrinson) writes:
>
>Is anyone else running a leaf site fed by uunet seeing lots of
>duplicates in their logs?


	Yes, thousands for the last week.  The sampled ones I checked were in
fact duplicates. I am running Bnews, and the postmaster at uunet is silent.
What gives?

henry@zoo.toronto.edu (Henry Spencer) (11/28/90)

In article <andy.659499523@xwkg> andy@xwkg.Icom.Com (Andrew H. Marrinson) writes:
>>... at least one incident recently of a combination of software oddities
>>causing repeated posting of articles which *looked* similar but were really
>>different articles, with different message-IDs...
>
>Can you tell us a little more about this weird combination?

It was a combination of an obsolete version of C News that, in certain
circumstances, fouled up headers slightly, and an overly-helpful B News
that inserts a message-ID header if there isn't one.  So each B News site
that got the original article turned it into a "new" article.  (Sites
running modern C News discarded the original as illegal, but there were
enough B News sites that got it to produce considerable proliferation.)

>We are a
>leaf site getting our feed only from uunet, and almost have the
>articles arriving here recently have been rejected as duplicates...

If articles are being rejected as duplicates, then they really *are*
duplicates, and the "weird combination" business is entirely irrelevant.

>... running C news (with libdbm, not dbz -- is that a problem?).  I
>investigated this further today trying to figure out if it was my
>problem or someone else's (uunet?).

Barring the unlikely possibility that your C News is rejecting things
that it shouldn't be, the problem is at your feed site.  Uunet is in
the middle of some hardware transitions, I believe, and things may be
a bit confused.

Running dbm instead of dbz is slow, but should not break anything.
-- 
"I'm not sure it's possible            | Henry Spencer at U of Toronto Zoology
to explain how X works."               |  henry@zoo.toronto.edu   utzoo!henry

tneff@bfmny0.BFM.COM (Tom Neff) (11/28/90)

If you think you're seeing an unusual number of duplicate articles from
UUNET: you are!  The problem is with their software, not yours.  They
are working on it, but as of this writing it's not fixed yet.

rbj@uunet.UU.NET (Root Boy Jim) (12/04/90)

In article <16070@bfmny0.BFM.COM> tneff@bfmny0.BFM.COM (Tom Neff) writes:
>If you think you're seeing an unusual number of duplicate articles from
>UUNET: you are!  The problem is with their software, not yours.  They
>are working on it, but as of this writing it's not fixed yet.

At last the story can be told! We recently moved news from the Sequent
to the Pyramid. In the process, we used dbz instead of dbm.

DON'T DO THIS UNLESS UNLESS YOU LIKE TO DEBUG SOFTWARE!
EVEN IF YOU *ARE* THE ADVENTURESOME TYPE, DON'T TRY THIS ON A
SYSTEM THAT CAN RETRANSMIT DUPLICATE ARTICLES INTO THE NETWORK!

Here is a rundown on the problems. First, you must call DBMCLOSE
before you exit in order to flush all buffers. If you don't, the
history file will be updated, but the database won't be, and all
new articles will not be found.

The second problem is that dbz uses stdio rather than read/write.
Stdio writes full buffers, not just the data you're interested in.
B news runs many copies of inews/rnews -U, while C news runs one
inews/rnews -U with the history buffer in core, so no one trips over
each other.

Actually, I'm not exactly sure why the preceding matters, unless
data read before a lock is obtained is rewritten. Ask Rick or Henry.

In any case, we are taking responsibility for it.
Where it counts. In the pocket.
Below is Rick's message to all our paying customers:

% From rick Sat Dec  1 12:58:24 1990
% Received: by uunet.UU.NET (5.61/1.14) 
% 	id AA07478; Sat, 1 Dec 90 12:52:15 -0500
% Date: Sat, 1 Dec 90 12:52:15 -0500
% From: rick (Rick Adams)
% Message-Id: <9012011752.AA07478@uunet.UU.NET>
% To: customer-list
% Subject: duplicate news
% Status: RO
% 
% 
% Over the last week or so, we have had problems with duplicate news
% articles.
% 
% In an effort to improve performance, we moved news processing to a
% separate machine. As part of the move, we recompiled Bnews with dbz
% instead of dbm. This allegedly would get us a large speedup.
% 
% Well, it did, but it also accepted a lot of duplicates!
% 
% I believe I've fixed the duplicate problem. (24 hours so far without
% duplicates)
% 
% There may be some still queued for transmission, but we should not be
% queueing up any more duplicates.
% 
% Unfortunately, we have no way of telling how many duplicates were erroneously
% transferred to your sites.
% 
% We will be happy to credit you for the transfer time for the useless
% articles. However, we can't calculate that credit.
% 
% So, if you would like to be credited for the duplicate news you were
% sent, send email to uunet!billing with your estimate of wasted connect
% time and the credit you would like.
% 
% This credit will be reflected on your 1/1/91 invoice.
% 
% Sorry for the trouble.
% 
% --rick
-- 

	Root Boy Jim Cottrell <rbj@uunet.uu.net>
	Close the gap of the dark year in between

jmaynard@thesis1.hsch.utexas.edu (Jay Maynard) (12/04/90)

In article <113417@uunet.UU.NET> rbj@uunet.UU.NET (Root Boy Jim) writes:
[description of tough-to-debug problem deleted]
>In any case, we are taking responsibility for it.
>Where it counts. In the pocket.
>Below is Rick's message to all our paying customers:
[...]
>% We will be happy to credit you for the transfer time for the useless
>% articles. However, we can't calculate that credit.
>% So, if you would like to be credited for the duplicate news you were
>% sent, send email to uunet!billing with your estimate of wasted connect
>% time and the credit you would like.
>% This credit will be reflected on your 1/1/91 invoice.

If anyone needed further proof that uunet's a class act...

No, I'm not a uunet customer...though if an Alternet POP appears in Houston,
I may become one - especially after this one.

-- 
Jay Maynard, EMT-P, K5ZC, PP-ASEL | Never ascribe to malice that which can
jmaynard@thesis1.hsch.utexas.edu  | adequately be explained by stupidity.
  "...flames are a specific art form of Usenet..." -- Gregory C. Woodbury

fletcher@cs.utexas.edu (Fletcher Mattox) (12/05/90)

In article <113417@uunet.UU.NET> rbj@uunet.UU.NET (Root Boy Jim) writes:
>Actually, I'm not exactly sure why the preceding matters, unless
>data read before a lock is obtained is rewritten. Ask Rick or Henry.

Speaking of B news locking, what prevents concurrent instances
of rnews from writing simultaneously to the history file?  A quick 
look at the code suggests savehist() could be doing this.  

mday@iconsys.icon.com (Matt Day) (12/06/90)

In article <264@ddi1.UUCP> news@ddi1.UUCP (News Administrator) writes:
>In article <andy.659499523@xwkg> andy@xwkg.Icom.Com (Andrew H. Marrinson) writes:
>>Is anyone else running a leaf site fed by uunet seeing lots of
>>duplicates in their logs?
>
>	Yes, thousands for the last week.  The sampled ones I checked were in
>fact duplicates. I am running Bnews, and the postmaster at uunet is silent.
>What gives?

According to a message I received from the UUNET postmaster a few days ago,
the duplicates were caused because they switched to dbz from dbx, which
caused some problems.  It looks like things are working fine once again.
-- 
- Matt Day, Sanyo/Icon, mday@iconsys.icon.com || uunet!iconsys!mday

henry@zoo.toronto.edu (Henry Spencer) (12/08/90)

In article <113417@uunet.UU.NET> rbj@uunet.UU.NET (Root Boy Jim) writes:
>The second problem is that dbz uses stdio rather than read/write.
>Stdio writes full buffers, not just the data you're interested in.
>B news runs many copies of inews/rnews -U, while C news runs one
>inews/rnews -U with the history buffer in core, so no one trips over
>each other.

However, dbm also writes full buffers (its own buffers, not stdio's).
I'm still rather puzzled as to why there was such a dramatic difference
in behavior.  Different usage patterns, maybe.  But in general...

Neither dbm nor any version of dbz was ever designed to be written by
multiple customers at once, or to be read while being written.  Having
only one process at a time working on the database is not a random
quirk of C News:  it is an absolute requirement if you want to be sure
of a complete and consistent database.  Software that writes dbm/dbz
databases without proper locking is asking for disaster.

Altering dbz to use read/write rather than stdio seems to reduce the
frequency of difficulties, but it does not make the problem go away.
-- 
"The average pointer, statistically,    |Henry Spencer at U of Toronto Zoology
points somewhere in X." -Hugh Redelmeier| henry@zoo.toronto.edu   utzoo!henry