[news.admin] Who's Messing With The Usenet ?

childers@avsd.UUCP (Richard Childers) (09/16/89)

I've been noticing a lot of duplicate articles recently.

Now, I've been reading the Usenet, on and off, for about five or six years
now, and I have _never_ seen anything like this in my life.

At first, I assumed that it was something wrong with my installation, because
I was mucking around with things, and in fact as a result of trying to get
the IHAVE / SENDME  protocol to work with multiple sites I _was_, for a while,
getting considerable duplication. But a fair amount of rigorous thinking in
the past month has convinced me that I'm not the problem here.

I've been reading for, oh, over a month now, about how duplicate articles
have been appearing across many newsgroups. This was my first hint that it
was a widespread problem that was affecting everybody. Like everyone else,
I watched and waited for someone to do something, for someone to trace the
problem. Nothing happened. Oh, many people tried to find a pattern, but it
wasn't there. I began to smell a rat.

Now, like everyone else, I have gained much from judicious application of
the excellent phrase, "Assume stupidity before malice", derived from Bill
Davidson's .signature, I believe, and I'm still not convinced that what's
happening is anything other than normal hoseheadedness. But the absence of
any sort of pattern in the data acquired from articles' headers makes me
wonder, because it is quite common for people to modify headers for their
own immature reasons. ( Personally, I think it's equivalent to changing
the address on a letter, or otherwise defacing a piece of mail, without the
owner's permission. Quite inconsistent with the tenets of intercooperation
around which the Usenet was founded, more suggestive of children fighting
over a toy than it is suggestive of a tool developed for civilized and
globally relevant purposes of advancing human knowledge and accomplishment,
if you know what I mean. )

So, I finally vowed to explore the matter the next time it bugged me and I
had a few spare moments. Here's some actual real data, made from a small and
thus uncomplicated sampling of a small and low-traffic newsgroup, alt.bbs.

	avsd# cd /usr/spool/news/alt/bbs

	avsd# ls
	231  232  233  234  235  236  237  238  239  240  241

A small sample, about 11 articles, all less than two days old.

	avsd# grep Message-ID *
	231:Message-ID: <4347@udccvax1.acs.udel.EDU>
	232:Message-ID: <11290@kuhub.cc.ukans.edu>
	233:Message-ID: <533@sud509.RAY.COM>
	234:Message-ID: <534@sud509.RAY.COM>
	235:Message-ID: <537@sud509.RAY.COM>
	236:Message-ID: <9626@venera.isi.edu>
	237:Message-ID: <935C18IO029@NCSUVM>
	238:Message-ID: <37058@conexch.UUCP>
	239:Message-ID: <11519@kuhub.cc.ukans.edu>
	240:Message-ID: <37058@conexch.UUCP>
	241:Message-ID: <11519@kuhub.cc.ukans.edu>

Ah, we have some duplicate articles, two out of the last three ...

	avsd# grep Path 239 241

	239:Path: avsd!vixie!decwrl!wuarchive!kuhub.cc.ukans.edu!orand
	241:Path: avsd!octopus!sts!claris!apple!brutus.cs.uiuc.edu!...
		...wuarchive!kuhub.cc.ukans.edu!orand

Now we have some real data. There are three machines which are found in
both "Path:" fields. Two of them are the source and the destination. The
third is "wuarchive".

Now, at this point it would normally be appropriate to run, screaming
accusations all the way, straight to the administration of "wuarchive",
all self-righteous as all get-out, demanding to know what they are doing.

But I'm not sure they are doing _anything_, because I'm assuming that I'm
not the first person who has approached this problem in this fashion. So,
instead, I'm going to take a leap of the imagination and try to imagine
why such a situation might occur, what circumstances might impinge upon
the Usenet that would lead to massive forgery and duplication.

The answer that occurs to me is, quite bluntly, sabotage. It is a well-
-established trick, made exemplar by the actions of senior Usenet people,
to generate forged headers, as I said before, and insert them into the
queue. These articles, given their untraceable nature, are very possibly
forged articles.

The sites found in the "Path:" field are, presumably, interconnected ...
which argues for a fairly sophisticated and detailed effort, not the act
of an average college student, whom would presumably still be so dazzled
by the wealth of information available that s/he would never think of
sabotaging it, incidentally. No, if such an act is being perpetuated, it
is coming from an individual or group thereof with considerable attention
to detail.

Why would someone do such a thing ? I can think of several reasons.

	(1)	Jealousy. There has been considerable territorialism
		lately, people posting to moderated groups and the like,
		commercialist behavior. Some people prefer to make sure
		that if they can't play, nobody can play.

	(2)	Disinformation. The Usenet represents a substantial and
		sophisticated alternative to normal channels of communi-
		-cation, one less subject to control through covert or
		coercive activities, as many of the sponsors are Real Big
		Corporations, not necessarily willing to agree with the
		marching orders of a hypothetically interested government.
		Remember COINTELPRO, multiply by several orders of magnitude
		where information processing capacity and expertise are
		concerned, divide by the number of Constitutional Amendments
		you've seen waylaid recently, and tell me what you get.

	(3)	Stupidity. Someone has some inscrutable motive, or there is
		a random scattering of badly-installed netnews sites that
		appear to approach a significant minority and are scattered
		fairly evenly through the United States. ( Perhaps the next
		phase in this research might be to coordinate efforts to
		identify source(s) by collecting the name of every machine
		that _appears_ to be a problematic machine, using methods
		outlined above, and examine this with an eye for statistical
		anomalies or patterns of placement. For instance, they might
		all fall within a few states. If they are evenly distributed
		geographically, that is possibly evidence of a sophisticated
		effort to muddy the trail, and important to establish. )

	(4)	Malice. Some group has acquired sufficient expertise and
		a invisibly coordinated set of Usenet sites, ostensibly
		independent sites, positioned them in positions of moderate
		but not excessive visibility amongst the crowd, and are
		using their position to damage the Usenet's interconnectivity.

Why, you say ? What's the point ? Well, I think there is a clear end result
here, and it's clogging the channels. Duplicate an article here and there,
one per newsgroup per day, and pretty soon some of the lesser sites are
filling up their disks. Soon, the administrations are calling for things to
be omitted from the 'spool' partitions. Maybe the entire news installation
might be deinstalled, perhaps only those parts of it irrelevant to the
specific commercial mission of the individual companies. It's been going
on for quite a while now, and it's gotten rather noticable at my site. If
we hadn't enlargened our spool partition, we might still be getting regular
"filesystem full" messages, and that was with a _lot_ of space and 'expire'
getting rid of everything less than two days old.

I don't know who's doing it. To tell you the truth, I'm still prepared to
find an error in my thinking, all down the line. But it _seems_ to be common
everywhere, and so I hesitate to discount my hypotheses until I hear from
a few others on the topic, the results of their own research, and their
thoughts / hypotheses. I do know it needs to be fixed, since it won't fix
itself, wherever it's coming from.

I'm also curious if this type of thing has been encountered in other networks,
such as FIDO, which certainly has the circumstances under which such things
might happen. The problem is that I don't know if they have restricted inter-
-connectivity to conform with requirements for linearity, or have allowed
potential looping paths to evolve in their interconnections, compensating
with article ID checks in the software.

I must admit that I'm puzzled as to how this is happening, as netnews is
_supposed_ to be checking articles against the 'active' articles database.
Perhaps the "Message-ID:" field is being invisibly corrupted, or the software
decides by comparing Message-ID and Path, classifying them as identical only
if they _both_ match, to avoid the vague but present possibility of two
articles from divergent sites being generated with identical Message-IDs

Anyhow, some thoughts that have been brewing for about two weeks. I'd like
to hear some responses ... reactions will be reacted to in a vein similar to
that in which they were conveyed in, but intelligent commentary will receive
the respect it deserves.

-- richard

-- 
 *                                                                            *
 *          Intelligence : the ability to create order out of chaos.          *
 *                                                                            *
 *      ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho     *

karl@godiva.cis.ohio-state.edu (Karl Kleinpaste) (09/16/89)

childers@avsd.UUCP (Richard Childers) writes:
   I've been noticing a lot of duplicate articles recently.

Richard, truly I mean no disrespect, but I think you've been spending
too much time around alt.conspiracy.

   I began to smell a rat.

I think it's just news software suffering from bitrot, at your own site.

	   avsd# grep Message-ID *
	   ...
	   238:Message-ID: <37058@conexch.UUCP>
	   239:Message-ID: <11519@kuhub.cc.ukans.edu>
	   240:Message-ID: <37058@conexch.UUCP>
	   241:Message-ID: <11519@kuhub.cc.ukans.edu>
   Ah, we have some duplicate articles, two out of the last three ...

Stop right there.  If your news system is letting in articles with
identical Message-ID's, then there is braindeath in rnews' ability to
poke around in your history file.  That "can't happen" when things are
running properly.

Now, if you were to pull those lines out and check them for invisible
control characters, and found differences...well, then perhaps you'd
have a case for paranoia.  But up to this point, you've just got a
problem with your history file.  Try expire -r and wait a week.

I, for one, have seen relatively little in the way of duplicated
stuff, even the items coming from the reputedly-corrupt `tropix'
system.  Just as a data point against which to compare, I keep a whole
lot of news around (290Mbytes), and alt.bbs ends with these
Message-ID's:

796:Message-ID: <533@sud509.RAY.COM>
797:Message-ID: <534@sud509.RAY.COM>
798:Message-ID: <537@sud509.RAY.COM>
799:Message-ID: <9626@venera.isi.edu>
800:Message-ID: <935C18IO029@NCSUVM>
801:Message-ID: <37058@conexch.UUCP>
802:Message-ID: <11519@kuhub.cc.ukans.edu>
803:Message-ID: <89257.213434JTW106@PSUVM.BITNET>
804:Message-ID: <10349@eerie.acsu.Buffalo.EDU>
805:Message-ID: <1668@ns.network.com>
808:Message-ID: <1652@psuhcx.psu.edu>

No dups, and I have both of the articles for which you got dups.

--Karl

tneff@bfmny0.UU.NET (Tom Neff) (09/16/89)

I don't understand this complaint.  These duplicate articles are proven
to exist by matching Message ID's, correct?  But news is supposed to
eliminate duplicate message ID's before storage.  If this is not
happening at some site then something is broken there.  Sites with
multiple feeds may commonly see duplicates in the batch -- they are not
supposed to make it to the spool directory as individual articles
though.

Rather than engage in lengthy disquisitions on others' motives, I would
get to work tracking down why inews and history are busted at my site.
-- 
'We have luck only with women --    \\\     Tom Neff
          not spacecraft!'         *-((O    tneff@bfmny0.UU.NET
 -- R. Kremnev, builder of FOBOS      \\\   uunet!bfmny0!tneff (UUCP)

werner@utastro.UUCP (Werner Uhrig) (09/16/89)

> Anyhow, some thoughts that have been brewing for about two weeks. I'd like
> to hear some responses ... reactions will be reacted to in a vein similar to
> that in which they were conveyed in, but intelligent commentary will receive
> the respect it deserves.
 
	wow, yeah man, send me some of that stuff ... (I can't believe I read
	all 175 lines of that article;  must be a Friday night, you know) ...

-- 
      ----------->   PREFERED RETURN-ADDRESS FOLLOWS   <--------------

  (ARPA)    werner@rascal.ics.utexas.edu   (Internet: 128.83.144.1)
  (UUCP)    ..!utastro!werner   or  ..!uunet!rascal.ics.utexas.edu!werner

coolidge@brutus.cs.uiuc.edu (John Coolidge) (09/16/89)

childers@avsd.UUCP (Richard Childers) writes:
>	[finds some duplicate articles, apparently with the same messageid]
>	239:Message-ID: <11519@kuhub.cc.ukans.edu>
>	241:Message-ID: <11519@kuhub.cc.ukans.edu>
>	[and checks the path]
>	239:Path: avsd!vixie!decwrl!wuarchive!kuhub.cc.ukans.edu!orand
>	241:Path: avsd!octopus!sts!claris!apple!brutus.cs.uiuc.edu!...
>		...wuarchive!kuhub.cc.ukans.edu!orand

>Now we have some real data. There are three machines which are found in
>both "Path:" fields. Two of them are the source and the destination. The
>third is "wuarchive".

>Now, at this point it would normally be appropriate to run, screaming
>accusations all the way, straight to the administration of "wuarchive",
>all self-righteous as all get-out, demanding to know what they are doing.

Wouldn't be appropriate, and it would be wrong. I've checked on both
wuarchive (where I'm a guest) and brutus (which I run) and we each have
only one copy of the offending article, with (as best as I can tell)
no offensive control characters or any such sillyness.

If there's a Sinister Villain (TM) trashing Message-ID's out there,
it's not wuarchive or brutus. I sincerely doubt it's apple or decwrl
either. Take a good look at the Message-ID's in question. If there
are any control characters or other real differences, one of the
feed sites beyond wuarchive (239) or brutus (241) is causing problems.
If not, your news is messed up and needs an overhaul, since you should
never have two articles with the same Message-ID. Normal duplicates are
caused by old news being resent after the old Message-ID's have been
expired. These, clearly, are not normal duplicates.

One potential news problem that just might do this (I'm insufficiently
acquainted with the internal arcana to be sure) would be the case where
the history file can't be appended to (disk full, maybe?). In that case,
I could imagine news dropping the Message-ID on the floor, then accepting
the same id later. Doesn't seem likely to me (well, *I'D* check for it
when writing a news :-) ), but maybe...

--John

--------------------------------------------------------------------------
John L. Coolidge     Internet:coolidge@cs.uiuc.edu   UUCP:uiucdcs!coolidge
Of course I don't speak for the U of I (or anyone else except myself)
Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed.
You may redistribute this article if and only if your recipients may as well.

marc@lakesys.UUCP (Marc Rassbach) (09/16/89)

[nice text killed]

Hmmmm.   On our end here in Wisc, I'll see replys to threads that I've 
never seen the first, let alone the 4th article that is being replied to.
At first, I thought it was lakesys and the small capacity.   But my
understanding is this is occuring at the UW.

On the subject of FIDOnet.... Major political problems in that network.
People wanting to play God cutting off nodes w/o notice, bombing runs,
(a bombing run is when one re-mails out the same packet.), etc.

With every node on FIDOnet, the volume grows.   FIDOnet is beging to fall
apart...

The originator of this post is correct, UseNet is a VERY good alternative
information source.   I'd hate to see it go.     I've fired off letters to
my rep. (as if it will do any good) supporting Sen. Gore's network proposal.
(Being shell-fish {crab-y  :-) } I want to see better net.traffic.)
When I get the $$$ ahead, I'm planning on getting myself one of them thar   
satalite thingies to get my newsfeed from 0:00-6:00

Go nuts!

							M.R.
							(stands for [M]ad
							bad, and dange[R]ous
							to know....) 


-- 
Marc Rassbach     marc@lakesys	              If you take my advice, that
"I can't C with my AI closed"                 is your problem, not mine!
              If it was said on UseNet, it must be true.

karl@ddsw1.MCS.COM (Karl Denninger) (09/16/89)

We're getting lots of dups too.

But they have DIFFERENT Message IDs here...  Here is but one of the dozen or
so examples I notice each day!

Path: ddsw1!mcdchg!att!dptg!rutgers!caip.rutgers.edu!toccata.rutgers.edu!rlr
From: rlr@toccata.rutgers.edu (Rich Rosen)
Newsgroups: news.admin
Subject: Re: Site Admin stuff -- what if I give boneheads accounts?
Message-ID: <Sep.16.02.00.12.1989.9161@toccata.rutgers.edu>
Date: 16 Sep 89 06:00:14 GMT
References: <7260@medusa.cs.purdue.edu>
Distribution: na

And...

Path: ddsw1!mcdchg!att!dptg!rutgers!caip.rutgers.edu!toccata.rutgers.edu!rlr
From: rlr@toccata.rutgers.edu (Rich Rosen)
Newsgroups: news.admin
Subject: Re: Site Admin stuff -- what if I give boneheads accounts?
Message-ID: <Sep.16.02.02.51.1989.9175@toccata.rutgers.edu>
Date: 16 Sep 89 06:02:52 GMT
References: <2380@flatline.UUCP> <1489@jolnet.ORPK.IL.US> <1989Sep6.001332.12167@NCoast.ORG>
Distribution: na
Organization: RLRCLC
Lines: 40

The differences?  Here's the diff from it:

5,7c5,7
< Message-ID: <Sep.16.02.00.12.1989.9161@toccata.rutgers.edu>
< Date: 16 Sep 89 06:00:14 GMT
< References: <7260@medusa.cs.purdue.edu>
---
> Message-ID: <Sep.16.02.02.51.1989.9175@toccata.rutgers.edu>
> Date: 16 Sep 89 06:02:52 GMT
> References: <2380@flatline.UUCP> <1489@jolnet.ORPK.IL.US> <1989Sep6.001332.12167@NCoast.ORG>
10c10
< Lines: 39
---
> Lines: 40
21a22
> > 

Hmmm.....

It looks to me like someone is using some other mechanism than the Usenet
stuff to post, and in addition their "crosslink" software is posting
articles TWICE!

I've seen a lot of these the last few weeks, and I only read a couple of
dozen newsgroups.   The problem, of course, is that it is darn difficult if
not impossible to track these down automatically, since they are, by
definition, different articles.  I'd have to estimate that I see about 2-5
of these a day -- and I only read a small subpart of the entire net (who
could possibly read the entire thing?!).

Is this a case of broken software, or broken users posting things twice?

--
Karl Denninger (karl@ddsw1.MCS.COM, <well-connected>!ddsw1!karl)
Public Access Data Line: [+1 312 566-8911], Voice: [+1 312 566-8910]
Macro Computer Solutions, Inc.		"Quality Solutions at a Fair Price"

tneff@bfmny0.UU.NET (Tom Neff) (09/17/89)

In article <1989Sep16.152857.14239@ddsw1.MCS.COM> karl@ddsw1.MCS.COM (Karl Denninger) writes:
>Is this a case of broken software, or broken users posting things twice?

In the case of the Rosen piece, almost certainly a broken user.  I got
both of those here too.  It's just luser spoor.  Note that one header
had an Organization: line and the other didn't.

I do not believe there is a plague of dupes.  Any site with more than
one feed which is seeing dupes has something broken in history.  If
someone is corrupting Message IDs via modem noise or something, then
there could be a problem of course.
-- 
'We have luck only with women --    \\\     Tom Neff
          not spacecraft!'         *-((O    tneff@bfmny0.UU.NET
 -- R. Kremnev, builder of FOBOS      \\\   uunet!bfmny0!tneff (UUCP)

elliot@alfred.UUCP (Elliot Dierksen) (09/17/89)

If you don't want to see dups, keep your history files longer. I expire all
the news on my system within 3 days. mainly because of disk space
restrictions. I keep 30 days history though which only takes up 300-500
blocks. I also get some redundant information because I get half of my feed
from one system and half from another and I NEVER see duplicate articles.
Even if someone re-releases an article to the net, if it is still in your
history file it will get shipped to the bit bucket. I use the following
command to purge news on my system (2.11 B) expire -e 3 -E 30. Try that and
I doubt anyone will have dups again!!

-- 
Elliot Dierksen                 UUCP: {peora,ucf-cs,uunet}!tarpit!alfred!elliot

"You can only be you once, but you can be immature forever!"

bill@twwells.com (T. William Wells) (09/17/89)

In article <2056@avsd.UUCP> childers@avsd.UUCP (Richard Childers) writes:
:       avsd# cd /usr/spool/news/alt/bbs
:
:       avsd# ls
:       231  232  233  234  235  236  237  238  239  240  241
:
: A small sample, about 11 articles, all less than two days old.
:
:       avsd# grep Message-ID *
:       231:Message-ID: <4347@udccvax1.acs.udel.EDU>
:       232:Message-ID: <11290@kuhub.cc.ukans.edu>
:       233:Message-ID: <533@sud509.RAY.COM>
:       234:Message-ID: <534@sud509.RAY.COM>
:       235:Message-ID: <537@sud509.RAY.COM>
:       236:Message-ID: <9626@venera.isi.edu>
:       237:Message-ID: <935C18IO029@NCSUVM>
:       238:Message-ID: <37058@conexch.UUCP>
:       239:Message-ID: <11519@kuhub.cc.ukans.edu>
:       240:Message-ID: <37058@conexch.UUCP>
:       241:Message-ID: <11519@kuhub.cc.ukans.edu>

That just means that your news software is broken. Messages with
duplicate message ids should not ever make it into your spool
directory. If two messages arrive with the same id, the later one is
supposed to be discarded. The only residue of this would be in your
log file.

:       239:Path: avsd!vixie!decwrl!wuarchive!kuhub.cc.ukans.edu!orand
:       241:Path: avsd!octopus!sts!claris!apple!brutus.cs.uiuc.edu!...
:               ...wuarchive!kuhub.cc.ukans.edu!orand

This just says that the news left kuhub.cc.ukans.edu, went to
wuarchive, and then was sent via two different paths from there and
that the routes of the message happened to intersect at your site. All
perfectly normal, and the redundancy of it improves the reliability
of the net.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

rlr@toccata.rutgers.edu (Rich Rosen) (09/17/89)

> We're getting lots of dups too.  But they have DIFFERENT Message IDs here... 

> From: rlr@toccata.rutgers.edu (Rich Rosen)
> Newsgroups: news.admin
> Subject: Re: Site Admin stuff -- what if I give boneheads accounts?
> Message-ID: <Sep.16.02.00.12.1989.9161@toccata.rutgers.edu>
> References: <7260@medusa.cs.purdue.edu>
>	And...
> From: rlr@toccata.rutgers.edu (Rich Rosen)
> Newsgroups: news.admin
> Subject: Re: Site Admin stuff -- what if I give boneheads accounts?
> Message-ID: <Sep.16.02.02.51.1989.9175@toccata.rutgers.edu>
> References: <2380@flatline.UUCP> <1489@jolnet.ORPK.IL.US> <1989Sep6.001332.12167@NCoast.ORG>
> Organization: RLRCLC

> It looks to me like someone is using some other mechanism than the Usenet
> stuff to post, and in addition their "crosslink" software is posting
> articles TWICE!

OHMYGOD!!!

The reason I had to post it twice in the first place was *because* "the Usenet
stuff" decided to "fix" (its words) my reference line "for" me.  (I'd always
used 'f' commands to post almost everything, since followup commands, in my day
(oh no, there he goes, talking about the good old days), didn't care what you
put into the article; you could "followup" an article in nose.sinus.headache
and replace the text with a completely different article intended for
elbow.funnybone.gonad and it would go to the intended place; apparently some
'f' functions are sneakier and clevererer nowadays...)  Anyway, since postnews
is an abomination to me, and since rn isn't installed here, I had to figure
out a vnews equivalent.  'f'-ing was the best I could come up with on a
moment's notice.  I wasn't able to cancel either article until much later
because I had no idea how to cancel something I didn't yet know the article-ID
of (it didn't appear here until much later).  I apologize if this caused
conniption fits among news administrators.

> The problem, of course, is that it is darn difficult if
> not impossible to track these down automatically, since they are, by
> definition, different articles.  I'd have to estimate that I see about 2-5
> of these a day -- and I only read a small subpart of the entire net
> (who could possibly read the entire thing?!).
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You're talking to him, but of course that was many eons ago.  I remember
when... [WAR STORIES TOLD WITH CREAKING OLD-MAN JEWISH ACCENT DELETED FOR
SANITY]  I'll bet where you see two (roughly) equivalent articles with
different IDs, you are witnessing an article that was posted twice for
whatever reason.  Since in many locally-networked or remote news servers
you can't tell if your article's been "posted" for an undetermined amount of
time, I can't think of a means of cancelling something that was just missent,
so I'm sure that accounts for a number of these dupes.  (Also the naive user
who doesn't see his/her article immediately and panics...)

> Is this a case of broken software, or broken users posting things twice?

In this case, a broken user, but worry not.  I've been fixed, and I won't be
impregnating the net with a litter of random/duplicated/otherwise inane
articles.  (That should help some people sleep better tonight... :-)
--
"Time to eat all your words, swallow your pride, open your eyes..."
	Rich Rosen	rlr@toccata.rutgers.edu

hoyt@polyslo.CalPoly.EDU (Sir Hoyt) (09/18/89)

In article <1989Sep16.061700.4572@brutus.cs.uiuc.edu> coolidge@cs.uiuc.edu writes:
>the history file can't be appended to (disk full, maybe?). In that case,
>I could imagine news dropping the Message-ID on the floor, then accepting
>the same id later. Doesn't seem likely to me (well, *I'D* check for it
>when writing a news :-) ), but maybe...

	It's happend here at polyslo before.  Expire on B news makes a copy of
	the history file before processing.  Our history file is
	about 4Meg, which makes 8Meg, and when you have ~10Meg free....

	Another problem we had was that we were deleteing old Message-ID 
	form the history file too sone.  We now keep them for 30 days,
	and have not had a problem since.


-- 
John H. Pochmara				 A career is great, 
UUCP: {csun,voder,trwind}!polyslo!hoyt		 But you can't run your 
Internet: hoyt@polyslo.CalPoly.EDU		 fingers through its hair
							-Graffiti 4/13/83

coolidge@brutus.cs.uiuc.edu (John Coolidge) (09/18/89)

hoyt@polyslo.CalPoly.EDU (Sir Hoyt) writes:
>In article <1989Sep16.061700.4572@brutus.cs.uiuc.edu> I write:
>>the history file can't be appended to (disk full, maybe?). In that case,
>>I could imagine news dropping the Message-ID on the floor, then accepting
>>the same id later. Doesn't seem likely to me (well, *I'D* check for it
>>when writing a news :-) ), but maybe...
>	It's happend here at polyslo before.  Expire on B news makes a copy of
>	the history file before processing.  Our history file is
>	about 4Meg, which makes 8Meg, and when you have ~10Meg free....

But does it drop the message id from history while keeping/spooling the
article? That was the behavior that I thought was unlikely... just running
out of history file space seems perfectly reasonable, but reacting to
a failure to append to the history file by keeping the article sounds
somewhat strange...

--John

--------------------------------------------------------------------------
John L. Coolidge     Internet:coolidge@cs.uiuc.edu   UUCP:uiucdcs!coolidge
Of course I don't speak for the U of I (or anyone else except myself)
Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed.
You may redistribute this article if and only if your recipients may as well.

" Maynard) (09/18/89)

In article <1989Sep18.041601.15352@brutus.cs.uiuc.edu> coolidge@cs.uiuc.edu writes:
>But does it drop the message id from history while keeping/spooling the
>article? That was the behavior that I thought was unlikely... just running
>out of history file space seems perfectly reasonable, but reacting to
>a failure to append to the history file by keeping the article sounds
>somewhat strange...

That has happened here before. If I forget to patch the default ulimit
on this SysV.2 system when putting up a new kernel, I usually discover
the problem by seeing a bunch of duplicated articles.

Yes, it's strange, but that's how it happens...BTW, this is on B news
2.11.14.

-- 
Jay Maynard, EMT-P, K5ZC, PP-ASEL   | Never ascribe to malice that which can
jay@splut.conmicro.com       (eieio)| adequately be explained by stupidity.
{attctc,bellcore}!texbell!splut!jay +----------------------------------------
"The unkindest thing you can do for a hungry man is to give him food." - RAH

childers@avsd.UUCP (Richard Childers) (09/18/89)

coolidge@cs.uiuc.edu writes:

>If not, your news is messed up and needs an overhaul, since you should
>never have two articles with the same Message-ID. Normal duplicates are
>caused by old news being resent after the old Message-ID's have been
>expired. These, clearly, are not normal duplicates.

I've had a few people suggest this to me, and I am aware of the possibility.

But I should also make clear that there was a large gap of time between the
time I was playing with IHAVE / SENDME, and getting duplicate articles ...
and the time when I, having deinstalled IHAVE / SENDME a few months prior,
began to _again_ see duplicate news articles. A period of several months.
At the same time, many other people began to comment on the problem ...

I'll be looking into both possibilities as soon as I get a chance.

>--John

-- richard

-- 
 *                                                                            *
 *          Intelligence : the ability to create order out of chaos.          *
 *                                                                            *
 *      ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho     *

childers@avsd.UUCP (Richard Childers) (09/19/89)

I've received one reply so far of a useful nature, ws well as permission to
share the email. In the interests of the many people whom have commented on
the problem of duplicate articles, it is reproduced in edited form below.

				-=*=-

Date: Sat, 16 Sep 89 13:55 CDT
From: sysop@attctc.Dallas.TX.US (Charles Boykin-BBS Admin)
To: avsd!childers
Subject: Re: Who's Messing With The Usenet ?

"Are you running dbz ?"

"... inews has error checking to reject duplicates but this depends upon dbz
 if you are using it. And, dbz doesn't always find the article id when it
 does, in fact, exist in the history file. I had as many as four copies of
 the same articles as a result. From a number of comments, I suspect that
 the use of dbz is fairly widespread, unfortunately. I did a local poll of
 sites connecting to this one and found that 100% of the ones that had
 multiple news feeds and were running dbz had duplicate articles. Since junking
 it, I have none. Further,I was sending the duplicate articles to a downstream
 site also running dbz and they were duplicated on his system with this one his
 only feed."

":-) As I am sending this email, you may certainly distribute it as you
 see fit. I do hope this is of some assistance -"

				-=*=-

-- richard

-- 
 *                                                                            *
 *          Intelligence : the ability to create order out of chaos.          *
 *                                                                            *
 *      ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho     *

childers@avsd.UUCP (Richard Childers) (09/19/89)

bill@twwells.com (T. William Wells) writes:

>That just means that your news software is broken. Messages with
>duplicate message ids should not ever make it into your spool
>directory. If two messages arrive with the same id, the later one is
>supposed to be discarded. The only residue of this would be in your
>log file.

I'm aware of the possibility, but I'm puzzled by the periodicity of the
events. One would assume that broken software would fail consistently,
and there would constantly be a statistically steady stream of duplicate
articles. This is not the case, I installed netnews2.11.b about a year
ago, when I first started working here, and it worked fine. I haven't done
a thing other than edit /usr/lib/news/sys and play with pathalias.

>All perfectly normal, and the redundancy of it improves the reliability
>of the net.

Agreed, provided it works. I understand the _intent_. After all, I sought
duplicate newsfeeds for that precise reason ...

>bill@twwells.com

-- richard



-- 
 *                                                                            *
 *          Intelligence : the ability to create order out of chaos.          *
 *                                                                            *
 *      ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho     *

childers@avsd.UUCP (Richard Childers) (09/19/89)

elliot@alfred.UUCP (Elliot Dierksen) writes:

>If you don't want to see dups, keep your history files longer. I expire all
>the news on my system within 3 days. mainly because of disk space ...

Here's the line in the crontab :

	30 3,21 * * *   root (find /usr/spool/news -size 0 -o -mtime +2
	-exec rm -f {} \; /usr/lib/news/expire -n all,!comp.mail.maps -e 2
	-E 30 )

>I use the following command to purge news on my system (2.11 B)

		 expire -e 3 -E 30
>Try that and I doubt anyone will have dups again!!

A good suggestion, but I took that into account many months ago. That's not
the problem.

>Elliot Dierksen                 UUCP: {peora,ucf-cs,uunet}!tarpit!alfred!elliot
>
>"You can only be you once, but you can be immature forever!"

Isn't that supposed to be "You can only be young once" ?

-- richard


-- richard



-- 
 *                                                                            *
 *          Intelligence : the ability to create order out of chaos.          *
 *                                                                            *
 *      ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho     *

arf@chinet.chi.il.us (Jack Schmidling) (09/19/89)

awhy/e9 
 
Article 3708 (11 more) in news.misc: 
From: childers@avsd.UUCP (Richard Childers) 
Newsgroups: news.admin,news.misc,alt.conspiracy 
Subject: Who's Messing With The Usenet ? 
Keywords: article, duplication, monkey business 
 
>I've been noticing a lot of duplicate articles recently...... 
 
>Now, I've been reading the Usenet, on and off, for about five or six years 
now, and I have _never_ seen anything like this in my life...... 
 
>I've been reading for, oh, over a month now, about how duplicate articles 
have been appearing across many newsgroups......I began to smell a rat. 
 
>The answer that occurs to me is, quite bluntly, sabotage. It is a well- 
-established trick, made exemplar by the actions of senior Usenet people, 
to generate forged headers, as I said before, and insert them into the 
queue. These articles, given their untraceable nature, are very possibly 
forged articles. 
 
>The sites found in the "Path:" field are, presumably, interconnected ... 
which argues for a fairly sophisticated and detailed effort, not the act 
of an average college student, whom would presumably still be so dazzled 
by the wealth of information available that s/he would never think of 
sabotaging it, incidentally. No, if such an act is being perpetuated, it 
is coming from an individual or group thereof with considerable attention 
to detail. 
 
>Why would someone do such a thing ? 
 
ARF says: 
 
This probably belongs on alt.conspiracies but since this is the first time I  
have even looked at this group, I was dazzled by the possible solution to a  
problem that has been bugging me since shortly after I joined usenet around  
June of this year. 
 
I am a born-again Anti-Zionist and post profusely to talk.politics.mideast  
with that point of view.  My usenet site (chinet) loses on the average, about  
50 articles every week.  The way it happens is.... 32 Articles  
talk.politics.mideast read now? (y n)..."y"....sorry all read...bye 
 
My sysop says it's because he has run out of disc space.   
 
It also happens that there are many duplicate articles on this news group and  
they obviously fill up the disk and prevent new articles from being  
processed. 
 
Keeping $Billions flowing to Israel is not to be overlooked as a possible  
reason why "someone would do such a thing". 
 
The Amateur Radio Forum (arf) 

gary@sci34hub.UUCP (Gary Heston) (09/19/89)

In article <1989Sep18.041601.15352@brutus.cs.uiuc.edu>, coolidge@brutus.cs.uiuc.edu (John Coolidge) writes:
> hoyt@polyslo.CalPoly.EDU (Sir Hoyt) writes:
> >In article <1989Sep16.061700.4572@brutus.cs.uiuc.edu> I write:
> >>the history file can't be appended to (disk full, maybe?). In that case,
> >	It's happend here at polyslo before.  Expire on B news makes a copy of
> >	the history file before processing.  Our history file is
> But does it drop the message id from history while keeping/spooling the
> article? That was the behavior that I thought was unlikely... just running

> John L. Coolidge     Internet:coolidge@cs.uiuc.edu   UUCP:uiucdcs!coolidge
> Of course I don't speak for the U of I (or anyone else except myself)
> Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed.
> You may redistribute this article if and only if your recipients may as well.

I think it's highly likely--if the partition containing history runs out of 
space or inodes, then history data is lost. However, the news partition 
itself may be different from the history partition, and would have room 
(and inodes) for the article. My history files are in /usr/lib/news/history*
(the default, I think), which is part of my root partition (/dev/dsk/0s1).
My news files, however, are in /news/spool/news (/news is /dev/dsk/1s0, 
has 65K inodes, and 140M of space...). I have had problems with various 
partitions filling up, and the lost history/kept article sounds quite 
plausable on a configuration like mine. When the article comes in the 
second (or third, etc., assuming you haven't seen the disc full errors--
over a weekend, perhaps) time, r/inews scans the history file, doesn't 
find a match, posts the article and tries to append history again.
Article has space, history doesn't. 

-- 
    Gary Heston     { uunet!gary@sci34hub  }    System Mismanager
   SCI Technology, Inc.  OEM Products Department  (i.e., computers)
      Hestons' First Law: I qualify virtually everything I say.

perry@ccssrv.UUCP (Perry Hutchison) (09/20/89)

In article <Sep.17.12.48.26.1989.11051@toccata.rutgers.edu> rlr@toccata.rutgers.edu (Rich Rosen) writes:

>The reason I had to post it twice in the first place was ...

We got _three_ copies of that one here, all with different message id's and
with minor variations in the text such that it looks like three (not just two)
separate postings.  The first two are less than three minutes apart, and
differ in the References: and number of blank lines.  The third was posted
some 13 hours later, and contains some minor textual revisions relative to
the second.  This is a diff3 comparison of the three messages:

====3
1:1c
2:1c
  Path: ccssrv!sequent!tektronix!zephyr.ens.tek.com!uunet!ginosko!ctrsol!cica!iuvax!rutgers!caip.rutgers.edu!toccata.rutgers.edu!rlr
3:1c
  Path: ccssrv!sequent!tektronix!zephyr.ens.tek.com!uunet!ginosko!usc!apple!rutgers!caip.rutgers.edu!toccata.rutgers.edu!rlr
====
1:5,7c
  Message-ID: <Sep.16.02.00.12.1989.9161@toccata.rutgers.edu>
  Date: 16 Sep 89 06:00:14 GMT
  References: <7260@medusa.cs.purdue.edu>
2:5,7c
  Message-ID: <Sep.16.02.02.51.1989.9175@toccata.rutgers.edu>
  Date: 16 Sep 89 06:02:52 GMT
  References: <2380@flatline.UUCP> <1489@jolnet.ORPK.IL.US> <1989Sep6.001332.12167@NCoast.ORG>
3:5,7c
  Message-ID: <Sep.16.13.38.01.1989.9463@toccata.rutgers.edu>
  Date: 16 Sep 89 17:38:03 GMT
  References: <2380@flatline.UUCP> <1489@jolnet.ORPK.IL.US> <1989Sep6.001332.12167@NCoast.ORG>
====1
1:10c
  Lines: 39
2:10c
3:10c
  Lines: 40
====1
1:21a
2:22c
3:22c
  > 
====3
1:36c
2:37c
  sure to mark whatever/whomever that may be with a stamp that says "UNRELIABLE
3:37c
  sure to mark whatever source that may be with a stamp that says "UNRELIABLE
====3
1:41,43c
2:42,44c
  Deadheads may have bitched and moaned, but I was never restricted from use of
  the net in any way, either during my period of employment at Bell Labs/Bellcore
  or thereafter.  Got it? ... :-?
3:42,44c
  Deadheads may have bitched and moaned, but I was never restricted from use
  of the net in any way, either during my period of employment at Bell Labs or
  Bellcore or anytime thereafter.  Got it? ... :-?

jack@cs.glasgow.ac.uk (Jack Campin) (09/20/89)

arf@chinet.chi.il.us (Jack Schmidling) wrote:
>> I've been reading for, oh, over a month now, about how duplicate articles 
>> have been appearing across many newsgroups [...]   
>> The answer that occurs to me is, quite bluntly, sabotage.
> Keeping $Billions flowing to Israel is not to be overlooked as a possible  
> reason why "someone would do such a thing". 

A screwup in tropix's news system was responsible for most of that (look at
the Path: in the offending articles).  They never told the net what exactly
went wrong, so the same problem may have come up at other places since.
The duplication occurred in many groups with no political content; sci.med
and sci.math, for example.  It is about as plausible to argue that it was a
conspiracy by the alternative dentistry establishment to prevent debunking
of the mercury-fillings scare in sci.med.

Why the people in charge at tropix have decided to keep these details
to themselves when sharing their experience could benefit every news
administrator on the net is entirely beyond me.

-- 
Jack Campin  *  Computing Science Department, Glasgow University, 17 Lilybank
Gardens, Glasgow G12 8QQ, SCOTLAND.    041 339 8855 x6045 wk  041 556 1878 ho
INTERNET: jack%cs.glasgow.ac.uk@nsfnet-relay.ac.uk  USENET: jack@glasgow.uucp
JANET: jack@uk.ac.glasgow.cs     PLINGnet: ...mcvax!ukc!cs.glasgow.ac.uk!jack

mhw@wittsend.lbp.harris.com (Michael H. Warfield (Mike)) (09/21/89)

In article <14684@bfmny0.UU.NET> tneff@bfmny0.UU.NET (Tom Neff) writes:
>I do not believe there is a plague of dupes.  Any site with more than
>one feed which is seeing dupes has something broken in history.  If
>someone is corrupting Message IDs via modem noise or something, then
>there could be a problem of course.

	It's not a plague it's a curse.  I'm getting them here by the ton.
Many are weeks out of date and missing my history file (which goes back
two weeks).  I got dozens of dups in the "comp.dcom.telecom" newsgroup.  I can't
confirm the message id's but I even think I've seen some of my own postings
screwed up.  Note that "comp.dcom.telecom" is moderated so it's not because
of some amateur poster.  Many of the articles should have been out of
circulation for some time.  I'm also noticing a large volume of article
going into junk because they are too far out of date.  The messages all look
clean with reasonable message-id's so the modem corruption idea is also
highly unlikely.  I also haven't seen a modem yet that will store an article
for three weeks.

Michael H. Warfield  (The Mad Wizard)	| gatech.edu!galbp!wittsend!mhw
  (404)  270-2123 / 270-2098		| mhw@wittsend.LBP.HARRIS.COM
An optimist believes we live in the best of all possible worlds.
A pessimist is sure of it!

mhw@wittsend.lbp.harris.com (Michael H. Warfield (Mike)) (09/21/89)

In article <14682@bfmny0.UU.NET> tneff@bfmny0.UU.NET (Tom Neff) writes:
>I don't understand this complaint.  These duplicate articles are proven
>to exist by matching Message ID's, correct?  But news is supposed to
>eliminate duplicate message ID's before storage.  If this is not
>happening at some site then something is broken there.  Sites with
>multiple feeds may commonly see duplicates in the batch -- they are not
>supposed to make it to the spool directory as individual articles
>though.

	Problem is that history does not store message-id's indefinitely.
On my news engine (galbp.lbp.harris.com) I typically run a 7 day expire and
retain message-id's in the history file for 14 days (that makes for 1 - 2Meg
file and 1 - 4Meg file to accomplish that amount of latency).  Some of the
dup's I've seen are as much as four weeks out of date.  If they have and
"expires" header they go straight to junk.  I'm seeing alot of those AND just
how many of us really use that header.  I have identified some articles by
matching up message-id's on save articles with the new ones and there are some
true delayed dup's out there.  There also may be some message-id fudging as
well, I'm not real sure on that.

	I'll try posting somemore details as soon as my snark traps catch
some details.  Loop time is so long, though, and the incidents are so sporatic
that by the time I get hit with another barage, I've always figured the
mess has cleared itself and turned off my traps.

Michael H. Warfield  (The Mad Wizard)	| gatech.edu!galbp!wittsend!mhw
  (404)  270-2123 / 270-2098		| mhw@wittsend.LBP.HARRIS.COM
An optimist believes we live in the best of all possible worlds.
A pessimist is sure of it!

tneff@bfmny0.UU.NET (Tom Neff) (09/21/89)

Not to confuse matters here.

There *is* a completely separate issue involving some site (tropix?)
reposting hideously old articles with munged dates and times so they
don't match history.  That was not the original complaint in this
thread as far as I can tell.

The only cure for the (tropix?) thing is to lengthen your history
retention period.

The phenomenon under discussion here supposedly involves rapidly
duplicated articles.  Inews should catch those, period.
-- 
"My God, Thiokol, when do you     \\    Tom Neff
want me to launch -- next April?"  \\   uunet!bfmny0!tneff

perry@ccssrv.UUCP (Perry Hutchison) (09/21/89)

Here is a pair I noticed today.  One has a very interesting date field.
BTW I don't _look for_ these things, but I do sometimes _notice_ them
when they pop up in groups I read.

<file date Sep 20 02:05>

-> Path: ccssrv!sequent!ogccse!orstcs!rutgers!apple!csibtfr!excelan!edc
-> From: edc@excelan.com (Eric Christensen)
-> Newsgroups: comp.dcom.lans
-> Subject: Re: Netware 2.0a++ performance degradation, how come?
-> Message-ID: <403@excelan.COM>
-> Date: 19 Sep 89 20:24:21 GMT
-> References: <5620@decvax.dec.com>
-> Sender: news@excelan.COM
-> Reply-To: edc@ka.UUCP (Eric Christensen)
-> Organization: Excelan, San Jose, Califonia
-> Lines: 91
-> Posted: Tue Sep 19 13:24:21 1989

<file date Sep 20 02:08>

-> Path: ccssrv!sequent!ogccse!orstcs!rutgers!apple!csibtfr!excelan!edc
-> From: edc@excelan.com (Eric Christensen)
-> Newsgroups: comp.dcom.lans
-> Subject: Re: Netware 2.0a++ performance degradation, how come?
-> Message-ID: <406@excelan.COM>
-> Date: 10 Mar 90 01:08:12 GMT
         ^^^^^^^^^
-> References: <5620@decvax.dec.com>
-> Sender: news@excelan.COM
-> Reply-To: edc@excelan.com (Eric Christensen)
-> Organization: Excelan - A Novell Company, San Jose, Califonia
-> Lines: 91
-> Posted: Fri Mar  9 17:08:12 1990
               ^^^^^^          ^^^^

and here is the diff

5,6c5,6
< Message-ID: <403@excelan.COM>
< Date: 19 Sep 89 20:24:21 GMT
---
> Message-ID: <406@excelan.COM>
> Date: 10 Mar 90 01:08:12 GMT
9,10c9,10
< Reply-To: edc@ka.UUCP (Eric Christensen)
< Organization: Excelan, San Jose, Califonia
---
> Reply-To: edc@excelan.com (Eric Christensen)
> Organization: Excelan - A Novell Company, San Jose, Califonia
12c12
< Posted: Tue Sep 19 13:24:21 1989
---
> Posted: Fri Mar  9 17:08:12 1990
97c97
< around too. Please don't take my suggestion above as reccommendations,
---
> around too. Please don't take my suggestions above as recommendations,

Differences in Reply-To, Organization, and text are suggestive of a double
posting, but how did the second one get a date 6 months in the future?

gene@bu-cs.BU.EDU (Yevgeny Y. Itkis) (09/21/89)

In article <3438@midway.cs.glasgow.ac.uk> jack@cs.glasgow.ac.uk (Jack Campin) writes:
>
>arf@chinet.chi.il.us (Jack Schmidling) wrote:
>>> I've been reading for, oh, over a month now, about how duplicate articles 
>>> have been appearing across many newsgroups [...]   
>>> The answer that occurs to me is, quite bluntly, sabotage.
>
>A screwup in tropix's news system was responsible for most of that (look at
>the Path: in the offending articles).  They never told the net what exactly
>went wrong, so the same problem may have come up at other places since.
>The duplication occurred in many groups with no political content; sci.med
>and sci.math, for example.  It is about as plausible to argue that it was a
>conspiracy by the alternative dentistry establishment to prevent debunking
>of the mercury-fillings scare in sci.med.

Yeah, that's it. And do not forget that most of them are jewish and therefore
part of the zionist plot to.. to... what was it, Jack S.? Oh yeah, to take over
the world, right?
>
>Why the people in charge at tropix have decided to keep these details
>to themselves when sharing their experience could benefit every news
>administrator on the net is entirely beyond me.

You are so naive, Jack Campin. Don't you see they are part of the plot. Don't
you know that most of the administrators are jewish? After all computers are a
kind of media, which, as is well known, is controlled by zionists.
>
>Jack Campin  *  Computing Science Department, Glasgow University, 17 Lilybank

Actually, there is something sad going on right infront of our eyes. I hate to
play a net psychologist, but when I try to think about Jack Schmidling and his
postings (which I rarely read, I admit) I cannot help feeling bad about the guy
(after overcomming the more immidiate reactions :-)). Jack, I am very serious,
I have not told this to any of the people I disagree with. But I really think
you should see someone for help. Jack's personality problems are screaming thru
for attention e.g. in his use of an organization name in place of his proper
name, not to mention maniacal fixation on certain problems. Sometimes I think
that maybe he is just an zionist agent who on a rare occasion brings up a
troubling issue but in such a way that it is obvious to any human being that
the presentation is warped and, therefore, probably does not deserve any
attention. In such a way those clever zionists discredit the organization
(which is probably set up and controlled by them anyways) as well as divert
attention from real problems. Btw, something I never could figure out - am I in
on the plot? Am I a zionist? But these are my personality problems :-)

	-Gene

bob@MorningStar.COM (Bob Sutterfield) (09/22/89)

In article <14715@bfmny0.UU.NET> tneff@bfmny0.UU.NET (Tom Neff) writes:
   ...some site (tropix?)  reposting hideously old articles with
   munged dates and times so they don't match history... The only cure
   for the (tropix?) thing is to lengthen your history retention
   period.

A better cure, requiring fewer megabytes on each of untold thousands
of Usenet sites, would be for the news neighbors of the offending site
to firewall the damage by corking its outbound feed until repairs are
verified complete.  If it's an NNTP feed, say "no no" in nntp_access;
for a UUCP feed, deny access to rnews in Permissions.  The
bogon-generating site would become a news roach motel.

No, this is neither fascist censorship nor asocial shunning.  It's
exercising collective responsibility for damage control.