childers@avsd.UUCP (Richard Childers) (09/16/89)
I've been noticing a lot of duplicate articles recently. Now, I've been reading the Usenet, on and off, for about five or six years now, and I have _never_ seen anything like this in my life. At first, I assumed that it was something wrong with my installation, because I was mucking around with things, and in fact as a result of trying to get the IHAVE / SENDME protocol to work with multiple sites I _was_, for a while, getting considerable duplication. But a fair amount of rigorous thinking in the past month has convinced me that I'm not the problem here. I've been reading for, oh, over a month now, about how duplicate articles have been appearing across many newsgroups. This was my first hint that it was a widespread problem that was affecting everybody. Like everyone else, I watched and waited for someone to do something, for someone to trace the problem. Nothing happened. Oh, many people tried to find a pattern, but it wasn't there. I began to smell a rat. Now, like everyone else, I have gained much from judicious application of the excellent phrase, "Assume stupidity before malice", derived from Bill Davidson's .signature, I believe, and I'm still not convinced that what's happening is anything other than normal hoseheadedness. But the absence of any sort of pattern in the data acquired from articles' headers makes me wonder, because it is quite common for people to modify headers for their own immature reasons. ( Personally, I think it's equivalent to changing the address on a letter, or otherwise defacing a piece of mail, without the owner's permission. Quite inconsistent with the tenets of intercooperation around which the Usenet was founded, more suggestive of children fighting over a toy than it is suggestive of a tool developed for civilized and globally relevant purposes of advancing human knowledge and accomplishment, if you know what I mean. ) So, I finally vowed to explore the matter the next time it bugged me and I had a few spare moments. Here's some actual real data, made from a small and thus uncomplicated sampling of a small and low-traffic newsgroup, alt.bbs. avsd# cd /usr/spool/news/alt/bbs avsd# ls 231 232 233 234 235 236 237 238 239 240 241 A small sample, about 11 articles, all less than two days old. avsd# grep Message-ID * 231:Message-ID: <4347@udccvax1.acs.udel.EDU> 232:Message-ID: <11290@kuhub.cc.ukans.edu> 233:Message-ID: <533@sud509.RAY.COM> 234:Message-ID: <534@sud509.RAY.COM> 235:Message-ID: <537@sud509.RAY.COM> 236:Message-ID: <9626@venera.isi.edu> 237:Message-ID: <935C18IO029@NCSUVM> 238:Message-ID: <37058@conexch.UUCP> 239:Message-ID: <11519@kuhub.cc.ukans.edu> 240:Message-ID: <37058@conexch.UUCP> 241:Message-ID: <11519@kuhub.cc.ukans.edu> Ah, we have some duplicate articles, two out of the last three ... avsd# grep Path 239 241 239:Path: avsd!vixie!decwrl!wuarchive!kuhub.cc.ukans.edu!orand 241:Path: avsd!octopus!sts!claris!apple!brutus.cs.uiuc.edu!... ...wuarchive!kuhub.cc.ukans.edu!orand Now we have some real data. There are three machines which are found in both "Path:" fields. Two of them are the source and the destination. The third is "wuarchive". Now, at this point it would normally be appropriate to run, screaming accusations all the way, straight to the administration of "wuarchive", all self-righteous as all get-out, demanding to know what they are doing. But I'm not sure they are doing _anything_, because I'm assuming that I'm not the first person who has approached this problem in this fashion. So, instead, I'm going to take a leap of the imagination and try to imagine why such a situation might occur, what circumstances might impinge upon the Usenet that would lead to massive forgery and duplication. The answer that occurs to me is, quite bluntly, sabotage. It is a well- -established trick, made exemplar by the actions of senior Usenet people, to generate forged headers, as I said before, and insert them into the queue. These articles, given their untraceable nature, are very possibly forged articles. The sites found in the "Path:" field are, presumably, interconnected ... which argues for a fairly sophisticated and detailed effort, not the act of an average college student, whom would presumably still be so dazzled by the wealth of information available that s/he would never think of sabotaging it, incidentally. No, if such an act is being perpetuated, it is coming from an individual or group thereof with considerable attention to detail. Why would someone do such a thing ? I can think of several reasons. (1) Jealousy. There has been considerable territorialism lately, people posting to moderated groups and the like, commercialist behavior. Some people prefer to make sure that if they can't play, nobody can play. (2) Disinformation. The Usenet represents a substantial and sophisticated alternative to normal channels of communi- -cation, one less subject to control through covert or coercive activities, as many of the sponsors are Real Big Corporations, not necessarily willing to agree with the marching orders of a hypothetically interested government. Remember COINTELPRO, multiply by several orders of magnitude where information processing capacity and expertise are concerned, divide by the number of Constitutional Amendments you've seen waylaid recently, and tell me what you get. (3) Stupidity. Someone has some inscrutable motive, or there is a random scattering of badly-installed netnews sites that appear to approach a significant minority and are scattered fairly evenly through the United States. ( Perhaps the next phase in this research might be to coordinate efforts to identify source(s) by collecting the name of every machine that _appears_ to be a problematic machine, using methods outlined above, and examine this with an eye for statistical anomalies or patterns of placement. For instance, they might all fall within a few states. If they are evenly distributed geographically, that is possibly evidence of a sophisticated effort to muddy the trail, and important to establish. ) (4) Malice. Some group has acquired sufficient expertise and a invisibly coordinated set of Usenet sites, ostensibly independent sites, positioned them in positions of moderate but not excessive visibility amongst the crowd, and are using their position to damage the Usenet's interconnectivity. Why, you say ? What's the point ? Well, I think there is a clear end result here, and it's clogging the channels. Duplicate an article here and there, one per newsgroup per day, and pretty soon some of the lesser sites are filling up their disks. Soon, the administrations are calling for things to be omitted from the 'spool' partitions. Maybe the entire news installation might be deinstalled, perhaps only those parts of it irrelevant to the specific commercial mission of the individual companies. It's been going on for quite a while now, and it's gotten rather noticable at my site. If we hadn't enlargened our spool partition, we might still be getting regular "filesystem full" messages, and that was with a _lot_ of space and 'expire' getting rid of everything less than two days old. I don't know who's doing it. To tell you the truth, I'm still prepared to find an error in my thinking, all down the line. But it _seems_ to be common everywhere, and so I hesitate to discount my hypotheses until I hear from a few others on the topic, the results of their own research, and their thoughts / hypotheses. I do know it needs to be fixed, since it won't fix itself, wherever it's coming from. I'm also curious if this type of thing has been encountered in other networks, such as FIDO, which certainly has the circumstances under which such things might happen. The problem is that I don't know if they have restricted inter- -connectivity to conform with requirements for linearity, or have allowed potential looping paths to evolve in their interconnections, compensating with article ID checks in the software. I must admit that I'm puzzled as to how this is happening, as netnews is _supposed_ to be checking articles against the 'active' articles database. Perhaps the "Message-ID:" field is being invisibly corrupted, or the software decides by comparing Message-ID and Path, classifying them as identical only if they _both_ match, to avoid the vague but present possibility of two articles from divergent sites being generated with identical Message-IDs Anyhow, some thoughts that have been brewing for about two weeks. I'd like to hear some responses ... reactions will be reacted to in a vein similar to that in which they were conveyed in, but intelligent commentary will receive the respect it deserves. -- richard -- * * * Intelligence : the ability to create order out of chaos. * * * * ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho *
karl@godiva.cis.ohio-state.edu (Karl Kleinpaste) (09/16/89)
childers@avsd.UUCP (Richard Childers) writes:
I've been noticing a lot of duplicate articles recently.
Richard, truly I mean no disrespect, but I think you've been spending
too much time around alt.conspiracy.
I began to smell a rat.
I think it's just news software suffering from bitrot, at your own site.
avsd# grep Message-ID *
...
238:Message-ID: <37058@conexch.UUCP>
239:Message-ID: <11519@kuhub.cc.ukans.edu>
240:Message-ID: <37058@conexch.UUCP>
241:Message-ID: <11519@kuhub.cc.ukans.edu>
Ah, we have some duplicate articles, two out of the last three ...
Stop right there. If your news system is letting in articles with
identical Message-ID's, then there is braindeath in rnews' ability to
poke around in your history file. That "can't happen" when things are
running properly.
Now, if you were to pull those lines out and check them for invisible
control characters, and found differences...well, then perhaps you'd
have a case for paranoia. But up to this point, you've just got a
problem with your history file. Try expire -r and wait a week.
I, for one, have seen relatively little in the way of duplicated
stuff, even the items coming from the reputedly-corrupt `tropix'
system. Just as a data point against which to compare, I keep a whole
lot of news around (290Mbytes), and alt.bbs ends with these
Message-ID's:
796:Message-ID: <533@sud509.RAY.COM>
797:Message-ID: <534@sud509.RAY.COM>
798:Message-ID: <537@sud509.RAY.COM>
799:Message-ID: <9626@venera.isi.edu>
800:Message-ID: <935C18IO029@NCSUVM>
801:Message-ID: <37058@conexch.UUCP>
802:Message-ID: <11519@kuhub.cc.ukans.edu>
803:Message-ID: <89257.213434JTW106@PSUVM.BITNET>
804:Message-ID: <10349@eerie.acsu.Buffalo.EDU>
805:Message-ID: <1668@ns.network.com>
808:Message-ID: <1652@psuhcx.psu.edu>
No dups, and I have both of the articles for which you got dups.
--Karl
tneff@bfmny0.UU.NET (Tom Neff) (09/16/89)
I don't understand this complaint. These duplicate articles are proven to exist by matching Message ID's, correct? But news is supposed to eliminate duplicate message ID's before storage. If this is not happening at some site then something is broken there. Sites with multiple feeds may commonly see duplicates in the batch -- they are not supposed to make it to the spool directory as individual articles though. Rather than engage in lengthy disquisitions on others' motives, I would get to work tracking down why inews and history are busted at my site. -- 'We have luck only with women -- \\\ Tom Neff not spacecraft!' *-((O tneff@bfmny0.UU.NET -- R. Kremnev, builder of FOBOS \\\ uunet!bfmny0!tneff (UUCP)
werner@utastro.UUCP (Werner Uhrig) (09/16/89)
> Anyhow, some thoughts that have been brewing for about two weeks. I'd like > to hear some responses ... reactions will be reacted to in a vein similar to > that in which they were conveyed in, but intelligent commentary will receive > the respect it deserves. wow, yeah man, send me some of that stuff ... (I can't believe I read all 175 lines of that article; must be a Friday night, you know) ... -- -----------> PREFERED RETURN-ADDRESS FOLLOWS <-------------- (ARPA) werner@rascal.ics.utexas.edu (Internet: 128.83.144.1) (UUCP) ..!utastro!werner or ..!uunet!rascal.ics.utexas.edu!werner
coolidge@brutus.cs.uiuc.edu (John Coolidge) (09/16/89)
childers@avsd.UUCP (Richard Childers) writes: > [finds some duplicate articles, apparently with the same messageid] > 239:Message-ID: <11519@kuhub.cc.ukans.edu> > 241:Message-ID: <11519@kuhub.cc.ukans.edu> > [and checks the path] > 239:Path: avsd!vixie!decwrl!wuarchive!kuhub.cc.ukans.edu!orand > 241:Path: avsd!octopus!sts!claris!apple!brutus.cs.uiuc.edu!... > ...wuarchive!kuhub.cc.ukans.edu!orand >Now we have some real data. There are three machines which are found in >both "Path:" fields. Two of them are the source and the destination. The >third is "wuarchive". >Now, at this point it would normally be appropriate to run, screaming >accusations all the way, straight to the administration of "wuarchive", >all self-righteous as all get-out, demanding to know what they are doing. Wouldn't be appropriate, and it would be wrong. I've checked on both wuarchive (where I'm a guest) and brutus (which I run) and we each have only one copy of the offending article, with (as best as I can tell) no offensive control characters or any such sillyness. If there's a Sinister Villain (TM) trashing Message-ID's out there, it's not wuarchive or brutus. I sincerely doubt it's apple or decwrl either. Take a good look at the Message-ID's in question. If there are any control characters or other real differences, one of the feed sites beyond wuarchive (239) or brutus (241) is causing problems. If not, your news is messed up and needs an overhaul, since you should never have two articles with the same Message-ID. Normal duplicates are caused by old news being resent after the old Message-ID's have been expired. These, clearly, are not normal duplicates. One potential news problem that just might do this (I'm insufficiently acquainted with the internal arcana to be sure) would be the case where the history file can't be appended to (disk full, maybe?). In that case, I could imagine news dropping the Message-ID on the floor, then accepting the same id later. Doesn't seem likely to me (well, *I'D* check for it when writing a news :-) ), but maybe... --John -------------------------------------------------------------------------- John L. Coolidge Internet:coolidge@cs.uiuc.edu UUCP:uiucdcs!coolidge Of course I don't speak for the U of I (or anyone else except myself) Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed. You may redistribute this article if and only if your recipients may as well.
marc@lakesys.UUCP (Marc Rassbach) (09/16/89)
[nice text killed] Hmmmm. On our end here in Wisc, I'll see replys to threads that I've never seen the first, let alone the 4th article that is being replied to. At first, I thought it was lakesys and the small capacity. But my understanding is this is occuring at the UW. On the subject of FIDOnet.... Major political problems in that network. People wanting to play God cutting off nodes w/o notice, bombing runs, (a bombing run is when one re-mails out the same packet.), etc. With every node on FIDOnet, the volume grows. FIDOnet is beging to fall apart... The originator of this post is correct, UseNet is a VERY good alternative information source. I'd hate to see it go. I've fired off letters to my rep. (as if it will do any good) supporting Sen. Gore's network proposal. (Being shell-fish {crab-y :-) } I want to see better net.traffic.) When I get the $$$ ahead, I'm planning on getting myself one of them thar satalite thingies to get my newsfeed from 0:00-6:00 Go nuts! M.R. (stands for [M]ad bad, and dange[R]ous to know....) -- Marc Rassbach marc@lakesys If you take my advice, that "I can't C with my AI closed" is your problem, not mine! If it was said on UseNet, it must be true.
karl@ddsw1.MCS.COM (Karl Denninger) (09/16/89)
We're getting lots of dups too. But they have DIFFERENT Message IDs here... Here is but one of the dozen or so examples I notice each day! Path: ddsw1!mcdchg!att!dptg!rutgers!caip.rutgers.edu!toccata.rutgers.edu!rlr From: rlr@toccata.rutgers.edu (Rich Rosen) Newsgroups: news.admin Subject: Re: Site Admin stuff -- what if I give boneheads accounts? Message-ID: <Sep.16.02.00.12.1989.9161@toccata.rutgers.edu> Date: 16 Sep 89 06:00:14 GMT References: <7260@medusa.cs.purdue.edu> Distribution: na And... Path: ddsw1!mcdchg!att!dptg!rutgers!caip.rutgers.edu!toccata.rutgers.edu!rlr From: rlr@toccata.rutgers.edu (Rich Rosen) Newsgroups: news.admin Subject: Re: Site Admin stuff -- what if I give boneheads accounts? Message-ID: <Sep.16.02.02.51.1989.9175@toccata.rutgers.edu> Date: 16 Sep 89 06:02:52 GMT References: <2380@flatline.UUCP> <1489@jolnet.ORPK.IL.US> <1989Sep6.001332.12167@NCoast.ORG> Distribution: na Organization: RLRCLC Lines: 40 The differences? Here's the diff from it: 5,7c5,7 < Message-ID: <Sep.16.02.00.12.1989.9161@toccata.rutgers.edu> < Date: 16 Sep 89 06:00:14 GMT < References: <7260@medusa.cs.purdue.edu> --- > Message-ID: <Sep.16.02.02.51.1989.9175@toccata.rutgers.edu> > Date: 16 Sep 89 06:02:52 GMT > References: <2380@flatline.UUCP> <1489@jolnet.ORPK.IL.US> <1989Sep6.001332.12167@NCoast.ORG> 10c10 < Lines: 39 --- > Lines: 40 21a22 > > Hmmm..... It looks to me like someone is using some other mechanism than the Usenet stuff to post, and in addition their "crosslink" software is posting articles TWICE! I've seen a lot of these the last few weeks, and I only read a couple of dozen newsgroups. The problem, of course, is that it is darn difficult if not impossible to track these down automatically, since they are, by definition, different articles. I'd have to estimate that I see about 2-5 of these a day -- and I only read a small subpart of the entire net (who could possibly read the entire thing?!). Is this a case of broken software, or broken users posting things twice? -- Karl Denninger (karl@ddsw1.MCS.COM, <well-connected>!ddsw1!karl) Public Access Data Line: [+1 312 566-8911], Voice: [+1 312 566-8910] Macro Computer Solutions, Inc. "Quality Solutions at a Fair Price"
tneff@bfmny0.UU.NET (Tom Neff) (09/17/89)
In article <1989Sep16.152857.14239@ddsw1.MCS.COM> karl@ddsw1.MCS.COM (Karl Denninger) writes: >Is this a case of broken software, or broken users posting things twice? In the case of the Rosen piece, almost certainly a broken user. I got both of those here too. It's just luser spoor. Note that one header had an Organization: line and the other didn't. I do not believe there is a plague of dupes. Any site with more than one feed which is seeing dupes has something broken in history. If someone is corrupting Message IDs via modem noise or something, then there could be a problem of course. -- 'We have luck only with women -- \\\ Tom Neff not spacecraft!' *-((O tneff@bfmny0.UU.NET -- R. Kremnev, builder of FOBOS \\\ uunet!bfmny0!tneff (UUCP)
elliot@alfred.UUCP (Elliot Dierksen) (09/17/89)
If you don't want to see dups, keep your history files longer. I expire all the news on my system within 3 days. mainly because of disk space restrictions. I keep 30 days history though which only takes up 300-500 blocks. I also get some redundant information because I get half of my feed from one system and half from another and I NEVER see duplicate articles. Even if someone re-releases an article to the net, if it is still in your history file it will get shipped to the bit bucket. I use the following command to purge news on my system (2.11 B) expire -e 3 -E 30. Try that and I doubt anyone will have dups again!! -- Elliot Dierksen UUCP: {peora,ucf-cs,uunet}!tarpit!alfred!elliot "You can only be you once, but you can be immature forever!"
bill@twwells.com (T. William Wells) (09/17/89)
In article <2056@avsd.UUCP> childers@avsd.UUCP (Richard Childers) writes:
: avsd# cd /usr/spool/news/alt/bbs
:
: avsd# ls
: 231 232 233 234 235 236 237 238 239 240 241
:
: A small sample, about 11 articles, all less than two days old.
:
: avsd# grep Message-ID *
: 231:Message-ID: <4347@udccvax1.acs.udel.EDU>
: 232:Message-ID: <11290@kuhub.cc.ukans.edu>
: 233:Message-ID: <533@sud509.RAY.COM>
: 234:Message-ID: <534@sud509.RAY.COM>
: 235:Message-ID: <537@sud509.RAY.COM>
: 236:Message-ID: <9626@venera.isi.edu>
: 237:Message-ID: <935C18IO029@NCSUVM>
: 238:Message-ID: <37058@conexch.UUCP>
: 239:Message-ID: <11519@kuhub.cc.ukans.edu>
: 240:Message-ID: <37058@conexch.UUCP>
: 241:Message-ID: <11519@kuhub.cc.ukans.edu>
That just means that your news software is broken. Messages with
duplicate message ids should not ever make it into your spool
directory. If two messages arrive with the same id, the later one is
supposed to be discarded. The only residue of this would be in your
log file.
: 239:Path: avsd!vixie!decwrl!wuarchive!kuhub.cc.ukans.edu!orand
: 241:Path: avsd!octopus!sts!claris!apple!brutus.cs.uiuc.edu!...
: ...wuarchive!kuhub.cc.ukans.edu!orand
This just says that the news left kuhub.cc.ukans.edu, went to
wuarchive, and then was sent via two different paths from there and
that the routes of the message happened to intersect at your site. All
perfectly normal, and the redundancy of it improves the reliability
of the net.
---
Bill { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com
rlr@toccata.rutgers.edu (Rich Rosen) (09/17/89)
> We're getting lots of dups too. But they have DIFFERENT Message IDs here... > From: rlr@toccata.rutgers.edu (Rich Rosen) > Newsgroups: news.admin > Subject: Re: Site Admin stuff -- what if I give boneheads accounts? > Message-ID: <Sep.16.02.00.12.1989.9161@toccata.rutgers.edu> > References: <7260@medusa.cs.purdue.edu> > And... > From: rlr@toccata.rutgers.edu (Rich Rosen) > Newsgroups: news.admin > Subject: Re: Site Admin stuff -- what if I give boneheads accounts? > Message-ID: <Sep.16.02.02.51.1989.9175@toccata.rutgers.edu> > References: <2380@flatline.UUCP> <1489@jolnet.ORPK.IL.US> <1989Sep6.001332.12167@NCoast.ORG> > Organization: RLRCLC > It looks to me like someone is using some other mechanism than the Usenet > stuff to post, and in addition their "crosslink" software is posting > articles TWICE! OHMYGOD!!! The reason I had to post it twice in the first place was *because* "the Usenet stuff" decided to "fix" (its words) my reference line "for" me. (I'd always used 'f' commands to post almost everything, since followup commands, in my day (oh no, there he goes, talking about the good old days), didn't care what you put into the article; you could "followup" an article in nose.sinus.headache and replace the text with a completely different article intended for elbow.funnybone.gonad and it would go to the intended place; apparently some 'f' functions are sneakier and clevererer nowadays...) Anyway, since postnews is an abomination to me, and since rn isn't installed here, I had to figure out a vnews equivalent. 'f'-ing was the best I could come up with on a moment's notice. I wasn't able to cancel either article until much later because I had no idea how to cancel something I didn't yet know the article-ID of (it didn't appear here until much later). I apologize if this caused conniption fits among news administrators. > The problem, of course, is that it is darn difficult if > not impossible to track these down automatically, since they are, by > definition, different articles. I'd have to estimate that I see about 2-5 > of these a day -- and I only read a small subpart of the entire net > (who could possibly read the entire thing?!). ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You're talking to him, but of course that was many eons ago. I remember when... [WAR STORIES TOLD WITH CREAKING OLD-MAN JEWISH ACCENT DELETED FOR SANITY] I'll bet where you see two (roughly) equivalent articles with different IDs, you are witnessing an article that was posted twice for whatever reason. Since in many locally-networked or remote news servers you can't tell if your article's been "posted" for an undetermined amount of time, I can't think of a means of cancelling something that was just missent, so I'm sure that accounts for a number of these dupes. (Also the naive user who doesn't see his/her article immediately and panics...) > Is this a case of broken software, or broken users posting things twice? In this case, a broken user, but worry not. I've been fixed, and I won't be impregnating the net with a litter of random/duplicated/otherwise inane articles. (That should help some people sleep better tonight... :-) -- "Time to eat all your words, swallow your pride, open your eyes..." Rich Rosen rlr@toccata.rutgers.edu
hoyt@polyslo.CalPoly.EDU (Sir Hoyt) (09/18/89)
In article <1989Sep16.061700.4572@brutus.cs.uiuc.edu> coolidge@cs.uiuc.edu writes: >the history file can't be appended to (disk full, maybe?). In that case, >I could imagine news dropping the Message-ID on the floor, then accepting >the same id later. Doesn't seem likely to me (well, *I'D* check for it >when writing a news :-) ), but maybe... It's happend here at polyslo before. Expire on B news makes a copy of the history file before processing. Our history file is about 4Meg, which makes 8Meg, and when you have ~10Meg free.... Another problem we had was that we were deleteing old Message-ID form the history file too sone. We now keep them for 30 days, and have not had a problem since. -- John H. Pochmara A career is great, UUCP: {csun,voder,trwind}!polyslo!hoyt But you can't run your Internet: hoyt@polyslo.CalPoly.EDU fingers through its hair -Graffiti 4/13/83
coolidge@brutus.cs.uiuc.edu (John Coolidge) (09/18/89)
hoyt@polyslo.CalPoly.EDU (Sir Hoyt) writes: >In article <1989Sep16.061700.4572@brutus.cs.uiuc.edu> I write: >>the history file can't be appended to (disk full, maybe?). In that case, >>I could imagine news dropping the Message-ID on the floor, then accepting >>the same id later. Doesn't seem likely to me (well, *I'D* check for it >>when writing a news :-) ), but maybe... > It's happend here at polyslo before. Expire on B news makes a copy of > the history file before processing. Our history file is > about 4Meg, which makes 8Meg, and when you have ~10Meg free.... But does it drop the message id from history while keeping/spooling the article? That was the behavior that I thought was unlikely... just running out of history file space seems perfectly reasonable, but reacting to a failure to append to the history file by keeping the article sounds somewhat strange... --John -------------------------------------------------------------------------- John L. Coolidge Internet:coolidge@cs.uiuc.edu UUCP:uiucdcs!coolidge Of course I don't speak for the U of I (or anyone else except myself) Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed. You may redistribute this article if and only if your recipients may as well.
" Maynard) (09/18/89)
In article <1989Sep18.041601.15352@brutus.cs.uiuc.edu> coolidge@cs.uiuc.edu writes: >But does it drop the message id from history while keeping/spooling the >article? That was the behavior that I thought was unlikely... just running >out of history file space seems perfectly reasonable, but reacting to >a failure to append to the history file by keeping the article sounds >somewhat strange... That has happened here before. If I forget to patch the default ulimit on this SysV.2 system when putting up a new kernel, I usually discover the problem by seeing a bunch of duplicated articles. Yes, it's strange, but that's how it happens...BTW, this is on B news 2.11.14. -- Jay Maynard, EMT-P, K5ZC, PP-ASEL | Never ascribe to malice that which can jay@splut.conmicro.com (eieio)| adequately be explained by stupidity. {attctc,bellcore}!texbell!splut!jay +---------------------------------------- "The unkindest thing you can do for a hungry man is to give him food." - RAH
childers@avsd.UUCP (Richard Childers) (09/18/89)
coolidge@cs.uiuc.edu writes: >If not, your news is messed up and needs an overhaul, since you should >never have two articles with the same Message-ID. Normal duplicates are >caused by old news being resent after the old Message-ID's have been >expired. These, clearly, are not normal duplicates. I've had a few people suggest this to me, and I am aware of the possibility. But I should also make clear that there was a large gap of time between the time I was playing with IHAVE / SENDME, and getting duplicate articles ... and the time when I, having deinstalled IHAVE / SENDME a few months prior, began to _again_ see duplicate news articles. A period of several months. At the same time, many other people began to comment on the problem ... I'll be looking into both possibilities as soon as I get a chance. >--John -- richard -- * * * Intelligence : the ability to create order out of chaos. * * * * ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho *
childers@avsd.UUCP (Richard Childers) (09/19/89)
I've received one reply so far of a useful nature, ws well as permission to share the email. In the interests of the many people whom have commented on the problem of duplicate articles, it is reproduced in edited form below. -=*=- Date: Sat, 16 Sep 89 13:55 CDT From: sysop@attctc.Dallas.TX.US (Charles Boykin-BBS Admin) To: avsd!childers Subject: Re: Who's Messing With The Usenet ? "Are you running dbz ?" "... inews has error checking to reject duplicates but this depends upon dbz if you are using it. And, dbz doesn't always find the article id when it does, in fact, exist in the history file. I had as many as four copies of the same articles as a result. From a number of comments, I suspect that the use of dbz is fairly widespread, unfortunately. I did a local poll of sites connecting to this one and found that 100% of the ones that had multiple news feeds and were running dbz had duplicate articles. Since junking it, I have none. Further,I was sending the duplicate articles to a downstream site also running dbz and they were duplicated on his system with this one his only feed." ":-) As I am sending this email, you may certainly distribute it as you see fit. I do hope this is of some assistance -" -=*=- -- richard -- * * * Intelligence : the ability to create order out of chaos. * * * * ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho *
childers@avsd.UUCP (Richard Childers) (09/19/89)
bill@twwells.com (T. William Wells) writes: >That just means that your news software is broken. Messages with >duplicate message ids should not ever make it into your spool >directory. If two messages arrive with the same id, the later one is >supposed to be discarded. The only residue of this would be in your >log file. I'm aware of the possibility, but I'm puzzled by the periodicity of the events. One would assume that broken software would fail consistently, and there would constantly be a statistically steady stream of duplicate articles. This is not the case, I installed netnews2.11.b about a year ago, when I first started working here, and it worked fine. I haven't done a thing other than edit /usr/lib/news/sys and play with pathalias. >All perfectly normal, and the redundancy of it improves the reliability >of the net. Agreed, provided it works. I understand the _intent_. After all, I sought duplicate newsfeeds for that precise reason ... >bill@twwells.com -- richard -- * * * Intelligence : the ability to create order out of chaos. * * * * ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho *
childers@avsd.UUCP (Richard Childers) (09/19/89)
elliot@alfred.UUCP (Elliot Dierksen) writes: >If you don't want to see dups, keep your history files longer. I expire all >the news on my system within 3 days. mainly because of disk space ... Here's the line in the crontab : 30 3,21 * * * root (find /usr/spool/news -size 0 -o -mtime +2 -exec rm -f {} \; /usr/lib/news/expire -n all,!comp.mail.maps -e 2 -E 30 ) >I use the following command to purge news on my system (2.11 B) expire -e 3 -E 30 >Try that and I doubt anyone will have dups again!! A good suggestion, but I took that into account many months ago. That's not the problem. >Elliot Dierksen UUCP: {peora,ucf-cs,uunet}!tarpit!alfred!elliot > >"You can only be you once, but you can be immature forever!" Isn't that supposed to be "You can only be young once" ? -- richard -- richard -- * * * Intelligence : the ability to create order out of chaos. * * * * ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho *
arf@chinet.chi.il.us (Jack Schmidling) (09/19/89)
awhy/e9 Article 3708 (11 more) in news.misc: From: childers@avsd.UUCP (Richard Childers) Newsgroups: news.admin,news.misc,alt.conspiracy Subject: Who's Messing With The Usenet ? Keywords: article, duplication, monkey business >I've been noticing a lot of duplicate articles recently...... >Now, I've been reading the Usenet, on and off, for about five or six years now, and I have _never_ seen anything like this in my life...... >I've been reading for, oh, over a month now, about how duplicate articles have been appearing across many newsgroups......I began to smell a rat. >The answer that occurs to me is, quite bluntly, sabotage. It is a well- -established trick, made exemplar by the actions of senior Usenet people, to generate forged headers, as I said before, and insert them into the queue. These articles, given their untraceable nature, are very possibly forged articles. >The sites found in the "Path:" field are, presumably, interconnected ... which argues for a fairly sophisticated and detailed effort, not the act of an average college student, whom would presumably still be so dazzled by the wealth of information available that s/he would never think of sabotaging it, incidentally. No, if such an act is being perpetuated, it is coming from an individual or group thereof with considerable attention to detail. >Why would someone do such a thing ? ARF says: This probably belongs on alt.conspiracies but since this is the first time I have even looked at this group, I was dazzled by the possible solution to a problem that has been bugging me since shortly after I joined usenet around June of this year. I am a born-again Anti-Zionist and post profusely to talk.politics.mideast with that point of view. My usenet site (chinet) loses on the average, about 50 articles every week. The way it happens is.... 32 Articles talk.politics.mideast read now? (y n)..."y"....sorry all read...bye My sysop says it's because he has run out of disc space. It also happens that there are many duplicate articles on this news group and they obviously fill up the disk and prevent new articles from being processed. Keeping $Billions flowing to Israel is not to be overlooked as a possible reason why "someone would do such a thing". The Amateur Radio Forum (arf)
gary@sci34hub.UUCP (Gary Heston) (09/19/89)
In article <1989Sep18.041601.15352@brutus.cs.uiuc.edu>, coolidge@brutus.cs.uiuc.edu (John Coolidge) writes: > hoyt@polyslo.CalPoly.EDU (Sir Hoyt) writes: > >In article <1989Sep16.061700.4572@brutus.cs.uiuc.edu> I write: > >>the history file can't be appended to (disk full, maybe?). In that case, > > It's happend here at polyslo before. Expire on B news makes a copy of > > the history file before processing. Our history file is > But does it drop the message id from history while keeping/spooling the > article? That was the behavior that I thought was unlikely... just running > John L. Coolidge Internet:coolidge@cs.uiuc.edu UUCP:uiucdcs!coolidge > Of course I don't speak for the U of I (or anyone else except myself) > Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed. > You may redistribute this article if and only if your recipients may as well. I think it's highly likely--if the partition containing history runs out of space or inodes, then history data is lost. However, the news partition itself may be different from the history partition, and would have room (and inodes) for the article. My history files are in /usr/lib/news/history* (the default, I think), which is part of my root partition (/dev/dsk/0s1). My news files, however, are in /news/spool/news (/news is /dev/dsk/1s0, has 65K inodes, and 140M of space...). I have had problems with various partitions filling up, and the lost history/kept article sounds quite plausable on a configuration like mine. When the article comes in the second (or third, etc., assuming you haven't seen the disc full errors-- over a weekend, perhaps) time, r/inews scans the history file, doesn't find a match, posts the article and tries to append history again. Article has space, history doesn't. -- Gary Heston { uunet!gary@sci34hub } System Mismanager SCI Technology, Inc. OEM Products Department (i.e., computers) Hestons' First Law: I qualify virtually everything I say.
perry@ccssrv.UUCP (Perry Hutchison) (09/20/89)
In article <Sep.17.12.48.26.1989.11051@toccata.rutgers.edu> rlr@toccata.rutgers.edu (Rich Rosen) writes: >The reason I had to post it twice in the first place was ... We got _three_ copies of that one here, all with different message id's and with minor variations in the text such that it looks like three (not just two) separate postings. The first two are less than three minutes apart, and differ in the References: and number of blank lines. The third was posted some 13 hours later, and contains some minor textual revisions relative to the second. This is a diff3 comparison of the three messages: ====3 1:1c 2:1c Path: ccssrv!sequent!tektronix!zephyr.ens.tek.com!uunet!ginosko!ctrsol!cica!iuvax!rutgers!caip.rutgers.edu!toccata.rutgers.edu!rlr 3:1c Path: ccssrv!sequent!tektronix!zephyr.ens.tek.com!uunet!ginosko!usc!apple!rutgers!caip.rutgers.edu!toccata.rutgers.edu!rlr ==== 1:5,7c Message-ID: <Sep.16.02.00.12.1989.9161@toccata.rutgers.edu> Date: 16 Sep 89 06:00:14 GMT References: <7260@medusa.cs.purdue.edu> 2:5,7c Message-ID: <Sep.16.02.02.51.1989.9175@toccata.rutgers.edu> Date: 16 Sep 89 06:02:52 GMT References: <2380@flatline.UUCP> <1489@jolnet.ORPK.IL.US> <1989Sep6.001332.12167@NCoast.ORG> 3:5,7c Message-ID: <Sep.16.13.38.01.1989.9463@toccata.rutgers.edu> Date: 16 Sep 89 17:38:03 GMT References: <2380@flatline.UUCP> <1489@jolnet.ORPK.IL.US> <1989Sep6.001332.12167@NCoast.ORG> ====1 1:10c Lines: 39 2:10c 3:10c Lines: 40 ====1 1:21a 2:22c 3:22c > ====3 1:36c 2:37c sure to mark whatever/whomever that may be with a stamp that says "UNRELIABLE 3:37c sure to mark whatever source that may be with a stamp that says "UNRELIABLE ====3 1:41,43c 2:42,44c Deadheads may have bitched and moaned, but I was never restricted from use of the net in any way, either during my period of employment at Bell Labs/Bellcore or thereafter. Got it? ... :-? 3:42,44c Deadheads may have bitched and moaned, but I was never restricted from use of the net in any way, either during my period of employment at Bell Labs or Bellcore or anytime thereafter. Got it? ... :-?
jack@cs.glasgow.ac.uk (Jack Campin) (09/20/89)
arf@chinet.chi.il.us (Jack Schmidling) wrote: >> I've been reading for, oh, over a month now, about how duplicate articles >> have been appearing across many newsgroups [...] >> The answer that occurs to me is, quite bluntly, sabotage. > Keeping $Billions flowing to Israel is not to be overlooked as a possible > reason why "someone would do such a thing". A screwup in tropix's news system was responsible for most of that (look at the Path: in the offending articles). They never told the net what exactly went wrong, so the same problem may have come up at other places since. The duplication occurred in many groups with no political content; sci.med and sci.math, for example. It is about as plausible to argue that it was a conspiracy by the alternative dentistry establishment to prevent debunking of the mercury-fillings scare in sci.med. Why the people in charge at tropix have decided to keep these details to themselves when sharing their experience could benefit every news administrator on the net is entirely beyond me. -- Jack Campin * Computing Science Department, Glasgow University, 17 Lilybank Gardens, Glasgow G12 8QQ, SCOTLAND. 041 339 8855 x6045 wk 041 556 1878 ho INTERNET: jack%cs.glasgow.ac.uk@nsfnet-relay.ac.uk USENET: jack@glasgow.uucp JANET: jack@uk.ac.glasgow.cs PLINGnet: ...mcvax!ukc!cs.glasgow.ac.uk!jack
mhw@wittsend.lbp.harris.com (Michael H. Warfield (Mike)) (09/21/89)
In article <14684@bfmny0.UU.NET> tneff@bfmny0.UU.NET (Tom Neff) writes: >I do not believe there is a plague of dupes. Any site with more than >one feed which is seeing dupes has something broken in history. If >someone is corrupting Message IDs via modem noise or something, then >there could be a problem of course. It's not a plague it's a curse. I'm getting them here by the ton. Many are weeks out of date and missing my history file (which goes back two weeks). I got dozens of dups in the "comp.dcom.telecom" newsgroup. I can't confirm the message id's but I even think I've seen some of my own postings screwed up. Note that "comp.dcom.telecom" is moderated so it's not because of some amateur poster. Many of the articles should have been out of circulation for some time. I'm also noticing a large volume of article going into junk because they are too far out of date. The messages all look clean with reasonable message-id's so the modem corruption idea is also highly unlikely. I also haven't seen a modem yet that will store an article for three weeks. Michael H. Warfield (The Mad Wizard) | gatech.edu!galbp!wittsend!mhw (404) 270-2123 / 270-2098 | mhw@wittsend.LBP.HARRIS.COM An optimist believes we live in the best of all possible worlds. A pessimist is sure of it!
mhw@wittsend.lbp.harris.com (Michael H. Warfield (Mike)) (09/21/89)
In article <14682@bfmny0.UU.NET> tneff@bfmny0.UU.NET (Tom Neff) writes: >I don't understand this complaint. These duplicate articles are proven >to exist by matching Message ID's, correct? But news is supposed to >eliminate duplicate message ID's before storage. If this is not >happening at some site then something is broken there. Sites with >multiple feeds may commonly see duplicates in the batch -- they are not >supposed to make it to the spool directory as individual articles >though. Problem is that history does not store message-id's indefinitely. On my news engine (galbp.lbp.harris.com) I typically run a 7 day expire and retain message-id's in the history file for 14 days (that makes for 1 - 2Meg file and 1 - 4Meg file to accomplish that amount of latency). Some of the dup's I've seen are as much as four weeks out of date. If they have and "expires" header they go straight to junk. I'm seeing alot of those AND just how many of us really use that header. I have identified some articles by matching up message-id's on save articles with the new ones and there are some true delayed dup's out there. There also may be some message-id fudging as well, I'm not real sure on that. I'll try posting somemore details as soon as my snark traps catch some details. Loop time is so long, though, and the incidents are so sporatic that by the time I get hit with another barage, I've always figured the mess has cleared itself and turned off my traps. Michael H. Warfield (The Mad Wizard) | gatech.edu!galbp!wittsend!mhw (404) 270-2123 / 270-2098 | mhw@wittsend.LBP.HARRIS.COM An optimist believes we live in the best of all possible worlds. A pessimist is sure of it!
tneff@bfmny0.UU.NET (Tom Neff) (09/21/89)
Not to confuse matters here. There *is* a completely separate issue involving some site (tropix?) reposting hideously old articles with munged dates and times so they don't match history. That was not the original complaint in this thread as far as I can tell. The only cure for the (tropix?) thing is to lengthen your history retention period. The phenomenon under discussion here supposedly involves rapidly duplicated articles. Inews should catch those, period. -- "My God, Thiokol, when do you \\ Tom Neff want me to launch -- next April?" \\ uunet!bfmny0!tneff
perry@ccssrv.UUCP (Perry Hutchison) (09/21/89)
Here is a pair I noticed today. One has a very interesting date field. BTW I don't _look for_ these things, but I do sometimes _notice_ them when they pop up in groups I read. <file date Sep 20 02:05> -> Path: ccssrv!sequent!ogccse!orstcs!rutgers!apple!csibtfr!excelan!edc -> From: edc@excelan.com (Eric Christensen) -> Newsgroups: comp.dcom.lans -> Subject: Re: Netware 2.0a++ performance degradation, how come? -> Message-ID: <403@excelan.COM> -> Date: 19 Sep 89 20:24:21 GMT -> References: <5620@decvax.dec.com> -> Sender: news@excelan.COM -> Reply-To: edc@ka.UUCP (Eric Christensen) -> Organization: Excelan, San Jose, Califonia -> Lines: 91 -> Posted: Tue Sep 19 13:24:21 1989 <file date Sep 20 02:08> -> Path: ccssrv!sequent!ogccse!orstcs!rutgers!apple!csibtfr!excelan!edc -> From: edc@excelan.com (Eric Christensen) -> Newsgroups: comp.dcom.lans -> Subject: Re: Netware 2.0a++ performance degradation, how come? -> Message-ID: <406@excelan.COM> -> Date: 10 Mar 90 01:08:12 GMT ^^^^^^^^^ -> References: <5620@decvax.dec.com> -> Sender: news@excelan.COM -> Reply-To: edc@excelan.com (Eric Christensen) -> Organization: Excelan - A Novell Company, San Jose, Califonia -> Lines: 91 -> Posted: Fri Mar 9 17:08:12 1990 ^^^^^^ ^^^^ and here is the diff 5,6c5,6 < Message-ID: <403@excelan.COM> < Date: 19 Sep 89 20:24:21 GMT --- > Message-ID: <406@excelan.COM> > Date: 10 Mar 90 01:08:12 GMT 9,10c9,10 < Reply-To: edc@ka.UUCP (Eric Christensen) < Organization: Excelan, San Jose, Califonia --- > Reply-To: edc@excelan.com (Eric Christensen) > Organization: Excelan - A Novell Company, San Jose, Califonia 12c12 < Posted: Tue Sep 19 13:24:21 1989 --- > Posted: Fri Mar 9 17:08:12 1990 97c97 < around too. Please don't take my suggestion above as reccommendations, --- > around too. Please don't take my suggestions above as recommendations, Differences in Reply-To, Organization, and text are suggestive of a double posting, but how did the second one get a date 6 months in the future?
gene@bu-cs.BU.EDU (Yevgeny Y. Itkis) (09/21/89)
In article <3438@midway.cs.glasgow.ac.uk> jack@cs.glasgow.ac.uk (Jack Campin) writes: > >arf@chinet.chi.il.us (Jack Schmidling) wrote: >>> I've been reading for, oh, over a month now, about how duplicate articles >>> have been appearing across many newsgroups [...] >>> The answer that occurs to me is, quite bluntly, sabotage. > >A screwup in tropix's news system was responsible for most of that (look at >the Path: in the offending articles). They never told the net what exactly >went wrong, so the same problem may have come up at other places since. >The duplication occurred in many groups with no political content; sci.med >and sci.math, for example. It is about as plausible to argue that it was a >conspiracy by the alternative dentistry establishment to prevent debunking >of the mercury-fillings scare in sci.med. Yeah, that's it. And do not forget that most of them are jewish and therefore part of the zionist plot to.. to... what was it, Jack S.? Oh yeah, to take over the world, right? > >Why the people in charge at tropix have decided to keep these details >to themselves when sharing their experience could benefit every news >administrator on the net is entirely beyond me. You are so naive, Jack Campin. Don't you see they are part of the plot. Don't you know that most of the administrators are jewish? After all computers are a kind of media, which, as is well known, is controlled by zionists. > >Jack Campin * Computing Science Department, Glasgow University, 17 Lilybank Actually, there is something sad going on right infront of our eyes. I hate to play a net psychologist, but when I try to think about Jack Schmidling and his postings (which I rarely read, I admit) I cannot help feeling bad about the guy (after overcomming the more immidiate reactions :-)). Jack, I am very serious, I have not told this to any of the people I disagree with. But I really think you should see someone for help. Jack's personality problems are screaming thru for attention e.g. in his use of an organization name in place of his proper name, not to mention maniacal fixation on certain problems. Sometimes I think that maybe he is just an zionist agent who on a rare occasion brings up a troubling issue but in such a way that it is obvious to any human being that the presentation is warped and, therefore, probably does not deserve any attention. In such a way those clever zionists discredit the organization (which is probably set up and controlled by them anyways) as well as divert attention from real problems. Btw, something I never could figure out - am I in on the plot? Am I a zionist? But these are my personality problems :-) -Gene
bob@MorningStar.COM (Bob Sutterfield) (09/22/89)
In article <14715@bfmny0.UU.NET> tneff@bfmny0.UU.NET (Tom Neff) writes:
...some site (tropix?) reposting hideously old articles with
munged dates and times so they don't match history... The only cure
for the (tropix?) thing is to lengthen your history retention
period.
A better cure, requiring fewer megabytes on each of untold thousands
of Usenet sites, would be for the news neighbors of the offending site
to firewall the damage by corking its outbound feed until repairs are
verified complete. If it's an NNTP feed, say "no no" in nntp_access;
for a UUCP feed, deny access to rnews in Permissions. The
bogon-generating site would become a news roach motel.
No, this is neither fascist censorship nor asocial shunning. It's
exercising collective responsibility for damage control.