childers@avsd.UUCP (Richard Childers) (09/16/89)
I've been noticing a lot of duplicate articles recently. Now, I've been reading the Usenet, on and off, for about five or six years now, and I have _never_ seen anything like this in my life. At first, I assumed that it was something wrong with my installation, because I was mucking around with things, and in fact as a result of trying to get the IHAVE / SENDME protocol to work with multiple sites I _was_, for a while, getting considerable duplication. But a fair amount of rigorous thinking in the past month has convinced me that I'm not the problem here. I've been reading for, oh, over a month now, about how duplicate articles have been appearing across many newsgroups. This was my first hint that it was a widespread problem that was affecting everybody. Like everyone else, I watched and waited for someone to do something, for someone to trace the problem. Nothing happened. Oh, many people tried to find a pattern, but it wasn't there. I began to smell a rat. Now, like everyone else, I have gained much from judicious application of the excellent phrase, "Assume stupidity before malice", derived from Bill Davidson's .signature, I believe, and I'm still not convinced that what's happening is anything other than normal hoseheadedness. But the absence of any sort of pattern in the data acquired from articles' headers makes me wonder, because it is quite common for people to modify headers for their own immature reasons. ( Personally, I think it's equivalent to changing the address on a letter, or otherwise defacing a piece of mail, without the owner's permission. Quite inconsistent with the tenets of intercooperation around which the Usenet was founded, more suggestive of children fighting over a toy than it is suggestive of a tool developed for civilized and globally relevant purposes of advancing human knowledge and accomplishment, if you know what I mean. ) So, I finally vowed to explore the matter the next time it bugged me and I had a few spare moments. Here's some actual real data, made from a small and thus uncomplicated sampling of a small and low-traffic newsgroup, alt.bbs. avsd# cd /usr/spool/news/alt/bbs avsd# ls 231 232 233 234 235 236 237 238 239 240 241 A small sample, about 11 articles, all less than two days old. avsd# grep Message-ID * 231:Message-ID: <4347@udccvax1.acs.udel.EDU> 232:Message-ID: <11290@kuhub.cc.ukans.edu> 233:Message-ID: <533@sud509.RAY.COM> 234:Message-ID: <534@sud509.RAY.COM> 235:Message-ID: <537@sud509.RAY.COM> 236:Message-ID: <9626@venera.isi.edu> 237:Message-ID: <935C18IO029@NCSUVM> 238:Message-ID: <37058@conexch.UUCP> 239:Message-ID: <11519@kuhub.cc.ukans.edu> 240:Message-ID: <37058@conexch.UUCP> 241:Message-ID: <11519@kuhub.cc.ukans.edu> Ah, we have some duplicate articles, two out of the last three ... avsd# grep Path 239 241 239:Path: avsd!vixie!decwrl!wuarchive!kuhub.cc.ukans.edu!orand 241:Path: avsd!octopus!sts!claris!apple!brutus.cs.uiuc.edu!... ...wuarchive!kuhub.cc.ukans.edu!orand Now we have some real data. There are three machines which are found in both "Path:" fields. Two of them are the source and the destination. The third is "wuarchive". Now, at this point it would normally be appropriate to run, screaming accusations all the way, straight to the administration of "wuarchive", all self-righteous as all get-out, demanding to know what they are doing. But I'm not sure they are doing _anything_, because I'm assuming that I'm not the first person who has approached this problem in this fashion. So, instead, I'm going to take a leap of the imagination and try to imagine why such a situation might occur, what circumstances might impinge upon the Usenet that would lead to massive forgery and duplication. The answer that occurs to me is, quite bluntly, sabotage. It is a well- -established trick, made exemplar by the actions of senior Usenet people, to generate forged headers, as I said before, and insert them into the queue. These articles, given their untraceable nature, are very possibly forged articles. The sites found in the "Path:" field are, presumably, interconnected ... which argues for a fairly sophisticated and detailed effort, not the act of an average college student, whom would presumably still be so dazzled by the wealth of information available that s/he would never think of sabotaging it, incidentally. No, if such an act is being perpetuated, it is coming from an individual or group thereof with considerable attention to detail. Why would someone do such a thing ? I can think of several reasons. (1) Jealousy. There has been considerable territorialism lately, people posting to moderated groups and the like, commercialist behavior. Some people prefer to make sure that if they can't play, nobody can play. (2) Disinformation. The Usenet represents a substantial and sophisticated alternative to normal channels of communi- -cation, one less subject to control through covert or coercive activities, as many of the sponsors are Real Big Corporations, not necessarily willing to agree with the marching orders of a hypothetically interested government. Remember COINTELPRO, multiply by several orders of magnitude where information processing capacity and expertise are concerned, divide by the number of Constitutional Amendments you've seen waylaid recently, and tell me what you get. (3) Stupidity. Someone has some inscrutable motive, or there is a random scattering of badly-installed netnews sites that appear to approach a significant minority and are scattered fairly evenly through the United States. ( Perhaps the next phase in this research might be to coordinate efforts to identify source(s) by collecting the name of every machine that _appears_ to be a problematic machine, using methods outlined above, and examine this with an eye for statistical anomalies or patterns of placement. For instance, they might all fall within a few states. If they are evenly distributed geographically, that is possibly evidence of a sophisticated effort to muddy the trail, and important to establish. ) (4) Malice. Some group has acquired sufficient expertise and a invisibly coordinated set of Usenet sites, ostensibly independent sites, positioned them in positions of moderate but not excessive visibility amongst the crowd, and are using their position to damage the Usenet's interconnectivity. Why, you say ? What's the point ? Well, I think there is a clear end result here, and it's clogging the channels. Duplicate an article here and there, one per newsgroup per day, and pretty soon some of the lesser sites are filling up their disks. Soon, the administrations are calling for things to be omitted from the 'spool' partitions. Maybe the entire news installation might be deinstalled, perhaps only those parts of it irrelevant to the specific commercial mission of the individual companies. It's been going on for quite a while now, and it's gotten rather noticable at my site. If we hadn't enlargened our spool partition, we might still be getting regular "filesystem full" messages, and that was with a _lot_ of space and 'expire' getting rid of everything less than two days old. I don't know who's doing it. To tell you the truth, I'm still prepared to find an error in my thinking, all down the line. But it _seems_ to be common everywhere, and so I hesitate to discount my hypotheses until I hear from a few others on the topic, the results of their own research, and their thoughts / hypotheses. I do know it needs to be fixed, since it won't fix itself, wherever it's coming from. I'm also curious if this type of thing has been encountered in other networks, such as FIDO, which certainly has the circumstances under which such things might happen. The problem is that I don't know if they have restricted inter- -connectivity to conform with requirements for linearity, or have allowed potential looping paths to evolve in their interconnections, compensating with article ID checks in the software. I must admit that I'm puzzled as to how this is happening, as netnews is _supposed_ to be checking articles against the 'active' articles database. Perhaps the "Message-ID:" field is being invisibly corrupted, or the software decides by comparing Message-ID and Path, classifying them as identical only if they _both_ match, to avoid the vague but present possibility of two articles from divergent sites being generated with identical Message-IDs Anyhow, some thoughts that have been brewing for about two weeks. I'd like to hear some responses ... reactions will be reacted to in a vein similar to that in which they were conveyed in, but intelligent commentary will receive the respect it deserves. -- richard -- * * * Intelligence : the ability to create order out of chaos. * * * * ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho *
karl@godiva.cis.ohio-state.edu (Karl Kleinpaste) (09/16/89)
childers@avsd.UUCP (Richard Childers) writes:
I've been noticing a lot of duplicate articles recently.
Richard, truly I mean no disrespect, but I think you've been spending
too much time around alt.conspiracy.
I began to smell a rat.
I think it's just news software suffering from bitrot, at your own site.
avsd# grep Message-ID *
...
238:Message-ID: <37058@conexch.UUCP>
239:Message-ID: <11519@kuhub.cc.ukans.edu>
240:Message-ID: <37058@conexch.UUCP>
241:Message-ID: <11519@kuhub.cc.ukans.edu>
Ah, we have some duplicate articles, two out of the last three ...
Stop right there. If your news system is letting in articles with
identical Message-ID's, then there is braindeath in rnews' ability to
poke around in your history file. That "can't happen" when things are
running properly.
Now, if you were to pull those lines out and check them for invisible
control characters, and found differences...well, then perhaps you'd
have a case for paranoia. But up to this point, you've just got a
problem with your history file. Try expire -r and wait a week.
I, for one, have seen relatively little in the way of duplicated
stuff, even the items coming from the reputedly-corrupt `tropix'
system. Just as a data point against which to compare, I keep a whole
lot of news around (290Mbytes), and alt.bbs ends with these
Message-ID's:
796:Message-ID: <533@sud509.RAY.COM>
797:Message-ID: <534@sud509.RAY.COM>
798:Message-ID: <537@sud509.RAY.COM>
799:Message-ID: <9626@venera.isi.edu>
800:Message-ID: <935C18IO029@NCSUVM>
801:Message-ID: <37058@conexch.UUCP>
802:Message-ID: <11519@kuhub.cc.ukans.edu>
803:Message-ID: <89257.213434JTW106@PSUVM.BITNET>
804:Message-ID: <10349@eerie.acsu.Buffalo.EDU>
805:Message-ID: <1668@ns.network.com>
808:Message-ID: <1652@psuhcx.psu.edu>
No dups, and I have both of the articles for which you got dups.
--Karl
tneff@bfmny0.UU.NET (Tom Neff) (09/16/89)
I don't understand this complaint. These duplicate articles are proven to exist by matching Message ID's, correct? But news is supposed to eliminate duplicate message ID's before storage. If this is not happening at some site then something is broken there. Sites with multiple feeds may commonly see duplicates in the batch -- they are not supposed to make it to the spool directory as individual articles though. Rather than engage in lengthy disquisitions on others' motives, I would get to work tracking down why inews and history are busted at my site. -- 'We have luck only with women -- \\\ Tom Neff not spacecraft!' *-((O tneff@bfmny0.UU.NET -- R. Kremnev, builder of FOBOS \\\ uunet!bfmny0!tneff (UUCP)
werner@utastro.UUCP (Werner Uhrig) (09/16/89)
> Anyhow, some thoughts that have been brewing for about two weeks. I'd like > to hear some responses ... reactions will be reacted to in a vein similar to > that in which they were conveyed in, but intelligent commentary will receive > the respect it deserves. wow, yeah man, send me some of that stuff ... (I can't believe I read all 175 lines of that article; must be a Friday night, you know) ... -- -----------> PREFERED RETURN-ADDRESS FOLLOWS <-------------- (ARPA) werner@rascal.ics.utexas.edu (Internet: 128.83.144.1) (UUCP) ..!utastro!werner or ..!uunet!rascal.ics.utexas.edu!werner
coolidge@brutus.cs.uiuc.edu (John Coolidge) (09/16/89)
childers@avsd.UUCP (Richard Childers) writes: > [finds some duplicate articles, apparently with the same messageid] > 239:Message-ID: <11519@kuhub.cc.ukans.edu> > 241:Message-ID: <11519@kuhub.cc.ukans.edu> > [and checks the path] > 239:Path: avsd!vixie!decwrl!wuarchive!kuhub.cc.ukans.edu!orand > 241:Path: avsd!octopus!sts!claris!apple!brutus.cs.uiuc.edu!... > ...wuarchive!kuhub.cc.ukans.edu!orand >Now we have some real data. There are three machines which are found in >both "Path:" fields. Two of them are the source and the destination. The >third is "wuarchive". >Now, at this point it would normally be appropriate to run, screaming >accusations all the way, straight to the administration of "wuarchive", >all self-righteous as all get-out, demanding to know what they are doing. Wouldn't be appropriate, and it would be wrong. I've checked on both wuarchive (where I'm a guest) and brutus (which I run) and we each have only one copy of the offending article, with (as best as I can tell) no offensive control characters or any such sillyness. If there's a Sinister Villain (TM) trashing Message-ID's out there, it's not wuarchive or brutus. I sincerely doubt it's apple or decwrl either. Take a good look at the Message-ID's in question. If there are any control characters or other real differences, one of the feed sites beyond wuarchive (239) or brutus (241) is causing problems. If not, your news is messed up and needs an overhaul, since you should never have two articles with the same Message-ID. Normal duplicates are caused by old news being resent after the old Message-ID's have been expired. These, clearly, are not normal duplicates. One potential news problem that just might do this (I'm insufficiently acquainted with the internal arcana to be sure) would be the case where the history file can't be appended to (disk full, maybe?). In that case, I could imagine news dropping the Message-ID on the floor, then accepting the same id later. Doesn't seem likely to me (well, *I'D* check for it when writing a news :-) ), but maybe... --John -------------------------------------------------------------------------- John L. Coolidge Internet:coolidge@cs.uiuc.edu UUCP:uiucdcs!coolidge Of course I don't speak for the U of I (or anyone else except myself) Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed. You may redistribute this article if and only if your recipients may as well.
marc@lakesys.UUCP (Marc Rassbach) (09/16/89)
[nice text killed] Hmmmm. On our end here in Wisc, I'll see replys to threads that I've never seen the first, let alone the 4th article that is being replied to. At first, I thought it was lakesys and the small capacity. But my understanding is this is occuring at the UW. On the subject of FIDOnet.... Major political problems in that network. People wanting to play God cutting off nodes w/o notice, bombing runs, (a bombing run is when one re-mails out the same packet.), etc. With every node on FIDOnet, the volume grows. FIDOnet is beging to fall apart... The originator of this post is correct, UseNet is a VERY good alternative information source. I'd hate to see it go. I've fired off letters to my rep. (as if it will do any good) supporting Sen. Gore's network proposal. (Being shell-fish {crab-y :-) } I want to see better net.traffic.) When I get the $$$ ahead, I'm planning on getting myself one of them thar satalite thingies to get my newsfeed from 0:00-6:00 Go nuts! M.R. (stands for [M]ad bad, and dange[R]ous to know....) -- Marc Rassbach marc@lakesys If you take my advice, that "I can't C with my AI closed" is your problem, not mine! If it was said on UseNet, it must be true.
bill@twwells.com (T. William Wells) (09/17/89)
In article <2056@avsd.UUCP> childers@avsd.UUCP (Richard Childers) writes:
: avsd# cd /usr/spool/news/alt/bbs
:
: avsd# ls
: 231 232 233 234 235 236 237 238 239 240 241
:
: A small sample, about 11 articles, all less than two days old.
:
: avsd# grep Message-ID *
: 231:Message-ID: <4347@udccvax1.acs.udel.EDU>
: 232:Message-ID: <11290@kuhub.cc.ukans.edu>
: 233:Message-ID: <533@sud509.RAY.COM>
: 234:Message-ID: <534@sud509.RAY.COM>
: 235:Message-ID: <537@sud509.RAY.COM>
: 236:Message-ID: <9626@venera.isi.edu>
: 237:Message-ID: <935C18IO029@NCSUVM>
: 238:Message-ID: <37058@conexch.UUCP>
: 239:Message-ID: <11519@kuhub.cc.ukans.edu>
: 240:Message-ID: <37058@conexch.UUCP>
: 241:Message-ID: <11519@kuhub.cc.ukans.edu>
That just means that your news software is broken. Messages with
duplicate message ids should not ever make it into your spool
directory. If two messages arrive with the same id, the later one is
supposed to be discarded. The only residue of this would be in your
log file.
: 239:Path: avsd!vixie!decwrl!wuarchive!kuhub.cc.ukans.edu!orand
: 241:Path: avsd!octopus!sts!claris!apple!brutus.cs.uiuc.edu!...
: ...wuarchive!kuhub.cc.ukans.edu!orand
This just says that the news left kuhub.cc.ukans.edu, went to
wuarchive, and then was sent via two different paths from there and
that the routes of the message happened to intersect at your site. All
perfectly normal, and the redundancy of it improves the reliability
of the net.
---
Bill { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com
hoyt@polyslo.CalPoly.EDU (Sir Hoyt) (09/18/89)
In article <1989Sep16.061700.4572@brutus.cs.uiuc.edu> coolidge@cs.uiuc.edu writes: >the history file can't be appended to (disk full, maybe?). In that case, >I could imagine news dropping the Message-ID on the floor, then accepting >the same id later. Doesn't seem likely to me (well, *I'D* check for it >when writing a news :-) ), but maybe... It's happend here at polyslo before. Expire on B news makes a copy of the history file before processing. Our history file is about 4Meg, which makes 8Meg, and when you have ~10Meg free.... Another problem we had was that we were deleteing old Message-ID form the history file too sone. We now keep them for 30 days, and have not had a problem since. -- John H. Pochmara A career is great, UUCP: {csun,voder,trwind}!polyslo!hoyt But you can't run your Internet: hoyt@polyslo.CalPoly.EDU fingers through its hair -Graffiti 4/13/83
coolidge@brutus.cs.uiuc.edu (John Coolidge) (09/18/89)
hoyt@polyslo.CalPoly.EDU (Sir Hoyt) writes: >In article <1989Sep16.061700.4572@brutus.cs.uiuc.edu> I write: >>the history file can't be appended to (disk full, maybe?). In that case, >>I could imagine news dropping the Message-ID on the floor, then accepting >>the same id later. Doesn't seem likely to me (well, *I'D* check for it >>when writing a news :-) ), but maybe... > It's happend here at polyslo before. Expire on B news makes a copy of > the history file before processing. Our history file is > about 4Meg, which makes 8Meg, and when you have ~10Meg free.... But does it drop the message id from history while keeping/spooling the article? That was the behavior that I thought was unlikely... just running out of history file space seems perfectly reasonable, but reacting to a failure to append to the history file by keeping the article sounds somewhat strange... --John -------------------------------------------------------------------------- John L. Coolidge Internet:coolidge@cs.uiuc.edu UUCP:uiucdcs!coolidge Of course I don't speak for the U of I (or anyone else except myself) Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed. You may redistribute this article if and only if your recipients may as well.
childers@avsd.UUCP (Richard Childers) (09/18/89)
coolidge@cs.uiuc.edu writes: >If not, your news is messed up and needs an overhaul, since you should >never have two articles with the same Message-ID. Normal duplicates are >caused by old news being resent after the old Message-ID's have been >expired. These, clearly, are not normal duplicates. I've had a few people suggest this to me, and I am aware of the possibility. But I should also make clear that there was a large gap of time between the time I was playing with IHAVE / SENDME, and getting duplicate articles ... and the time when I, having deinstalled IHAVE / SENDME a few months prior, began to _again_ see duplicate news articles. A period of several months. At the same time, many other people began to comment on the problem ... I'll be looking into both possibilities as soon as I get a chance. >--John -- richard -- * * * Intelligence : the ability to create order out of chaos. * * * * ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho *
childers@avsd.UUCP (Richard Childers) (09/19/89)
I've received one reply so far of a useful nature, ws well as permission to share the email. In the interests of the many people whom have commented on the problem of duplicate articles, it is reproduced in edited form below. -=*=- Date: Sat, 16 Sep 89 13:55 CDT From: sysop@attctc.Dallas.TX.US (Charles Boykin-BBS Admin) To: avsd!childers Subject: Re: Who's Messing With The Usenet ? "Are you running dbz ?" "... inews has error checking to reject duplicates but this depends upon dbz if you are using it. And, dbz doesn't always find the article id when it does, in fact, exist in the history file. I had as many as four copies of the same articles as a result. From a number of comments, I suspect that the use of dbz is fairly widespread, unfortunately. I did a local poll of sites connecting to this one and found that 100% of the ones that had multiple news feeds and were running dbz had duplicate articles. Since junking it, I have none. Further,I was sending the duplicate articles to a downstream site also running dbz and they were duplicated on his system with this one his only feed." ":-) As I am sending this email, you may certainly distribute it as you see fit. I do hope this is of some assistance -" -=*=- -- richard -- * * * Intelligence : the ability to create order out of chaos. * * * * ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho *
childers@avsd.UUCP (Richard Childers) (09/19/89)
bill@twwells.com (T. William Wells) writes: >That just means that your news software is broken. Messages with >duplicate message ids should not ever make it into your spool >directory. If two messages arrive with the same id, the later one is >supposed to be discarded. The only residue of this would be in your >log file. I'm aware of the possibility, but I'm puzzled by the periodicity of the events. One would assume that broken software would fail consistently, and there would constantly be a statistically steady stream of duplicate articles. This is not the case, I installed netnews2.11.b about a year ago, when I first started working here, and it worked fine. I haven't done a thing other than edit /usr/lib/news/sys and play with pathalias. >All perfectly normal, and the redundancy of it improves the reliability >of the net. Agreed, provided it works. I understand the _intent_. After all, I sought duplicate newsfeeds for that precise reason ... >bill@twwells.com -- richard -- * * * Intelligence : the ability to create order out of chaos. * * * * ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers@tycho *
arf@chinet.chi.il.us (Jack Schmidling) (09/19/89)
awhy/e9 Article 3708 (11 more) in news.misc: From: childers@avsd.UUCP (Richard Childers) Newsgroups: news.admin,news.misc,alt.conspiracy Subject: Who's Messing With The Usenet ? Keywords: article, duplication, monkey business >I've been noticing a lot of duplicate articles recently...... >Now, I've been reading the Usenet, on and off, for about five or six years now, and I have _never_ seen anything like this in my life...... >I've been reading for, oh, over a month now, about how duplicate articles have been appearing across many newsgroups......I began to smell a rat. >The answer that occurs to me is, quite bluntly, sabotage. It is a well- -established trick, made exemplar by the actions of senior Usenet people, to generate forged headers, as I said before, and insert them into the queue. These articles, given their untraceable nature, are very possibly forged articles. >The sites found in the "Path:" field are, presumably, interconnected ... which argues for a fairly sophisticated and detailed effort, not the act of an average college student, whom would presumably still be so dazzled by the wealth of information available that s/he would never think of sabotaging it, incidentally. No, if such an act is being perpetuated, it is coming from an individual or group thereof with considerable attention to detail. >Why would someone do such a thing ? ARF says: This probably belongs on alt.conspiracies but since this is the first time I have even looked at this group, I was dazzled by the possible solution to a problem that has been bugging me since shortly after I joined usenet around June of this year. I am a born-again Anti-Zionist and post profusely to talk.politics.mideast with that point of view. My usenet site (chinet) loses on the average, about 50 articles every week. The way it happens is.... 32 Articles talk.politics.mideast read now? (y n)..."y"....sorry all read...bye My sysop says it's because he has run out of disc space. It also happens that there are many duplicate articles on this news group and they obviously fill up the disk and prevent new articles from being processed. Keeping $Billions flowing to Israel is not to be overlooked as a possible reason why "someone would do such a thing". The Amateur Radio Forum (arf)