[net.news.b] article duplications

lmc@denelcor.UUCP (Lyle McElhaney) (06/19/84)

I've recently been wondering why we are getting what seems to me to be a
lot of repeat messages. I haven't much heard anyone else complaining, so I
suspect I've missed a bug fix somewhere along the line. Anyway, it seems the
most of the problems stem from having a message originate from a site that
supports more than one domain. I received a message (one of many) twice from
bbncca, once with the ARPA domain and once via UUCP. The history lines are:

<778@bbncca.ARPA>	Wed, 13-Jun-84 20:51:56 MDT	net.unix-wizards/6863 net.lang.c/1753 
<778@bbncca.UUCP>	Thu, 14-Jun-84 02:12:12 MDT	net.unix-wizards/6877 

The headers show that the mail reached us through two different paths:

  Relay-Version: version B 2.10.1 6/24/83; site denelcor.UUCP
  Path: denelcor!hao!seismo!cmcl2!floyd!whuxle!mit-eddie!genrad!
    decvax!bbncca!keesan
  From: keesan@bbncca.ARPA (Morris Keesan)
  Newsgroups: net.unix-wizards,net.lang.c
  Subject: Re: unsigned char -> unsigned int conversion
  Message-ID: <778@bbncca.ARPA>
  Date: Tue, 12-Jun-84 08:20:27 MDT
  Article-I.D.: bbncca.778
  Posted: Tue Jun 12 08:20:27 1984
  Date-Received: Wed, 13-Jun-84 20:51:56 MDT
  References: <183@haddock.UUCP>
  Organization: Bolt, Beranek and Newman, Cambridge, Ma.
  Lines: 30

  Relay-Version: version B 2.10.1 6/24/83; site denelcor.UUCP
  Path: denelcor!hao!seismo!cmcl2!rna!n44a!wjh12!vaxine!linus!bbncca!keesan
  From: keesan@bbncca.UUCP
  Newsgroups: net.unix-wizards
  Subject: Re: unsigned char -> unsigned int conversion
  Message-ID: <778@bbncca.UUCP>
  Date: Tue, 12-Jun-84 08:20:27 MDT
  Article-I.D.: bbncca.778
  Posted: Tue Jun 12 08:20:27 1984
  Date-Received: Thu, 14-Jun-84 02:12:12 MDT
  References: <183@haddock.UUCP>
  Lines: 30

I've also received pairs of articles via UUCP and SUN (the Australian
network). Has anyone else seen this and/or fixed it? It doesn't seem to
hard to imagine a fix to the history mechanism that scrubs the domain from
the entry, but it seems a little kludgey. Any words?

Thanks and regards,
-- 
		Lyle McElhaney
		(hao,brl-bmd,nbires,csu-cs,scgvaxd)!denelcor!lmc

guy@rlgvax.UUCP (Guy Harris) (06/21/84)

> It doesn't seem to hard to imagine a fix to the history mechanism that
> scrubs the domain from the entry, but it seems a little kludgey. Any words?

It's not just kludgy, it's incorrect.  There could be a site called
"frobozz.UUCP" and a different site called "frobozz.ARPA".  One of the
purposes of the domain structure is to permit a domain to manage the names
of hosts within that domain without having to worry about collisions with
other domains.  Unfortunately, not all sites in the (unofficial) domain
UUCP know how to send mail to sites in the (official) domain ARPA yet, so
sites will let themselves be known by the same name in both domains.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

kre@ucbvax.UUCP (Robert Elz) (06/22/84)

From denelcor!lmc:
>I've recently been wondering why we are getting what seems to me to be a
>lot of repeat messages. I haven't much heard anyone else complaining, so I
>suspect I've missed a bug fix somewhere along the line. Anyway, it seems the
>most of the problems stem from having a message originate from a site that
>supports more than one domain. I received a message (one of many) twice from
>bbncca, once with the ARPA domain and once via UUCP.
>
>I've also received pairs of articles via UUCP and SUN (the Australian
>network). Has anyone else seen this and/or fixed it? It doesn't seem to
>hard to imagine a fix to the history mechanism that scrubs the domain from
>the entry, but it seems a little kludgey. Any words?

From rglvax!guy:
>It's not just kludgy, it's incorrect.  There could be a site called
>"frobozz.UUCP" and a different site called "frobozz.ARPA".
>
>Unfortunately, not all sites in the (unofficial) domain
>UUCP know how to send mail to sites in the (official) domain ARPA yet, so
>sites will let themselves be known by the same name in both domains.

Guy's explanation might be right for UUCP/ARPA, but that's not
what is causing this problem if you get duplicate articles from
Australian sites.  There are no sites in Australia which use
the UUCP domain in news articles (at least, not in the recent past,
this might have been used a long time ago).

My guess is that there is some site somewhere, that is altering
the domain name part of the message id, illegally.

Please, administrators (or anyone else who reads this group)
if you come across any duplicate articles from Australian sites
that exhibit this problem (that is, that appear to have been
sent from the UUCP domain as well as OZ (or SUN)) please mail (MAIL!)
me the headers of the articles.  Eventually, from that info, I
should be able to isolate bad sites from the combination of routes taken.

You can recognize Australian news by the OZ domain (or SUN from some
of the more conservative sites - SUN is going away as quickly as
we can make it happen), or by finding "mulga" anywhere in the
path the news arrived on.

Thanks for your co-operation.

Robert Elz				decvax!mulga!kre  (or ucbvax!kre)

ps: I know that neither "OZ" nor "SUN" are "official" domain names,
so please don't flame at me about that.  Its not going to change,
and nothing that you can say will make it.

pps: This news was not sent from Australia, so a UUCP domain is
OK here!

mp@whuxle.UUCP (Mark Plotnick) (06/23/84)

I think I've found the cause of the duplicate article problem.  It
seems there's at least one site still running an old version of A news
- I heard it's version 1.6.  A news didn't support the Message-ID
field, only the Article-ID field, so the domain is getting stripped off
as it passes through this machine, and the machine after it is regenerating
the Message-ID by adding a .UUCP domain.  I think the A news system
may also truncate site names to 8 characters.

If anyone has tips on making B news work on a PDP11 running PWB1.2, let
me know and I'll forward the fixes on to the site.
	Mark Plotnick

rees@apollo.UUCP (06/25/84)

This happens because some site out there is stripping the 2.10 headers.
It could be an A news site, or pre-2.7 site.  My guess is that it is
site "rna", because I know all the others on the path are 2.9 or later.

Please, if you must run out-of-date software, at least have pity on
the rest of us and don't feed news on to other sites.

lmc@denelcor.UUCP (06/25/84)

Concerning the multiple articles on the net I complained about earlier, I
received this message (quoted without permission, hope its ok):

> From: hao!seismo!harpo!whuxle!mp
> Any article that passes through n44a-rna-wjh12 will have its domain
> stripped off and .UUCP added on.  This has been mentioned for months,
> but nobody there seems to notice.   I think Jim Rees posted a fix that
> only compares message IDs up to the first '.'.  and Horton said this
> defeated the purpose of message IDs (essentially article-IDs
> with domains), which were invented to handle the possibility of
> identically-named machines.

	Name: n44a
	Organization: Harvard Medical School
	Contact:
	Electronic-Address:

	Name: wjh12
	Organization: Harvard University, William James Hall
	Contact: Scott Bradner
	Electronic-Address: {genrad|allegra|ihnp4|amd70}!wjh12!sob

	Name: rna
	Organization: Rockefeller University, Dept. of Neurobiology
	Contact: Daniel Ts'o
	Electronic-Address: cmcl2!rna!dan

I would appeal to the above two gentlemen to find the problem (if indeed
the problem is there) and to fix it, and to lean on their unnamed colleague
at HMS to do likewise.  This has got to be costing the net a lot of money
(if that concerns us) and time (which certainly does).

An alternate solution which could be taken temporarily would be to break the
news link somewhere in the loop until the news problem is isolated. News
stats says some 1100 messages every two weeks come from ARPA; potentially
that many duplicate articles are circulating.

This seems to me to be a case in which a central authority (if we had one)
would be able to force a resolution to the matter. In the best of all worlds
we wouldn't need to do that, but unfortunately....

By the way, mp above says that "this has been mentioned for months..".
Out here in the sticks we haven't heard of it, and I've been a steady reader
of net.news.*. Where has it been mentioned? Why hasn't anyone else made
a stink about it? I can't be the only one getting these extra articles.

Oh, yes, here is a header (that came to us via (guess who?)) that originated
at elecvax.SUN:

>  Relay-Version: version B 2.10.1 6/24/83; site denelcor.UUCP
>  Path: denelcor!hao!seismo!cmcl2!rna!n44a!wjh12!genrad!decvax!mulga!munnari!basser!elecvax!dave
>  From: dave@elecvax.UUCP
>  Newsgroups: net.unix-wizards
>  Subject: Re: XMAGIC: a.out without a valid page 0?
>  Message-ID: <233@elecvax.UUCP>
>  Date: Tue, 12-Jun-84 10:05:33 MDT
>  Article-I.D.: elecvax.233
>  Posted: Tue Jun 12 10:05:33 1984
>  Date-Received: Wed, 13-Jun-84 23:10:10 MDT
>  References: <3732@mordor.UUCP>
>  Lines: 11

-- 
		Lyle McElhaney
		(hao,brl-bmd,nbires,csu-cs,scgvaxd)!denelcor!lmc

mp@whuxle.UUCP (Mark Plotnick) (06/27/84)

...
>From: lmc@denelcor.UUCP
>
>I would appeal to the above two gentlemen to find the problem (if indeed
>the problem is there) and to fix it, and to lean on their unnamed colleague
>at HMS to do likewise.  This has got to be costing the net a lot of money
>(if that concerns us) and time (which certainly does).

I've already been in contact with the two gentlemen, and as I
mentioned in a followup a few days ago, I am looking for a replacement
for the A news that's running under PWB/UNIX on one of these machines.

>Out here in the sticks we haven't heard of it, and I've been a steady reader
>of net.news.*. Where has it been mentioned? Why hasn't anyone else made
>a stink about it? I can't be the only one getting these extra articles.

It's only been mentioned a couple times, mostly implicitly (by people
posting diffs of duplicate articles) rather than explicitly (as you're
doing).  Also, unless you have 3 good newsfeeds you're probably missing
articles; one of our feeds recently folded, and before we got another
hookup to a major site the number of incoming articles (including
duplicates) went down by 66%.  In any case, there's no need to "make a
stink".  Do not flood rna, n44a, and wjh12 with mail.  People know
about the problem and it's being worked on.
	Mark