[net.news] Duplicate articles and munging of message IDs

jerry@oliveb.UUCP (Jerry Aguirre) (04/05/84)

(Please excuse the warmup, I have a legitimate bug at the end)

Having to skip over articles that you are not interested in goes
with the turf.  Having to skip over articles that are not worth
the phone charges to send them goes under the name of tolerance.

But getting the EXACT SAME ARTICLE TWICE is unexcusable.  I
occasionally see duplicate articles.  I check them out to see if
our news is allowing articles with the same ID.  Sometimes I see
that the IDs are different like:
    From: rusty@sun.uucp (Russel Sandberg)
    < Subject: Broadcast ICMP ECHO system crash fix
    < Message-ID: <745@sun.uucp>
    < Date: Mon, 2-Apr-84 16:26:11 PST
    < Article-I.D.: sun.745
    < Posted: Mon Apr  2 16:26:11 1984
    ---
    > Subject: Broadcast ICMP ECHO crash fix
    > Message-ID: <746@sun.uucp>
    > Date: Mon, 2-Apr-84 16:38:30 PST
    > Article-I.D.: sun.746
    > Posted: Mon Apr  2 16:38:30 1984

From the different Subject, ID, and date I assume that Sandberg
posted the article twice.  This probably goes under the heading
of tolerance and human error.

But now for something compleatly different.
    ==> 4005 <==
    Posting-Version: version B 2.10 5/3/83; site bbncca.ARPA
    Path: oliveb!hplabs!tektronix!decvax!bbncca!sdyer
    From: sdyer@bbncca.ARPA (Steve Dyer)
    Newsgroups: net.unix-wizards
    Subject: Re: Minor device numbers: too small!
  > Message-ID: <649@bbncca.ARPA>
    Date: Sat, 31-Mar-84 09:57:42 PST
  * Article-I.D.: bbncca.649
    Posted: Sat Mar 31 09:57:42 1984

    ==> 4030 <==
    Path: oliveb!hplabs!hao!seismo!harpo!decvax!genrad!grkermit!masscomp!clyde!floyd!cmcl2!rna!n44a!wjh12!bbncca!sdyer
    From: sdyer@bbncca.UUCP
    Newsgroups: net.unix-wizards
    Subject: Re: Minor device numbers: too small!
  > Message-ID: <649@bbncca.UUCP>
    Date: Sat, 31-Mar-84 09:57:42 PST
  * Article-I.D.: bbncca.649
    Posted: Sat Mar 31 09:57:42 1984
    Date-Received: Tue, 3-Apr-84 04:05:43 PST

This is an obvious example of the news software allowing the same
article to be received twice.  Two questions:
	1 - Which should the news software be using, the article or
	    message ID?  For news the article ID would seem more
	    correct.  What is the difference between the message and
	    article IDs?
	2 - Who is munging the Message ID?  There was some discussion
	    on the net receintly about such munging of From/Sender
	    lines.  There is some argument for that munging but how
	    do you justify munging a string who's only function is
	    to uniquely identify a message?

Is this a Bug or one of those never to be resolved arpanet gateway
problems?

					    Jerry Aguirre
    {hplabs|fortune|ios|tolerant|allegra|tymix}!oliveb!jerry

mark@cbosgd.UUCP (Mark Horton) (04/10/84)

The Usenet standard (RFC850) specifies that the Message-ID is to be used.
The Article-ID is just there for upward compatibility for 2.9, it's
ignored by 2.10.  What's happening is that some old site (running B 2.6
or earlier or A news) is stripping out Message-ID, and the next 2.10 site
in the path guesses that .UUCP is the domain when it puts it back in.  The
guess is usually right, but in cases like this it's wrong.  Some site in
this path is the culprit:

    Path: oliveb!hplabs!hao!seismo!harpo!decvax!genrad!grkermit!masscomp!clyde!floyd!cmcl2!rna!n44a!wjh12!bbncca!sdyer

I suspect the problem is at rna or n44a, and can best be fixed by making
sure there are no loops in the network that go through these sites.  Or
else getting them to upgrade.

	Mark