[news.software.b] Supersedes problems with rapid-fire articles

brad@looking.on.ca (Brad Templeton) (08/29/89)

As a recent experiment, I decided to give away the ClariNet wireservice
articles on the voyager flyby to USENET by cross-posting them to
sci.astro.

When a newswire does a major story it releases sometimes dozens of new
versions of that story during the day, all under the same keyword.

Our interface currently uses the Supersedes header to have each version
replace the previous one.

Unfortunately, we got a few complaints.  This is in part due to the fact
that several sites, including watmath (our main connection) are not
handling supersedes as yet.   Anybody have any ideas on how many sites
are running news software that is old enough to not handle supersedes?

Admittedly, nobody on the net has ever made serious use of this header
before. 

Furthermore there may be another problem.  If two articles in a chain come
in between two calls to sendbatch, the first supersedes the other, erasing
it.  Unfortunately the erased article contains the supersedes line needed
to cancel the article that went out in the first sendbatch.  So that article
never gets deleted.   This means supersedes and batching just don't seem
to work well together if articles come rapidly, either from the wire or
by hand when people make corrections.

One messy solution is to give up on supersedes and just use cancel
messages -- lots and lots of them.  I could see generating over 100 per
day, although few would make it past the first few downstream levels.

Any other thoughts on how we might make supersedes work?

A long term idea involves a new header, which would allow versions of
articles.  In such a case, we could just say, "this is version 1, this is
version 2" etc. and all articles would replace *all* their previous versions
as they arrived.

Alternately no version numbers would be needed.  One could simply use the
date field.  If a message had a "Supersedes" header of *any* sort on it,
the software would parse the message-id (the real one, not the dummy
supersedes argument) and remove the version number prefix.  It would then
cancel/replace all messages with a matching ID but an older date.

In fact, although it doesn't match current software, I think it would be
neat to have this work when a "duplicate" article comes in.  If
<abc@def> comes in, and you already have <abc@def> you check the dates.
If they are the same, or the incoming message is older, throw away the
incoming message.   If the incoming message is newer, cancel the old
one and put in the new, but with the same message-id!
-- 
Brad Templeton, Looking Glass Software Ltd.  --  Waterloo, Ontario 519/884-7473

lmb@vicom.com (Larry Blair) (08/30/89)

In article <5200@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes:
=Unfortunately, we got a few complaints.  This is in part due to the fact
=that several sites, including watmath (our main connection) are not
=handling supersedes as yet.   Anybody have any ideas on how many sites
=are running news software that is old enough to not handle supersedes?

Or new enough.  For some reason, Geoff and Henry decided that C News wouldn't
properly handle supercedes.  There is a separate program, called "superkludge"
(in the typical C News sarcastic manner), which tries to clean these up.
-- 
Larry Blair   ames!vsi1!lmb   lmb@vicom.com

Makey@LOGICON.ARPA (Jeff Makey) (08/30/89)

In article <5200@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes:
>Admittedly, nobody on the net has ever made serious use of [the Supersedes]
>header before. 

Have a look at comp.mail.maps.

>One messy solution is to give up on supersedes and just use cancel
>messages -- lots and lots of them.  I could see generating over 100 per
>day, although few would make it past the first few downstream levels.
               ^^^
Few cancel messages, or few "real" articles?  The real articles might
not make it very far, but the cancel messages would be fully
propagated unless you used something like the Supersedes header to
cancel *them*.

>Any other thoughts on how we might make supersedes work?

Allow multiple message ids to be specified in a single Supersedes
line.  Then, each article could supersede *all* (or at least many) of
its predecessors.

                           :: Jeff Makey

Department of Tautological Pleonasms and Superfluous Redundancies Department
    Disclaimer: Logicon doesn't even know we're running news.
    Internet: Makey@LOGICON.ARPA    UUCP: {nosc,ucsd}!logicon.arpa!Makey

tar@ksuvax1.cis.ksu.edu (Tim Ramsey) (08/30/89)

In article <536@logicon.arpa> Makey@LOGICON.ARPA (Jeff Makey) writes:

[ ... ]

>Few cancel messages, or few "real" articles?  The real articles might
>not make it very far, but the cancel messages would be fully
>propagated unless you used something like the Supersedes header to
>cancel *them*.

This shouldn't happen.  The cancel messages shouldn't make it past
the site *beyond* the last site the article made it to.

Wow, that last paragraph was confusing.  Let me quote RFC 1036:

3.1.  Cancel

                     cancel <Message-ID>


    If a message with the given Message-ID is present on the local
    system, the message is cancelled.  This mechanism allows a user to
    cancel a message after the message has been distributed over the
    network.

    If the system is unable to cancel the message as requested, it
    should not forward the cancellation request to its neighbor systems.

Just some unsolicited nit-picking,

Tim
--
 - VAX it to me at -              Dept. of Computing and Information Sciences
BITNET:   tar@KSUVAX1                       Kansas State University
Internet: tar@ksuvax1.cis.ksu.edu             Manhattan, KS 66506
UUCP:  ...!{rutgers,texbell}!ksuvax1!tar        (913) 532-6350

pst@anise.acc.com (Paul Traina) (08/31/89)

tar@ksuvax1.cis.ksu.edu (Tim Ramsey) writes:
>Wow, that last paragraph was confusing.  Let me quote RFC 1036:
>    If the system is unable to cancel the message as requested, it
>    should not forward the cancellation request to its neighbor systems.

However, I believe Henry & Geoff decided that this was a bad idea, so they
will pass on cancel message.  (Right guys?)
-- 
A program should follow the 'Law of Least Astonishment.'  What is this law?
It is simply that the program should always respond to the user in the way
that astonishes him or her the least.
		-- from 'The Tao of Programming'

nagel@ics.uci.edu (Mark Nagel) (08/31/89)

lmb@vicom.com (Larry Blair) writes:

>Or new enough.  For some reason, Geoff and Henry decided that C News wouldn't
>properly handle supercedes.  There is a separate program, called "superkludge"
>(in the typical C News sarcastic manner), which tries to clean these up.

Speaking of gratuitous "wholely user invisible/only sysadmin
visible" changes made in C news, has anyone out there modified C
news to properly handle the Lines header?  I know about the kludge
you can do in the inews script, but, unfortunately, that's not the
only place news may enter the system, at least here.  I believe it
belongs in relaynews, or possibly newsspool.  Clearly, Henry and
Geoff have decided to ignore the fact that this is a de facto
standard even if it doesn't appear in the cherished RFC, so if
anyone out there has an unofficial patch and or ideas on where this
function should go in the system, please let me know.  Or post it --
I believe most people on the net would like their lines headers back
again.
--
Mark Nagel
UC Irvine, Department of Information and Computer Science
ARPA: nagel@ics.uci.edu         UUCP: ucbvax!ucivax!nagel

coolidge@brutus.cs.uiuc.edu (John Coolidge) (08/31/89)

nagel@ics.uci.edu (Mark Nagel) writes:
>[...] has anyone out there modified C
>news to properly handle the Lines header?  I know about the kludge
>you can do in the inews script, but, unfortunately, that's not the
>only place news may enter the system, at least here.  I believe it
>belongs in relaynews, or possibly newsspool.

I've patched C news here to generate the Lines: header for locally posted
articles, but I can't see any reason to generate it for articles which
aren't created locally. Re-writing incoming articles to create a Lines:
header seems just plain wrong to me. Perhaps it would be a good idea if
you're running some sort of gateway (news-notes, mailing list, etc) but
at that point it should be the gateway program's job, not C news'.

On the other hand, you could have local news that doesn't pass through
C news (how?). If that's the case, I can see a problem --- but I think
the problem is in the path taken. Programs mainly intended for passing on
external news (relaynews, newsspool) still shouldn't do any rewriting not
required (and only Xref (not propagated) and Path are required, I think).

--John

--------------------------------------------------------------------------
John L. Coolidge     Internet:coolidge@cs.uiuc.edu   UUCP:uiucdcs!coolidge
Of course I don't speak for the U of I (or anyone else except myself)
Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed.

nagel@ics.uci.edu (Mark Nagel) (09/01/89)

coolidge@brutus.cs.uiuc.edu (John Coolidge) writes:

>I've patched C news here to generate the Lines: header for locally posted
>articles, but I can't see any reason to generate it for articles which
>aren't created locally. Re-writing incoming articles to create a Lines:
>header seems just plain wrong to me. Perhaps it would be a good idea if
>you're running some sort of gateway (news-notes, mailing list, etc) but
>at that point it should be the gateway program's job, not C news'.

>On the other hand, you could have local news that doesn't pass through
>C news (how?). If that's the case, I can see a problem --- but I think
>the problem is in the path taken. Programs mainly intended for passing on
>external news (relaynews, newsspool) still shouldn't do any rewriting not
>required (and only Xref (not propagated) and Path are required, I think).

Adding the code to inews is not sufficient, because not every site
is going to add the Lines header locally.  Thus you end up with
articles being propagated without Lines headers.  Amazingly enough,
many users actually find these headers useful in determining whether
an article is "read-worthy."  _Any_ piece of information aiding in
article weeding is useful enough that it should be present in all
articles.  The only place to properly ensure this is in the relay
software.  Admittedly, most of the net will remain B news for quite
some time (especially if C news is going to cause unnecessary
problems -- efficiency is great, but it is not _everything_), so the
Lines header will show up most of the time.  However, as (if) more
sites switch to C news, this useful header will eventually be lost
and/or useless.

Progress is great.
--
Mark Nagel
UC Irvine, Department of Information and Computer Science
ARPA: nagel@ics.uci.edu         UUCP: ucbvax!ucivax!nagel

coolidge@brutus.cs.uiuc.edu (John Coolidge) (09/01/89)

nagel@ics.uci.edu (Mark Nagel) writes:
>I write:
>>[...] Re-writing incoming articles to create a Lines:
>>header seems just plain wrong to me. Perhaps it would be a good idea if
>>you're running some sort of gateway (news-notes, mailing list, etc) but
>>at that point it should be the gateway program's job, not C news'.

>>Programs mainly intended for passing on
>>external news (relaynews, newsspool) still shouldn't do any rewriting not
>>required (and only Xref (not propagated) and Path are required, I think).

>Adding the code to inews is not sufficient, because not every site
>is going to add the Lines header locally.  Thus you end up with
>articles being propagated without Lines headers.  Amazingly enough,
>many users actually find these headers useful in determining whether
>an article is "read-worthy."  _Any_ piece of information aiding in
>article weeding is useful enough that it should be present in all
>articles.

Amazingly enough, I'm one of those people who finds Lines: very useful.
That's why I patched C News to do Lines: (right!) as soon as I knew that
it wasn't generating the header.

That being said, I reiterate my opposition to rewriting articles generated
at another site. With the exception of things like Path: (which is necessary
to ensure that things work right) and Xref: (which is really a performance
hack, and one graced by the RFC at that) I don't hold with rewriting at
all. In fact, this is one of the places I think Geoff and Henry should not
have "re-interpreted" the RFC --- Xrefs should not be passed on, IMHO,
simply because ANY local changes (excepting Path:) to an article not
generated locally shouldn't be passed on. If you want to add Lines: locally
but not propagate it, that's a different matter. But articles should be
passed on EXACTLY as received.

The alternative is to have every zealous sysadmin "fixing" broken articles
whenever possible. Bad subject line? No problem, I've got my Subject: fixer
that removes old was's and fixes spelling problems. Since lots more people
use Subject: to decide if an article is "read-worthy", it's obviously a
prime candidate for rewriting. Also, while we're at it, let's fix all the
Organizations: so they indicate the "correct" organization. And add
Reply-To:'s, Keywords:'s, and Summary:'s.

I just plain disagree here. Articles should be passed on exactly as
received. I strongly urge that everyone's machine generate Lines: (and
wish every newsreader would produce References:), and I urge Geoff and
Henry to put a notice in the release notes to C News saying "many people
think Lines: is a good idea. We don't enable it by default, and here's
why: <>. But if you think it's a good idea, turn it on by doing: <>."
I'd probably even support a change to the RFC making Lines: an offically
required header (especially if they'd make References: one too). But I
greatly disagree with rewriting articles on the fly, regardless of how
useful the resulting header is. The ends simply do not justify the means.

--John

--------------------------------------------------------------------------
John L. Coolidge     Internet:coolidge@cs.uiuc.edu   UUCP:uiucdcs!coolidge
Of course I don't speak for the U of I (or anyone else except myself)
Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed.

geoff@utstat.uucp (Geoff Collyer) (09/01/89)

Everyone who reads a specification must interpret it; the hope is that
the specificition is clear and error-free.  Otherwise there will be
different interpretations of it by well-meaning people.

To avoid transmitting Xref:, the batcher would have to exclude Xref:s
when forming batches.  It could be done at some cost in CPU cycles.
I'll ask Henry when he gets back if he wants to do this.
-- 
Geoff Collyer		utzoo!utstat!geoff, geoff@utstat.toronto.edu

henry@utzoo.uucp (Henry Spencer) (09/07/89)

In article <1989Aug30.052459.1166@vicom.com> lmb@vicom.COM (Larry Blair) writes:
>... For some reason, Geoff and Henry decided that C News wouldn't
>properly handle supercedes...

More accurately, we decided (and we still believe this was the correct
decision at the time) that it wasn't worth handling "properly".  Control-
message handling is already a colossal pain, spreading slimy tentacles
everywhere inside relaynews.  Supersedes is even worse, as it's *both* a
regular message *and* a control message.  Until quite recently, the only
real use of Supersedes was in a couple of low-article-count groups that
used it for occasional updates.  And it isn't in the RFCs at all.  So we
considered superkludge an adequate approach.

Unfortunately, Brad seems to have found a real, legitimate, desirable
reason to make heavy use of what was previously a rather marginal feature.
So...

Our current plan is to fix relaynews to cope.  Actually, what's going to
happen is a radical revision of control-message handling, to split it out
into a largely-separate module, invoked (when necessary) at the end of each
batch.  This will implement Supersedes promptly and efficiently while
cleaning up relaynews considerably.  It will also give a noticeable net
performance boost.  Don't expect it right away -- this is going to be a
fair bit of work for Geoff.
-- 
V7 /bin/mail source: 554 lines.|     Henry Spencer at U of Toronto Zoology
1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

henry@utzoo.uucp (Henry Spencer) (09/07/89)

In article <1989Aug30.174430.20687@anise.acc.com> pst@anise.acc.com (Paul Traina) writes:
>>    If the system is unable to cancel the message as requested, it
>>    should not forward the cancellation request to its neighbor systems.
>
>However, I believe Henry & Geoff decided that this was a bad idea, so they
>will pass on cancel message.  (Right guys?)

That is correct.  Actually this wasn't a decision per se, since at the
time we sorted this out, all existing implementations did likewise, and
the RFC change that made it technically illegal came rather later.
More shortly on why we still think forwarding cancellations is the right
thing to do.
-- 
V7 /bin/mail source: 554 lines.|     Henry Spencer at U of Toronto Zoology
1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

henry@utzoo.uucp (Henry Spencer) (09/07/89)

In article <3246@deimos.cis.ksu.edu> tar@ksuvax1.cis.ksu.edu (Tim Ramsey) writes:
> [quoting the RFC]
>    If a message with the given Message-ID is present on the local
>    system, the message is cancelled...
>
>    If the system is unable to cancel the message as requested, it
>    should not forward the cancellation request to its neighbor systems.

If we're in a mood to really study the apocrypha, this passage does not
completely and unambiguously rule out what C News does and B2.10 did
(forwarding cancellations for messages that have not yet arrived).

Clearly, if a cancellation arrives when the message is present, the
cancellation must occur, by the first verse.  Clearly, if the system
is unable to perform a requested cancellation, the cancellation must
not be forwarded.  But what exactly should be done if a cancellation
arrives when the message is not present?  Is this "unable to cancel"?
That's a strange way of putting it.  As I've mentioned in another
posting, the real Usenet is full of non-ideal behavior that makes this
a not-too-surprising event.  The Holy Verses don't even mention leaving
a note saying "cancel this sucker when it gets here".  Leaving such a
note can reasonably be considered at least a tentatively-successful
cancellation, in which case forwarding would seem legal.

As mentioned in my other posting, there are powerful arguments, based
on robustness, for taking this point of view.
-- 
V7 /bin/mail source: 554 lines.|     Henry Spencer at U of Toronto Zoology
1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

rick@uunet.UU.NET (Rick Adams) (09/07/89)

If the article is not present, then you can't cancel it. if you cant
cancel it, then you dont forward the cancel message.

Thats pretty unambiguous to me. Its also the intent of the passage.

You are clearly violating the RFC. There is no quesiton about it.
If you want to ignore that passage, fine. However, don't pretend to
comply with it.

--rick

trent@.uucp (Trent MacDougall) (09/07/89)

From article <66812@uunet.UU.NET>, by rick@uunet.UU.NET (Rick Adams):
> If the article is not present, then you can't cancel it. if you cant
> cancel it, then you dont forward the cancel message.
> 
Being an imperfect world, what about the following situation:

A small machine (A) gets a large feed, but doesn't have the room to keep
the articles around for long, so it expires some groups daily and others
every 3 days and yet others every 5 days.  This small site feeds a larger
machine (B) that keeps the articles around for 2 weeks, and it too feeds
other sites.  So a cancel message comes and fails on A and never gets
forwarded to B who still has the article (and quite possibly some of the
sites B feeds).

I tend to agree with the transmitting of cancel messages to downstream
sites even if the cancel message fails. I'm no news guru, so if I messed
this up, flame me gently :-).
-- 
//_//_//_//_//  Trent MacDougall @ Dalhousie University, CS Dept.
\\_\\_\\_\\_\\  UUCP               {uunet watmath}!dalcs!trent
// // // // //  INTERNET           trent@cs.dal.ca

stevea@laiter.i88.isc.com (Steve Alexander) (09/07/89)

In article <66812@uunet.UU.NET> rick@uunet.UU.NET (Rick Adams) writes:
>If the article is not present, then you can't cancel it. if you cant
>cancel it, then you dont forward the cancel message.
>--rick

Anyone who's been around long enough to remember ``Orphaned Response''
should know that that's a bad way to do things.  Because 3 hours later
when the article arrives, it'll never get cancelled.  Then you'll forward
it to all your downstream sites.  C news may be violating the RFC, but
its approach makes more sense in the real world.

-- 
Steve Alexander, Software Technologies Group    | stevea@i88.isc.com
Interactive Systems Corporation, Naperville, IL | ...!{sun,ico}!laidbak!stevea

henry@utzoo.uucp (Henry Spencer) (09/08/89)

In article <66812@uunet.UU.NET> rick@uunet.UU.NET (Rick Adams) writes:
>If the article is not present, then you can't cancel it. if you cant
>cancel it, then you dont forward the cancel message.
>
>Thats pretty unambiguous to me. Its also the intent of the passage.
>You are clearly violating the RFC. There is no quesiton about it.

Except that what is actually done is a third possibility:  one neither
cancels it nor fails to cancel it, one arranges for cancellation in
future... which may or may not ever happen.  The language in the RFC
simply does not cover that at all.  We are "clearly violating" the RFC
only if one assumes that the RFC *must* have the answer to *any* question,
i.e. the words must be bent and reinterpreted until an answer comes out.
Real documents don't have all the answers.  The real situation is that
the RFC just doesn't address deferred cancellations.

>If you want to ignore that passage, fine. However, don't pretend to
>comply with it.

Given a narrow interpretation of that passage, and at least one assumption
that is not in the RFC, we are ignoring it (for what we consider pretty 
good reason).  Given a broader interpretation without added assumptions,
we think it can fairly be said that we are complying with it.  The RFC
could use revision to clarify this (preferably with input from people
who've dealt with the robustness issues).   Until that is done, we think
the fairest statement is that we may violate one obvious interpretation
of the law, but we do not clearly and unambiguously violate the letter
of the law itself.

(As we've said before, the standard has to be the RFC as written, not what
the authors "really had in mind" or what their own software does.)
-- 
V7 /bin/mail source: 554 lines.|     Henry Spencer at U of Toronto Zoology
1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

jerry@olivey.olivetti.com (Jerry Aguirre) (09/08/89)

In article <1989Sep7.121502.11649@.uucp> trent@.uucp (Trent MacDougall) writes:
>A small machine (A) gets a large feed, but doesn't have the room to keep
>the articles around for long, so it expires some groups daily and others
>every 3 days and yet others every 5 days.  This small site feeds a larger
>machine (B) that keeps the articles around for 2 weeks, and it too feeds
>other sites.  So a cancel message comes and fails on A and never gets
>forwarded to B who still has the article (and quite possibly some of the
>sites B feeds).

Even if site A expires the articles it should still keep them in its
history file for at least 2 weeks.  So when the cancel arrives it should
see the entry in the history file, mark it as canceled, and forward the
message.

Of course this again gets into the fine points of the meaning of having
the cancel succeed.  The article wasn't on the system so it couldn't
actually be removed.  But the history entry was.  The intent
seems clear to me even if the wording of the RFC is a little vague.

Clearly this is an issue of robustness verses effeciency.  Nothing
breaks if the control message is sent round the world;  we just all pay
for a few more bytes.  Just as clearly it will not always catch all the
copies of the target articles even under the kind of failures it is
touted for.  So the trade off is the extra overhead where the cancel
catches up with its target against the extra number of places were the
cancel doesn't catch up.

brad@looking.on.ca (Brad Templeton) (09/08/89)

The only reason to forward on a cancel message for a message that you
haven't gotten yet is if you fear that:

	your downstream sites also have an alternate feed, and

	the alternate feed runs an old B news that ignores cancels
	for messages that have not arrived, and

	the alternate feed gets articles out of order, so that a
	cancel arrives before a message.


I hardly think this is worth it.
-- 
Brad Templeton, Looking Glass Software Ltd.  --  Waterloo, Ontario 519/884-7473

sartin@hplabsz.HPL.HP.COM (Rob Sartin) (09/09/89)

[If you see an earlier version of this article, it's because cancel
doesn't work.  -Rob]

Disclaimer:  it's been a while since I had to administer a news system,
I haven't read the RFC recently.  Despite that, I think I have a valid
argument.

Suppose the following:

1.  Machine A receives article X and batches it for machine B.

2.  Machine A receives cancel for article X, cancels the article and
batches it for machine B.

3.  Machine B (running an old version of B news that doesn't save
cancels), due to the way it processes batches, processes the batch with
the cancel for article X, sees that it hasn't got article X and throws
away the cancel.

4.  Machine B processes the batch with article X and forwards the
article to machines C, D, E, ....

Seems to me this isn't a very robust way to cancel articles.  If you've
watched the way articles propagate, you know that the "due to ..."
clause in 3 is easily met.  It usually happens when a bunch of batches
get queued up and have names whose alphabetical ordering is not the same
as their chronological ordering.  If you've watched the net, you also
know that many sites run old or otherwise unusual software and may not
save a cancel for an article they didn't get.

You don't need dropped articles or alternate paths to spoil the idea
that cancels should only be forwarded if "successful".  If I were
writing news software (which, thankfully, I am not - it would be a
wasted duplication of effort) I would forward cancels.

Rob Sartin			internet: sartin@hplabs.hp.com
Software Technology Lab 	uucp    : hplabs!sartin
Hewlett-Packard			voice	: (415) 857-7592

ambar@bloom-beacon.mit.edu (Jean Marie Diaz) (09/18/89)

   From: brad@looking.on.ca (Brad Templeton)
   Date: 8 Sep 89 06:34:05 GMT

   The only reason to forward on a cancel message for a message that you
   haven't gotten yet is if you fear that:

	   your downstream sites also have an alternate feed, and

	   the alternate feed runs an old B news that ignores cancels
	   for messages that have not arrived, and

	   the alternate feed gets articles out of order, so that a
	   cancel arrives before a message.


   I hardly think this is worth it.

Given that these conditions apply to probably 90% of the NNTP-speaking
machines on the Internet, I think it is very much worth it.


				 AMBAR
ambar@bloom-beacon.mit.edu		   {mit-eddie,uunet}!bloom-beacon!ambar