[news.admin] Suggested Improvements to C News Error Handling

mathew@mantis-consultants.co.uk (mathew) (06/22/91)

                PLEASE HOLD OFF THE KILLFILE FOR A MOMENT.

I get the impression that many people still feel I am being unreasonable,
believe that I am not proposing any solution to the problem, or think that my
solution is some naive mail-based method which would fall over horribly.

I have therefore put together a short document describing exactly what
modifications I feel should be made to C News's handling of errors in news
articles.

Please send all suggested improvements, clarifications and corrections to me
so I can update the file.

Discussions of whether the system I am proposing will work should go to
news.software.b.  (Hopefully now people will be able to debate the technical
merits of what I am actually proposing, rather than just dismissing various
straw-men.)

Please read the document before flaming.  Apologies if bits of it seem
patronizing or excessively obvious, but I'm trying to stamp out as many
misunderstandings as possible in one short article.


IMPROVED ERROR HANDLING FOR C NEWS                           1991-06-21 18:15
==================================                           ================

PROPOSAL
========

Firstly, a newsgroup news.errors should be created for error reports.  Then,
C News should be modified to use the following algorithm for reporting
errors:

1. Parse article date.  If it is syntactically invalid, go to step 5.

2. If parsed date is too far in the past, discard the article silently,
   writing a message in the log file.  Seek to next article and go to step 1.

3. Check remainder of article header.  If it is syntactically invalid, go to
   step 5.

4. Forward the article as normal.  Seek to the next article and go to step 1.

5. Report errors according to the following procedure:

5.1. Parse the Message-ID.  If it is syntactically invalid, discard the
     article silently, writing a message in the log file.  Seek to the next
     article and go to step 1.

5.2. If the parsed Message-ID begins "<error", discard the article silently,
     writing a message in the log file.  Seek to the next article and go to
     step 1.

5.3. Check the Newsgroups line.  If it cannot be parsed go to step 5.5.

5.4. Check the parsed Newsgroups line for the group "news.errors".  If the
     string is found, discard the article silently, writing a message in the
     log file.  Seek to the next article and go to step 1.

5.5. Take the Message-ID of the bad article and replace the opening "<" with
     "<error". Create an error report article with this new Message-ID.

5.6. Write an error report in the error article, and post the article to the
     newsgroup news.errors.

5.7. Discard the bad article.  Write a message in the log file.  Seek to the
     next article.  Go to step 1.


NOTES ON THE ALGORITHM
======================

Step 2 is essential, and ensures that the net continues to be protected from
stale news.  Since parseable but old articles are still discarded silently,
stale news will *not* (in general) cause the net to be flooded with error
reports.

Step 3 can be as lax or as pedantic as the implementors of C News desire.  If
speed is a priority, then C News should probably only check those parts of
the header which it needs in order to process the article.  If purity is
considered more important, then a more pedantic analysis can be performed.
(Perhaps the choice between "fast" and "pedantic" could be an option when
building C News?)

For valid articles, steps 1 to 4 will take no more time than C News currently
requires.  So for valid articles, the new process is just as quick as the
existing C News.

Step 5.1 is essential.  If the article does not have a Message-ID, then we
cannot ensure that duplicate error reports will be deleted by the news
transport mechanisms.  In view of this, perhaps "Bad message-ID" errors
should be placed in a separate log file and drawn to the attention of the
system administrator, since human intervention will be required to deal with
reporting them to the source site.

Steps 5.2 and 5.4 are to help ensure that error reports cannot themselves get
mangled, causing more error reports which get mangled, causing more... These
steps should help prevent the error reporting mechanism from going haywire.

Step 5.5 could be varied, but *MUST* be standardized across all copies of C
News.  The precise method used to produce the new Message-ID should probably
be mandated by an RFC.

Step 5.6 should probably also be standardized.  The format of the error
report should ideally be machine-readable as well as human-readable.  The
error report should probably include selected fields from the header of the
bad article -- the Path, Message-ID, From field, Subject and Newsgroups lines
-- and an indication of what the error was that C News detected.

For example:

    Path: ukc!slxsys!ibmpcug!news
    Subject: Article <1991Jun17.185953.26845@zoo.toronto.edu> discarded
    Date: 17 Jun 91 21:30:54 GMT
    Newsgroups: news.errors
    From: usenet@ibmpcug.co.uk
    Message-ID: <error1991Jun17.185953.26845@zoo.toronto.edu>

    | Path: mwowm!mantis!ibmpcug!slxsys!ukc!mcsun!zoo!henry
    | From: henry@zoo.toronto.edu (Henry Spencer)
    | Newsgroups: news.software.b
    | Subject: Re: IMPORTANT: Users of Rodney's UUCP modules / GUS
    | Message-ID: <1991Jun17.185953.26845@zoo.toronto.edu>
    | Date: 17 Jun 91 18:59:53 GMT

    The above article was discarded by ibmpcug.co.uk for the following
    reason:

    Header contained non-header line.


During step 5.7, the news software may optionally emit a self-satisfied
chuckle.


SOME LIKELY OBJECTIONS ANSWERED
===============================

1. "It's the posting software's fault if a bad article gets out.  Why should
    we have to deal with it?"

The fewer pieces of buggy software there are on the net, the better.  People
will not fix their software unless they know that it is faulty.  Therefore,
when their faulty software generates bad articles, it is important that this
fact is reported to them.  C News currently relies on the competence and
goodwill of every single C News administrator; this has been shown to be an
insufficiently reliable method of reporting errors.


2. "All this error reporting will waste my processor time."

Those who defend C News's current practice of simply dropping articles on the
floor with no attempt to send an error report, assure us that only a tiny
insignificant fraction of the articles posted to the net are lost in this
way.

If this statement is true, then only an insignificant number of articles will
trigger the error-report mechanism. The rest of the time, C News will be just
as fast as before.


3. "All this error reporting will waste net bandwidth."

Because all the error reports caused by a bad article have the same
Message-ID, the normal news transport mechanisms will ensure that each site
passes on exactly one copy of each error message to each site it feeds. The
worst possible case is that a site obtains one copy of an error report from
each site which feeds it.

The error reports themselves are shorter than the bad articles.  Therefore
the bandwidth consumed is significantly less than would be consumed if the
bad articles were allowed to pass.

It isn't less than would be consumed if the bad articles were silently
dropped as at present, but many people view that as an unacceptable solution
to the problem.  If we're allowed the luxury of unacceptable solutions, the
best way to prevent bad articles is to unplug your modem...


4. "What about when old news is barfed onto the net?"

It is detected and deleted in step 2, just like at present.


5. "What if the barf of old news has all been made syntactically invalid?"

This is a more subtle problem.  See "REFINEMENTS" below.


6. "What about the bad articles you still end up dropping silently?"

No error-reporting system is perfect.  However, this system is better than
the existing system.  It drops articles only when reporting the error would
put the net at risk.


7. "Why should anyone want to carry news.errors?"

The newsgroup can also be used for reporting problems detected by humans --
such as the recent Fidonet news barf.


8. "If every site on the net mails you an error report, your mailbox will be
flooded!"

You really haven't been paying attention, have you?


REFINEMENTS TO THE SYSTEM
=========================

REFINEMENT #1: Add mail-based error reporting.
----------------------------------------------

"But if every site on the net mails you..."  "Sshhh.  Listen a minute..."

Once news.errors has been set up and error reports are being generated, one
or two sites can volunteer to act as mail-based error reporting sites.  At
the end of each week, they can examine the articles in news.errors, and mail
one copy of the appropriate error reports to each site mentioned as the
source of a bad article.

For example, if in a given week site luser.foovax.com produced 15 bad
articles, usenet@luser.foovax.com would be sent a mail message containing a
summary of the 15 error reports, and suggesting that he look in news.errors
for further information.


REFINEMENT #2: Add local mail-based error reporting.
----------------------------------------------------

Sites not short of CPU time can examine incoming error reports destined for
the news.errors newsgroup.  The news software can check to see if the site
mentioned in each report is itself.  If so, it can send its system
administrator a warning note.


REFINEMENT #3: Add a history file for errors.
---------------------------------------------

Steps 5.5 to 5.7 of the original algorithm are modified as follows:

5.5. Write the Message-ID of the bad article to the error history file, along
     with any header fields from the bad article which will be required for
     the error report.

5.6. Discard the bad article. Write a message in the log file. Seek to the
     next article. Go to step 1.

5.7. At the end of each week:

5.7.1. Go through the error history file.  For each site mentioned, take the
       alphanumerically first Message-ID out of all those logged, and use it
       to produce a new Message-ID by replacing the first "<" with "<error".

5.7.2. Write an article with the new message-ID to news.errors, containing a
       summary of all the errors recorded as a result of that site.

This modification helps to reduce the number of error reports when there is a
large barf of old *corrupted* news.


REFINEMENT #4: Control the number of error reports.
---------------------------------------------------

For those really paranoid about net bandwidth, this is the ultimate
solution...

Keep a record of how many articles have passed through news.errors in the
past seven days.  Once that number reaches a threshold value, automatically
stop propogating news.errors until the number drops below the threshold
again.

This refinement gives 100% safety in the rare event that there is a major
barf of corrupted news.


THE OFFER
=========

My only concern throughout this debate (or 'flame war', if you wish) has been
to make the net more reliable.

I hereby make the following offers:

I offer to help co-ordinate the re-writing of this document into a proper
specification or RFC, if that is deemed appropriate.

I offer to hold the vote for news.errors, if it is felt that the group needs
to go through the normal voting process.  (The alternative is to mandate the
existence of the group in an RFC.)

I offer to spend my spare time writing code to implement the above
suggestions, so long as it doesn't cost me anything.  Unfortunately, I don't
have IP connectivity and transatlantic phone calls are expensive, so this
will probably necessitate my being given access to a UNIX system somewhere in
Cambridge.  Those wishing to take me up on this part of the offer should be
warned that I don't have very much spare time!

I offer to donate all code, documents, or specifications I write or help
write to the net at large or to the authors of C News, or to place them in the
public domain.


FINAL COMMENT
=============

I apologize for any misunderstandings which may have occurred during the
news.software.b flame-fest, and I regret that I didn't write up this proposal
sooner.  We might have saved a lot of bandwidth.

===


mathew