[news.software.b] warning to all sinners in regard to current C News patches

henry@zoo.toronto.edu (Henry Spencer) (03/24/91)

Okay, all you folks using improvised kludges to gateway mail into news,
and also all of you using bizarre and substandard news posting software:
the jig is up.

The current set of C News patches, the third of which will probably appear
tonight, drastically tightens up legality checking on headers.  A lot of
sloppiness that we once tolerated will now cause articles to be dropped on
the floor.  RFC1036's required headers must be present.  Non-header lines
may not be present in headers; note in particular that there *must* be
white space after the `:' in an RFC1036 header, even if the header is empty!
The message ID must contain `@'.  The contents of the Date header must be
a legal, unambiguous date and must not deviate too far from the format
recommended in RFC1036; in particular, obscenities like "11/12/91" are
outlawed.  The date can't be too far in the future; there is currently a
special one-day dispensation for the benefit of those whose software
gives local time but attaches "GMT" to it, but this is strictly temporary.
And in general, it's No More Mister Nice Guy when it comes to headers.

Time to get out a copy of RFC1036 and fix your software.
-- 
"[Some people] positively *wish* to     | Henry Spencer @ U of Toronto Zoology
believe ill of the modern world."-R.Peto|  henry@zoo.toronto.edu  utzoo!henry

heiko@methan.chemie.fu-berlin.de (Heiko Schlichting) (03/25/91)

henry@zoo.toronto.edu (Henry Spencer) writes:

>The current set of C News patches, the third of which will probably appear
>tonight, drastically tightens up legality checking on headers.  
[...]
>The contents of the Date header must be
>a legal, unambiguous date and must not deviate too far from the format
>recommended in RFC1036; in particular, obscenities like "11/12/91" are
>outlawed.  The date can't be too far in the future; there is currently a
>special one-day dispensation for the benefit of those whose software
>gives local time but attaches "GMT" to it, but this is strictly temporary.

Oh...I'm afraid there will be *much* trouble with worldwide news in the
near future. Is this new "feature" tested in Europe and with a lot of
timezones? Is it testet with news from GMT-12 to GMT+12 and GMT-23 
to T23-1D23 (used in AIX 3.1)?
What about the European timezones "MET", "MEZ", "MESZ"... ?
It is not nice to forget Europe (and most of the other timezones outside
USA) in the RFC and I hope Cnews is still more international.

I hope the news Cnews is beta-tested in Europe...

Bye, Heiko.
-- 
 |~|    Heiko Schlichting                   | Freie Universitaet Berlin 
 / \    heiko@fub.uucp                      | Institut fuer Organische Chemie
/FUB\   heiko@methan.chemie.fu-berlin.de    | Takustrasse 3
`---'   phone +49 30 838-2677; fax ...-5163 | D-1000 Berlin 33  Germany

geoff@world.std.com (Geoff Collyer) (03/25/91)

Henry Spencer:
>>The contents of the Date header must be a legal, unambiguous date and
>>must not deviate too far from the format recommended in RFC1036; in
>>particular, obscenities like "11/12/91" are outlawed.

Heiko Schlichting:
>What about the European timezones "MET", "MEZ", "MESZ"... ?
>It is not nice to forget Europe (and most of the other timezones outside
>USA) in the RFC and I hope Cnews is still more international.

No need for paranoia :-); we haven't forgotten Europe (and it's not nice
to forget the rest of the Western Hemisphere, the same timezones that
apply to the US also apply to Canada, Mexico and South America :-) :-) or
the rest of the world.  We have a large table of world-wide timezones in
our date parsers.  (There are a few minor errors in the current table, but
that is to be expected in timezone tables, alas.)

However, non-numeric timezone names are a botch:  they are ambiguous and
people seem to just invent new abbreviations at will (Q: how many
timezones claim the name "CST"? "EST"?).  The less said about the mess in
the USSR and daylight savings time in general, the better.  Timezones in
Date:  headers should either be "GMT" (*strongly* preferred) or numeric
(+hhmm or -hhmm [e.g. -0500, +0100]), per RFCs 1036 (netnews) and 1123
(hosts requirements, which updates 822 [mail format], which is cited by
1036).  US military timezones are right out.  (While you are at it, make
sure the year in any date is given as four digits, per RFC 1123; 2000 is
only nine years away.)
-- 
Geoff Collyer		world.std.com!geoff, uunet.uu.net!geoff

henry@zoo.toronto.edu (Henry Spencer) (03/25/91)

In article <9ICP05C@methan.chemie.fu-berlin.de> admin@methan.chemie.fu-berlin.de writes:
>>The contents of the Date header must be
>>a legal, unambiguous date and must not deviate too far from the format
>>recommended in RFC1036...
>
>Oh...I'm afraid there will be *much* trouble with worldwide news in the
>near future...

I don't think so.  We've checked the new stuff against existing news; only
a bare handful of legitimate dates fail to pass it.  Virtually everybody is
doing the right thing and supplying dates in RFC1036 format, and even the
timezones usually aren't too messed-up.
-- 
"[Some people] positively *wish* to     | Henry Spencer @ U of Toronto Zoology
believe ill of the modern world."-R.Peto|  henry@zoo.toronto.edu  utzoo!henry

wisner@ims.alaska.edu (Bill Wisner) (03/25/91)

>However, non-numeric timezone names are a botch:  they are ambiguous and
>people seem to just invent new abbreviations at will (Q: how many
>timezones claim the name "CST"? "EST"?).  The less said about the mess in
>the USSR and daylight savings time in general, the better.

I'm rather fond of the way Sun arbitrarily decided that the abbreviation
for "Alaska Standard Time" is AKST in SunOS 4.1.  We use AST.  But then,
every other UNIX I've seen (and older SunOSes) uses YST for Yukon Standard
Time, a time zone designation that no longer exists.

>							     Timezones in
>Date:  headers should either be "GMT" (*strongly* preferred) or numeric
>(+hhmm or -hhmm [e.g. -0500, +0100]), per RFCs 1036 (netnews) and 1123
>(hosts requirements, which updates 822 [mail format], which is cited by
>1036).

Strongly seconded.

Bill Wisner <wisner@ims.alaska.edu> Gryphon Gang Fairbanks AK 99775
bnug, dude
yeah
.

rdc30@nmrdc1.nmrdc.nnmc.navy.mil (LCDR Michael E. Dobson) (03/27/91)

In article <1991Mar24.220537.14059@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>In article <9ICP05C@methan.chemie.fu-berlin.de> admin@methan.chemie.fu-berlin.de writes:
>>>The contents of the Date header must be
>>>a legal, unambiguous date and must not deviate too far from the format
>>>recommended in RFC1036...
>>
>>Oh...I'm afraid there will be *much* trouble with worldwide news in the
>>near future...
>
>I don't think so.  We've checked the new stuff against existing news; only
>a bare handful of legitimate dates fail to pass it.  Virtually everybody is
>doing the right thing and supplying dates in RFC1036 format, and even the
>timezones usually aren't too messed-up.

Well I think I've got a problem with one of my leaf sites then.  The following
date is unparsable according to Cnews patch 24Mar91:

Mon Mar 25 20:40:20 1991 MET +0100

According to RFC1036 2.1.2  Date

this date "is not acceptable because it is not a valid RFC-822 date.  However,
since older software still generates this format, news implementations are
encouraged to accept this format and translate it into an acceptable format."

Am I correct in assuming that C-News will no longer tolerate this "sin"?  If
so I will inform this site to correct his software which is probably a good
idea in any event.
>-- 
>"[Some people] positively *wish* to     | Henry Spencer @ U of Toronto Zoology
>believe ill of the modern world."-R.Peto|  henry@zoo.toronto.edu  utzoo!henry


-- 
Mike Dobson, Sys Admin for      | Internet: rdc30@nmrdc1.nmrdc.nnmc.navy.mil
nmrdc1.nmrdc.nnmc.navy.mil      | UUCP:   ...uunet!mimsy!nmrdc1!rdc30
AT&T 3B2/600G Sys V R 3.2.2     | BITNET:   dobson@usuhsb or nrd0mxd@vmnmdsc
WIN/TCP for 3B2                 | MCI-Mail: 377-2719 or 0003772719@mcimail.com

henry@zoo.toronto.edu (Henry Spencer) (03/28/91)

In article <1991Mar27.165259.28073@nmrdc1.nmrdc.nnmc.navy.mil> rdc30@nmrdc1.nmrdc.nnmc.navy.mil (LCDR Michael E. Dobson) writes:
>I just noticed that C-News itself does not generate RFC822/1036 compliant dates
>because it uses the form:
>
>	Wdy Mon DD HH:MM:SS YYYY
>
>at least that is what's in the Date: field of several articles I just posted.

I think you may need to investigate your copy of C News, because that's not
what it's supposed to be supplying, if it's current.  The current version
will even try to canonicalize a date supplied by the news reader/poster.
No way will you get a date out of it that doesn't at least have a comma
in it, even if your "date" command is producing some totally weird output,
and if "date" is producing normal format, you should get a timezone too.
-- 
"[Some people] positively *wish* to     | Henry Spencer @ U of Toronto Zoology
believe ill of the modern world."-R.Peto|  henry@zoo.toronto.edu  utzoo!henry

henry@zoo.toronto.edu (Henry Spencer) (03/28/91)

In article <1991Mar27.152155.27218@nmrdc1.nmrdc.nnmc.navy.mil> rdc30@nmrdc1.nmrdc.nnmc.navy.mil (LCDR Michael E. Dobson) writes:
>The following date is unparsable according to Cnews patch 24Mar91:
>
>Mon Mar 25 20:40:20 1991 MET +0100

This is not a valid RFC1036 date because it is neither ctime(3) format
(which 1036 encourages you to accept for backward compatibility) nor
RFC 822 format.  You can't specify both an alphabetic timezone and
a numeric timezone; it has to be one or the other.

>"since older software still generates this format, news implementations are
>encouraged to accept this format and translate it into an acceptable format."
>
>Am I correct in assuming that C-News will no longer tolerate this "sin"?

The quoted phrase refers to ctime(3) format, which this date isn't.

However, your conclusion -- that C News will not tolerate this date --
is correct.
-- 
"[Some people] positively *wish* to     | Henry Spencer @ U of Toronto Zoology
believe ill of the modern world."-R.Peto|  henry@zoo.toronto.edu  utzoo!henry

ronald@robobar.co.uk (Ronald S H Khoo) (03/28/91)

henry@zoo.toronto.edu (Henry Spencer) writes:

> admin@methan.chemie.fu-berlin.de writes:
> >>The contents of the Date header must be
> >>a legal, unambiguous date and must not deviate too far from the format
> >>recommended in RFC1036...
> >
> >Oh...I'm afraid there will be *much* trouble with worldwide news in the
> >near future...
> 
> I don't think so.  We've checked the new stuff against existing news; only
> a bare handful of legitimate dates fail to pass it.

Are you sure that your test is valid ?  Heiko's fears are related to
European news.  I suspect that by the time most of it has gone through
the European backbones + UUNET to reach Canada, the Date: field has
probably been rewritten in terms of GMT by 2.11, which is what the
backbones tend to run, no ? 

-- 
Ronald Khoo <ronald@robobar.co.uk> +44 81 991 1142 (O) +44 71 229 7741 (H)

geoff@world.std.com (Geoff Collyer) (03/28/91)

LCDR Michael E. Dobson:
>Well I think I've got a problem with one of my leaf sites then.  The following
>date is unparsable according to Cnews patch 24Mar91:
>
>Mon Mar 25 20:40:20 1991 MET +0100
>
>According to RFC1036 2.1.2  Date
>
>this date "is not acceptable because it is not a valid RFC-822 date.  However,
>since older software still generates this format, news implementations are
>encouraged to accept this format and translate it into an acceptable format."

I think you are misreading RFC 1036.  A more complete quotation is:

2.1.2.  Date

    The "Date" line (formerly "Posted") is the date that the message was
    originally posted to the network.  Its format must be acceptable
    both in RFC-822 and to the getdate(3) routine that is provided with
    the Usenet software.  This date remains unchanged as the message is
    propagated throughout the network.  One format that is acceptable to
    both is:

                      Wdy, DD Mon YY HH:MM:SS TIMEZONE

    Several examples of valid dates appear in the sample message above.
    Note in particular that ctime(3) format:

                          Wdy Mon DD HH:MM:SS YYYY

    is not acceptable because it is not a valid RFC-822 date.  However,
    since older software still generates this format, news
    implementations are encouraged to accept this format and translate
    it into an acceptable format.

    There is no hope of having a complete list of timezones.  Universal
    Time (GMT), the North American timezones (PST, PDT, MST, MDT, CST,
    CDT, EST, EDT) and the +/-hhmm offset specifed in RFC-822 should be
    supported.  It is recommended that times in message headers be
    transmitted in GMT and displayed in the local time zone.

Note that the date format referred to in the phrase you quoted is Unix
ctime format, and note that the sample unparsable date you give is neither
and RFC 822 date nor a ctime.  It is not a ctime because of the presence
of the (redundant) time zone at the end.

>Am I correct in assuming that C-News will no longer tolerate this "sin"?  If
>so I will inform this site to correct his software which is probably a good
>idea in any event.

C inews now accepts a wide range of date formats in Date: headers
(all-numeric dates are *not* acceptable, but then they have always been
ambiguous:  what will 1/2/3 mean in the next century [ten years from
now]?  in Europe? the US? Canada?).  inews will rewrite the contents of
the Date: header into an RFC 822 date in GMT (UTC, UT, whatever).  C
relaynews now expects Date: header contents to be strict RFC 822 dates:
no redundant timezones, no all-numeric dates, no ctimes, no funny
invented-here formats.

(Expires: headers are currently still parsed by getdate.y and can be any
darn thing you like, just beware that what you write may not be what
getdate understands.  We hope to stop using getdate.y eventually.)
-- 
Geoff Collyer		world.std.com!geoff, uunet.uu.net!geoff

peter@taronga.hackercorp.com (Peter da Silva) (03/28/91)

henry@zoo.toronto.edu (Henry Spencer) writes:
> Okay, all you folks using improvised kludges to gateway mail into news,
> and also all of you using bizarre and substandard news posting software:
> the jig is up.

Well, I'm gonna hold off on installing the latest patches until they get
some sort of track record...

> there *must* be white space after the `:' ...
> the Date header must be a legal, unambiguous date ...
> Time to get out a copy of RFC1036 and fix your software.

"Be liberal about what you receive, conservative about what you generate"
-- 
               (peter@taronga.uucp.ferranti.com)
   `-_-'
    'U`

src@scuzzy.in-berlin.de (Heiko Blume) (03/28/91)

henry@zoo.toronto.edu (Henry Spencer) writes:

>In article <1991Mar27.165259.28073@nmrdc1.nmrdc.nnmc.navy.mil> rdc30@nmrdc1.nmrdc.nnmc.navy.mil (LCDR Michael E. Dobson) writes:
>>I just noticed that C-News itself does not generate RFC822/1036 compliant dates
>>because it uses the form:
>>
>>	Wdy Mon DD HH:MM:SS YYYY
>>
>>at least that is what's in the Date: field of several articles I just posted.

>I think you may need to investigate your copy of C News, because that's not
>what it's supposed to be supplying, if it's current.  The current version
>will even try to canonicalize a date supplied by the news reader/poster.
>No way will you get a date out of it that doesn't at least have a comma
>in it, even if your "date" command is producing some totally weird output,
>and if "date" is producing normal format, you should get a timezone too.

possible danger ahead!
for example ISC, which has support for international stuff, enables you
to modify date's output with files in /lib/cftype/. that means here
date may put out stuff like

Dienstag, der 3. Maerz 1991 (13:00:00)

which is very convenient for german users. it sure has a comma, but thats
just for fun. i haven't investigated this further, yet, but i think
i'd mention it.
-- 
   Heiko Blume <-+-> src@scuzzy.in-berlin.de <-+-> (+49 30) 691 88 93 [voice!]
                  public UNIX source archive [HST V.42bis]:
        scuzzy Any ACU,f 38400 6919520 gin:--gin: nuucp sword: nuucp
                     uucp scuzzy!/src/README /your/home

henry@zoo.toronto.edu (Henry Spencer) (03/29/91)

In article <FPP2HQE@taronga.hackercorp.com> peter@taronga.hackercorp.com (Peter da Silva) writes:
>> Time to get out a copy of RFC1036 and fix your software.
>
>"Be liberal about what you receive, conservative about what you generate"

We're still much more liberal than B News in most areas.  But too many
people begged and pleaded with us to tighten up header checking.  We
were being *too* liberal, and articles which did not meet the specs in
various ways were getting through and wreaking havoc.
-- 
"The stories one hears about putting up | Henry Spencer @ U of Toronto Zoology
SunOS 4.1.1 are all true."  -D. Harrison|  henry@zoo.toronto.edu  utzoo!henry

flee@cs.psu.edu (Felix Lee) (03/29/91)

>"Be liberal about what you receive, conservative about what you generate"

The latest C News complies with this dictum by dropping malformed
articles.

B news complies by liberally rewriting all articles it receives, since
each article, whether posted or not, gets run through the same
machinery.  This behavior has caused a number of problems, such as
duplicate articles, both harmless and painful.  Right now, there's a
complaint that some B News site is re-feeding fj.* with damaged
articles (missing ESC characters).
--
Felix Lee	flee@cs.psu.edu

rdc30@nmrdc1.nmrdc.nnmc.navy.mil (LCDR Michael E. Dobson) (03/29/91)

In article <1991Mar27.185851.9247@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>In article <1991Mar27.165259.28073@nmrdc1.nmrdc.nnmc.navy.mil> rdc30@nmrdc1.nmrdc.nnmc.navy.mil (LCDR Michael E. Dobson) writes:
>>I just noticed that C-News itself does not generate RFC822/1036 compliant dates
>>because it uses the form:
>>
>>	Wdy Mon DD HH:MM:SS YYYY
>>
>>at least that is what's in the Date: field of several articles I just posted.
>
>I think you may need to investigate your copy of C News, because that's not
>what it's supposed to be supplying, if it's current.  The current version

I discovered almost immediately after I posted that it was the reader (trn)
displaying the date in ctime(3) format.  Using the verbose command, the
proper date format was there.  I cancelled the article but I guess not
fast enough, sigh.
-- 
Mike Dobson, Sys Admin for      | Internet: rdc30@nmrdc1.nmrdc.nnmc.navy.mil
nmrdc1.nmrdc.nnmc.navy.mil      | UUCP:   ...uunet!mimsy!nmrdc1!rdc30
AT&T 3B2/600G Sys V R 3.2.2     | BITNET:   dobson@usuhsb or nrd0mxd@vmnmdsc
WIN/TCP for 3B2                 | MCI-Mail: 377-2719 or 0003772719@mcimail.com

rdc30@nmrdc1.nmrdc.nnmc.navy.mil (LCDR Michael E. Dobson) (03/29/91)

In article <1991Mar27.190459.9384@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>In article <1991Mar27.152155.27218@nmrdc1.nmrdc.nnmc.navy.mil> rdc30@nmrdc1.nmrdc.nnmc.navy.mil (LCDR Michael E. Dobson) writes:
>>The following date is unparsable according to Cnews patch 24Mar91:
>>
>>Mon Mar 25 20:40:20 1991 MET +0100
>
>This is not a valid RFC1036 date because it is neither ctime(3) format
>(which 1036 encourages you to accept for backward compatibility) nor
>RFC 822 format.  You can't specify both an alphabetic timezone and
>a numeric timezone; it has to be one or the other.
>
>>"since older software still generates this format, news implementations are
>>encouraged to accept this format and translate it into an acceptable format."
>>
>>Am I correct in assuming that C-News will no longer tolerate this "sin"?
>
>The quoted phrase refers to ctime(3) format, which this date isn't.
>
>However, your conclusion -- that C News will not tolerate this date --
>is correct.

Thank you for the info, I've passed it on to the author of the offending
news software, W-News.  He has a real incentive to fix it quick since I
transfer news to and from him and quite a few sites behind him in Europe.
All articles generated from theses sites using W-News are being dropped on
the floor here.

One final question:  Will C-News accept a 4 digit YYYY field since RFC1036
specifies a 2 digit YY field?
-- 
Mike Dobson, Sys Admin for      | Internet: rdc30@nmrdc1.nmrdc.nnmc.navy.mil
nmrdc1.nmrdc.nnmc.navy.mil      | UUCP:   ...uunet!mimsy!nmrdc1!rdc30
AT&T 3B2/600G Sys V R 3.2.2     | BITNET:   dobson@usuhsb or nrd0mxd@vmnmdsc
WIN/TCP for 3B2                 | MCI-Mail: 377-2719 or 0003772719@mcimail.com

henry@zoo.toronto.edu (Henry Spencer) (03/29/91)

In article <1991Mar28.200952.12309@nmrdc1.nmrdc.nnmc.navy.mil> rdc30@nmrdc1.nmrdc.nnmc.navy.mil (LCDR Michael E. Dobson) writes:
>One final question:  Will C-News accept a 4 digit YYYY field since RFC1036
>specifies a 2 digit YY field?

RFC1123 (Host Requirements, part 2) amends the date syntax description in
RFC822 -- to which RFC1036 defers -- to permit 4 digits, and strongly
encourages use of 4.  C news accepts 4 and generates 4.
-- 
"The stories one hears about putting up | Henry Spencer @ U of Toronto Zoology
SunOS 4.1.1 are all true."  -D. Harrison|  henry@zoo.toronto.edu  utzoo!henry

emv@ox.com (Ed Vielmetti) (03/29/91)

In article <1991Mar29.040656.2790@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:

   RFC1123 (Host Requirements, part 2) amends the date syntax description in
   RFC822 -- to which RFC1036 defers -- to permit 4 digits, and strongly
   encourages use of 4.  C news accepts 4 and generates 4.

Some site along the way mangled your 4 digit date back down to two
digits, chopping off the day of the week in the process.

-- 
 Msen	Edward Vielmetti
/|---	moderator, comp.archives
	emv@msen.com

original:
Path: news-server.csri.toronto.edu!utzoo!henry
From: henry@zoo.toronto.edu (Henry Spencer)
Subject: Re: warning to all sinners in regard to current C News patches
Date: Fri, 29 Mar 1991 04:06:56 GMT

at ox.com:
From: henry@zoo.toronto.edu (Henry Spencer)
Newsgroups: news.software.b
Date: 29 Mar 91 04:06:56 GMT
Path: ox.com!caen!uakari.primate.wisc.edu!samsung!rex!wuarchive!usc!rpi!news-server.csri.toronto.edu!utzoo!henry

henry@zoo.toronto.edu (Henry Spencer) (03/29/91)

In article <EMV.91Mar28232650@poe.aa.ox.com> emv@ox.com (Ed Vielmetti) writes:
>Some site along the way mangled your 4 digit date back down to two
>digits, chopping off the day of the week in the process.

It only takes one B News site in the path, alas.
-- 
"The stories one hears about putting up | Henry Spencer @ U of Toronto Zoology
SunOS 4.1.1 are all true."  -D. Harrison|  henry@zoo.toronto.edu  utzoo!henry

rd@pixie.aii.com (Bob Thrush) (03/30/91)

In article <1991Mar24.035259.20738@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>
>among other things "the jig is up."

  I installed the 24Mar91 version and the tightening up has resulted in
rejecting 84 articles in the past almost 2 days.  It would be more
desirable for this version to just alert the local system administrator
about the problems rather than rejecting as well.  A future version
could provide the current tighter behavior after the offenders have had
a chance to mend their ways.

  The author of the rejected article is probably unaware that a problem
exists.  Rejecting the article won't make him/her any more aware unless
they happen to be posting on a newly upgraded C news machine.

  FYI, here's a summary of the rejections by this site:

Count Reason
----- ------
  75  article "header" contains non-header line
   5  no @ in Message-ID
   2  no From: header
   2  older than 30 days


  I investigated the `article "header" contains non-header line' reason
by asking an upstream feed to look at the headers.  Every rejected
article appeared to be missing the space following the colon.

  I would like to have a patch to the 24Mar91 version that maintains
the new tighter header checking but does *not* remove the offending
article.

  (At first glance, it looks like adding some return statements in a
few judicious places to relay/procart.c:reject() would provide the
alert without rejecting the article.  Could it be that simple?)
-- 
Bob Thrush rd@aii.com
Automation Intelligence,Inc.,1200 W. Colonial Drive, Orlando, FL 32804

henry@zoo.toronto.edu (Henry Spencer) (03/31/91)

In article <RD.91Mar29140206@pixie.aii.com> rd@pixie.aii.com (Bob Thrush) writes:
>... It would be more
>desirable for this version to just alert the local system administrator
>about the problems rather than rejecting as well.  A future version
>could provide the current tighter behavior after the offenders have had
>a chance to mend their ways.

There were various possibilities for how to go about this, but it always
seemed to boil down to the hard, cold fact that people seldom fix their
news software until they are forced to.  Alerts just don't help much.

>  The author of the rejected article is probably unaware that a problem
>exists.  Rejecting the article won't make him/her any more aware unless
>they happen to be posting on a newly upgraded C news machine.

An unfortunate problem.  Our feeling, though, is that the problem is
usually not with the article's author, but with the gatewaying software
the article is passing through, or the posting software being used.
It's really hard to come up with a way to get a complaint to the right
place.
-- 
"The stories one hears about putting up | Henry Spencer @ U of Toronto Zoology
SunOS 4.1.1 are all true."  -D. Harrison|  henry@zoo.toronto.edu  utzoo!henry

heiko@methan.chemie.fu-berlin.de (Heiko Schlichting) (03/31/91)

rd@pixie.aii.com (Bob Thrush) writes:

>  I would like to have a patch to the 24Mar91 version that maintains
>the new tighter header checking but does *not* remove the offending
>article.

I'm afraid I need this too or I have to deinstall the newest patches.
In the last 24 hours a huge number of articles are dropped. It looks
like that common software (only in Europe?) creates Reference:-Lines
without a space after the colon. All this articles are dropped.

Is there a simple hack or is it necassary to deinstall the complete
patches?

Thanks, Heiko.
-- 
 |~|    Heiko Schlichting                   | Freie Universitaet Berlin 
 / \    heiko@fub.uucp                      | Institut fuer Organische Chemie
/FUB\   heiko@methan.chemie.fu-berlin.de    | Takustrasse 3
`---'   phone +49 30 838-2677; fax ...-5163 | D-1000 Berlin 33  Germany

brad@looking.on.ca (Brad Templeton) (03/31/91)

In article <UHJPNUU@methan.chemie.fu-berlin.de> admin@methan.chemie.fu-berlin.de writes:
>
>I'm afraid I need this too or I have to deinstall the newest patches.
>In the last 24 hours a huge number of articles are dropped. It looks
>like that common software (only in Europe?) creates Reference:-Lines
>without a space after the colon. All this articles are dropped.
>
>Is there a simple hack or is it necassary to deinstall the complete
>patches?
>

I think the plan is to get the people posting the non-conforming
articles to fix their software.  If you don't do this eventually, the net
will get more and more bitrot.

Actually, the best answer is to set up one central site that detects
bad articles and e-mails back to the originator.   This will get them to
fix things real quick.

It's not unjustified.  USENET can be a mishmash of software, but the one
thing everybody has to agree upon is the file format.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

peter@taronga.hackercorp.com (Peter da Silva) (03/31/91)

rd@pixie.aii.com (Bob Thrush) writes:
>   I investigated the `article "header" contains non-header line' reason
> by asking an upstream feed to look at the headers.  Every rejected
> article appeared to be missing the space following the colon.

Ah, just hack the header-reading code to insert the space after the colon
before the censor sees it.
-- 
               (peter@taronga.uucp.ferranti.com)
   `-_-'
    'U`

rli@buster.stafford.tx.us (Buster Irby) (04/03/91)

brad@looking.on.ca (Brad Templeton) writes:

>Actually, the best answer is to set up one central site that detects
>bad articles and e-mails back to the originator.   This will get them to
>fix things real quick.

If everyone is dropping the out of spec articles, how do you
propose that the offensive articles be passed along far enough
for one central site to get a look at them?  In order for this to
work, everyone would have to continue accepting the out of spec
articles for a period of time while we track down and notify the
source of the bad articles.

I have been running C-News for over a year now and have been very
happy with it up to this point.  However, I believe that this is
a very rash move, and I, for one, am not going to install these
patches until more thought has been given as to how to deal with
this problem.  Simply dropping the offensive articles on the
floor is not an acceptable answer.

henry@zoo.toronto.edu (Henry Spencer) (04/04/91)

In article <1991Apr03.033233.16633@buster.stafford.tx.us> rli@buster.stafford.tx.us writes:
>... everyone would have to continue accepting the out of spec
>articles for a period of time while we track down and notify the
>source of the bad articles.

Unfortunately, it's hard to make this work.  The combination of header
rewriting at B News sites and apathy/inertia/workload of sysadmins would
make it relatively difficult to get results this way.

The intent of the "bad articles" reports in newsdaily's output was to
achieve the same thing, on a more local level and with teeth:  sysadmins
can report to their neighbors that articles are being dropped for bad
headers.

>... Simply dropping the offensive articles on the
>floor is not an acceptable answer.

It's hard to do anything else.  Bouncing them back to the author is not
a viable approach.  Not only does it easily result in the originator
getting hundreds of mail messages, but *the originator is usually an
innocent victim of bad software*.  We judged that having a gateway
machine tell its neighbor "everything you send us is being dropped"
was much more effective, and had a better chance of getting the message
to people who could do something about it.
-- 
"The stories one hears about putting up | Henry Spencer @ U of Toronto Zoology
SunOS 4.1.1 are all true."  -D. Harrison|  henry@zoo.toronto.edu  utzoo!henry

rd@pixie.aii.com (Bob Thrush) (04/05/91)

In article <1991Apr3.172825.27190@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>In article <1991Apr03.033233.16633@buster.stafford.tx.us> rli@buster.stafford.tx.us writes:
>>... everyone would have to continue accepting the out of spec
>>articles for a period of time while we track down and notify the
>>source of the bad articles.
>
>Unfortunately, it's hard to make this work.  The combination of header
>rewriting at B News sites and apathy/inertia/workload of sysadmins would
>make it relatively difficult to get results this way.
>
>The intent of the "bad articles" reports in newsdaily's output was to
>achieve the same thing, on a more local level and with teeth:  sysadmins
>can report to their neighbors that articles are being dropped for bad
>headers.

  Ok.  I've alerted my neighbors for the "bad articles" that they have
generated.

>>... Simply dropping the offensive articles on the
>>floor is not an acceptable answer.
>
>It's hard to do anything else.  Bouncing them back to the author is not
>a viable approach.  Not only does it easily result in the originator
>getting hundreds of mail messages, but *the originator is usually an
>innocent victim of bad software*.  We judged that having a gateway
>machine tell its neighbor "everything you send us is being dropped"
>was much more effective, and had a better chance of getting the message
>to people who could do something about it.

  What should the response be?

  For the record, this site has rejected over 400 articles since
installing the 24Mar91 version of C News on March 28.  6 of those
articles were (as best I can determine) generated by downstream sites
who have been duly notified.

  Maybe it's time to compare notes and alert offending sites.  To that
end I have appended the summary of the first 400 "bad articles" from
174 sites that we have rejected as a result of the tighter header
checking.

  Does anyone have any suggestions about 1) a better way to determine
which sites are generating offending articles, and 2) how to alert
these offending sites.  Or should we just let well enough alone and
hope that the offending sites will become aware of their silent disconnections
and eventually "do the right thing".

------------------------------------------------------------------------

                 Summary of "bad articles"

  The summary does not include a few "older than 30 days" articles.
The sitename was determined by using the string following the '@' in
the Message-ID.  Each entry contains the total no. of offending articles,
sitename (as above), and reason.  For the few sites with multiple reasons,
the results are combined into one entry.

  Use the following legend to translate the terse reason back to the original
C News log reason:

nofrom     = no From: header
nonheader  = article "header" contains non-header line
nodate     = no Date: header
nosubject  = no Subject: header
unparsable = unparsable Date: `some-bad-date-string'

22 bria nonheader 4 nofrom 18
14 goofy.Apple.COM nonheader
12 sequent.UUCP nonheader
11 sequent.cs.qmw.ac.uk nonheader
10 novell.com nodate
10 maraba.tamu.edu nonheader
9 husc6.harvard.edu nonheader
8 umriscc.isc.umr.edu nonheader
8 nntp-server.caltech.edu nonheader
7 jarthur.Claremont.EDU nonheader
7 athena.mit.edu nonheader 5 nosubject 2
6 sirius.ucs.adelaide.edu.au nonheader
5 wsl.dec.com unparsable
5 sun13.scri.fsu.edu nonheader
5 eecs.nwu.edu nonheader 2 unparsable 3
5 dogface nofrom
5 amigash.UUCP unparsable
5 amc-gw.amc.com nonheader
4 rodan.acs.syr.edu nonheader
4 rex.cs.tulane.edu nonheader
4 relay1.UU.NET nofrom
4 puck.mrcu nonheader
4 occrsh.ATT.COM unparsable
4 murdoch.acc.Virginia.EDU nonheader 1 nosubject 3
4 dog.ee.lbl.gov nonheader
4 darkstar.ucsc.edu nonheader
4 cbnewsl.att.com nonheader
4 casbah.acns.nwu.edu nonheader
4 bu.edu.bu.edu nonheader
4 bolero.ati.com nonheader
4 andrew.cmu.edu nonheader
3 wet.UUCP nonheader
3 ut-emx.uucp nonheader
3 ucselx.sdsu.edu nonheader
3 tippy nofrom
3 scorn.sco.COM nonheader
3 lectroid.sw.stratus.com nonheader
3 hydra.gatech.EDU nonheader
3 helios.TAMU.EDU nonheader
3 erg.sri.com nosubject
3 brewing.cts.com nonheader
2 watdragon.waterloo.edu nonheader
2 ux.acs.umn.edu nonheader
2 unix.cis.pitt.edu nonheader
2 ucbvax.BERKELEY.EDU nonheader
2 sunfish.bellcore.com unparsable
2 stewart.UUCP nonheader
2 ssdc.honeywell.com nodate
2 shlump.nac.dec.com nonheader
2 pt.cs.cmu.edu nonheader
2 pasteur.Berkeley.EDU nonheader
2 netcom.COM nonheader
2 munnari.oz.au nonheader
2 milton.u.washington.edu nonheader
2 linac.fnal.gov nonheader
2 lib.tmc.edu nonheader
2 leland.Stanford.EDU nonheader
2 island.COM nonheader
2 hpwala.wal.hp.com nonheader
2 hplabsz.hplabs.hpl.hp.com nonheader
2 fcom.cc.utah.edu nonheader
2 enuxha.eas.asu.edu nonheader
2 crg5.UUCP nonheader
2 cogsci.cog.jhu.edu nofrom
2 cbnewse.att.com nosubject
2 cbnewsd.att.com nonheader
2 cbnewsc.att.com nonheader
2 castle.ed.ac.uk nonheader
2 cadillac.CAD.MCC.COM nonheader
2 brunix.UUCP nonheader
2 archive.BBN.COM nonheader
2 actrix.gen.nz nonheader
1 zardoz.eng.ohio-state.edu nonheader
1 xyzoom.UUCP nonheader
1 vicstoy.UUCP nodate
1 venera.isi.edu nonheader
1 ux1.cso.uiuc.edu nonheader
1 unisoft.UUCP nonheader
1 tymix.Tymnet.COM nonheader
1 ttidca.TTI.COM nonheader
1 trantor.harris-atd.com nonheader
1 tokyo07.UUCP nonheader
1 terminator.cc.umich.edu nonheader
1 techbook.com nonheader
1 tamarack12.timbuk nonheader
1 tahoe.unr.edu nonheader
1 swindj.UUCP nonheader
1 suns9.crosfield.co.uk nonheader
1 suns4.crosfield.co.uk nonheader
1 sun3.crosfield.co.uk nonheader
1 stsci.EDU nonheader
1 stl.stc.co.uk nonheader
1 spdcc.SPDCC.COM nonheader
1 solo.csci.unt.edu nofrom
1 server2.crosfield.co.uk nonheader
1 secola.Columbia.NCR.COM nonheader
1 saturn.uucp nofrom
1 salt.bellcore.com nonheader
1 ryn.mro4.dec.com nonheader
1 ronquil.cs.washington.edu nonheader
1 riscy.enet.dec.com nonheader
1 rins.ryukoku.ac.jp nonheader
1 rice.edu nosubject
1 prometheus.UUCP nonheader
1 powys.x.co.uk nonheader
1 pmafire.inel.gov nonheader
1 platypus.uofs.edu nonheader
1 panix.uucp nosubject
1 orca.wv.tek.com nonheader
1 oracle.com nonheader
1 odin.corp.sgi.com nonheader
1 oasys.dt.navy.mil nonheader
1 noname.edu nonheader
1 newsserver.sfu.ca nonheader
1 ncsu.edu nonheader
1 ncrwat.Waterloo.NCR.COM nonheader
1 nas.nasa.gov nosubject
1 murtoa.cs.mu.oz.au nonheader
1 mswind.UUCP nofrom
1 mnopltd.UUCP nonheader
1 mmc.mmmg.com nonheader
1 minnow.sp.unisys.com nonheader
1 microsoft.UUCP nonheader
1 mcdphx.phx.mcd.mot.com nonheader
1 maverick.ksu.ksu.edu nonheader
1 math.lsa.umich.edu nonheader
1 marlin.jcu.edu.au nonheader
1 manatee.UUCP nofrom
1 mack.uit.no nonheader
1 mace.cc.purdue.edu nonheader
1 lynx.CS.ORST.EDU nonheader
1 linus.mitre.org nonheader
1 levelland.cs.utexas.edu nonheader
1 kodak.kodak.com nonheader
1 jornada.nmsu.edu nonheader
1 jhunix.HCF.JHU.EDU nonheader
1 jethro.Corp.Sun.COM nonheader
1 jabbahybrid nofrom
1 iphase.UUCP nonheader
1 infonode.ingr.com nonheader
1 idunno.Princeton.EDU nonheader
1 groucho nonheader
1 giaea.gi.oz nonheader
1 flash.bellcore.com nosubject
1 europa.asd.contel.com nonheader
1 encore.Encore.COM nonheader
1 duke.cs.duke.edu nonheader
1 dsd.es.com nonheader
1 cybaswan.UUCP nonheader
1 csc.canberra.edu.au nonheader
1 cs.utk.edu nonheader
1 crpmks.UUCP nonheader
1 chpc.utexas.edu nonheader
1 chorus.fr nonheader
1 cgl.ucsf.EDU nonheader
1 ccncsu.ColoState.EDU nonheader
1 cci632.UUCP nonheader
1 cbmvax.commodore.com nonheader
1 bwdls61.bnr.ca nonheader
1 bunny.GTE.COM nonheader
1 borg.cs.unc.edu nonheader
1 bnlux1.bnl.gov nonheader
1 atari.UUCP nonheader
1 atacama.cs.utexas.edu nonheader
1 argosy.UUCP nonheader
1 aquarium.buffalo.ny.us nonheader
1 apple.com nonheader
1 apollo.HP.COM nonheader
1 amdahl.uts.amdahl.com nonheader
1 agora.rain.com nonheader
1 agate.berkeley.edu nonheader
1 aber-cs.UUCP nonheader
1 Think.COM nosubject
1 Shiva.COM nonheader
-- 
Bob Thrush rd@aii.com
Automation Intelligence,Inc.,1200 W. Colonial Drive, Orlando, FL 32804

rd@pixie.aii.com (Bob Thrush) (04/05/91)

In article <RD.91Apr4164531@pixie.aii.com> rd@pixie.aii.com (Bob Thrush) writes:
>
>                 Summary of "bad articles"
>
>  The summary does not include a few "older than 30 days" articles.
>The sitename was determined by using the string following the '@' in
>the Message-ID.  Each entry contains the total no. of offending articles,
>sitename (as above), and reason.  For the few sites with multiple reasons,
>the results are combined into one entry.
>
>  [ summary deleted ]

  I forgot to mention that my summary ignored another category of
header errors, ie. no @ in Message-ID, due to relying on the @
character to determine the site.

  Here is the associated list of 26 Message-ID's that contained no @
character (and were omitted from the forementioned summary):

002843.....
032991.170348WDBURNS%MTUS5.BITNET
032991.170514WDBURNS%MTUS5.BITNET
1991Apr2.124114
69280007Bhpcupt1.cup.hp.com
9103281920.kremvax.red.square.cccp
CBM.91.08.22
CBM.91.08.23
CBM.91.08.24
CBM.91.08.25
CBM.91.08.26
CBM.91.08.27
CBM.91.08.28
CBM.91.08.29
CBM.91.08.30
CBM.91.08.31
CBM.91.08.32
CBM.91.08.33
CBM.91.08.34
CBM.91.08.35
CBM.91.08.36
CBM.91.08.37
CBM.91.08.38
my.favorite.Message-ID
number.of.the.beast
smelly.animal.ick.yuck
-- 
Bob Thrush rd@aii.com
Automation Intelligence,Inc.,1200 W. Colonial Drive, Orlando, FL 32804

stealth@caen.engin.umich.edu (Mike Pelletier) (04/06/91)

I noticed that most of the articles rejected were because of "nonheader"
errrors, where the header contained a non-header line.  What qualifies
as this?  Is the X- prefix supported, as in X-Zippy-Says: that some folks
are wont to put in their header line?  What sort of lines qualify
as non-header?

--
Mike Pelletier                     |
The University of Michigan's       |           [this section intentionally]
Computer Aided Engineering Network |           [         left blank       ]
  Usenet, UUCP, IRC and mail admin |

henry@zoo.toronto.edu (Henry Spencer) (04/06/91)

In article <1991Apr5.175447.27096@engin.umich.edu> stealth@caen.engin.umich.edu (Mike Pelletier) writes:
>I noticed that most of the articles rejected were because of "nonheader"
>errrors, where the header contained a non-header line.  What qualifies
>as this?  Is the X- prefix supported, as in X-Zippy-Says: that some folks
>are wont to put in their header line?  What sort of lines qualify
>as non-header?

The "X-" is just part of the header name, and is a "prefix" only by
convention.  C News neither knows nor cares about it.  The usual type
of "non-header line" is a line (preceding the header-ending empty line)
that does not start with white space (which would make it a continuation of
a previous header) and does not conform to the proper header syntax
(header name, colon, white space, header body).
-- 
"The stories one hears about putting up | Henry Spencer @ U of Toronto Zoology
SunOS 4.1.1 are all true."  -D. Harrison|  henry@zoo.toronto.edu  utzoo!henry

geoff@world.std.com (Geoff Collyer) (04/06/91)

Mike Pelletier:
>I noticed that most of the articles rejected were because of "nonheader"
>errrors, where the header contained a non-header line.  What qualifies
>as this?  Is the X- prefix supported, as in X-Zippy-Says: that some folks
>are wont to put in their header line?  What sort of lines qualify
>as non-header?

RFC 822 (which RFC 1036 cites) defines a message header as consisting of
header lines, followed by a blank line.  In particular, the message
header isn't over until the blank line (two newlines in a row) is seen.
Header lines consist of a keyword, a colon, and some text, and may be
continued by line(s), each of which begins with a space or tab.  RFC 1036
further requires a space (not a tab, not a newline) after the colon.

So a non-header line in the message header is anything that doesn't fit
the above restrictions.  X-Foo: is okay.  The usual problems are lack of
space after the colon (e.g. "Subject:no time to waste") or random junk:

Subject: a long,
rambling subject line
-- 
Geoff Collyer		world.std.com!geoff, uunet.uu.net!geoff