rickert@mp.cs.niu.edu (Neil Rickert) (04/26/91)
It seems that when a batch of news is received from afar, cnews rejects any articles with bad dates. However, when a news article is received locally any sort of date gets the article into the local news stream. (Basically inews does not use the '-o' option when calling relaynews). This means that a site can your feed, which is running the latest cnews, can still pump bad news into your system for you to see, without your feed ever getting a message to say something is wrong. The cases I have seen (or more accurately I haven't seen, since relaynews drops them on the floor) are being gatewayed into news from mail at my feed, and probably have strange time zone names that getdate does not comprehend. I presume a similar problem could happen with a moderated group if the moderator took a long vacation, allowed a large backlog to develop, but didn't update the 'Date:' header. Shouldn't anne.jones make sure dates are understood, or even insist that the date means the posting date so just recreate these headers? -- =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= Neil W. Rickert, Computer Science <rickert@cs.niu.edu> Northern Illinois Univ. DeKalb, IL 60115 +1-815-753-6940
geoff@world.std.com (Geoff Collyer) (04/26/91)
Neil Rickert: > However, when a news article is received locally any sort of date gets the >article into the local news stream. No, inews and friends parse any existing Date: header contents and rewrite them in RFC 1036 (as modified by RFC 1123) notation. An unparsable date will cause the article to be bounced. (Dates need not be in 1036 format, merely unambiguous and absolute.) >(Basically inews does not use the '-o' option when calling relaynews). A change that just missed the last patch and should be in the next one is that inews will (finally) invoke newsspool or rnews instead of relaynews, and thus relaynews will get invoked with -o on local postings. -- Geoff Collyer world.std.com!geoff, uunet.uu.net!geoff
per@erix.ericsson.se (Per Hedeland) (05/17/91)
In article <1991Apr25.223301.27280@world.std.com> geoff@world.std.com (Geoff Collyer) writes: >Neil Rickert: >> However, when a news article is received locally any sort of date gets the >>article into the local news stream. > >No, inews and friends parse any existing Date: header contents and >rewrite them in RFC 1036 (as modified by RFC 1123) notation. An >unparsable date will cause the article to be bounced. I don't think this is correct; here's the relevant part of the latest anne.jones: *) # canonicalise given date to RFC 822 timet="` getabsdate \"$artdate\" `" case "$timet" in -*) echo "$0: bad Date: header" >&2 rm -f $tmp $input exit 1 ;; *) set `ctime -u "$timet"` defdate="$1, $3 $2 $5 $4 GMT" ;; esac Getabsdate will not print anything on stdout for an "invalid" date, which causes $timet to be any empty string, and ctime to produce the well-known date "Thu Jan 1 00:00:00 1970" - which is indeed "too old" for most news setups. >>(Basically inews does not use the '-o' option when calling relaynews). > >A change that just missed the last patch and should be in the next one is >that inews will (finally) invoke newsspool or rnews instead of relaynews, >and thus relaynews will get invoked with -o on local postings. I really hope that inews in the future will not reject articles with bad dates, but fix up the date instead (just setting the posting date if the given one is unparsable seems quite reasonable to me) - IMHO, this *is* the place to try to "repair" headers, and it already does insert the missing spaces that are so impossible for relaynews to provide. Btw, regarding inews rejecting articles - as far as I can determine (I tried an article with no Newsgroups: line), inews -W will exit with status 0 even if the posting failed - this causes e.g. NNTP postings to always appear as "successful". --Per Hedeland per@erix.ericsson.se or per%erix.ericsson.se@uunet.uu.net or ...uunet!erix.ericsson.se!per
mark@comp.vuw.ac.nz (Mark Davies) (05/17/91)
In article <1991May16.211311.3544@eua.ericsson.se>, per@erix.ericsson.se (Per Hedeland) writes: |> Btw, regarding inews rejecting articles - as far as I can determine |> (I tried an article with no Newsgroups: line), inews -W will exit |> with |> status 0 even if the posting failed - this causes e.g. NNTP postings |> to |> always appear as "successful". I had this problem under HP-UX 7.0. I think its a bug in HP-UX's /bin/sh. The exit status of the subshell is not being correctly passed to the parent. Since we always invoke inews with the -W option (from nntp) I just made the following patch to inews. *** inews.cnews Wed Apr 17 16:32:11 1991 --- inews Wed Apr 17 21:37:28 1991 *************** *** 8,13 **** --- 8,16 ---- # Yes, it's big, slow and awkward. The alternative is casting a lot of # local policy in C. # TODO: rewrite Date: and Expires: dates + # + # Hacked to effectively always have "-W" on + # so that exit status is reported correctly in hp-ux # =()<. ${NEWSCONFIG-@<NEWSCONFIG>@}>()= . ${NEWSCONFIG-/usr/lib/news/bin/config} *************** *** 372,378 **** ;; esac exit $status # trap 0 may cleanup, make dead.article ! ) & ! eval $waitcmd # wait & get status if -W given trap 0 # let the background run on unmolested exit $status --- 375,381 ---- ;; esac exit $status # trap 0 may cleanup, make dead.article ! ) ! status=$? # wait & get status if -W given trap 0 # let the background run on unmolested exit $status cheers mark
geoff@world.std.com (Geoff Collyer) (05/23/91)
Per Hedeland: > In article ... (Geoff Collyer) writes: > >No, inews and friends parse any existing Date: header contents and > >rewrite them in RFC 1036 (as modified by RFC 1123) notation. An > >unparsable date will cause the article to be bounced. > > I don't think this is correct; ... ... > Getabsdate will not print anything on stdout for an "invalid" date, > which causes $timet to be any empty string, and ctime to produce the > well-known date "Thu Jan 1 00:00:00 1970" - which is indeed "too old" > for most news setups. Whoops, I was confused about error reporting conventions; fixed. > I really hope that inews in the future will not reject articles with bad > dates, but fix up the date instead (just setting the posting date if the > given one is unparsable seems quite reasonable to me) - IMHO, this *is* > the place to try to "repair" headers, and it already does insert the > missing spaces that are so impossible for relaynews to provide. As I said above, inews rewrites the date, if it can understand the date at all, into canonical form. If it can't understand the date using a fairly tolerant parser (getabsdate for Date: and getdate [for now] for Expires:), I don't think substituting the current date is a great idea. -- Geoff Collyer world.std.com!geoff, uunet.uu.net!geoff
rickert@mp.cs.niu.edu (Neil Rickert) (05/23/91)
In article <1991May23.024533.9731@world.std.com> geoff@world.std.com (Geoff Collyer) writes: > >As I said above, inews rewrites the date, if it can understand the date >at all, into canonical form. If it can't understand the date using a >fairly tolerant parser (getabsdate for Date: and getdate [for now] for >Expires:), I don't think substituting the current date is a great >idea. If substituting the current date is a bad idea, what would you think is a better idea? Silently dumping the article, perhaps? Just bouncing the article with an error message back to the user might seem sensible. Except that if there is a user there to bounce it back to, then substituting the current date is absolutely the correct thing to do. Never forget that some users may be posting via an nntp connection which might quite possibly be closed before the date is scanned. The only evidence they see is that the article never appears. This is the same symptom as they would see if the article is mailed to a moderator with a slow response time, so it is pretty hard to see the difference. -- =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= Neil W. Rickert, Computer Science <rickert@cs.niu.edu> Northern Illinois Univ. DeKalb, IL 60115 +1-815-753-6940
flee@cs.psu.edu (Felix Lee) (05/23/91)
> If substituting the current date is a bad idea, what would you think >is a better idea? If a news site rewrites the Date field, it has suddenly created a different article. Rewriting a header is not a trivial action! Any change made to an article creates a different article. Given the size of Usenet, this could happen thousands of times, creating thousands of different versions of the article. Different articles logically require different Message-IDs. If you don't give them different Message-IDs, then I have no assurance that article <xyz> on my site is the same as <xyz> on your site. You might think, we don't need network-wide consistency. Well, I've just created a news gateway that translates all articles to pig-latin while keeping the same Message-ID. Do you mind? You might think, rewriting the Date field is a trivial change that doesn't affect the substance of the article. Well, I've just created another news gateway that always changes the Date to something random. Do you care? I certainly do. I believe in high-fidelity. Rewriting articles is wrong. If you cannot deliver the article intact, don't deliver it at all. Understanding the Date field is a prerequisite for delivering the article, because the Date is used to reject articles that are too old. Here's a hypothetical. Site X rewrites the date into "MMM-DD-YYYY". Site Y rewrites the date into "DD MMM YYYY". X doesn't understand Y's format, and vice-versa. Each site replaces unparsable dates with the current date. If there is a long enough cycle in the network that includes X and Y, then X and Y will shuttle the same articles back and forth forever, continually updating the date. -- Felix Lee flee@cs.psu.edu
rickert@mp.cs.niu.edu (Neil Rickert) (05/23/91)
In article <b8bHel-b@cs.psu.edu> flee@cs.psu.edu (Felix Lee) writes: >> If substituting the current date is a bad idea, what would you think >>is a better idea? > >If a news site rewrites the Date field, it has suddenly created a >different article. Rewriting a header is not a trivial action! Any >change made to an article creates a different article. Given the size >of Usenet, this could happen thousands of times, creating thousands of >different versions of the article. Give us a break. We are not discussing relaynews. We are talking about inews and anne.jones, which are used when the article is INITIALLY submitted. This can not possibly create thousands of versions of the article. Until inews has done its job, there are 0 versions of the article. We are only discussing whether, when inews has finished, there will still be 0 versions, or there will be 1 version. -- =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= Neil W. Rickert, Computer Science <rickert@cs.niu.edu> Northern Illinois Univ. DeKalb, IL 60115 +1-815-753-6940
moraes@cs.toronto.edu (Mark Moraes) (05/24/91)
Unfortunately, inews is not exclusively used for human input. Sites have been known to use it for gatewaying mailing lists, for instance. (Bad idea, given inews performance) Scripts will get used for things the authors did not intend they be used for; given the cost of errors on Usenet (as our usenet mailbox fills up with hundreds of newgroups caused by escaping control messages :-), it's best to be cautious. Note: if you leave the date header out, inews will fill in a valid date for you. And it does rewrite the date if it is understandable. There's no reason for posting agents to stick Date: headers on before handing articles to inews. And if they do insist on crafting their own Date: headers, the authors of said posting agents should have made sure they at least provide unambiguous dates -- getabsdate will accept almost anything. (the main exception being all-numeric dates. Geoff even hacked it to accept VMS dates, despite his religious persuasion!:-) Mark.
rickert@mp.cs.niu.edu (Neil Rickert) (05/24/91)
[In response to suggestions that whenever the Date proves unparseable, inews should insert the date of posting, instead of allowing the article to just be discarded, ..] In article <91May23.151914edt.1030@smoke.cs.toronto.edu> moraes@cs.toronto.edu (Mark Moraes) writes: >Unfortunately, inews is not exclusively used for human input. Sites >have been known to use it for gatewaying mailing lists, for instance. Exactly the point. Most of the bad dates are probably from mailers, often due to unknown time zones. >(Bad idea, given inews performance) Scripts will get used for things >the authors did not intend they be used for; given the cost of errors >on Usenet (as our usenet mailbox fills up with hundreds of newgroups >caused by escaping control messages :-), it's best to be cautious. Well, I am not using inews for this, but I am using the posting date for gatewayed mail if the date header is unparseable. Now since you think I am doing something so very dangerous, could you please explain what it is? I know. You are worried that old news will be recirculated, and somehow get back to inews, flooding the net with stale news. No, that can't be it. After all the stale news has to have valid date headers to have made it past relaynews, so the dates in this case are not unparseable, the old date will be retained, and the article rejected as stale. Ah. I've thought of it. You are a purist. You can't stand the idea that we might give the posting date instead of the date of authorship. But no, that can't be it either, since inews does that all the time if there is no date header. Perhaps you are worried that someone will take some really old news, and maliciously damage the date beyond recognition before resubmitting to inews. But, come to think of it, that makes no sense either. After all it would be even easier for this malicious individual to just delete the date header, in which case inews will use the current date anyway. I give up. I just can't think of what the danger might be. Please enlighten us. -- =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= Neil W. Rickert, Computer Science <rickert@cs.niu.edu> Northern Illinois Univ. DeKalb, IL 60115 +1-815-753-6940
moraes@cs.toronto.edu (Mark Moraes) (05/24/91)
Neil, I did not say that your mailing list gateway using posting dates was dangerous. (our mailer rewrites all dates to conform to RFC1123, else I'd probably have had to do the same for our mail2news gateway) By caution, I meant (ambiguously:-) that *inews* replacing *ambiguous* dates with the current date, or trying to fix up dates that were *ambiguous* is not safe. I claim it isn't safe because I suspect that one day it will fix the date incorrectly and will cause a mess as a result (I can't guess what the mess will be or why it will happen -- it's merely my respect for Murphy). I agree, this is an irrational fear. Modulo bugs, the current inews (well, anne.jones) rewrites unambiguous Dates to the correct syntax without losing information or changing the date by more than a day. And it is very generous with the dates it accepts -- its timezone table may be a bit flaky but it will merely ignore timezones it cannot understand. (All words it cannot understand are assumed to be flaky timezones, if the code hasn't changed since I last saw it) If the date is ambiguous, replacing it with the current date means one is potentially changing information that is used to order the article in newsreaders and reject the article in transport. If the rewrite of the ambiguous date results in a date in the future, it will result in the article being tossed. A parallel thread in this newsgroup indicates how some people react to this :-) Contrary to some assertions in that thread, I've seen the authors of C News take great care to avoid dropping news unless there's no reasonable alternative. C News relaynews does not rewrite headers and has a stricter parser partly because it makes a performance/size difference, and this was deemed to matter, especially around the University of Toronto where some of the news machines are Sun3/[12]80s that are also used as file servers or time-sharing machines. and consequently haven't got a lot of CPU to spare. The C News suite of date routines includes support for date rewriting. One could use this in a mail gateway to fix the dates. Or use the getabsdate proram as inews does if the gateway is a shell script. Another possibility is to modify relaynews to use prsabsdate() and then synthesize a new date header if getindate() fails. (B News emulation, more or less) Given Geoff and Henry's strong belief in not rewriting headers, the last is unlikely to happen in the official C News distribution. Mark. --- "With netnews, *everything* eventually becomes a performance problem!" - Geoff Collyer
geoff@world.std.com (Geoff Collyer) (05/24/91)
rickert@mp.cs.niu.edu (Neil Rickert) writes: >Most of the bad dates are probably from mailers, often due to unknown time zones. Our date parsers, getindate and getabsdate, don't fault a date for an unknown time zone. Alphabetic time zones are so utterly arbitrary and unknown ones are so common that the date parsers just ignore an unknown time zone. I think we can cope with the mailers. Most bad dates seem empirically to be either non-822 dates supplied by news systems where 822 dates are required, or wildly ambiguous crud from people, like 5/6/7 (just imagine the possibilities for confusion in the next decade, taking into account differing conventions for all-numeric dates in different countries). getabsdate can parse pretty much any unambiguous absolute date, including some that getdate can't parse or misunderstands; see libc/getabsdate.3 for details. -- Geoff Collyer world.std.com!geoff, uunet.uu.net!geoff
per@erix.ericsson.se (Per Hedeland) (05/28/91)
In article <1991May24.050525.9055@world.std.com>, geoff@world.std.com (Geoff Collyer) writes: |> I think we can cope with the mailers. |> getabsdate can parse pretty much any unambiguous absolute date, including |> some that getdate can't parse or misunderstands; see libc/getabsdate.3 |> for details. Well, my main problem (like, I gather, Neil Rickert's) is indeed with mail gatewayed to news (via inews - and if this is a "bad idea" please suggest alternatives - given the discussions here, I can't really see relaynews being it). Here are a couple of real-life dates that inews as previously described rewrote to "the epoch" - I certainly don't claim that they are "valid", but even the outdated getdate that comes with C news makes *some* sense of them (getabsdate didn't, obviously), and they did occur in a mailing list with world-wide distribution: Date: 16 May 1991 1547-PDT (Thursday) Date: Wed, 15 May 91 11:28:36+010 I suppose the best way to deal with this is to use a package like Rich $alz' newsgate, which aggressively makes sure that headers are in "standard" form before passing them on to news (looking inside mail2news should give the "headers are sanctified" proponents heartburn:-). The C news contrib stuff, on the other hand, which is often at least mentioned by C news authours in this context, does nothing of the kind - Date: is passed verbatim to inews, with effect as above. --Per Hedeland per@erix.ericsson.se or per%erix.ericsson.se@sunic.sunet.se or ...uunet!erix.ericsson.se!per
geoff@world.std.com (Geoff Collyer) (05/29/91)
Per Hedeland: >Well, my main problem (like, I gather, Neil Rickert's) is indeed with >mail gatewayed to news (via inews - and if this is a "bad idea" please >suggest alternatives - given the discussions here, I can't really see >relaynews being it). Rich Salz's mail2news is supposed to be quite good at this. inews is currently really too slow for this job and isn't really intended for it. >Here are a couple of real-life dates that inews as previously described >rewrote to "the epoch" - I certainly don't claim that they are "valid", >but even the outdated getdate that comes with C news makes *some* sense >of them (getabsdate didn't, obviously), and they did occur in a mailing >list with world-wide distribution: >Date: 16 May 1991 1547-PDT (Thursday) getabsdate knows nothing about RFC 822 comments (or other extraneous tokens) and expects times of day to contain colons. (Note that getdate will usually claim it parsed any given date-like string, but it doesn't always get it right. For example, it can confuse year and time of day(!).) >Date: Wed, 15 May 91 11:28:36+010 Sorry, I don't know what this means, so there's little chance of getabsdate understanding it. Does it mean Date: Wed, 15 May 91 11:28:36 +0100 or Date: Wed, 15 May 91 11:28:36 +0010 or even Date: Wed, 15 May 91 11:28:36 +1000 Time zones need to be separated from other strings in the date. Gluing them on to the end of some other nearby token won't work. Numeric time zones should be four digits long and signed. There's really no need for all the cleverness we see exhibited in Date: headers. RFCs 822 and 1122/1123 describe a fairly simple date format, and for news, you mainly need to just omit 822 comments. -- Geoff Collyer world.std.com!geoff, uunet.uu.net!geoff
george@killer-tomato.cis.ohio-state.edu (George M. Jones) (05/30/91)
geoff@world.std.com (Geoff Collyer) writes:
Rich Salz's mail2news is supposed to be quite good at this. inews is
currently really too slow for this job and isn't really intended for it.
I am looking at installing mail2news on a system running C News, and I noticed
that it uses the B News date parsing routines. Will this cause problems ?
---George Jones
--
OSU Computer & Inf. Science 2036 Neil Ave.,Columbus,Ohio 43210. 614-292-7325
george@cis.ohio-state.edu or ...!osu-cis!george
"To err is human, to moo bovine"
rickert@mp.cs.niu.edu (Neil Rickert) (05/30/91)
In article <1991May30.141804.21755@cis.ohio-state.edu> george@killer-tomato.cis.ohio-state.edu (George M. Jones) writes: >geoff@world.std.com (Geoff Collyer) writes: > > Rich Salz's mail2news is supposed to be quite good at this. inews is > currently really too slow for this job and isn't really intended for it. > >I am looking at installing mail2news on a system running C News, and I noticed >that it uses the B News date parsing routines. Will this cause problems ? I just used the cnews getdate(). It caused no problems. Actually with a few changes to Makefile, you can just pull getdate from libcnews.a -- =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= Neil W. Rickert, Computer Science <rickert@cs.niu.edu> Northern Illinois Univ. DeKalb, IL 60115 +1-815-753-6940
sob@tmc.edu (Stan Barber) (06/01/91)
In article <1991May23.024533.9731@world.std.com> geoff@world.std.com (Geoff Collyer) writes: >As I said above, inews rewrites the date, if it can understand the date >at all, into canonical form. If it can't understand the date using a >fairly tolerant parser (getabsdate for Date: and getdate [for now] for >Expires:), I don't think substituting the current date is a great >idea. I am afraid I disagree with this. If inews is used when posting an article and it can't figure out the date, I can't think of a rational case for not using the current date. Note that I am referring to when an article is first posted, NOT when it is relayed. If there is a rational case for not using the current date under these circumstances, I look forward to reading about it. -- Stan internet: sob@bcm.tmc.edu Director, Networking Olan uucp: rutgers!bcm!sob and Systems Support Barber Opinions expressed are only mine. Baylor College of Medicine
henry@zoo.toronto.edu (Henry Spencer) (06/02/91)
In article <5791@gazette.bcm.tmc.edu> sob@tmc.edu (Stan Barber) writes: >If inews is used when posting an article and it can't figure out the date, >I can't think of a rational case for not using the current date. Note >that I am referring to when an article is first posted, NOT when it is relayed. That is a crucial distinction. What if inews is being used for gatewaying? (Not a good idea, but people *do* it.) It is a terrible mistake to disregard an incomprehensible date on an article that is not being posted for the first time. -- "We're thinking about upgrading from | Henry Spencer @ U of Toronto Zoology SunOS 4.1.1 to SunOS 3.5." | henry@zoo.toronto.edu utzoo!henry
per@erix.ericsson.se (Per Hedeland) (06/03/91)
In article <1991May28.232833.22503@world.std.com>, geoff@world.std.com (Geoff Collyer) writes: |> Per Hedeland: |> >Well, my main problem (like, I gather, Neil Rickert's) is indeed with |> >mail gatewayed to news (via inews - and if this is a "bad idea" please |> >suggest alternatives - given the discussions here, I can't really see |> >relaynews being it). |> |> Rich Salz's mail2news is supposed to be quite good at this. inews is |> currently really too slow for this job and isn't really intended for it. Well, mail2news isn't an *alternative* to inews (the default config is for it to call inews), but perhaps mail2news + relaynews is? Soo, how does mail2news (now the "officially" sanctioned way of gatewaying mail to C news:-) deal with the Date: line in the mails? - It uses getdate! And if even getdate fails, it uses the current date!! (Shudder...:-) [ discussion of the ugly date examples I provided deleted ] As I said, I don't claim they are valid, the point is a) that they were acceptable to all the mailers they passed through, and presumably the authors or their sysadmins have limited interest in the fact that I'm unable to feed their letters into my local newsgroup, and b) that getdate *did* make some sense of them (as has been pointed out before, it doesn't matter if the result is a few hours wrong, that's exactly what happens when e.g. getabsdate ignores unknown timezones). --Per Hedeland per@erix.ericsson.se or per%erix.ericsson.se@sunic.sunet.se or ...uunet!erix.ericsson.se!per
geoff@world.std.com (Geoff Collyer) (06/04/91)
Per Hedeland: >Well, mail2news isn't an *alternative* to inews (the default config is >for it to call inews), but perhaps mail2news + relaynews is? Soo, how >does mail2news (now the "officially" sanctioned way of gatewaying mail >to C news:-) deal with the Date: line in the mails? - It uses getdate! >And if even getdate fails, it uses the current date!! (Shudder...:-) Well, I did say mail2news is *supposed* to be good at gatewaying. I haven't yet examined the code myself. In the mean time, another candidate for mail-to-news gateway has appeared. I'm not going to say anything more about it until I *have* examined it. >As I said, I don't claim they are valid, the point is a) that they were >acceptable to all the mailers they passed through, These dates were probably "acceptable" to the mailers they passed through only because they weren't parsed by those mailers. >... and b) that getdate *did* make some sense of them (as has been pointed >out before, it doesn't matter if the result is a few hours wrong, that's >exactly what happens when e.g. getabsdate ignores unknown timezones). Right, a few hours off wouldn't matter, but getdate can be be off by *centuries*. E.g. "getdate 'may 23 2100'" yields 675046800, which is "Thu May 23 21:00:00 1991" because getdate *knows*, by ghod, that no years after 1999 are valid. That does matter. At the very least, relaynews will toss an article whose date is off by centuries. -- Geoff Collyer world.std.com!geoff, uunet.uu.net!geoff