[gnu.emacs.gnus] GNUS censors articles from Vint Cerf

sra@lcs.mit.edu (Rob Austein) (07/31/89)

[I tried sending this to info-gnus-english via mail, but it bounced.
 Just as well, since I had forgotten to say that this is GNUS 3.12,
 Emacs 18.54.  I'll send a separate note to the ohio-state people
 documenting the bounce.]

Symptom:

The GNUS newsreader doesn't like messages from Vint Cerf.  It gets
confused while trying to retrieve their headers from the NNTP server,
merges the headers with those of the preceeding article, constructs a
bogus article header vector for the preceeding article such that it
looks like the preceeding article is from Vint, and forgets about
Vint's article (ie, attempting to display the article with the bogus
header vector displays the one preceeding Vint's article, and there is
no way to display Vint's article at all).  This is distressing, since,
on those occasions when Vint does post to the net, it's worth reading.

Diagnosis:

There are two bugs here.  One is in the mail composition program Vint
is using.  This is a known problem and the ISI people are working on
fixing it by decomissioning the offending program, which dates from
pre-RFC822 days.  It generates Message-ID: headers in the old format
that has been illegal since RFC822 came out.  Eg:

	Message-ID: <[A.ISI.EDU]29-Jul-89.05:18:51.CERF>

There is also a bug in GNUS, specificly in the function
"nntp-retrieve-headers" (nntp.el).  The code

     ;; Skip invalid field (ex. Subject:abc)
     (if (looking-at "^[^:]*:[^ \t]")
	 (forward-line 1))

makes the invalid assumption that a Message-ID can not have a ":"
character in it (it is legal, by RFC822 anyway, if it's enclosed in
double quote characters).  If there is a ":" character in a
Message-ID, the cited code will skip over it, effectively merging the
headers of the current article with those of the next.

I'm not sure why this code is thought to be necessary at all, since a
delete-non-matching-lines had already been run over the buffer to
flush anything except the desired header lines.  Commenting out the
code fixes the problem without any known bad effects.

In case the description is unclear, here is an example.  If the
contents of the nntp-server-buffer are [Exhibit A], the resulting GNUS
*Subject* buffer will be [Exhibit B].  I've removed all the trailing
<CR> characters from [Exhibit A], for ease of reading, and the NNTP
server seems to dislike lines with just a dot on them, so I've
replaced all occurences of "\n.\n" with "\n{.}\n" in [Exhibit A].

[Exhibit A]
================================================================
221 1159 <447@warlock.UUCP> Article retrieved; head follows.
Path: mintaka!bloom-beacon!tut.cis.ohio-state.edu!cs.utexas.edu!uunet!dowjone!gregb
From: gregb@dowjone.UUCP (Gregory S. Baber)
Newsgroups: comp.dcom.lans,comp.protocols.tcp-ip,comp.unix.questions
Subject: help with WANs
Message-ID: <447@warlock.UUCP>
Date: 28 Jul 89 12:49:34 GMT
Date-Received: 29 Jul 89 06:43:17 GMT
Reply-To: gregb@dowjone.UUCP (Gregory S. Baber)
Distribution: world
Organization: Dow Jones, Inc. Princeton, NJ
Xref: mintaka comp.dcom.lans:534 comp.protocols.tcp-ip:1159 comp.unix.questions:2570 
Lines: 27
{.}
221 1160 <623@dtix.ARPA> Article retrieved; head follows.
Path: mintaka!bloom-beacon!tut.cis.ohio-state.edu!cica!iuvax!uxc.cso.uiuc.edu!tank!mimsy!dtix!curt
From: curt@dtix.ARPA (Curt Welch)
Newsgroups: comp.dcom.lans,comp.protocols.tcp-ip
Subject: "dead ports" on a Bridge LS/1 terminal server
Message-ID: <623@dtix.ARPA>
Date: 28 Jul 89 17:54:43 GMT
Date-Received: 29 Jul 89 11:21:04 GMT
Reply-To: curt@dtix.arpa
Followup-To: comp.dcom.lans
Distribution: usa
Organization: David Taylor Research Center, Bethesda, MD
Xref: mintaka comp.dcom.lans:536 comp.protocols.tcp-ip:1160 
Lines: 26
{.}
221 1161 <[A.ISI.EDU]29-Jul-89.05:18:51.CERF> Article retrieved; head follows.
Path: mintaka!bloom-beacon!tut.cis.ohio-state.edu!ucbvax!A.ISI.EDU!CERF
From: CERF@A.ISI.EDU
Newsgroups: comp.protocols.tcp-ip
Subject: Re: Announcing a little board-room shakeup
Message-ID: <[A.ISI.EDU]29-Jul-89.05:18:51.CERF>
Date: 29 Jul 89 09:18:00 GMT
Date-Received: 29 Jul 89 11:51:28 GMT
References: <Jul.26.17.48.59.1989.12721@hardees.rutgers.edu>
Sender: daemon@ucbvax.BERKELEY.EDU
Distribution: world
Organization: The Internet
Lines: 16
{.}
================================================================

[Exhibit B]
================================================================
  1159: [ 27:gregb@dowjone] help with WANs
  1160: [ 16:CERF@A.ISI.ED] Re: Announcing a little board-room shakeup
================================================================

Fix:

In the immortal words of the DEC Software Dispatch: "Please!"
I would think that just flushing the offending code would work, but it
would be nice to know why it's there before doing so.

--Rob Austein, MIT Lab for Computer Science

sra@lcs.mit.edu (Rob Austein) (07/31/89)

An afterthought.  The (delete-non-matching-lines) in
(nntp-retrieve-headers) is a little risky, since it doesn't take into
account the possibility of RFC822 line continuation within the
headers.  Adding the two lines

      (goto-char (point-min))
      (replace-regexp "[ \t]*\\(\n[ \t]+\\)+" " ")

just after the comment

      ;; First, delete unnecessary lines.

fixes this.

--Rob