[news.software.b] News header compliance...

lamy@ai.toronto.edu (Jean-Francois Lamy) (06/23/88)

The machine that injected the message in the news system does indeed run C
news.  utcsri runs an archaic B news -- the machine is being phased out, and I
have not heard of problems there.  The news posting script is my own, heavily
modified, version of the C news postnews script.  The news machine supports
about 80 news "clients" from at least 5 different subdomains of toronto.edu,
and the simplest solution to acheive centralized posting was to use mail and
use as much of the mail header as possible (at least that way the return
addresses are sane and marginally harder to spoof).  Yes, we know about NNTP.

I don't think the tab is inserted on purpose.  It may have been inserted by a
mailer, could easily be filtered out, and in any case *I* certainly have no
religious feelings about it.

Jean-Francois Lamy               lamy@ai.utoronto.ca, uunet!ai.utoronto.ca!lamy
AI Group, Department of Computer Science, University of Toronto, Canada M5S 1A4

)) (06/23/88)

I've been having our B News (2.11 patch 14) hang occasionally while trying
to parse batched news - goes into some sort of loop when it decides
that an article is garbled.  I'm still trying to fix the hang problem
(the "#!rnews" piping business appears to hang in the write),
but I've noticed that this usually comes from an article from utcsri
(often from one of the upcoming events articles).  

This is part of an article:

>#! rnews 6703
>Newsgroups: ut.na
>Path: tmsoft!utgpu!jarvis.csri.toronto.edu!csri.toronto.edu!krj
>From: krj@csri.toronto.edu (Ken Jackson)
>Subject: NA Digest   Volume 88 : Issue 24
>Message-ID: <8806201610.AA16854@gerrard.csri.toronto.edu>
>Organization: University of Toronto, CSRI
>Distribution: ut
>Date:	Mon, 20 Jun 88 10:50:36 EDT
>
... rest of article deleted ...

The article is considered garbled in this case because B-news was
unable to parse out the date.  Please look at the date shown above - after
the ":" there's a tab instead of a space.  Bnews does the parse by
means of a heavily optimized macro/function combination that effectively
does this:

	if (strncmp(artline, "Date: ", strlen("Date: ")) == 0) ...
				   ^--- space

Obviously, it'll fail and will always be considered garbled by B-news.

I've sent off a copy of this mail item to some of the local SA's
(including Henry Spencer who hasn't replied yet).

B-news doesn't appear to like tabs, but C-news (according to some of
the local C-newsing SA's, eg: Dave Mason at tmsoft, or Rayan at utai) 
does accept tabs and that this is supposedly legal according to the 
USENET RFC's (of which I don't presently have a copy).  However, the 
"standards.mn" document (that comes with 2.11) says:

	A message consists of several header lines, followed by a blank line,
	followed by the body of the message.  The header lines consist of a
	keyword, a colon, a blank, and some additional information.  This is a
	subset of the ARPANET standard, simplified to allow simpler software to
	handle it.

What do we do now?  Should B-news be hacked to accept tabs here?  It's
very easy to do.  However, as more people convert to C-news (What
does B-news 3.0 do?) this will represent a bigger and bigger problem to
people who can't or won't upgrade past their current B-news.

Comments?

Thanks,
-- 
Chris Lewis, Spectrix Microsystems Inc, Phone: (416)-474-1955
UUCP: {uunet!mnetor, utcsri!utzoo, lsuc, yunexus}!spectrix!clewis
Moderator of the Ferret Mailing List (ferret-list,ferret-request@spectrix)

)) (06/24/88)

In article <674@spectrix.UUCP>, clewis@spectrix.UUCP (Chris Lewis (It's loose again!)) writes:
> I've been having our B News (2.11 patch 14) hang occasionally while trying
> to parse batched news - goes into some sort of loop when it decides
> that an article is garbled.  I'm still trying to fix the hang problem
> (the "#!rnews" piping business appears to hang in the write),
> but I've noticed that this usually comes from an article from utcsri
> (often from one of the upcoming events articles).  

[synopsis: when an incoming batch contains an article with a tab instead
of a space after the "Date:" token, rnews will report that the article
is garbled and then sometimes hang.]

I seem to have resolved the hanging problem.

SYNOPSIS: 
	When the "rnews -S" invocation is parsing batches and tossing
	"#! rnews nnn"-prefixed chunks to separate forks of itself, if the
	forked rnews determines that the article is garbled, it will exit
	and the "rnews -S" will hang in the write to the pipe - potentially
	forever.  Regardless of whether the batch is compressed, or how
	many other articles are in the batch.

DISCUSSION:

	The "rnews -S" personality makes a decision about whether to
	create a temporary file or use a pipe to transmit one article
	to a forked copy of itself, by figuring if the article is smaller
	than the buffer (CPBFSZ in ifuncs.c) then pipe it, otherwise
	dump the file to a "/tmp/unb*" temporary, and fork itself
	with either the pipe or the temporary file as standard input.

	If the forked rnews determines that the message is garbled,
	it exits immediately.  HOWEVER, if the article is bigger than
	the in-core kernel pipe buffer size (PIPEMAX in some System V
	implementations - see SVID, or pipe(2)), but smaller than CPBFSZ,
	the write will not return immediately - it has written the
	first pipe-buffer-full, but has to wait for the destination
	to read the buffer so as to send the rest of the write buffer - 
	which never gets read since the child rnews has exitted.

	This only appears to be a problem when both the article and CPBFSZ 
	are bigger than the in-core kernel pipe buffer size and the article
	is smaller than CPBFSZ.  On Xenix 2.1.3 and NCR Tower SVR2 (and 
	many other systems (ie: V7) - we only have documentation
	for the two specific ones) the pipe buffer size is 5120.  And,
	CPBFSZ is 8192 (which I *think* is the BSD pipe buffer size).

REPEAT BY:
	Create a batch file (with "#! rnews" header) which has an 
	article about 6K long and garble the "Date: " header (eg: 
	change the space to a tab).

	Then issue:
		rnews -S < batchfile

	It will say something like:
		inews: : Inbound news is garbled.
	Then hang.

	If it doesn't hang, then congrats, you're probably on a machine
	with a bigger pipe buffer, and you don't need to do anything.
	Otherwise, you might want to do the fix given below.  The fix
	is relatively harmless even if you don't *have* to do it - there'll
	simply be a few more articles using temporary files rather than
	pipes - only a slight performance hit.

FIX:
	in ifuncs.c, find the "#define" for "CPBFSZ" and change it
	to be the same or smaller than your pipe buffer size.  We chose
	4096.
	
	Rebuild, reinstall and viola.

NOTE: This is a fragment of the pipeing code in ifuncs.c:

	/* parent of fork */
	if (rc == asize) {
		/* article fits in buffer */
		wc = write(piped[1], buf, rc);
		if (wc != rc) {
			fprintf(stderr, "write of %d to pipe returned %d",
				rc, wc);
			perror("rnews: write");
			exit(1);
		}
		(void) close(piped[0]);
		(void) close(piped[1]);
	}

We figured that part of the reason why the write didn't terminate was
because the "close(piped[0])" was *after* the write not before - thus
there are two people (both sides of the fork) with the input side of the
pipe open - thus no broken pipe when the child exited.  I tried moving
it (before changing CPBFSZ - which is the size of buf in the write), and 
the test shown above under "REPEAT-BY" would print one "garbled" message, 
a blank line, and ignore the next article in the batch.
-- 
Chris Lewis, Spectrix Microsystems Inc, Phone: (416)-474-1955
UUCP: {uunet!mnetor, utcsri!utzoo, lsuc, yunexus}!spectrix!clewis
Moderator of the Ferret Mailing List (ferret-list,ferret-request@spectrix)

eric@snark.UUCP (Eric S. Raymond) (06/24/88)

In article <674@spectrix.uucp>, Chris Lewis (It's loose again!) writes:
>What do we do now?  Should B-news be hacked to accept tabs here?  It's
>very easy to do.  However, as more people convert to C-news (What
>does B-news 3.0 do?) this will represent a bigger and bigger problem to
>people who can't or won't upgrade past their current B-news.

The Right Thing in these situations is to be liberal about what you accept
and conservative about what you send. Accordingly, B3.0 won't barf on tabs
immediately after the colon, but (for header fields over which it has format
control at header write time) it always generates a space there. I say unto the
C news people: go thou and do likewise.
-- 
      Eric S. Raymond                     (the mad mastermind of TMN-Netnews)
      UUCP: {{uunet,rutgers,ihnp4}!cbmvax,rutgers!vu-vlsi,att}!snark!eric
      Post: 22 South Warren Avenue, Malvern, PA 19355   Phone: (215)-296-5718

)) (06/29/88)

In article <dSsed#3HOKJG=eric@snark.UUCP> eric@snark.UUCP (Eric S. Raymond) writes:
>In article <674@spectrix.uucp>, Chris Lewis (It's loose again!) writes:
>>What do we do now?  Should B-news be hacked to accept tabs here?  It's
>>very easy to do.  However, as more people convert to C-news (What
>>does B-news 3.0 do?) this will represent a bigger and bigger problem to
>>people who can't or won't upgrade past their current B-news.

>The Right Thing in these situations is to be liberal about what you accept
>and conservative about what you send. Accordingly, B3.0 won't barf on tabs
>immediately after the colon, but (for header fields over which it has format
>control at header write time) it always generates a space there. I say unto the
>C news people: go thou and do likewise.

So, I went and did likewise for B news.

For the B news 2.11 people, 2.11 *almost* does likewise.  It has "charmap" -
a character mapping array that is used for comparing strings when you don't
want to worry about case.  To fix the "tab problem":

	In file "funcs.c", look for the initialization of "charmap".
	In the second line of initialization, the second char is '\011'
	(map tab to tab).  Change that to '\040' (map tab to space).

	Rebuild.  Voila.

What this means is that for the purposes of string comparisons, consider
tab to be a space.  This is in line with charmap's normal usage: a case
translation table to be used in header field comparisons (so MESSAGE-ID
is the same as Message-ID etc).  Charmap is used elsewhere, but I don't
think that this will cause other problems.  Further, when 2.11 regenerates
the article to stuff into the spool area (hence of course outgoing batches), 
the thing after the colon is forced to be a space.  Exactly the "Right Thing". 
[Thanks Eric for the phone call and the consult...]

This is only necessary if one of your neighboring feeds is a C-news site,
or someone's generating articles thru some non-standard approach.

So, for those C news sites generating headers with tabs (utcsri, uthub etc)
- well, if your stuff goes thru us, we'll fix it for you so that the rest
of the world can see it.  (For the next couple of days at least...)
-- 
Chris Lewis, 
These addresses will last only til June 30: (clewis@lsuc afterwards)
UUCP: {uunet!mnetor, utcsri!utzoo, lsuc, yunexus}!spectrix!clewis
Moderator of the Ferret Mailing List (ferret-list,ferret-request@spectrix)