[news.software.nntp] Useful NNTP Patch

chris@wugate.wustl.edu (Chris Myers) (08/31/89)

Here is a set of patches that I made to NNTP 1.5 and BNews 2.11 that helps
reduce the CPU time and I/O for transmitting news via NNTP.  I have been using
these patches on wuarchive.wustl.edu for several days now without any
difficulties.

If you use CNews, discard the patch named 'news-2.11-patch' and whenever I
mention a 'Z' flag below substitute 'n' instead.

The original nntpxmit program would open each message, scan the headers and
extract the message ID before sending the "IHAVE" command.  This actually uses
a fair bit of CPU time and a causes a LOT of I/O.  Since rnews already knows
the message ID it's much more efficient to have rnews write the message ID (and
the message path) into the batch file and just have NNTP read it back out later.

This patch won't do a lot for lightly loaded NNTP servers, but will help a
well-connected site a BUNCH.

What this set of patches does is:

a) Modify rnews/inews to add a 'Z' flag in the sys file (used instead of the
'F' flag).  The 'Z' option causes rnews to create batch file entries of the
form: "<path> <messageID>" rather than just "<path>".

b) Modify nntpxmit to check for a batch file which has both the message path
and message ID in the batch file.  If it finds both, it will not scan the
message for the ID.  If it does NOT find both (i.e. the old batch file format)
it will go ahead and scan for the message ID.  Thus the new nntpxmit is
compatible with the old one and can operate with either type of batch file (or
a batch file with BOTH types of entries for that matter).

Here is a shar file containing two patches.  The first, 'news-2.11-patch',
should be applied to ifuncs.c in ./news-2.11/src and the second,
'nntp-1.5-patch' is applied to nntpxmit.c in ./nntp-1.5/xmit

Chris Myers
Washington University in St. Louis

-------------------------------- CUT HERE -----------------------------------
#!/bin/sh
# to extract, remove the header and type "sh filename"
if `test ! -s ./news-2.11-patch`
then
echo "writing ./news-2.11-patch"
cat > ./news-2.11-patch << '\End\Of\Shar\'
*** ifuncs.c	Tue Aug 29 19:13:27 1989
--- ../ifuncs.c	Tue Aug 29 19:32:37 1989
***************
*** 281,287 ****
  	int useexist = (index(sp->s_flags, 'U') != NULL);
  /* I:	append messageid to file. implies F flag */
  	int appmsgid = maynotify && (index(sp->s_flags, 'I') != NULL);
! 
  	/* allow specification based on size */
  	if ((size_ptr = strpbrk(sp->s_flags, "<>")) != NULL) {
  		struct stat stbuf;
--- 281,288 ----
  	int useexist = (index(sp->s_flags, 'U') != NULL);
  /* I:	append messageid to file. implies F flag */
  	int appmsgid = maynotify && (index(sp->s_flags, 'I') != NULL);
! /* Z:   append name AND messageid to file. implies F flag */
! 	int appboth = (index(sp->s_flags, 'Z') != NULL);
  	/* allow specification based on size */
  	if ((size_ptr = strpbrk(sp->s_flags, "<>")) != NULL) {
  		struct stat stbuf;
***************
*** 362,368 ****
  			sp->s_name, oldid, hh.ident);
  	}
  
! 	if (appfile || appmsgid) {
  		if (firstbufname[0] == '\0') {
  			extern char histline[];
  			localize("junk");
--- 363,369 ----
  			sp->s_name, oldid, hh.ident);
  	}
  
! 	if (appfile || appmsgid || appboth) {
  		if (firstbufname[0] == '\0') {
  			extern char histline[];
  			localize("junk");
***************
*** 392,399 ****
  		ofp = fopen(sp->s_xmit, "a");
  		if (ofp == NULL)
  			xerror("Cannot append to %s", sp->s_xmit);
! 		fprintf(ofp, "%s", appmsgid ? hh.ident :
  			firstbufname);
  #ifdef MULTICAST
  		while (--mc >= 0)
  			fprintf(ofp, " %s", *sysnames++);
--- 393,402 ----
  		ofp = fopen(sp->s_xmit, "a");
  		if (ofp == NULL)
  			xerror("Cannot append to %s", sp->s_xmit);
! 		if (!appboth) fprintf(ofp, "%s", appmsgid ? hh.ident :
  			firstbufname);
+ 		else fprintf(ofp, "%s %s", firstbufname, hh.ident);
+ 
  #ifdef MULTICAST
  		while (--mc >= 0)
  			fprintf(ofp, " %s", *sysnames++);
\End\Of\Shar\
else
  echo "will not over write ./news-2.11-patch"
fi
if [ `wc -c ./news-2.11-patch | awk '{printf $1}'` -ne 1837 ]
then
echo `wc -c ./news-2.11-patch | awk '{print "Got " $1 ", Expected " 1837}'`
fi
if `test ! -s ./nntp-1.5-patch`
then
echo "writing ./nntp-1.5-patch"
cat > ./nntp-1.5-patch << '\End\Of\Shar\'
*** nntpxmit.c.old	Tue Aug 29 19:56:36 1989
--- nntpxmit.c	Tue Aug 29 19:56:40 1989
***************
*** 368,373 ****
--- 368,374 ----
  #else
  	char	*mode = "r";
  #endif	FTRUNCATE
+ 	char	mesgid[255];
  
  	if ((Qfp = fopen(file, mode)) == (FILE *)NULL) {
  		char	buf[BUFSIZ];
***************
*** 408,415 ****
  		*/
  		catchsig(interrupted);
  
! 		while((fp = getfp(Qfp, Article, sizeof(Article))) != (FILE *)NULL) {
! 			if (!sendarticle(host, fp)) {
  				(void) fclose(fp);
  				requeue(Article);
  				Article[0] = '\0';
--- 409,416 ----
  		*/
  		catchsig(interrupted);
  
! 		while((fp = getfp(Qfp, Article, sizeof(Article), mesgid)) != (FILE *)NULL) {
! 			if (!sendarticle(host, fp, mesgid)) {
  				(void) fclose(fp);
  				requeue(Article);
  				Article[0] = '\0';
***************
*** 449,463 ****
  **	Watch all network I/O for errors, return FALSE if
  **		the connection fails and we have to cleanup.
  */
! sendarticle(host, fp)
  char	*host;
  FILE	*fp;
  {
  	register int	code;
  	char	buf[BUFSIZ];
  	char	*e_xfer = "%s xfer: %s";
  
! 	switch(code = ihave(fp)) {
  	case CONT_XFER:
  		/*
  		** They want it. Give it to 'em.
--- 450,465 ----
  **	Watch all network I/O for errors, return FALSE if
  **		the connection fails and we have to cleanup.
  */
! sendarticle(host, fp, mesgid)
  char	*host;
  FILE	*fp;
+ char	*mesgid;
  {
  	register int	code;
  	char	buf[BUFSIZ];
  	char	*e_xfer = "%s xfer: %s";
  
! 	switch(code = ihave(fp, mesgid)) {
  	case CONT_XFER:
  		/*
  		** They want it. Give it to 'em.
***************
*** 753,766 ****
  ** Read the header of a netnews article, snatch the message-id therefrom,
  ** and ask the remote if they have that one already.
  */
! ihave(fp)
  FILE	*fp;
  {
  	register int	code;
  	register char	*id;
  	char	buf[BUFSIZ];
  
! 	if ((id = getmsgid(fp)) == (char *)NULL || *id == '\0') {
  		/*
  		** something botched locally with the article
  		** so we don't send it, but we don't break off
--- 755,770 ----
  ** Read the header of a netnews article, snatch the message-id therefrom,
  ** and ask the remote if they have that one already.
  */
! ihave(fp, mesgid)
  FILE	*fp;
+ char	*mesgid;
+ 
  {
  	register int	code;
  	register char	*id;
  	char	buf[BUFSIZ];
  
! 	if ((strlen(mesgid) == 0) && ((id = getmsgid(fp)) == (char *)NULL || *id == '\0')) {
  		/*
  		** something botched locally with the article
  		** so we don't send it, but we don't break off
***************
*** 771,776 ****
--- 775,782 ----
  		return(ERR_GOTIT);
  	}
  
+         if (strlen(mesgid) > 0) id = mesgid;
+ 
  	if (!msgid_ok(id)) {
  		sprintf(buf, "%s: message-id syntax error: %s", Article, id);
  		log(L_DEBUG, buf);
***************
*** 801,814 ****
  ** Returns a valid FILE pointer or NULL if end of file.
  */
  FILE *
! getfp(fp, filename, fnlen)
  register FILE	*fp;
  char	*filename;
  register int	fnlen;
  {
  	register FILE	*newfp = (FILE *)NULL;
  	register char	*cp;
  	char	*mode = "r";
  
  	while(newfp == (FILE *)NULL) {
  		if (fgets(filename, fnlen, fp) == (char *)NULL)
--- 807,822 ----
  ** Returns a valid FILE pointer or NULL if end of file.
  */
  FILE *
! getfp(fp, filename, fnlen, mesgid)
  register FILE	*fp;
  char	*filename;
  register int	fnlen;
+ char	*mesgid;
  {
  	register FILE	*newfp = (FILE *)NULL;
  	register char	*cp;
  	char	*mode = "r";
+ 	char	buffer[255];
  
  	while(newfp == (FILE *)NULL) {
  		if (fgets(filename, fnlen, fp) == (char *)NULL)
***************
*** 822,827 ****
--- 830,840 ----
  
  		if (filename[0] == '\0')
  			continue;
+ 
+ 		if (index(filename, ' ') != NULL) {
+ 			sscanf(filename, "%s %s", buffer, mesgid);
+ 			strcpy(filename, buffer);
+ 		} else strcpy(mesgid, "");
  
  		if ((newfp = fopen(filename, mode)) == (FILE *)NULL) {
  			/*
\End\Of\Shar\
else
  echo "will not over write ./nntp-1.5-patch"
fi
if [ `wc -c ./nntp-1.5-patch | awk '{printf $1}'` -ne 3781 ]
then
echo `wc -c ./nntp-1.5-patch | awk '{print "Got " $1 ", Expected " 3781}'`
fi
echo "Finished archive 1 of 1"
exit

jerry@olivey.olivetti.com (Jerry Aguirre) (09/01/89)

In article <237@wugate.wustl.edu> chris@wugate.wustl.edu (Chris Myers) writes:
>a) Modify rnews/inews to add a 'Z' flag in the sys file (used instead of the
>'F' flag).  The 'Z' option causes rnews to create batch file entries of the
>form: "<path> <messageID>" rather than just "<path>".

I have also made this change locally and have found it useful.  I used
the "Q" (Queue) flag instead of "Z" but that is, of course, not
important.

A more important difference is in how the file was organized.  I write
it as:

	<messageID> pathname

instead of the reverse.  This makes it much simpler to check for the new
format.  It is only necessary to check if the first character is an "<".
I was a little leary of the other format because there is no guarantee
that an "<" won't be imbedded in a path name.  For example if one did an
index for "<" on a pathname that looked like:

	/usr/spool/newsfile/<890831FE03@foo.bar>

one could confuse the pathname with an ID.  The above is a perfectly
legal path name in Unix and I can immagine a news system that would use
something like that.

I have discussed this with others and they have convinced me that it
would be desirable to accept multiple pathnames and even no pathnames at
all.  (This is primarily for compatability with CNEWS though others
might benifit.)  The optional formats would then be:

		pathname
		pathname pathname ...
		<messageID>
		<messageID> pathname
		<messageID> pathname pathname ...

Handling a message ID by itself shouldn't be too difficult as there is
already code to handle a request for an article ID and find the pathname
from the history file.  I happen to feel this is a high overhead way of
doing it but as it is not difficult to code it should probably be
supported.  The multiple pathnames handle the case where the article
might be expired early in one group but still exist in another.

I suggest that we decide on a standard for NNTP that the different
versions of news can code for.  Making the nntpxmit code flexible now
will prevent future hacking on the code to make it work for "DNews".

				Jerry

coolidge@brutus.cs.uiuc.edu (John Coolidge) (09/01/89)

jerry@olivey.olivetti.com (Jerry Aguirre) writes:
>A more important difference is in how the file was organized.  I write
>it as:

>	<messageID> pathname

>instead of the reverse.  This makes it much simpler to check for the new
>format.  It is only necessary to check if the first character is an "<".
>I was a little leary of the other format because there is no guarantee
>that an "<" won't be imbedded in a path name.  For example if one did an
>index for "<" on a pathname that looked like:

>	/usr/spool/newsfile/<890831FE03@foo.bar>

>one could confuse the pathname with an ID.  The above is a perfectly
>legal path name in Unix and I can immagine a news system that would use
>something like that.

You can't confuse a message ID with a pathname in the original format

pathname <messageID>

because the whitespace separating the pathname and messageID is enough to
make the distinction between the two obvious. Spaces are illegal in both
pathnames and messageID's, so the space is a unique separator.

Another (and, at least in my case, a more immediately important) reason
I like the original system better is that C News comes preequipped with
the ability to generate files of this form. Just replace the 'F' with an
'n' in the sys file entry, install the nntp side of the patch, and you're
saving i/o bandwidth immediately.

--John

--------------------------------------------------------------------------
John L. Coolidge     Internet:coolidge@cs.uiuc.edu   UUCP:uiucdcs!coolidge
Of course I don't speak for the U of I (or anyone else except myself)
Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed.

coolidge@brutus.cs.uiuc.edu (John Coolidge) (09/01/89)

I write:
>Spaces are illegal in both
>pathnames and messageID's, so the space is a unique separator.

Of course this isn't true, as was pointed out to me via e-mail. Pathnames
can indeed have spaces. MessageID's, however, cannot have either spaces or
left angle brackets (other than the required starting bracket), and must close
with a right angle bracket, so as long as the filename is not of the form
/news/path/text <stuff>
(where the < is part of the filename, not a shell command), then the
proposed system implemented by the patch is safe. As long as a news
filename never includes the sequence "space left-angle-bracket"  the
system should still be safe, since space left-angle-bracket would then
be unambiguously the start of a messageid. I doubt that restriction is
likely to bother implementors too much :-)

--John

--------------------------------------------------------------------------
John L. Coolidge     Internet:coolidge@cs.uiuc.edu   UUCP:uiucdcs!coolidge
Of course I don't speak for the U of I (or anyone else except myself)
Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed.

sob@watson.bcm.tmc.edu (Stan Barber) (09/16/89)

I agree that we need to make nntp smarter about various version of news. 
This is why I added the CNEWS stuff to the last patch. I think nntp and
news will remain somewhat intertwined for some time (at least until
a major rewrite of nntp comes along). So, I'd say that if you change your
news sofware, you may need to recompile NNTP to adapt it to what you have 
done. I don't think adding adaptive mechanisms to NNTP is necessary.

STAN
--
Stan           internet: sob@bcm.tmc.edu         Manager, Networking
Olan           uucp: {rutgers,mailrus}!bcm!sob   Information Technology
Barber         Opinions expressed are only mine. Baylor College of Medicine