[news.software.nntp] SPOOLNEWS & nntp

david@ms.uky.edu (David Herron -- One of the vertebrae) (07/28/88)

It's amazing what you'll do when you're bored.  One afternoon I moved the
SPOOLNEWS code from inews.c into nntpd to see what sort of effect it would
have on the system load, it "seems" to help quite a bit.  [it's rather hard
to test you see]

What we had noticed here was that we'd often have 3 or 4 nntpd's running
at a time sending us news.  When nntp receives an article it starts up
an rnews to handle the reception.  While that's nice and clean and elegant,
that rnews in reality executes very little code, which raises the overhead
percentage -- that is, it always costs "x" amount of resources to start up
a process and the whole process execution costs "y" and since very little
is done (when you have SPOOLNEWS defined) to insert the article into the
system "y" is very close to "x".

Moving the SPOOLNEWS stuff into nntpd avoids "x" -- or rather having many
"x"'s going on all at once.  Because eventually, when the "rnews -U" is
run, all those "x"'s will happen anyway but now they'll be happening
serially rather than in parallel.  (parallel would be ok if this were
running over on our Sequent, but it's not over there yet ... instead
we're on a uVaxII with 13 megs of memory.)

The patch includes a context diff for server/spawn.c and also a line to
add to common/conf.h, and is derived from unadulterated nntp v1.5 sources.


As an added bonus you get to see one of the silliest programs I've ever
written.  I'm almost embarassed to post this, especially since I'm going
to be looking for a job later this year :-).  Anyway.  The program is
probably the MOST stupid way of unbatching a batch in existance.  It
looks for "#! rnews" lines and assumes that it's the beginning of a new
article.  It also strips blanks out of otherwise empty lines, strips away
empty lines before headers and otherwise cleans up the batch file.  All
this so that I can have a news feed from an IBM mainframe at Penn State.

The reason that it's being posted is that it also has had the SPOOLNEWS
code put into it.



#! /bin/sh
: This is a shell archive, meaning:
: 1. Remove everything above the '#! /bin/sh' line.
: 2. Save the resulting text in a file.
: 3. Execute the file with /bin/sh '(not csh)' to create the files:
:	'conf.DIFF'
:	'spawn.DIFF'
:	'split.batch.c'
: This archive created: 'Wed Jul 27 16:03:27 1988'
: By:	'David Herron -- One of the vertebrae ()'
export PATH; PATH=/bin:$PATH
echo shar: extracting "'conf.DIFF'" '(61 characters)'
if test -f 'conf.DIFF'
then
	echo shar: will not over-write existing file "'conf.DIFF'"
else
sed 's/^X//'  >'conf.DIFF' <<'SHAR_EOF'
X#define SPOOLNEWS	/* Emulate the SPOOLNEWS code from news */
SHAR_EOF
if test 61 -ne "`wc -c < 'conf.DIFF'`"
then
	echo shar: error transmitting "'conf.DIFF'" '(should have been 61 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'spawn.DIFF'" '(3343 characters)'
if test -f 'spawn.DIFF'
then
	echo shar: will not over-write existing file "'spawn.DIFF'"
else
sed 's/^X//'  >'spawn.DIFF' <<'SHAR_EOF'
X*** spawn.c.orif	Tue May 31 10:39:10 1988
X--- spawn.c	Thu Jul 14 22:02:37 1988
X***************
X*** 3,9 ****
X  #endif
X  
X  #include "common.h"
X! 
X  #include <signal.h>
X  
X  #ifdef XFER_TIMEOUT
X--- 3,10 ----
X  #endif
X  
X  #include "common.h"
X! #include <stdio.h>
X! #include <time.h>
X  #include <signal.h>
X  
X  #ifdef XFER_TIMEOUT
X***************
X*** 11,16 ****
X--- 12,33 ----
X  static int	old_xfer_lines;
X  #endif
X  
X+ 
X+ char *
X+ errmsg(code)
X+ int code;
X+ {
X+ 	extern int sys_nerr;
X+ 	extern char *sys_errlist[];
X+ 	static char ebuf[6+5+1];
X+ 
X+ 	if (code > sys_nerr) {
X+ 		(void) sprintf(ebuf, "Error %d", code);
X+ 		return ebuf;
X+ 	} else
X+ 		return sys_errlist[code];
X+ }
X+ 
X  static char	tempfile[256];
X  
X  /*
X***************
X*** 59,66 ****
X--- 76,88 ----
X  #endif not USG
X  	register FILE	*fp;
X  
X+ #ifdef SPOOLNEWS
X+ 	(void) sprintf(tempfile, "%s/.spXXXXXX", spooldir);
X+ 	(void) mktemp(tempfile);
X+ #else
X  	(void) strcpy(tempfile, "/tmp/rpostXXXXXX");
X  	(void) mktemp(tempfile);
X+ #endif /* SPOOLNEWS */
X  
X  	fp = fopen(tempfile, "w");
X  	if (fp == NULL) {
X***************
X*** 122,127 ****
X--- 144,209 ----
X  	(void) chown(tempfile, uid_poster, gid_poster);
X  #endif
X  
X+ 	/*
X+ 	 * Ok, now we have the article in "tempfile".  We
X+ 	 * should be able to fork off, close fd's 0 to 31 (or
X+ 	 * whatever), open "tempfile" for input, thus making
X+ 	 * it stdin, and then execl the inews.  We think.
X+ 	 */
X+ 
X+ #ifdef SPOOLNEWS
X+ 
X+ 	{
X+ 	register struct tm *tp;
X+ 	time_t t;
X+ #define BUFLEN 512
X+ 	char buf[BUFLEN];
X+ 	extern struct tm *gmtime();
X+ 	int randno = getpid();
X+ 
X+ 	(void) time(&t);
X+ 	tp = gmtime(&t);
X+ retry:
X+ 	/* This file name "has to" be unique  (right?) */
X+ 	(void) sprintf(buf, "%s/.rnews/%02d%02d%02d%02d%02d%x",
X+ 		spooldir,
X+ 		tp->tm_year, tp->tm_mon+1, tp->tm_mday,
X+ 		tp->tm_hour, tp->tm_min, randno);
X+ 
X+ 	if (link(tempfile, buf) < 0) {
X+ 		char dbuf[BUFLEN];
X+ 		if (errno == EEXIST) {
X+ 			randno++;
X+ 			goto retry;
X+ 		}
X+ 		sprintf(dbuf, "%s/.rnews", spooldir);
X+ #define N_UMASK 022 /* from localize.ukma */
X+ 		if (mkdir(dbuf, 0777&~N_UMASK) < 0) {
X+ 			sprintf(errbuf, "%s dospool: Cannot mkdir %s: %s",
X+ 				hostname, dbuf, errmsg(errno));
X+ # ifdef LOG
X+ 			syslog(LOG_ERR, "%s", errbuf);
X+ 			/* xerror("Cannot mkdir %s: %s", dbuf, errmsg(errno)); */
X+ #endif
X+ 			return(-1);
X+ 		}
X+ 		if (link(tempfile, buf) < 0) {
X+ 			sprintf(errbuf,  "%s dospool: Cannot link(%s,%s): %s",
X+ 				hostname, tempfile, buf, errmsg(errno));
X+ # ifdef LOG
X+ 			syslog(LOG_ERR, "%s", errbuf);
X+ 			/* xerror("Cannot link(%s,%s): %s", tempfile, buf,
X+ 			 *	errmsg(errno)); */
X+ #endif
X+ 			return(-1);
X+ 		}
X+ 	}
X+ 	(void) unlink(tempfile);
X+ 	return(1);
X+ 	}
X+ 
X+ #else /* SPOOLNEWS */
X+ 
X  	/* Set up a pipe so we can see errors from rnews */
X  
X  	if (pipe(fds) < 0) {
X***************
X*** 132,144 ****
X  		return (-1);
X  	}
X  
X- 	/*
X- 	 * Ok, now we have the article in "tempfile".  We
X- 	 * should be able to fork off, close fd's 0 to 31 (or
X- 	 * whatever), open "tempfile" for input, thus making
X- 	 * it stdin, and then execl the inews.  We think.
X- 	 */
X- 
X  	pid = vfork();
X  	if (pid == 0) {		/* We're in child */
X  #ifdef POSTER
X--- 214,219 ----
X***************
X*** 225,230 ****
X--- 300,306 ----
X  			
X  		return (exit_status ? -1 : 1);
X  	}
X+ #endif /* SPOOLNEWS */
X  }
X  
X  #ifdef XFER_TIMEOUT
SHAR_EOF
if test 3343 -ne "`wc -c < 'spawn.DIFF'`"
then
	echo shar: error transmitting "'spawn.DIFF'" '(should have been 3343 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'split.batch.c'" '(3105 characters)'
if test -f 'split.batch.c'
then
	echo shar: will not over-write existing file "'split.batch.c'"
else
sed 's/^X//'  >'split.batch.c' <<'SHAR_EOF'
X
X#include <stdio.h>
X#include <time.h>
X#include <sys/types.h>
X
X#define SPOOLNEWS
X
X#ifdef SPOOLNEWS
X
X#define spooldir "/net/spool.news"
X
Xextern struct tm *gmtime();
Xstruct tm *curtime2;
Xtime_t curtime1;
X
Xextern char *mktemp();
Xchar tempfile[256], outfile[256];
Xint tfd;
X
X#endif
X
Xmain(argc, argv)
Xint argc;
Xchar **argv;
X{
X	FILE *outf, *fopen();
X	register int bol, opn, c, boa;
X	int spccount;
X	char tmp[100], *command;
X	int len, randno, getpid();
X
X	/* puts("beginning");
X	 * fflush(stdout);
X	 * puts(argv[0]);
X	 * fflush(stdout);
X	 * puts(argv[1]);
X	 * fflush(stdout); */
X	if (argc != 2) {
X		fprintf(stderr, "Usage: split.batch command <file\n");
X		exit(1);
X	}
X	command = argv[1];
X	randno = getpid();
X	bol = (1==1);
X	boa = (1==1);
X	opn = (1==0);
X	while((c=getchar()) != EOF) {
X		if (bol && c == ' ') {
X			spccount = 1;
X			while ((c=getchar()) == ' ')
X				spccount++;
X			if (c == EOF) {
X				for (; spccount >= 1; spccount--)
X					fputc(' ', outf);
X				goto out;
X			}
X			if (c == '\n' && boa) {
X				bol = (1==1);
X				continue;
X			}
X			if (c != '\n')
X				for (; spccount >= 1; spccount--)
X					fputc(' ', outf);
X			/*
X			 * This falls into the part at the end of the while
X			 * loop.
X			 */
X		}
X		else if (boa && bol && c == '\n')
X			continue;
X		else if (bol && c == '#') {
X			bol = (1==0);
X			c = getchar();
X			if (c != '!')
X				fprintf(outf, "#%c", c);
X			else if ((c=getchar()) != ' ') 
X				fprintf(outf, "#!%c", c);
X			else if ((c=getchar()) != 'r')
X				fprintf(outf, "#! %c", c);
X			else if ((c=getchar()) != 'n')
X				fprintf(outf, "#! r%c", c);
X			else if ((c=getchar()) != 'e')
X				fprintf(outf, "#! rn%c", c);
X			else if ((c=getchar()) != 'w')
X				fprintf(outf, "#! rne%c", c);
X			else if ((c=getchar()) != 's')
X				fprintf(outf, "#! rnew%c", c);
X			else if ((c=getchar()) != ' ')
X				fprintf(outf, "#! rnews%c", c);
X			else {
X				gets(tmp);
X				/* puts(tmp); */
X				if (opn) {
X#ifndef SPOOLNEWS
X					pclose(outf);
X#else
X					fclose(outf);
X#endif
X					opn = (1==0);
X				}
X				/*
X				 * len = atoi(tmp);
X				 * sprintf(tmp, "%s.%d.%d", base, pid++, len);
X				 * printf("%s\n", tmp);
X				 */
X#ifndef SPOOLNEWS
X				outf = popen(command, "w");
X				if (outf == NULL) {
X					fprintf(stderr, 
X				    "Gosh darn! Can't open |%s!\n", command);
X					exit(1);
X				}
X#else
X				do {
X					(void) sprintf(tempfile, "%s/.spXXXXXX", spooldir);
X					(void) mktemp(tempfile);
X				} while ((tfd = creat(tempfile, 0644) < -1));
X				(void) close(tfd);
X				(void) time(&curtime1);
X				curtime2 = gmtime(&curtime1);
Xretry:
X			(void) sprintf(outfile, "%s/.rnews/%02d%02d%02d%02d%02d%x",
X					spooldir,
X					curtime2->tm_year, curtime2->tm_mon+1,
X					curtime2->tm_mday, curtime2->tm_hour,
X					curtime2->tm_min, randno);
X				if (link(tempfile, outfile) < 0) {
X					randno++;
X					goto retry;
X				}
X				unlink(tempfile);
X				outf = fopen(outfile, "w");
X#endif
X				opn = (1==1);
X				boa = (1==1);
X				bol = (1==1);
X				continue;
X			}
X		}
X		boa = (1==0);
X		if (opn)
X			fputc(c, outf);
X		/* putchar(c); */
X		if (c == '\n')
X			bol = (1==1);
X		else
X			bol = (1==0);
X	}
Xout:
X	if (opn) {
X		fflush(outf);
X		fclose(outf);
X	}
X	exit(0);
X}
SHAR_EOF
if test 3105 -ne "`wc -c < 'split.batch.c'`"
then
	echo shar: error transmitting "'split.batch.c'" '(should have been 3105 characters)'
fi
fi # end of overwriting check
:	End of shell archive
exit 0
-- 
<---- David Herron -- The E-Mail guy                         <david@ms.uky.edu>
<---- ska: David le casse\*'      {rutgers,uunet}!ukma!david, david@UKMA.BITNET
<----
<---- Looking forward to a particularly blatant, talkative and period bikini ...

matt@oddjob.UChicago.EDU (Maxwell House Daemon) (07/29/88)

David Herron sez:
> It's amazing what you'll do when you're bored.  One afternoon I moved the
> SPOOLNEWS code from inews.c into nntpd to see what sort of effect it would
> have on the system load, it "seems" to help quite a bit.

This may be a big loser, David.  Until you process the article it
won't be in the history file.  Hence nntpd will continue to accept
more copies of that article until you run rnews -U.  This increases
the network load, which defeats one of the goals of NNTP.  Since you
say you often have multiple incoming NNTP sessions, I think you will
get multiple copies quite often.  Check your log.  I know that oddjob
sometimes get offered the same article from two sources (separated by
over 1000 miles) mere seconds apart.
________________________________________________________
Matt Crawford	     		matt@oddjob.uchicago.edu

flee@blitz (Felix Lee) (07/29/88)

In <10080@e.ms.uky.edu> David Herron writes:
> It's amazing what you'll do when you're bored.  One afternoon I moved the
> SPOOLNEWS code from inews.c into nntpd to see what sort of effect it would
> have on the system load, it "seems" to help quite a bit.

A better way to spend your time is to fix nntpd so that it doesn't spawn
a new inews every time it receives an article.  But B2.11 inews is going
to fork off a new inews for every article anyway.  This would still be a
slight win, especially if your fork() uses copy-on-write.  Even better
would be C news, which doesn't fork at all.  (What about 3.0?)

But nntpd would have to get the size of the article for #!rnews batches.
You could 1) save the article (either in memory or on disk); 2) have the
sender tell you the size; 3) create a batch that uses delimiters instead
of byte counts.  I like the last option best.
--
Felix Lee	*!psuvax1!flee

david@ms.uky.edu (David Herron -- One of the vertebrae) (08/03/88)

In article <14956@oddjob.UChicago.EDU> matt@oddjob.UChicago.EDU (Maxwell House Daemon) writes:
>David Herron sez:
>> It's amazing what you'll do when you're bored.  One afternoon I moved the
>> SPOOLNEWS code from inews.c into nntpd to see what sort of effect it would
>> have on the system load, it "seems" to help quite a bit.
>This may be a big loser, David.

I know.  I knew that when I posted it.

>Until you process the article it
>won't be in the history file.

I know.  I've been running with SPOOLNEWS all along.  That's because I
wanted to be able to control how many "streams" of processes were
unbatching news.  The cost is two fork()/exec() executions per article
to process the article.  The first set is to save away the standard
input into SPOOLDIR/.rnews and the other is to examine it to determine
if it is already present in HISTFILE and if it is not to insert it
into the right place in the news hierarchy.

On a normal system however you can easily get multiple streams of rnews
processes running, if you have SPOOLNEWS undefined.  That will happen
whenever you have >1 UUCP neighbor shipping you news at once.  Or if
you have multiple NNTP neighbors.  With multiple streams of these
processes running our uVaxII gets veery sloooowwww.  And I have a
vested interest in this machine not getting bogged down since it's the
one where I 'live' :-).

>Hence nntpd will continue to accept
>more copies of that article until you run rnews -U.  This increases
>the network load, which defeats one of the goals of NNTP.  

Yes I understand that.  I've switched from running my news scripts
every 15 minutes to every 10 minutes, plus I've staggered the news
transmission scripts with the news reception scripts so that they will
often sidestep each other.  There is also flock stuff going on so that
not more than one "newsdaemon" script is running on a machine at any
one time.  If 10 minutes isn't fast enough a turn around time I could
decrease the granularity to 5 minutes or some such.  It's merely
a matter of editting crontab ...

There is a trade-off between network load and host load.  I have
more host load than I can handle, but with a reshuffling of what
happens when I can handle it.  As I see it the better fix is one
which was discussed on nntp-managers some time ago (but not yet
implemented to my knowledge).  That is to put something into
the nntp_access file which will put limits onto the number of accesses
from certain subsets of the network.  This could be on a per-host,
per-net or anybody else basis.  If I could use that to force only
(say) 2 connections from the outside world, then this machine could
handle the load.  I would be able to turn SPOOLNEWS off in nntp,
and possibly in news in general.

This is yet another version of the local policy decisions versus
global policy decisions debate.  In my case I have a colleague who
was very adamant that I do something about nntp and the load it causes.
This was the easiest fix.

My patch does provide more flexibility in nntp administration ...

It also saves a fork()/exec() pair if you have SPOOLNEWS defined
in the underlying news system.  (the one which writes the article
into SPOOLDIR/.rnews).

>Since you
>say you often have multiple incoming NNTP sessions, I think you will
>get multiple copies quite often.  Check your log.  I know that oddjob
>sometimes get offered the same article from two sources (separated by
>over 1000 miles) mere seconds apart.

Unfortunately something broke the syslog stuff in news here a couple
of months ago and I haven't had time to fix it... sigh.


I've answered in length -- probably greater length than most of us need --
in order to let Matt know that I know what I'm talking about.  Also if 
*is* a flaw in my reasoning then someone can point it out and let me
correct my ways.
-- 
<---- David Herron -- The E-Mail guy                         <david@ms.uky.edu>
<---- ska: David le casse\*'      {rutgers,uunet}!ukma!david, david@UKMA.BITNET
<----
<---- Looking forward to a particularly blatant, talkative and period bikini ...