[net.news] Wanted: news batching with compression

msc@qubix.UUCP (Mark Callow) (03/22/84)

The uucp overhead for transferring news is really not a problem as long
as you have Mark Horton's speedup fix in your uucp packet driver.  If you
still feel there is a problem there are at least 3 different batching
schemes that reduce the uucp overhead a little more.

What really kills our 4.2BSD system and several others is the forking
of rnews once per article.  For this reason we have news flow
restricted to late night and early morning though mail is received
anytime.

None of the batching schemes that I'm aware of address this problem at
all.  I urge anyone who is devoting time to news batching schemes to
attack this area rather than uucp.  If I had time I would look into it
myself.
-- 
From the Tardis of Mark Callow
msc@qubix.UUCP,  decwrl!qubix!msc@Berkeley.ARPA
...{decvax,ucbvax,ihnp4}!decwrl!qubix!msc, ...{ittvax,amd70}!qubix!msc

dmmartindale@watcgl.UUCP (Dave Martindale) (03/22/84)

If forking rnews once per article is your worst problem with news, you
aren't feeding many other machines.  We found that most of the overhead
was in uux, queueing articles for other machines.  Once we started
batching news, the overhead went down enormously.

msc@qubix.UUCP (Mark Callow) (03/23/84)

>	If forking rnews once per article is your worst problem with news, you
>	aren't feeding many other machines.  We found that most of the overhead
>	was in uux, queueing articles for other machines.  Once we started
>	batching news, the overhead went down enormously.
True, we don't feed many sites.  On the other hand we also don't queue articles
for transmission, we queue requests.  I use the U flag in the third field of
the lines in my sys file.  The U flag says don't copy the article to the uucp
spool queue, rather spool a C. file requesting transmission of the article
from its permanent storage in the news morgue.  You need the -c option in your
uux to do this.  That change is distributed with the news software.
-- 
From the Tardis of Mark Callow
msc@qubix.UUCP,  decwrl!qubix!msc@Berkeley.ARPA
...{decvax,ucbvax,ihnp4}!decwrl!qubix!msc, ...{ittvax,amd70}!qubix!msc

phil@amd70.UUCP (Phil Ngai) (03/23/84)

> From: dmmartindale@watcgl.UUCP
> Subject: Re: Wanted: news batching with compression
> Message-ID: <2294@watcgl.UUCP>
> 
> If forking rnews once per article is your worst problem with news, you
> aren't feeding many other machines.  We found that most of the overhead
> was in uux, queueing articles for other machines.  Once we started
> batching news, the overhead went down enormously.

We feed news to decwrl, fortune, dual, onyx, sco, cae780. I think that's
a reasonable number of machines. We couldn't do it without batching. With
batching, it seems the rnews forking is now the place where the most CPU
time is being spent.
-- 
Phil Ngai (408) 988-7777 {ucbvax,decwrl,ihnp4,allegra,intelca}!amd70!phil

mjl@ritcv.UUCP (Mike Lutz) (03/24/84)

Once you HAVE batching, however, the rnews forking problems can be
significant if you normally have a moderate to heavy load.  Has
anyone looked into the possibility of having rnews do unbatching
directly rather than forcing it to be exec'ed repeatedly from
the batch processing program?

Mike Lutz
-- 
Mike Lutz	Rochester Institute of Technology, Rochester NY
UUCP:		{allegra,seismo}!rochester!ritcv!mjl
ARPA:		ritcv!mjl@Rochester.ARPA

mp@mit-eddie.UUCP (Mark Plotnick) (03/24/84)

Here's a way to get news compression done.  Don't expect it
to be elegant, it only took half an hour.

There are two new filters, pnews to compact the news and unews
to uncompact it.  First, the compacting:

pnews for Berkeley systems:
#!/bin/sh
# pnews for berkeley systems
PATH=$PATH:/usr/local/lib/news
export PATH
TMP=/tmp/pn$$
rm -f $TMP >/dev/null 2>&1
compact > $TMP 2>/dev/null
echo "#! unews "`filesize $TMP`
cat $TMP
rm -f $TMP >/dev/null 2>&1

For USG systems, this may work (I don't have access to a USG system
that runs netnews, so this is untested)

# pnews for USG
PATH=$PATH:/usr/local/lib/news
export PATH
TMP=/tmp/pn$$
rm -f $TMP $TMP.z >/dev/null 2>&1
cat <&0 >$TMP
pack -f $TMP >/dev/null 2>&1
echo "#! unews "`filesize $TMP.z`
cat $TMP.z
rm -f $TMP $TMP.z >/dev/null 2>&1

The command line in your hourly file should look something like:
batch /usr/spool/news/batch/whuxle | pnews | uux - -n -r whuxle\!rnews

Oh yes, you need filesize.c:
#include <sys/types.h>
#include <sys/stat.h>

main(argc, argv)
char **argv;
{
	struct stat sbuf;

	if (stat(argv[1], &sbuf) == -1)
		printf("0\n");
	else
		printf("%ld\n", sbuf.st_size);
}

OK, now on the receiving end, you need a slightly more
flexible unbatch.c:

/*
 * unbatchnews: extract news in batched format and process it one article
 * at a time.  The format looks like
 *	#! rnews 1234
 *	article containing 1234 characters
 *	#! rnews 4321
 *	article containing 4321 characters
 */

# include <stdio.h>
# include "defs.h"	/* to get RNEWS definition */
static char *sccsid = "@(#)unbatch.c	1.3	4/23/83";

char buf[512];
char *command;
#define UNEWS "/usr/local/lib/news/unews"

main(ac, av)
char **av;
{
	register int c;
	register FILE *pfn;
	register long size;
	char *filename;
	int pid, wpid, exstat;
	char *mktemp();
	long atol();

	/* first, close any extraneous files that uucp may have left open */
	for(c=3; c<20;c++) close(c);
	filename = mktemp("/tmp/unbnewsXXXXXX");
	while(fgets(buf, sizeof buf, stdin) != NULL) {
		while (strncmp(buf, "#! rnews ", 9) && strncmp(buf, "#! unews ", 9)) {
			fprintf(stderr, "out of sync, skipping %s\n", buf);
			if (fgets(buf, sizeof buf, stdin) == NULL)
				exit(0);
		}
		command = (strncmp(buf, "#! unews ", 9)==0) ? UNEWS : RNEWS;
		if (buf[strlen(buf)-1]=='\n')
			buf[strlen(buf)-1]='\0';	/* no use aggravating atol */
		size = atol(buf+9);
		if(size <= 0)
			break;
		pfn = fopen(filename, "w");
		while(--size >= 0 && (c = getc(stdin)) != EOF)
			putc(c, pfn);
		fclose(pfn);

		/*
		 * If we got a truncated batch, don't process the
		 * last article; it will probably be received again.
		 */
		if (size > 0)
			break;

		/*
		 * rnews < filename
		 */
		while ((pid = fork()) == -1) {
			fprintf(stderr, "fork failed, waiting...\r\n");
			sleep(60);
		}
		if (pid == 0) {
			close(0);
			open(filename, 0);
			execlp(command, command, 0);
			perror(command);
			exit(1);
		}
		while ((wpid = wait(&exstat)) >= 0 && wpid != pid)
			;
	}
	unlink(filename);
}

This new unbatch allows both "rnews" and "unews" markers.
I thought of just allowing arbitrary shell commands after the
"#!", but this would open up too many security holes.

unews on ucb looks like:

#!/bin/sh
# unews for berkeley systems
PATH=$PATH:/usr/local/lib/news
export PATH
uncompact| rnews


On USG, one possibility for unews is:

# unews for USG
PATH=$PATH:/usr/local/lib/news
export PATH
ZTMP=/tmp/un$$.z
rm -f $ZTMP
cat <&0 >$ZTMP
pcat $ZTMP | rnews
rm -f $ZTMP

But note that since this isn't a kernelly-executable file, you'd
probably have to replace the execlp in unbatch.c with a call
to system() or "sh -c".

	Mark Plotnick

P.S. If you exchange news with 2 or more systems, a MUCH better way
of reducing your uucp time would be to implement an ihave/sendme
protocol.

dmmartindale@watcgl.UUCP (Dave Martindale) (03/24/84)

My comments about rnews overhead being insignificant compared to uux
were directed at sites which don't use batching, since the original article
seemed to imply that this was the case for them.  Our experience has been
that using the 'U' option of the sys file does save a bit on overhead
compared to 'B' since the news spool file isn't copied, and saves a lot
on spool filesystem space since you don't have all those duplicate
copies of the news sitting around until they are sent to the destination
machine.  But doing one uux per article is still horribly inefficient
since uux accesses several files and creates several new ones every time
it starts up.  Batching reduces the time required for processing the
rnews's drastically (utzoo!henry claims something like 1/10 and that
seems reasonable to me), and the added overhead of running the batcher
(and uux) once for each site every half hour or hour seems to be
insignificant.

In other words, I greatly recommend batching, even with the rather
simplistic unbatcher that was supplied with the news system.

jr@forcm5.UUCP (John Rogers) (03/25/84)

Hi.  I'm looking for some news batching software that does text compression.
I already have some stuff that combines lots of little articles into big
files (to reduce the UUCP per-file overhead), with a daemon that allows rnews
to run without uuxqt (to free up the communications line while receiving the
articles).  This was all put together by fortune!stein (Mark Stein) (thanks
Mark!).  But I'm wondering if we can get even more efficient...

Someone suggested using pack and unpack, which are from System III.
Unfortunately, the site that needs it most (the poorest site that calls me)
doesn't have a System III license.  Another possibility might be SQ and USQ
(from the CP/M world), but I've heard that they're pretty slow, and I'd
rather not spend a lot of time integrating them into netnews.

So, does anybody have any suggestions (or, better yet, source code)?

				Thanks in advance!
				JR (John Rogers)
				{ihnp4,harpo,cbosgd}!fortune!forcm5!jr
				also fortune!jr, proper!jr
-- 
				JR (John Rogers)
				UUCP: forcm5!jr, fortune!jr, proper!jr