[net.news.adm] how batch & unbatch work

lda@clyde.UUCP (10/25/83)

There is a news batching system that is distributed with news 2.10.1
that makes transmission of news much more efficient.  News that is
batched takes much less time to transfer.  This is true because uucp
has to process only three files per batch rather than three files per
article.  Another side effect is that the size of the spool directory
is kept smaller, meaning that uucp has less looking to do to find each
file.

This system is very simple, yet very effective.  The purpose of this
article is to encourage netnews administrators to use a batching scheme
in their news transmission, and to show how easily news batching can be
set up at a site where news 2.10.1 has been installed.

For the purposes of this discussion, I have assumed that the following
definitions were made in defs.h during the make of news 2.10.1
(Note: this is the way that the news source was distributed):

#define BATCH "/usr/lib/news/unbatch"	/* name of unbatcher 	*/
#define BATCHDIR "/usr/spool/batch"	/* location (dir) of batching files */

There are four programs involved in the batching scheme that is
distributed with news 2.10.1.  They are uux, rnews, batch and unbatch.
When news comes in from a system, rnews "posts" it on the local system
and then decides which systems are to have it forwarded to them.  If
the entry for a remote system (eg. "rmt") has ":F:" in the third
field, then the fourth field is taken to be a filename into which the
location of the article  (on "local") is written.

For example:

	The sys file for system "local" looks like this:

	local:net,fa,nj::
	rmt:net,fa,nj:F:/usr/spool/batch/rmt

	If an article is posted on local to net.news.b and it becomes
	article 909 in that group, a line will be written into the file
	"/usr/spool/batch/rmt" that says

	/usr/spool/news/net/news/b/909

Batch is a program that reads the list of filenames that  has been
created by rnews and makes a "batch" of news and writes it on standard
output.  This "batch" is just a concatenation of the articles named in
the list separated by a line that says:

	#! rnews length

where "length" is the length of that article in bytes (characters).
Once all the articles have been written to stdout, the list of
filenames is truncated to make ready for the next batch.

If the file "/usr/spool/batch/rmt" had the lines

	/usr/spool/news/net/news/b/909
	/usr/spool/news/net/jokes/1222
	/usr/spool/news/fa/telecom/401

and you executed the command

	/usr/lib/news/batch /usr/spool/batch/rmt

then the following batch would be written on the standard output and
/usr/spool/batch/rmt would be made empty.

	#! rnews 1234		<where 1234 is the length of this article>
	<text of .../b/909>
	#! rnews 6833		<etc.>
	<text of .../jokes/1222>
	#! rnews 4563
	<text of .../fa/telecom/401>


"What good is all this?" you ask.  "Why would I want to write stuff to
stdout and loose it?"  Good question.  This is where uux comes in.  Uux
is a program that lets you execute a program on a remote system, "rmt".

Rather than letting the output of batch go to stdout and be lost (cf.
the story of Onan in the Bible), you pipe it into the stdin of uux and
send it as the input to the rnews command command on "rmt".  This is how
that's done:

/usr/lib/news/batch /usr/spool/batch/rmt | uux - -r -go -z -n rmt!rnews

The flags are interpreted to mean this:

	-	read standard input as input for rnews.
	-r	queue but don't send right away.
	-gN	give the job grade N. I use N=o to give news batches
		a grade that's lower than regular uucp jobs.
		Mail goes out as grade A.
	-z	don't notify user on zero exit status of rnews.
	-n	don't notify me period.

SO, now you have a batched job queued for the remote system.  When
"rmt" receives the batch and executes rnews on it,  "rmt"'s rnews will
see the "#! rnews 1234" on the first line and say to itself, "Aha! This
is a batched job.  I know what to do now."  It will then call "unbatch"
(the last link in the chain) to break the articles apart and post them
as it sees fit.

To make all this work together well, you should run batch from cron
periodically, say every hour (more often on backbone sites).  This can
be done for all the systems with which you exchange batched news by
using a shell file that contains command lines for each of those
systems and then invoking the shell from cron.  Be sure that the
directory "/usr/spool/batch" exists and is writtable by the rnews
program.  There is a script that is distributed with news 2.10.1 called
sendbatchednews that will do what you want, but it requires some
modification to uucp.  This is the script I use:

# @(#) /usr/lib/news/batch.sh 1.1
for rmt in akgua burl floyd ihnp4 masscomp
do
   # don't create a batch if the batch list is empty
   if test -s /usr/spool/batch/$rmt
   then
      /usr/lib/news/batch /usr/spool/batch/$rmt | uux - -r -go -z -n $rmt!rnews
   fi
done
/usr/lib/uucp/uucico -r1&	# send out the queued news

This is our sys file:

clyde:<groups we accept>:
akgua:<groups we send to akgua>:F:/usr/spool/batch/akgua
burl:<groups we send to burl>:F:/usr/spool/batch/burl
floyd:<groups we send to floyd>:F:/usr/spool/batch/floyd
ihnp4:<groups we send to ihnp4>:F:/usr/spool/batch/ihnp4
masscomp:<groups we send to masscomp>:F:/usr/spool/batch/masscomp
# make a list of everything that comes in
# be sure to 'mv list list.<date>' each night
list:all:F:/usr/spool/batch/list
-- 
Larry Auton WECo @ BTL WH 2C-123 (201)386-4272 ihnp4!clyde!lda

smb@ulysses.UUCP (10/27/83)

One addition to the excellent article on batching netnews:  don't use
batch intervals of more than an hour or so, even if you're not a backbone
site.  30 minutes is probably better for most sites.  The problem is
that the longer the batch interval, the larger the uucp file -- and
files over 200K tend to break uucp too often.  Besides, it's more to
retransmit if a link dies in the middle.  What I'd really like is a batcher
that's length-limited; i.e., one that sends off a batch when 50K bytes
or so has accumulated.  You're still getting most of the performance
advantage, but with fewer problems.  (Another problem for many sites:  if
too large a batch comes in, the LCK.XQT file can time out while all the
news submissins are taking place.  This isn't hypothetical; it used to
happen to us before we got a better uucp installed.  Using output batching
also helps prevent this, incidentally -- running a separate uux for each
article is *very* expensive, and can increase the load on a system by a
large amount.  Installing batching can tremdously reduce the load netnews
puts on a system.)

teus@mcvax.UUCP (Teus Hagen) (11/03/83)

Sure I will encourage batching as well, but please take into account some
severe drawbacks:
- batching is another layering between news and uucp (uux). So some old batch
  batch programs just quit after one error and you will loose some news
  articles coming after that error (see to your release!).
  Also the site from who you received your articles will not get any
  error message on failures any more. So you need to look after news better
  when you are using news.
- batching is creating large files. The change that uucp will be killed
  due to some severe error is greater. The next time uucp is called the
  total resending will be done again, etc. So sometimes it will not save
  you some phone costs. Some bug in old uucp versions will not permit
  you enough time to copy the batch file to the final destination.
  The copy will fail and the next time uucp will start to send the batched
  file again....

However with batching you can use compacting (thanks to decvax!aps and
mcvax!jim for the idea and programs). Together with some scheme to have
a certain limit in the batched file size I think batching is ok.
The programs doing that are tested now and will be included in the
next news releases.
(By the way the program is three pages, so if you do not want to wait...).
-- 
	Teus Hagen	Center for Math., Comp. Science (CMCS)
			formerly Math. Centre (MC)
			mcvax!teus

lda@clyde.UUCP (Larry D. Auton) (11/03/83)

Thanks for the critique, Steve.  Your point is well taken.  I have
encountered the problem of batching very large files, too.  I diddled
up a shell that solved the problem pretty well.  I have included it
below.  It breaks the batches into groups of twenty articles each.
Batches of this size are usually quite manageable for uucp.

This same technique is useful for sending a bunch of news to a new
netnews site.  This way the new site can get a load of old news from
their feed in batches that are of resonable length.

BATCHDIR=/usr/spool/batch
cd $BATCHDIR
for rmt in akgua burl floyd ihnp4	# sites that we feed news
do
    batchfile=$BATCHDIR/$rmt
    if test -s $batchfile
    then
         rm -f x??
         split -20 $batchfile
         >$batchfile
         for xfile in x??
         do
             /usr/lib/news/batch $BATCHDIR/$xfile|uux - -r -go -z -n $rmt!rnews
         done
    fi
done
rm -f x??
/usr/lib/uucp/uucico -r1&
-- 
Larry Auton WECo @ BTL WH 2C-123 (201)386-4272 ihnp4!clyde!lda