[news.software.b] News Batching Software

ahby@meccts.UUCP (02/25/87)

Okay, here we go...

Last summer I was at the Atlanta USENIX.  There I met Mel Pleasant,
Mark Horton, and other people like that.  As part of the discussion, 
I mentioned that we had a program here in Minnesota that allows a 
site to feed N downstream sites while only using the spool
space for 1 down stream feed.  Everybody said "really?  send it out!"
Well, I always meant to, but the code is kind of messy, and the
solution isn't exactly perfect - let me summarize how it works.

News 2.10.3 and beyond has a facility called MULTICAST.  This allows
sites in Austrailia to send news to each other in the manner described
above.  However, these was no way supplied with the distribution for
doing this on UUCP sites.  Our solution was first to write a program
called uucast.  Uucast uses link() and a little spit and glue to create
uucp job files that are destined for a number of systems, as well as
associating a command with the file:

	uucast file command site1 site2 ...

Uucast seems to work with HDB, Sys III, Sys V, BSD 4.2, BSD 4.3, Ultrix
versions 1.1 and 1.2, and a few different versions of Xenix.  However,
uucast's method of generating sequence numbers is a kludge, and I felt
it was inelegant.  This is actually the whole reason I have been
sitting on the news broadcasting system as long as I have.  It works,
but it's ugly.  Uucast uses it's process ID as a seed, and generates
sequential UUCP Sequence numbers from that.

The next phase of broadcasting was the program multibatch.  Multibatch
looks at a batch file generated by news 2.10.3 or later with the
MULTICAST option enabled and 'M' settings in the sys file.  It scans
this file for articles that are destined for the same set of systems,
and builds a traditional batch file for each of these sets.  Then for
each set it calls a shell script called multisend.  Multisend is a
close cousin of sendbatch, except it is designed with broadcasting in
mind.  It also cannot handle ihave/sendme protocols (for obvious
reasons).  Anyway, multisend takes this traditional batch file and
feeds it a piece at a time to the news batch program.  It then takes
these generated batches, compresses them if requested, and gives them
to uucast along with the set of systems indicated for that set of
articles.

The upshot of all this is that if an article is destined for
5 downstream sites, it is only batched once, only compressed once, and
only takes one slice of the spool device (well, two slices if you
count the original).

You're probably asking yourself "How well does this work?"  Well,
Minnesota is an exception to just about every Usenet rule, I know.
However, my site has 2.3 meg of spool space maximum for traffic (news
is kept on another device) and we full feed 4 sites and partial feed 4
others.  We very rarely have a problem, and when we do we would have
had the same problem regardless of how many sites we feed.  Once you
have solved the disk space problem, news becomes a problem of dollars
and modem bandwidth.  A 1200 baud modem can perform a typical
compressed batched feed in about 2.2 hours per night.  Our modems are
pretty busy a lot of the time :-).

I think it is high time that I got off my butt and published this
software.  I will be putting it all together and sending it out to
mod.sources tonight if at all possible.  It has been running on 5
sites for about 6 months, so I hope all of the bugs are out of it.
All I can say is that I'm sorry I took so long about this.
-- 
Shane P. McCarron		UUCP	ihnp4!meccts!ahby, ahby@MECC.COM
MECC Technical Services		ATT	(612) 481-3589

"Character is what you are in the dark!"

perry@vu-vlsi.UUCP (02/25/87)

In article <2234@meccts.MECC.COM> ahby@meccts.UUCP (Shane P. McCarron) writes:
>...
>called uucast.  Uucast uses link() and a little spit and glue to create
>uucp job files that are destined for a number of systems, as well as
>associating a command with the file:
>
>	uucast file command site1 site2 ...

   In Pyramid ucb universe you can use the -l option to uux and it will
make a link to the original file instead of copying it (the file being
sent and the uucp/spool area must be on the same disk partition for
this to work), so the equivalent of that uucast command is:

	uux -l -n -z "site1!command<" !file
	uux -l -n -z "site2!command<" !file
	...
	rm file

>Uucast seems to work with HDB, Sys III, Sys V, BSD 4.2, BSD 4.3, Ultrix
>versions 1.1 and 1.2, and a few different versions of Xenix.

   I suppose the uux -l option is not available on every system; on the
Pyramid, ucb uux has it but att uux doesn't, so the uucast solution will
be appreciated by many.

   The one thing that is preventing me from finishing the multi-destination
news batching that I've been working on is that for one of the sites we
feed (a vax/vms system) I must just copy the file to their spool directory
and it gets unbatched via a batch job there, and I haven't figured out
how to use uux directly to simply copy a file without sending a command
to be executed on the remote site (uucp itself here does not have -l
flag).  If anyone knows how to do that please let me know...

...Rick			..{cbmvax,pyrnj,bpa}!vu-vlsi!perry
			perry@vuvaxcom.bitnet

ahby@meccts.UUCP (02/27/87)

In article <635@vu-vlsi.UUCP> perry@vu-vlsi.UUCP (Rick Perry) writes:
>   In Pyramid ucb universe you can use the -l option to uux and it will
>make a link to the original file instead of copying it (the file being
>sent and the uucp/spool area must be on the same disk partition for
>this to work)

Actually, I am aware of this modification.  I believe Erik Fair has a
mod for Berkeley uux which allows uux -l.  Since Pyramid uses
Berkeley, maybe that is where it came from.  In the multibatch system,
uucast is called from within the shell script multisend.  You can
easily modify it to use uux -l if you have that available.  Mel
Pleasant is running multibatch at rutgers, and I believe he has done this.
-- 
Shane P. McCarron		UUCP	ihnp4!meccts!ahby, ahby@MECC.COM
MECC Technical Services		ATT	(612) 481-3589

"Character is what you are in the dark!"

stephen@dcl-cs.UUCP (02/27/87)

In article <2234@meccts.MECC.COM> ahby@meccts.UUCP (Shane P. McCarron) writes:
>Last summer I was at the Atlanta USENIX.  There I met Mel Pleasant,
>Mark Horton, and other people like that.  As part of the discussion, 
>I mentioned that we had a program here in Minnesota that allows a 
>site to feed N downstream sites while only using the spool
>space for 1 down stream feed.  Everybody said "really?  send it out!"

I have already posted one to mod.sources.  It appeared either this month or
last month.  The only problem is that it's written in C++.  What I really need
is someone to convert it to C and/or system V.  If you can't do this, at least
have a look at it and see if any of my ideas are useful.  It's perfectly
readable if you only know C.
-- 
EMAIL:	stephen@comp.lancs.ac.uk	| Post: University of Lancaster,
UUCP:	...!mcvax!ukc!dcl-cs!stephen	|	Department of Computing,
Phone:	+44 524 65201 Ext. 4120		|	Bailrigg, Lancaster, UK.
Project:Alvey ECLIPSE Distribution	|	LA1 4YR

fair@ucbarpa.Berkeley.EDU (Erik E. Fair) (03/16/87)

Sorry, the "-l" (link) flag to uux wasn't my idea. It passed through my
hands long ago either as part of the B news 2.10 release (which
contained a number of mods for UUCP in the form of diffs that Mark
Horton put together), or I got it from Mark Stein, then of Fortune
Systems, now of SUN Microsystems. I forget which. I know that the
batching system we used this with (bnproc) was from Mark Stein.

Why this is useful is easy to see: if you can ship the exact same
batch file to several sites, you can use one queue file with links.
This is a win because:

1. you save on disk space in your /usr/spool/uucp queue area.
2. incremental cost of feeding a new site is some more C. files in the
	queue area, and more modem time.
	
There are several caveats, though:

	1. all neighbors *must* use the same batch file format
	2. all neighbors *must* be "leaf" nodes (i.e. they don't send
		you much in the way of news)
	3. if you are tight on queue space, it only takes one
		incommunicado neighbor to muck things up

Caveat #2 is most important: to do this right, you set up a "pseudo"
site in your sys file, from which you ship batches to the list of leaf
nodes. However, since the "pseudo" site name does not match the name of
any of the leaf sites that you're sending those articles to, anything
that THEY send to YOU will get sent back to THEM by this system. If one
of your leaves hooks up to another system and gets a major feed, you'll
be sending back copies of all the articles that come over that path, to
be rejected when they reach the leaf that sent them to you. Ugh.

This system does work pretty well, though. In the heyday of "dual" as
a major netnews hub for the San Franciso Bay Area, we were taking
three full netnews feeds, and feeding five leaf nodes with the linked
queue files (not bad for a small 68000 system with 160Mbytes of disk,
and two modems...)

	Erik E. Fair	ucbvax!fair	fair@ucbarpa.berkeley.edu

perry@vu-vlsi.UUCP (Rick Perry) (03/17/87)

In article <17859@ucbvax.BERKELEY.EDU> fair@ucbarpa.Berkeley.EDU
 (Erik E. Fair) writes:
> ...
>  However, since the "pseudo" site name does not match the name of
>any of the leaf sites that you're sending those articles to, anything
>that THEY send to YOU will get sent back to THEM by this system.

   I think this depends on how you deal with the batch files created
for the "pseudo" site.   I am using the MULTICAST news feature, and if
news comes in from site xxx it naturally would not get batched up for
sending to site xxx, but since the "pseudo" site name I have specifies
:all: for it's newsgroups, it does get batched for that site but without
site xxx's name appended.  For example, the following excerpt from the
news log shows a message from devon that get's queued to multi (the pseudo
site):

Mar 16 23:02	devon	received <xxx@devon.UUCP> ng to.devon subj 'test' from rickp@vu-vlsi.UUCP (Rick Perry)
Mar 16 23:02	devon	<xxx@devon.UUCP> sent to multi

and in our /usr/spool/batchnews/multi file it says:

/usr/spool/news/to/devon/7

   Messages spooled to multi which are actually destined to goto systems
x, y, z would say (in the batchnews/multi file) something like:

/usr/spool/news/newsgroup/123 x y z

   In the way we handle this file, if no system names are specified on
a line then nothing gets sent anywhere.  So even though all sorts of
postings get sent to the "pseudo" system (like stuff posted to local
groups) it is no problem.

   I guess the real problem is that, although MULTICAST is provided
in news 2.11, there is no standard way to deal with it on uucp-
connected systems.

...Rick			..{cbmvax,pyrnj,bpa}!vu-vlsi!perry
			perry@vuvaxcom.bitnet

stephen@comp.lancs.ac.uk (Stephen J. Muir) (03/17/87)

In article <17859@ucbvax.BERKELEY.EDU> fair@ucbarpa.Berkeley.EDU (Erik E. Fair) writes:
>	2. all neighbors *must* be "leaf" nodes (i.e. they don't send
>		you much in the way of news)

This is not necessary if you use my news batcher, which appeared recently on
mod.sources.  I am desperately wanting someone to convert it from C++ to C so
that it is usable by most people.  Also, it needs to be modified to work on
system V.

Don't worry if you don't know any C++, it is perfectly understandable if you
only know C.
-- 
EMAIL:	stephen@comp.lancs.ac.uk	| Post: University of Lancaster,
UUCP:	...!mcvax!ukc!dcl-cs!stephen	|	Department of Computing,
Phone:	+44 524 65201 Ext. 4120		|	Bailrigg, Lancaster, UK.
Project:Alvey ECLIPSE Distribution	|	LA1 4YR

ahby@meccts.UUCP (Shane P. McCarron) (03/18/87)

In article <659@vu-vlsi.UUCP> perry@vu-vlsi.UUCP (Rick Perry) writes:
>   I guess the real problem is that, although MULTICAST is provided
>in news 2.11, there is no standard way to deal with it on uucp-
>connected systems.

I have just recently submitted a set of programs to mod.sources that
deals with exactly this problem.  Many sites in Minnesota have been
running multibatch/uucast for some time now, and it works pretty well.
I have already written an article to this group explaining how the
software works, so I won't elaborate further.  Suffice it to say that
you should keep your eyes on mod.sources for further details.
-- 
Shane P. McCarron		UUCP	ihnp4!meccts!ahby, ahby@MECC.COM
MECC Technical Services		ATT	(612) 481-3589

"Character is what you are in the dark!"