[net.sources.d] Backbone automatic news-compression question ...

wcs@ho95e.UUCP (#Bill_Stewart) (09/21/86)

In article <4012@ut-ngp.UUCP> werner@ut-ngp.UUCP (Werner Uhrig) writes:
<discussion of whether to compress or not, since the backbone sites
<mostly compress news anyway, and double-compression tends to be
<counter-productive.

I've seen this done in two areas -
	Binary programs for PCs
	Very large text files (e.g. the Senate bill)

Binary programs have to be processed by a program such as BinHex or
uuencode to make them safe to transmit; my guess is that compressing
them *before* Binhexing would not have much effect on the results of
compress.  Two comments: for PC programs, the most popular compression
seems to be ARC, which is shareware.  Compress is PD, its behavior
is better known, and source is available from mod.sources; is there
any reason to prefer ARC?  Also, the "btoa" program that comes with
compress expands files by 20% instead of the 33% for uuencode; this
can be significant, if more people would use it.

For large text files, it's normally not appropriate, because of the
double-compression problem.  However, the Senate Bill was >64K
before compression/uuencoding, and significantly smaller afterwards.
This allowed it to survive braindamaged mailers, at the expense of
being more work to read, and the redundancy added by uuencode was
probably exploited by compress on the backbone.

On the subject of compression for transmission, if a backbone site is
transmitting a given message to 24 other sites, does it compress each
outgoing message once, or does it do it 24 times?  This is obviously
moot on a broadcast network (like Stargate, or ihnp4->rest of IH),
but could have a major CPU impact on an overloaded machine like
ucbvax, ihnp4, or allegra.  If I were designing "C News", I'd consider
storing all articles in compressed form, with special pre-defined
tokens for the common header items; this would have major disk space
and transmission savings, and would probably have low CPU impact for
news reading programs.
-- 
# Bill Stewart, AT&T Bell Labs 2G-202, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs

sewilco@mecc.UUCP (Scot E. Wilcoxon) (09/26/86)

In article <857@ho95e.UUCP> wcs@ho95e.UUCP (Bill Stewart 1-201-949-0705 ihnp4!ho95c!wcs HO 2G202) writes:
>In article <4012@ut-ngp.UUCP> werner@ut-ngp.UUCP (Werner Uhrig) writes:
><discussion of whether to compress or not, since the backbone sites
><mostly compress news anyway, and double-compression tends to be
><counter-productive.
...
>On the subject of compression for transmission, if a backbone site is
>transmitting a given message to 24 other sites, does it compress each
>outgoing message once, or does it do it 24 times?  This is obviously
...

At present each news transmission via uucp is separate, so the answer is
24 times.

Here in Minnesota several sites are testing two programs which allow sending
one batched news file to several sites.  With MULTISEND/uucast, the answer is
"24 times or less, probably approaching 1".  If all 24 sites get the same
messages, the answer is 1.  Yes, this does drastically reduce CPU and disk
usage.

MULTISEND/uucast will be posted to the net after news 2.11, as they fit
around news 2.11.  I mention them by name so you'll know them when they
are posted.
-- 
Scot E. Wilcoxon    Minn Ed Comp Corp  {quest,dicome,meccts}!mecc!sewilco
45 03 N  93 08 W (612)481-3507                  ihnp4!meccts!mecc!sewilco
	Laws are society's common sense, recorded for the stupid.
	The alert question everything anyway.

ahby@meccts.UUCP (Shane P. McCarron) (09/27/86)

In article <628@mecc.UUCP> sewilco@mecc.UUCP (Scot E. Wilcoxon) writes:
>MULTISEND/uucast will be posted to the net after news 2.11, as they fit
>around news 2.11.  I mention them by name so you'll know them when they
>are posted.

Actually, the names are Multibatch and uusend.  They really will be
posted REAL SOON NOW, possibly even before 2.11, seeing as how it is
pretty late in coming out.
-- 
Shane P. McCarron			UUCP	ihnp4!meccts!ahby
MECC Technical Services			ATT	(612) 481-3589

"They're only monkey boys;  We can still crush them here on earth!"

fair@ucbarpa.Berkeley.EDU (Erik E. Fair) (10/02/86)

In article <857@ho95e.UUCP> wcs@ho95e.UUCP (Bill Stewart 1-201-949-0705) writes:
> [deleted]
>
>On the subject of compression for transmission, if a backbone site is
>transmitting a given message to 24 other sites, does it compress each
>outgoing message once, or does it do it 24 times?  This is obviously
>moot on a broadcast network (like Stargate, or ihnp4->rest of IH),
>but could have a major CPU impact on an overloaded machine like
>ucbvax, ihnp4, or allegra.  If I were designing "C News", I'd consider
>storing all articles in compressed form, with special pre-defined
>tokens for the common header items; this would have major disk space
>and transmission savings, and would probably have low CPU impact for
>news reading programs.

If you send a single article to 24 sites, yes, it normally would get
compressed and sent 24 times. When I ran "dual", which had seven
"leaf" nodes feeding off of it at its peak, I set up a "pseudo-site"
named "leaf" that generated one batch file, and through the magic of
UNIX links, I sent a single batch to all seven sites. The key is that
the sites that you do this for *must* be

	1. real leaf nodes (that is, they don't get news from anywhere
		else, and generate very little themselves, because this
		scheme will send *everything* that they send to you,
		back to them (in addition to sending to all the other
		leaves)).

	2. all using the same unbatching scheme (that is, they must
		all accept the same format of batch file as input; you
		can vary the command given to uux on a per-site
		basis).

In order to realize maximum savings from this, I had to hack uux to
take one more argument (not hard, the diff of the mod was distributed
with netnews for both v7 & sV UUCPs): the -l (minus el) argument; it is
equivalent to the "-c" option, except that it attempts to make a hard
UNIX link between the input file given to uux, and the queue data file
before trying an explicit "copy". The effect was seven jobs queued with
UUCP, but only one copy of the data file, which was a big win for
/usr/spool/uucp disk space; as a result, the incremental cost to "dual"
of feeding a new "leaf" node was more modem time, and a little more
disk space for C. and X. files (i.e. not much).

As for keeping the articles compressed on the disk, I don't think that
would win much because articles are stored as single files (collecting
articles in a newsgroup together would mean you couldn't link articles
across newsgroups), and the average size of article is pretty small
(last I checked (in 1984, admittedly a long time ago) it was 1600 bytes)
so that the resultant savings (if any) would not be much.

	Erik E. Fair	ucbvax!fair	fair@ucbarpa.berkeley.edu