[net.news] Data compression to lower phone bills

dyer@spdcc.UUCP (Steve Dyer) (05/29/86)

You know, there's no free lunch.  I seem to remember that compressing
and uncompressing a batch of news takes a significant chunk of a VAX 780,
probably a more significant impact than phone line charges for many sites,
which is one reason that compressed batching of news articles isn't as popular
as it might be.
-- 
Steve Dyer
dyer@harvard.HARVARD.EDU
{linus,wanginst,bbncca,bbnccv,harvard,ima,ihnp4}!spdcc!dyer

grr@cbmvax.cbm.UUCP (George Robbins) (05/29/86)

In article <327@spdcc.UUCP> dyer@spdcc.UUCP (Steve Dyer) writes:
>You know, there's no free lunch.  I seem to remember that compressing
>and uncompressing a batch of news takes a significant chunk of a VAX 780,
[...]
>Steve Dyer dyer@harvard.HARVARD.EDU

This is partly because when compress is made as a part of news, it is
compiled with the large system defaults.  This implies a big tradeoff of
memory in favor of speed.  If you don't have the memory, or are actually
sharing it with other tasks, the resultant thrashing will put your machine
to sleep.

You can get effective compression by using a smaller number of bits either
at compile or execute time.  Of course, the other end shouldn't send you
data compressed to a greater extent than you compile your end for...
--
George Robbins - now working with,      uucp: {ihnp4|seismo|caip}!cbmvax!grr
but no way officially representing      arpa: cbmvax!grr@seismo.css.GOV
Commodore, Engineering Department       fone: 215-431-9255 (only by moonlite)

jerry@oliveb.UUCP (Jerry Aguirre) (05/31/86)

I think that those people who are not using compress because of the
additional CPU overhead are not considering the entire picture.  Yes, it
takes cpu cycles to compress a batch of news.  But remember, by making
the batches smaller you save overhead in queueing and transmitting the
batch.  Here are some timings run on a medium loaded Vax750 running
4.2BSD.  The input file is a normal batch of news, compress is version 4.

First test is with two uncompressed batches of 50K
       12.5 real         1.7 user         1.7 sys  batch 50K
       13.5 real         2.7 user         2.0 sys  uux (copy and queue)
       15.7 real         1.8 user         2.2 sys  batch 50K
       16.6 real         2.9 user         2.2 sys  uux (copy and queue)
     1012.3 real        10.1 user        20.2 sys  uucico (2x50K)
			----             ----
			19.2		 28.3 = 47.5 cpu seconds

Second test is with a single compressed batch of 100K
       45.3 real         2.8 user         3.5 sys  batch 100K
       46.1 real         9.5 user         3.2 sys  compress 100K->50K
       46.3 real         2.4 user         2.5 sys  uux (copy and queue)
      508.3 real         6.2 user        12.5 sys  uucico (50K)
			----             ----
			20.9		 21.7 = 42.6 cpu seconds

These timings are of course subject to a lot of variation for different
hardware and different versions of uucp.  But in this configuration
where cpu cycles are at a premium it actually works out to be better to
compress than not!  The actual difference is probably much better as
some of that extra uucico activity consists of DH interrupts that are
probably not being charged to the uucico process.  Also the compress
process can easily be "niced" while using nice on the uucico process
will cause problems.

Older versions of compress would run lots faster if given a smaller
number of bits.  I ran some timing tests on version 4.0 and while it
seems optimized for either 12 or 16 bits the difference in cpu usage
between the two is negligible.  If you are concerned about memory
usage then I suggest you use 12 bits.  The difference in output file
size between using 12 and 16 bits of compression is only about 6
percent.  I would also urge upgrading to version 4.0 as it is
significantly faster than older versions.

In terms of system memory usage the 46 seconds of compress memory usage
can be traded off against the extra 504 seconds of uucico memory
usage.

So, you can have your cake and eat it.  Smaller queues, reduced phone
usage, and LESS cpu cycles.

					Jerry Aguirre @ Olivetti ATC
{hplabs|fortune|idi|ihnp4|tolerant|allegra|glacier|olhqma}!oliveb!jerry

tanner@ki4pv.UUCP (Tanner Andrews) (06/06/86)

News is distributed via a "diffusion" scheme, where articles are
passed along paths which are often redundant.  When the n'th copy of
an article (for n != 1) arrives at a site, it is tossed out.  The
first copy is the only copy kept or passed along.

If we merely pass along the news without uncompressing and unbatching
it, we lose the ability to toss duplicates from our site.  We also lose
the ability to _not_ pay phone bills to re-transmit the duplicate.

If you are not batching your news before transmission, your phone bills
are much higher than they should be.  You are losing a fair amount of
the value of compression if you compress 100 short articles
seperately rather than the same articles in a batch -- even if you
have an iAEC-286 processor which is limited to 12-bit compression.

There is also a certain amount of overhead PER FILE transmitted; for
for short files the UUCP negotiation may take as long as the actual
file transmission.  If you batch and transmit 100 articles, there are
two files transmitted (the batch + the UUX file).  Transmit each article
seperately, and you have to negotiate 200 times for 200 files.

-- 
<std dsclm, copies upon request>	   Tanner Andrews