[comp.sources.d] WANTED: News compression information...

time@ox.com (Tim Endres) (05/30/90)

I am working on an implementation of News for the lowly Macintosh.
I have had many requests to support compressed/batched news.
I have easily added the support for batched news, but have found
the support of compressed news to be a dilemma.

Most people have indicated that "compress", the PD version, is what
is normally used for news compression. This would *seem* fine, but
the darn thing requires 500K RAM just to uncompress. This not only
seems extraordinary, but I can not see how implementations on a PC
limited to 640K could even work.

QUESTIONS:
Is "compress" the common news compression algorithm?
Do all news feeds compress at 16 bits or 12 bits?
What are the implications of using compress in commercial sw?
Are there any other, more miserly, compress programs available?

Thanks, tim.
time@ox.com

jmv@sppy00.UUCP (J. Vickroy +1 614 764 4343) (05/30/90)

In article <1990May29.202056.26271@ox.com> time@ox.com (Tim Endres) writes:
=>
=>  [ stuff deleted ]
=>
=>QUESTIONS:
=>Is "compress" the common news compression algorithm?
=>Do all news feeds compress at 16 bits or 12 bits?
=>What are the implications of using compress in commercial sw?
=>Are there any other, more miserly, compress programs available?

I've been told that the compression algorithm that sendbatch uses is
a "modified" L-Z. If this is true, then the PD compress should work.
This is the theory that I'm working from, at least. I'm in the 
middle of a port to Think C 4.0. 

jim
--
Jim Vickroy                               | Voice:      +1 614 764 4343
Telecommunications Systems Engineering    | Internet:   jmv@rsch.oclc.org
Online Computer Library Center, Inc.      | uucp:       osu-cis!sppy00!jmv
6565 Frantz Road, Dublin Ohio, 43017-0702 | CompuServe: 73777,662

SA44@LIVERPOOL.AC.UK (Kevin Maguire) (05/31/90)

In article <1990May29.202056.26271@ox.com>, time@ox.com (Tim Endres) says:
>I am working on an implementation of News for the lowly Macintosh.
>I have had many requests to support compressed/batched news.
>Most people have indicated that "compress", the PD version, is what
>is normally used for news compression. This would *seem* fine, but
>the darn thing requires 500K RAM just to uncompress. This not only
>seems extraordinary, but I can not see how implementations on a PC
>limited to 640K could even work.

I had a look at the compress.c (comp.c ??) file in C news and it does
indeed seem to be the standard UN*X compress(1) command incognito.
It depends which defines you choose when compiling this file whether
you get 16bit/12bit/whatever compression. Yes 16 bit does need ~400K
to compress in but uncompressing with even 16bit need much less ~70K (?)
So just get this file (compress.c) and compile in 16bit mode. Once compiled
compress automatically determines the compression mode a file was compressed
in so on your Mac you can compress your batches in 13 bit mode and send
them out and uncompress incoming 16bit batches as well. The commandline
option is "compress -b13" for thirteen bits.
    This way you'll only need about 70K of work space to send out 13bit
batches (which any compress should automatically handle) and uncompress
16bits batches. Alternatively, ask your feed site to use 13bit (or 12)
compression on your batch (don't think either C news or B news automatically
allow this however :-()

Kevin Maguire

Nsfnet : sa44%liv.ac.uk@nsfnet-relay.ac.uk
Uucp   : ...!mcsun!ukc!liv-ib!sa44

gz@cambridge.apple.com (Gail Zacharias) (06/01/90)

In article <1990May29.202056.26271@ox.com> time@ox.com (Tim Endres) writes:
>Most people have indicated that "compress", the PD version, is what
>is normally used for news compression. This would *seem* fine, but
>the darn thing requires 500K RAM just to uncompress. This not only
>seems extraordinary, but I can not see how implementations on a PC
>limited to 640K could even work.

I have a program (actually, a set of library routines) that will do
unix-compatible compression on the Mac.  The worst case uncompression (16 bit)
requires 200K.

I'd be happy to send it to anybody who asks.  It's about 20K of MPW assembler
source, for functions meant to be called from MPW C.

---
Home: gz@entity.com or ...mit-eddie!spt!gz
Work: gz@cambridge.apple.com

guy@auspex.auspex.com (Guy Harris) (06/01/90)

>I've been told that the compression algorithm that sendbatch uses is
>a "modified" L-Z.

Uhh, the compression algorithm that "sendbatch" uses *is* "compress":

	-c)	COMP='| $LIB/compress $cflags'
		ECHO='echo "#! cunbatch"'
		continue;;

and "/usr/lib/news/compress" is simply a symlink to "/usr/ucb/compress"
on our system.  There is a variant of "compress" supplied in source form
with the netnews source, but it's basically the same program as the one
that comes with, say, BSD.

bob@MorningStar.Com (Bob Sutterfield) (06/01/90)

In article <1990May29.202056.26271@ox.com> time@ox.com (Tim Endres) writes:
   Is "compress" the common news compression algorithm?

Yes.

   Do all news feeds compress at 16 bits or 12 bits?

Hard to say, but I don't know of any doing 12-bit compression.  But
then, I don't run in those circles.

   What are the implications of using compress in commercial sw?

Ask your lawyer.  Net advice is worth exactly what you pay for it :-)

   Are there any other, more miserly, compress programs available?

Some years ago, someone posted a micro-zcat to net.sources.  It
performed the uncompress-in-a-pipe function in about two dozen lines
of C, really pretty elegantly.  Of course, I can't dredge it up any
more.  Perhaps your friendly neighborhood source archive site would
have a copy?

clewis@eci386.uucp (Chris Lewis) (06/01/90)

In article <861@sppy00.UUCP> jmv@sppy00.UUCP (J. Vickroy +1 614 764 4343) writes:
> In article <1990May29.202056.26271@ox.com> time@ox.com (Tim Endres) writes:

> =>QUESTIONS:
> =>Is "compress" the common news compression algorithm?

In B-news and C-news "compress" is what is invoked.  compress source
comes with B-news, but apparently not C-news.  I suspect TMN (aka
news 3.0) uses compress too.

> =>Do all news feeds compress at 16 bits or 12 bits?

B-news by default uses whatever the compress source is compiled for.

C-news by default uses 12 bits (because >12 bits isn't always
possible on things like 286 Xenix or other small address space
machines like pdp11's) regardless of how compress is compiled.  16 versus
12 bit usually only makes a difference of a few percent in size, but oodles
in time (and memory space).  B-news sites (particularly with scant
CPU cycles and memory space) would be well advised to use -b12 anyways
for outgoing batches (and possibly recompile for 12 bit provided that
your feeders know about it).

Build your Mac thingie with compress *compiled* for 12 bit compression,
and include in the software notes that anybody feeding you has to set
their compression to 12.  Which means C-news feeders probably don't have
to do anything, and B-news feeders have to stuff the "-b12" option
into sendbatch's compress invocation (sendbatch can be parameterized
for this I believe).  Compress compiled for 12 bit compression has a
data area of something like 32K compared to 400K+ when compiled for
16 bits.

There's absolutely *no* problem with feeding 12 bit compressed
data to a 16 bit compress program (it figures it out itself).
(eg: a 12 bit feeding a 16 bit)
The problem arises when a 16 bit compress program generates
16 bit output and tries to run it through a compress compiled for
12 bits.  (eg: a 16 bit feeding a 12 bit)

Just make sure that your compress source is at least version 4.0.

> =>What are the implications of using compress in commercial sw?

The source appears PD.  Don't charge money for compress itself - the
authors will get pissed....  Nor pretend you wrote it.  A mention
of the authors in your documentation would be nice...  Given the
source, you should be able to contact the original authors to make
sure.

> =>Are there any other, more miserly, compress programs available?

Not that would be compatible with the majority of existing news sites.
Please don't introduce another batch protocol!  There appears about
a dozen "supported" in B-news and C-news land, plus untold numbers
in actual practise.  (Though *all* of the compressed-batch protocols
that are directly supported by the batchers use "compress" somewhere).

The "standard" compressed batch format is a normal uncompressed
news batch, run through compress, and occasionally (unmodified B-news)
prepended with "#!cunbatch" or "#!rnews".  C-news doesn't do the
prepends, but will *accept* either prefix or none automatically.
(which is what I suggest yours do too).
 
> I've been told that the compression algorithm that sendbatch uses is
> a "modified" L-Z. If this is true, then the PD compress should work.

In true USENET tradition, the defacto standard is "compress".  Which
just so happens to be "an" implementation of L-Z.  Without comparing
output formats, there's no way of telling whether any other L-Z
implementation would be compatible.
-- 
Chris Lewis, Elegant Communications Inc, {uunet!attcan,utzoo}!lsuc!eci386!clewis
Ferret mailing list: eci386!ferret-list, psroff mailing list: eci386!psroff-list

wcf@psuhcx.psu.edu (Bill Fenner) (06/01/90)

In article <BOB.90May31152322@volitans.MorningStar.Com> bob@MorningStar.Com (Bob Sutterfield) writes:
|In article <1990May29.202056.26271@ox.com> time@ox.com (Tim Endres) writes:
|   Do all news feeds compress at 16 bits or 12 bits?
|
|Hard to say, but I don't know of any doing 12-bit compression.  But
|then, I don't run in those circles.

hogbbs<>psuhcx used to be 12 bit, before I got 16-bit compression working.
sendbatch simply passes the -b option to compress, if it's on the sendbatch
command line, so I said sendbatch -b12 hogbbs.

  Bill
-- 
Bill Fenner     psuhcx is going away 5/31.  Use wcf@wcfpc.scol.pa.us or
sysop@hogbbs.fidonet.org (1:129/87 - 814/238-9633) ..psuvax1!hogbbs!wcfpc!wcf

jaw@riacs.edu (James A. Woods) (06/01/90)

# "don't compress that dwarf, hand me the pliers."  -- after firesign theatre

>   What are the implications of using compress in commercial sw?

since 'compress' is a component of s5r4 unix, it's been done very publicly.
you, too, can appropriate it in much the same manner as has at&t.

>   Are there any other, more miserly, compress programs available?

>>Some years ago, someone posted a micro-zcat to net.sources.  It
>>performed the uncompress-in-a-pipe function in about two dozen lines
>>of C, really pretty elegantly.  Of course, I can't dredge it up any
>>more.  Perhaps your friendly neighborhood source archive site would
>>have a copy?

the first micro-zcat was done in 1987 by karl f. fox of morningstar
technologies.  since then, it's become both more (and less),
as discussed under the rubric of other cult postings which
never were directed to an official public archive.

please excuse the ellipticity here, but since mr. fox, myself,
and paul eggert of twinsun.com have a closely-related official
entry in chongo's 7th intl. obfuscated C contest, we ask you *not*
to dredge up the old code for publicity here.

after mid-june, when the judges have pronounced, you'll come to
know more than you'd ever wanted to about this twisted effort.

ames!jaw

chip@tct.uucp (Chip Salzenberg) (06/01/90)

According to bob@MorningStar.Com (Bob Sutterfield):
>In article <1990May29.202056.26271@ox.com> time@ox.com (Tim Endres) writes:
>>Do all news feeds compress at 16 bits or 12 bits?
>Hard to say, but I don't know of any doing 12-bit compression.  But
>then, I don't run in those circles.

By default, C News uses 12-bit compression.  Its authors, Geoff
Collyer and Henry Spencer, are compulsive measurers [:-)].  Their
measurements of the relative efficiency of 12- and 16-bit compression
on news batches indicated that it wasn't that much of a gain,
especially considering how large a 16-bit compression process is
compared to a 12-bit compression process.
-- 
Chip, the new t.b answer man    <chip%tct@ateng.com>, <uunet!ateng!tct!chip>

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (06/01/90)

>Hard to say, but I don't know of any doing 12-bit compression.  But
>then, I don't run in those circles.

C News tends to use 12 bit compression.  

It's unfortunate since even small address space sites can do 14 bit
uncompress by using a dedicated uncompress program or something like
my rnews.c.

-- 
Jon Zeeff (NIC handle JZ)	 zeeff@b-tech.ann-arbor.mi.us

I found a groundhog chewing on my car!

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (06/03/90)

>C-news by default uses 12 bits (because >12 bits isn't always
>possible on things like 286 Xenix or other small address space
>machines like pdp11's) regardless of how compress is compiled.  16 versus

You can use something like my rnews.c that is not only faster, more secure,
never runs out of disk space, etc but also allows 14 bit uncompression even on
small machines.


-- 
Jon Zeeff (NIC handle JZ)	 zeeff@b-tech.ann-arbor.mi.us

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (06/03/90)

>By default, C News uses 12-bit compression.  Its authors, Geoff
>Collyer and Henry Spencer, are compulsive measurers [:-)].  Their
>measurements of the relative efficiency of 12- and 16-bit compression
>on news batches indicated that it wasn't that much of a gain,

Let's look at some actual figures for fairly large batches (250k after 
compress). Believe me, with C News you want to use large batches, 
especially if your link is reliable: 

16 bit =  1.0
14 bit =  1.12
12 bit =  1.28

I consider a 28% difference quite significant.  Also consider that 
even small address space sites can do 14 bit uncompress with the right 
software (and no efficiency loss).  

-- 
Jon Zeeff (NIC handle JZ)	 zeeff@b-tech.ann-arbor.mi.us

henry@utzoo.uucp (Henry Spencer) (06/03/90)

In article <90151.173150SA44@LIVERPOOL.AC.UK> SA44@LIVERPOOL.AC.UK (Kevin Maguire) writes:
>I had a look at the compress.c (comp.c ??) file in C news and it does
>indeed seem to be the standard UN*X compress(1) command incognito.

Actually, the C News distribution as shipped from here doesn't include
a compressor at all; we simply assumed that everyone had compress.
(We realize this isn't a safe assumption, but we have to cut things off
somewhere.)

>... Alternatively, ask your feed site to use 13bit (or 12)
>compression on your batch (don't think either C news or B news automatically
>allow this however :-()

C News defaults to 12-bit compression, precisely to be compatible with
small machines.  In fact, you have to go in and work at it to get 16-bit
compression.  (That probably should be made easier.)
-- 
As a user I'll take speed over|     Henry Spencer at U of Toronto Zoology
features any day. -A.Tanenbaum| uunet!attcan!utzoo!henry henry@zoo.toronto.edu

root@ozdaltx.UUCP (root) (06/03/90)

Here lately I've noticed that compress seems to "choke" on certain
news files. (we're running Bnews PL17 on a 286 XENIX).  I've checked
compress and recompiled it several times with various options always
with the same results.  It hits a point in the file and just stops.
I suspect that some of the recent uuencoded files that have been
passing thought the net might have something to do with this.  But
don't really have a way to check it.

Outta courisity, could some one e-mail me a working make for SCO XENIX
286 for compress - maybe there is something I've missed and am not
aware of it.  Thanks in advance...
scotty
ozdaltx!sysop