[news.admin] Compressed archive format

xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (09/08/90)

[Crossposted to alt.sources.d to provide the data toward the end, and to
news.admin to show the possible telecomm savings of changing compressors;
the rest is Amiga specific.  Please select the appropriate followup group;
I've pointed it back to comp.sys.amiga, where this thread is ongoing.]

sparks@corpane.UUCP (John Sparks) writes:
>warren@hpindda.cup.hp.com (Warren Burnett) writes:
>|sparks@corpane.UUCP (John Sparks) writes:
>
>|> Lharc seems speed comparable with Zoo, maybe a bit slower at packing.
>
>|HA! That's a laugh.  Even on my 68030 workstation lharc is slow.  Lharc 
>|takes about five times as long as zoo does on compression and two to 
>|three times as long as zoo on decompression, at least on the kinds of 
>|files that I archive.

Hmm, on my A68000, I just unarchived an 803Kbyte file from lharc in
under 60 seconds.  I can easily live with that speed for the better
compression.  No question, lharc is slow to compress, but see below.

>|Personally, I am sticking with zoo.  I would rather have my archive files
>|be slightly larger than sit around waiting for ten minutes to decompress
     ^^^^^^^^
>|this nifty new screen hack I just downloaded.

Well, I just compressed an 803 Kbytes file with lharc, zoo, and compress,
all on the Amiga.  Zoo (12 bit Lempel-Ziv) gave 131 Kbytes, compress (14 bit
Lempel-Ziv) gave 121 Kbytes, and lharc (?? bit Lempel-Ziv cascaded with
(adaptive ?) Huffman) gave 50 Kbytes!  You can try this yourself, it is
the StPauls.dat file in the recent DKBtrace distribution in
comp.binaries.amiga.

The file, although it is in fact a text picture description file, is of the
type of raw pixel image data - lots of repeated bytes - so you might want
to really reconsider which compressor you want to use for image data; paying
for 2.5 times the storage to get the higher speed of zoo is a big hit.

The performance of lharc on general c.{s,b}.a distributions is less
spectacular, but still very good compared to zoo or compress.  I almost
always save at least 20%, which is money in the bank for me.

>Well after downloading files to my amiga then starting to uncompress them,
>I usually stay at the console when uncompressing a Lharc file with Lhunarc
>but I can go watch about 15 minutes of TV when unzooing a zoo file. And
>that is on any kind of file. You must have a really lousy version of
>Lharc. 

Well, lharc does something on the screen to keep your attention; I'd be a
bit surprised if it were really faster than zoo at decompression: please
do a couple of timing tests on large files, and, if you find lharc actually
faster, _please_ publish your version number!  ;-)

Still, the Amiga is a multitasking machine, and I just start lharc
going in the background and do something else while it churns; I
really don't much care how fast it is, just how well it compresses.
I'm slowly converting zoo files to lharc files, to recover _lots_
of floppy disk space for reuse.

The _entire_ DKBtrace distribution, source, docs, data, and binaries
lharc-ed like this:

Original file bytes:        1,920,793  (no file system overhead counted)
                                       (as reported by lharc)
Internal compressed bytes:    426,603  (sum of compressed files in .lzh)
                                       (as reported by lharc)
Archive size in bytes:        431,306  (External file size, not counting)
                                       (directory and extension blocks)
                                       (as reported by "list")

Compare to zoo:

Original file bytes:        1,920,793  (no file system overhead counted)
                                       (as reported by zoo)
Internal compressed bytes:    660,523  (sum of compressed files in .zoo)
                                       (as reported by zoo)
Archive size in bytes:        670,146  (External file size, not counting)
                                       (directory and extension blocks)
                                       (as reported by "list")

That's just too good a compression gain for me to ignore, even though it
really does take lharc almost five times as long to compress the files.
[Your data may differ slightly; I use my own arcane directory structure,
throw away the duplicate copies of the Docs files, and keep the header
from the first posting in each distribution group for reference.]

Think how much money that represents in transmission and storage costs for
the net, as well.  Even uuencoded, that 33% data savings will be the same;
perhaps it is time to really think about what we pay to use the compression
algorithm "everybody has" for USENet data transfer.

It is interesting to notice that lharc seems to use only half the overhead
to store full directory structures as does zoo.  I wonder why.

Those intensely interested in compressing image data are referred to the
ongoing discussion in alt.sources.d.

Kent, the man from xanth.
<xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>