[comp.binaries.ibm.pc.d] Dictionary-based archiver

duncans@phoenix.Princeton.EDU (Duncan E Smith) (01/28/91)

Does anyone know of a (preferably PD/Shareware) program that compresses files
using a dictionary. PKZip, et al do a great job, but it seems that something
dictionary-based would really save space, and I have lots of English text files
lying around waiting to be squished. I believe that something like this exists
(saw it a long time ago in a magazine article about archivers), but I've never
been able to find one in any of the BBSes, anonymous FTP sites, etc. that I've
frequented.

carlsona@en.ecn.purdue.edu (The Lossel) (01/29/91)

 It's been a long time since I read the information on PKZIP's compression
methods, but doesn't the 'Imploding' compression use a 'sliding dictionary'
as part of the compression scheme?  Anyone really familliar with the PKZIP
compression care to comment?

						- The Lossel

sander@cwi.nl (Sander Plomp) (01/29/91)

carlsona@en.ecn.purdue.edu (The Lossel) writes:

> It's been a long time since I read the information on PKZIP's compression
>methods, but doesn't the 'Imploding' compression use a 'sliding dictionary'
>as part of the compression scheme?  Anyone really familliar with the PKZIP
>compression care to comment?

Yes, Imploding uses a 'sliding dictionary' method. You should really consult
the information provided with PKZIP, as it contains not only more detailed
information but also some literature references. The information provided in
that file (APPNOTE.ZIP) is sufficient to built yourself an unzipper, but the
details of the compression process are left out, making it harder to construct
your own zipper.

As far as I recall, the sliding dictionary is either 4k or 8k long, and its
output is a mixture of litteral characters and (distance, length) pairs.
Litterals are 8 bits, distances either 12 or 13 bits and lengths 6 bit.
The distance/length pairs indicate a string of length bytes which has already
occurred in the input distance bytes back. Literal bytes are used when no long
enough match could be found in the buffer.

The output of the sliding dictionary is then compressed using Shannon-Fano
trees. These are just like Huffman trees, only the algorithm to construct them
is different (Top-down instead of bottom-up). Separate trees are used for the
upper six bits of the distances, the lenghts, and the literals. (The
compression of litterals can be switched off however)

[The above information may be (wildly) inaccurate as it has been a long time
since I read APPNOTE.ZIP]

 -- Sander Plomp                                  sander@cwi.nl

duncans@phoenix.Princeton.EDU (Duncan E Smith) (01/30/91)

In article <2862@charon.cwi.nl> sander@cwi.nl (Sander Plomp) writes:

>Yes, Imploding uses a 'sliding dictionary' method. You should really consult

[description of "sliding dictionary" deleted]

> -- Sander Plomp                                  sander@cwi.nl

This looks like a standard compression method, but what I was thinking
of was more along the lines of an English dictionary. Perhaps the
program would search the file(s) to be compressed and compile a "common
word list." No doubt all compressors use this technique to some extent,
but it seems that one would get much better compression on plain text
files (i.e. written documents) if the compressor knew that this is what
it was squishing.

- D