duncans@phoenix.Princeton.EDU (Duncan E Smith) (01/28/91)
Does anyone know of a (preferably PD/Shareware) program that compresses files using a dictionary. PKZip, et al do a great job, but it seems that something dictionary-based would really save space, and I have lots of English text files lying around waiting to be squished. I believe that something like this exists (saw it a long time ago in a magazine article about archivers), but I've never been able to find one in any of the BBSes, anonymous FTP sites, etc. that I've frequented.
carlsona@en.ecn.purdue.edu (The Lossel) (01/29/91)
It's been a long time since I read the information on PKZIP's compression methods, but doesn't the 'Imploding' compression use a 'sliding dictionary' as part of the compression scheme? Anyone really familliar with the PKZIP compression care to comment? - The Lossel
sander@cwi.nl (Sander Plomp) (01/29/91)
carlsona@en.ecn.purdue.edu (The Lossel) writes: > It's been a long time since I read the information on PKZIP's compression >methods, but doesn't the 'Imploding' compression use a 'sliding dictionary' >as part of the compression scheme? Anyone really familliar with the PKZIP >compression care to comment? Yes, Imploding uses a 'sliding dictionary' method. You should really consult the information provided with PKZIP, as it contains not only more detailed information but also some literature references. The information provided in that file (APPNOTE.ZIP) is sufficient to built yourself an unzipper, but the details of the compression process are left out, making it harder to construct your own zipper. As far as I recall, the sliding dictionary is either 4k or 8k long, and its output is a mixture of litteral characters and (distance, length) pairs. Litterals are 8 bits, distances either 12 or 13 bits and lengths 6 bit. The distance/length pairs indicate a string of length bytes which has already occurred in the input distance bytes back. Literal bytes are used when no long enough match could be found in the buffer. The output of the sliding dictionary is then compressed using Shannon-Fano trees. These are just like Huffman trees, only the algorithm to construct them is different (Top-down instead of bottom-up). Separate trees are used for the upper six bits of the distances, the lenghts, and the literals. (The compression of litterals can be switched off however) [The above information may be (wildly) inaccurate as it has been a long time since I read APPNOTE.ZIP] -- Sander Plomp sander@cwi.nl
duncans@phoenix.Princeton.EDU (Duncan E Smith) (01/30/91)
In article <2862@charon.cwi.nl> sander@cwi.nl (Sander Plomp) writes: >Yes, Imploding uses a 'sliding dictionary' method. You should really consult [description of "sliding dictionary" deleted] > -- Sander Plomp sander@cwi.nl This looks like a standard compression method, but what I was thinking of was more along the lines of an English dictionary. Perhaps the program would search the file(s) to be compressed and compile a "common word list." No doubt all compressors use this technique to some extent, but it seems that one would get much better compression on plain text files (i.e. written documents) if the compressor knew that this is what it was squishing. - D