[comp.compression] Chameleon: a single decompression utility

jel@wet.UUCP (John Levine) (06/05/91)

Hello, all!  I was recently meditating on the hassle of having
to deal with all the .zip .arc .tar.Z .zoo et cetera different
file comression formats, and I have a suggestion.

This suggestion concerns the implementation of data
decompression routines rather than their underlying algorithms.

Different compression algorithms seem to work best for different
data.

Indeed, I can imagine someone examining a file and compressing
it "by hand" to a small fraction of its original size, using
common sense and ad hoc techniques and a priori knowledge about
its contents.  And doing a better job than a program could.

So, why not format a compressed file as follows?  Put into the
header of the compressed file the decompressor itself, in a
terse but *general* machine language.  Since most of the expense
of data compression seems to be related to finding patterns in
the original data and then coding them as the compressed file,
decompression is usually just a matter of following the
directions for reconstructing the original file.  Fast, in other
words.  The decompressor header could even be interpreted,
though it would be faster to define it in such a way that it
could be compiled on the fly like some so-called
machine-independent assemblers.  Files big enough to need
compression are usually much bigger than a description of their
compression format, so there would be little overhead there.  In
fact, there is a PC compressor--I think it's PKZIP-- which can
produce compressed files such that to recover the original file
you just run the compressed file! This machine-independant
machine language header would be completely general, perhaps
with some special operations such as "return the i'th most
frequently occuring English word", according to some standard
table.

The advantage of doing things with this generality is that there
would be a single data compression format, without regard to the
technique used for the actual compression.  So whether the
actual algorithm is LZ or arithmetic or fractal or xyz (which
will be invented in late 1998), there is still ONE decompression
program you run to recover the original file, whether it's audio
or text or graphics or numbers from your physics experiment.  An
important large static file at some distribution site might even
be compressed by hand, using a compression toolkit to analyze
the peculiar regularities of that file and tailor a compression
scheme to it.

Still, as far as the user is concerned it's all in the same format.
You can call this format "Chameleon".  :-)

So, what's wrong with this idea?  Comments, criticism,
countersuggestions, improvements all welcome.



                                             jel @ (37 54 N / 122 18 W)
                                             jel@sutro.sfsu.edu
                                             ... uunet!sun!hoptoad!wet!jel