[net.math] Lempel and Ziv do it again

jaw@ames.UUCP (04/24/86)

# "Don't compress that dwarf -- hand me the pliers!" -- Watersign Theatre
  (or, everything you always wanted to know about LZ algorithms but were ...)
 
     The Lempel/Ziv 2-D compression work certainly merits respect for
its mathematical beauty.  This in addition to some "meta-respect" the good
professors have received from the USENET community via the working LZ code
which has processed this very message.

     I suspect the academic point Turkowski raises about difference coding
hinges upon the key words "finite-state".  Coding the "repeat count" Ken
describes requires an arbitrary amount of memory for arbitrary-length input,
and thus is more suitable for a more powerful automaton.  (Naturally this is
handwaving, but the interested reader is referred to thorough treatment in
Minsky's classic "Computation: Finite and Infinite Machines, Prentice-Hall,
1967).  It's rather like the impossibility of multiplication on an FSA. 
(see Minsky, section 2.6.)  But don't let this stop you from implementing
a difference (or run-length) compressor -- the enormous number of states
in a typical computer are adequate to process *bounded-size* input such as
a bitmapped screen.  Mathematical restrictions used for proofs in info.
theory journals are not reflective of what is obtainable with case-based
ad hoc-ery in the real world.  So, sure, difference coding is recommended
for monotonic sequences, but then, its non-universality shows up as failure
on text.

     To address another question posed by Ken [say, maybe it's time to
restart the defunct compress mailing list], the LZ method used by 'compress'
differs slightly from the LZ original, in that explicit data pointers are not
output.  An enlightening comparison of LZ variants appears in an article by
Miller/Wegman (of IBM) in the NATO symposium "Combinatorial Algorithms on Words"
(ed. Apostolico, Springer-Verlag, 1986.)  In particular, the "string extension"
variant yields somewhat better (a few percent) compression than does the
august 4.3 BSD utility.  However, the data structures chosen (trie forests,
LRU replacement stacks) seemingly require a factor-of-several slowdown
in execution time.  This would also apply to the newer splay-tree methods
of Tarjan/Bentley exhibited in a recent CACM article.  (Too bad they shied away
from LZ comparisons -- they likely lose to LZ in space as well as time.)
Believe me, the speedups I did to get compress into its present state would
have much less impact if the combined compress/uncompress CPU time were 
more than that of the less efficient pack/unpack duo.

     Now a bit about LZ2D pragmatics:  I programmed a simple Peano-Hilbert curve
image scanner shortly after looking at the paper preprint last year, thinking
that such a filter (in recursive C, though there are other ways (*) ) would be
perfect for piping into 'compress'.  Alas, after compressing a 512x512
gray-scale image or two (from the "spaceman" series), I lost interest after
discovering file *expansion* (1d gave me about 30% for the image, and
2d only about 20%).  It's conceivable I did something wrong, but it could
be that larger, or less "grainy" pictures are necessary for an LZ2D compressor
to shine in practice.  This would be ironic, given that the 1D LZ algorithm
approaches its asymptotic optimum much sooner than does the Huffman method.

     How LZ2D might deal with global structure is not yet clear to me, but
the curious might contact Mike Lee at !hplabs who worked with Lempel
during his sabbatical from Technion in Israel.  Since HP is working on
proprietary compression hardware (disk controllers and whatnot), you (or I)
might not find out as much as desired.

     -- ames!jaw@ames-aurora

(*)  e.g., see "A New Algorithm for Generating Hilbert Curves" in the 1/86
Software Practice and Experience, a one-up of the cleverly subtitled
"Space-filling Curves, or how to Waste Time with a Plotter", which appeared
in the same forum 15 years earlier.

ken@turtlevax.UUCP (04/25/86)

In article <1490@ames.UUCP> jaw@ames.UUCP (James A. Woods) writes:
>It's rather like the impossibility of multiplication on an FSA. 
>(see Minsky, section 2.6.)

It's results like the above that leave me skeptical of many information
theory results.  I personally know of several automatons that can do
multiplication on two 16x16 numbers :-) and I'm sure others do too.  I
have only found one issue (the March 1982 issue on quantization:  get
your hands on it and xerox(tm) the whole thing) that I have found to be
useful in the IEEE transactions on IT.

>  Now a bit about LZ2D pragmatics:  I programmed a simple Peano-Hilbert curve
>image scanner ...                         Alas, after compressing a 512x512
>gray-scale image or two (from the "spaceman" series), I lost interest after
>discovering file *expansion* (1d gave me about 30% for the image, and
>2d only about 20%).

Skeptic though I am, I really believed that the Peano-Hilbert scan
could compress better than the "video scan".  I'd like to believe that
James missed something (an event of measure zero), but I've been held
up in my own implementation because of a practical concern:

The LZ2D was designed to work only with square images with binary power
dimensions.  Usually, I like to save images with arbitrary rectangular
dimensions, perhaps with an associated alpha mask to get
non-rectangular silhouettes.  What is the best way to extend LZ2D to
rectangular images?  Find the next larger square and pad it with
zeros?  Find the next smaller square and break up the remainder into
smaller and smaller squares that can be P-H-scanned?
-- 
Ken Turkowski @ CIMLINC, Menlo Park, CA
UUCP: {amd,decwrl,hplabs,seismo}!turtlevax!ken
ARPA: turtlevax!ken@DECWRL.DEC.COM