jaw@ames.UUCP (04/24/86)
# "Don't compress that dwarf -- hand me the pliers!" -- Watersign Theatre (or, everything you always wanted to know about LZ algorithms but were ...) The Lempel/Ziv 2-D compression work certainly merits respect for its mathematical beauty. This in addition to some "meta-respect" the good professors have received from the USENET community via the working LZ code which has processed this very message. I suspect the academic point Turkowski raises about difference coding hinges upon the key words "finite-state". Coding the "repeat count" Ken describes requires an arbitrary amount of memory for arbitrary-length input, and thus is more suitable for a more powerful automaton. (Naturally this is handwaving, but the interested reader is referred to thorough treatment in Minsky's classic "Computation: Finite and Infinite Machines, Prentice-Hall, 1967). It's rather like the impossibility of multiplication on an FSA. (see Minsky, section 2.6.) But don't let this stop you from implementing a difference (or run-length) compressor -- the enormous number of states in a typical computer are adequate to process *bounded-size* input such as a bitmapped screen. Mathematical restrictions used for proofs in info. theory journals are not reflective of what is obtainable with case-based ad hoc-ery in the real world. So, sure, difference coding is recommended for monotonic sequences, but then, its non-universality shows up as failure on text. To address another question posed by Ken [say, maybe it's time to restart the defunct compress mailing list], the LZ method used by 'compress' differs slightly from the LZ original, in that explicit data pointers are not output. An enlightening comparison of LZ variants appears in an article by Miller/Wegman (of IBM) in the NATO symposium "Combinatorial Algorithms on Words" (ed. Apostolico, Springer-Verlag, 1986.) In particular, the "string extension" variant yields somewhat better (a few percent) compression than does the august 4.3 BSD utility. However, the data structures chosen (trie forests, LRU replacement stacks) seemingly require a factor-of-several slowdown in execution time. This would also apply to the newer splay-tree methods of Tarjan/Bentley exhibited in a recent CACM article. (Too bad they shied away from LZ comparisons -- they likely lose to LZ in space as well as time.) Believe me, the speedups I did to get compress into its present state would have much less impact if the combined compress/uncompress CPU time were more than that of the less efficient pack/unpack duo. Now a bit about LZ2D pragmatics: I programmed a simple Peano-Hilbert curve image scanner shortly after looking at the paper preprint last year, thinking that such a filter (in recursive C, though there are other ways (*) ) would be perfect for piping into 'compress'. Alas, after compressing a 512x512 gray-scale image or two (from the "spaceman" series), I lost interest after discovering file *expansion* (1d gave me about 30% for the image, and 2d only about 20%). It's conceivable I did something wrong, but it could be that larger, or less "grainy" pictures are necessary for an LZ2D compressor to shine in practice. This would be ironic, given that the 1D LZ algorithm approaches its asymptotic optimum much sooner than does the Huffman method. How LZ2D might deal with global structure is not yet clear to me, but the curious might contact Mike Lee at !hplabs who worked with Lempel during his sabbatical from Technion in Israel. Since HP is working on proprietary compression hardware (disk controllers and whatnot), you (or I) might not find out as much as desired. -- ames!jaw@ames-aurora (*) e.g., see "A New Algorithm for Generating Hilbert Curves" in the 1/86 Software Practice and Experience, a one-up of the cleverly subtitled "Space-filling Curves, or how to Waste Time with a Plotter", which appeared in the same forum 15 years earlier.
ken@turtlevax.UUCP (04/25/86)
In article <1490@ames.UUCP> jaw@ames.UUCP (James A. Woods) writes: >It's rather like the impossibility of multiplication on an FSA. >(see Minsky, section 2.6.) It's results like the above that leave me skeptical of many information theory results. I personally know of several automatons that can do multiplication on two 16x16 numbers :-) and I'm sure others do too. I have only found one issue (the March 1982 issue on quantization: get your hands on it and xerox(tm) the whole thing) that I have found to be useful in the IEEE transactions on IT. > Now a bit about LZ2D pragmatics: I programmed a simple Peano-Hilbert curve >image scanner ... Alas, after compressing a 512x512 >gray-scale image or two (from the "spaceman" series), I lost interest after >discovering file *expansion* (1d gave me about 30% for the image, and >2d only about 20%). Skeptic though I am, I really believed that the Peano-Hilbert scan could compress better than the "video scan". I'd like to believe that James missed something (an event of measure zero), but I've been held up in my own implementation because of a practical concern: The LZ2D was designed to work only with square images with binary power dimensions. Usually, I like to save images with arbitrary rectangular dimensions, perhaps with an associated alpha mask to get non-rectangular silhouettes. What is the best way to extend LZ2D to rectangular images? Find the next larger square and pad it with zeros? Find the next smaller square and break up the remainder into smaller and smaller squares that can be P-H-scanned? -- Ken Turkowski @ CIMLINC, Menlo Park, CA UUCP: {amd,decwrl,hplabs,seismo}!turtlevax!ken ARPA: turtlevax!ken@DECWRL.DEC.COM