jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) (05/22/91)
I just finished a quick bunch of experiments on splay-tree based prefix codes (for more information on splay-tree based codes, see my paper in the Aug. 1988 issue of CACM). A splay-tree can be viewed as a self-adjusting tree with the heuristic "whenever a leaf if referenced, move it half-way to the root". When this is used to manage the trie in a prefix code this means reducing the length of the code for each letter by one-half each time that letter is used. One question this raises is, why half? Why not 3/4 or 1/10? I used variations on my splay-tree based compression program, with a single-state source model, and applied it to about 1/2 megabyte of troff text for these experiments. Results: N = fraction by which code length is reduced when a letter is used R = compression ratio (plain/compressed) N R Notes 0.75 1.29 -- done by splaying as in N=0.5 twice after each letter. 0.44 1.35 -- done by splaying as in N=0.33 twice after each letter. 0.50 1.47 -- the original. 0.33 1.44 -- splay, modified to skip one node between rotations. 0.25 1.40 -- splay, modified to skip two nodes between rotations. I had speculated that the problem with splay-trees was that they adapted too quickly, and that they ought to work better if they could be made to adapt more slowly. This data shows that I was wrong, and that the original Sleator-Tarjan splaying heuristic is, if not optimal, close enough. The slope of the tail of the curve is shallow enough to suggest that for some data, slower adaptation may help, but I doubt it will help enough to be worth my effort to look for it. Doug Jones jones@herky.cs.uiowa.edu
jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) (05/22/91)
From article <6157@ns-mx.uiowa.edu>, by jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879): | | Results: | | N = fraction by which code length is reduced when a letter is used | R = compression ratio (plain/compressed) | | N R Notes | 0.75 1.29 -- done by splaying as in N=0.5 twice after each letter. 0.56 1.35 -- done by splaying as in N=0.33 twice after each letter. | 0.50 1.47 -- the original. | 0.33 1.44 -- splay, modified to skip one node between rotations. | 0.25 1.40 -- splay, modified to skip two nodes between rotations. | With red face, I note that I made a mistake in the table in my previous post! The correction is given above.