[comp.compression] A negative result on splay trees

jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) (05/22/91)

I just finished a quick bunch of experiments on splay-tree based prefix
codes (for more information on splay-tree based codes, see my paper in the
Aug. 1988 issue of CACM).

A splay-tree can be viewed as a self-adjusting tree with the heuristic
"whenever a leaf if referenced, move it half-way to the root".  When this
is used to manage the trie in a prefix code this means reducing the length
of the code for each letter by one-half each time that letter is used.
One question this raises is, why half?  Why not 3/4 or 1/10?

I used variations on my splay-tree based compression program, with a
single-state source model, and applied it to about 1/2 megabyte of troff
text for these experiments.

Results:

N = fraction by which code length is reduced when a letter is used
R = compression ratio (plain/compressed)

   N       R      Notes
 0.75    1.29      -- done by splaying as in N=0.5 twice after each letter.
 0.44    1.35      -- done by splaying as in N=0.33 twice after each letter.
 0.50    1.47      -- the original.
 0.33    1.44      -- splay, modified to skip one node between rotations.
 0.25    1.40      -- splay, modified to skip two nodes between rotations.

I had speculated that the problem with splay-trees was that they adapted
too quickly, and that they ought to work better if they could be made to
adapt more slowly.  This data shows that I was wrong, and that the original
Sleator-Tarjan splaying heuristic is, if not optimal, close enough.  The
slope of the tail of the curve is shallow enough to suggest that for some
data, slower adaptation may help, but I doubt it will help enough to be
worth my effort to look for it.
					Doug Jones
					jones@herky.cs.uiowa.edu

jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) (05/22/91)

From article <6157@ns-mx.uiowa.edu>, by jones@pyrite.cs.uiowa.edu
(Douglas W. Jones,201H MLH,3193350740,3193382879):
|
| Results:
|
| N = fraction by which code length is reduced when a letter is used
| R = compression ratio (plain/compressed)
|
|    N       R      Notes
|  0.75    1.29      -- done by splaying as in N=0.5 twice after each letter.
   0.56    1.35      -- done by splaying as in N=0.33 twice after each letter.
|  0.50    1.47      -- the original.
|  0.33    1.44      -- splay, modified to skip one node between rotations.
|  0.25    1.40      -- splay, modified to skip two nodes between rotations.
|
With red face, I note that I made a mistake in the table in my previous
post!  The correction is given above.