[comp.compression] In Search Of a key compression algorithm!

jok@niksula.hut.fi (Jouni Kosonen) (05/31/91)

I found this article in alt.comp.compression, and thought that this might be
a better place. So, here goes:

From: jerryw@tove.cs.umd.edu (Jerry Wieber)
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742

I am looking for an algorithm that can compress keys without destroying
lexigraphic ordering.  A reply to the effect of "this is impossible" is
equally helpful, of course.  Any compression, no matter how small, may
be useful.  To wit, I have a very large number of variable length keys
from 1 to 64 bytes in length, and I have got to get the size down for
sorting....


All replies gratefully appreciated!
-Jerry
--
                                                            __________ 
UUCP:     uunet!cs.umd.edu!jerryw   SPOKEN: Jerry Wieber   |/ `-.    |  U of Md
INTERNET: jerryw@cs.umd.edu              "Disclaimer"            \_|.|-,

jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) (06/03/91)

> From: jerryw@tove.cs.umd.edu (Jerry Wieber)
> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
> 
> I am looking for an algorithm that can compress keys without destroying
> lexigraphic ordering.

I'm at home and can't check my references, but the compression technique
that you might want is called the Garcia Wachs (or something like that)
algorithm.  I have a reprint of a paper by Jeff Kingston that discusses
this algorithm.

In sum, the idea is to use a prefix code, as with Huffman Codes, but
to balance the encoding tree without reordering the leaves.  This can
be done with compression efficiency that is no worse than one bit per
character worse than the efficiency of a Huffman code (which is no worse
than one bit per character worse than the source entropy under the source
model used for compression).
					Doug Jones
					jones@cs.uiowa.edu