[comp.compression] compress text once, decompress many

rogier@prl.philips.nl (04/12/91)

I am looking for a compression scheme which compresses static text,
compress once decompress many.

One way of doing it is prefix omission, like changing text into pointers
to a dictionary which is stored as in the following example:

text entry     prefix length       stored suffix
form                 0                 form
formally             4                 ally
format               4                 t

The suffices can be compressed by Huffman coding.

Ref: "Compression of Concordances in Full-Text Retrieval Systems"
     Y. Choueka et.al., ACM SIGIR, 11-th conf. on research & development
     in Information Retrieval, June 1988, Grenoble-France.


Another way of doing it is finding the number of occurrences of all
possible sub-strings in the text. Then have a good heuristic to pick
the sub-strings you are going to put in a dictionary.

Questions:
Does anyone know good heuristics for this,
does anyone know other solutions or references?


--------------------------------------------------
Rogier Wester
Philips Research Laboratories, The Netherlands.
e-mail: rogier@prle.prl.philips.nl

jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) (04/15/91)

From article <2700@prles2.prl.philips.nl>, by rogier@prl.philips.nl:
> 
> I am looking for a compression scheme which compresses static text,
> compress once decompress many.
> 

You want to look at "Models for compression in full-text retrieval systems"
by Witten, Bell and Nevill, presented at DCC91 a week ago (pages 23-32 of
the conference proceedings, IEEE Computer Society Press Number 2202,
ISBN 0-8186-9202-2).
				Doug Jones
				jones@cs.uiowa.edu