rogier@prl.philips.nl (04/12/91)
I am looking for a compression scheme which compresses static text,
compress once decompress many.
One way of doing it is prefix omission, like changing text into pointers
to a dictionary which is stored as in the following example:
text entry     prefix length       stored suffix
form                 0                 form
formally             4                 ally
format               4                 t
The suffices can be compressed by Huffman coding.
Ref: "Compression of Concordances in Full-Text Retrieval Systems"
     Y. Choueka et.al., ACM SIGIR, 11-th conf. on research & development
     in Information Retrieval, June 1988, Grenoble-France.
Another way of doing it is finding the number of occurrences of all
possible sub-strings in the text. Then have a good heuristic to pick
the sub-strings you are going to put in a dictionary.
Questions:
Does anyone know good heuristics for this,
does anyone know other solutions or references?
--------------------------------------------------
Rogier Wester
Philips Research Laboratories, The Netherlands.
e-mail: rogier@prle.prl.philips.nljones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) (04/15/91)
From article <2700@prles2.prl.philips.nl>, by rogier@prl.philips.nl: > > I am looking for a compression scheme which compresses static text, > compress once decompress many. > You want to look at "Models for compression in full-text retrieval systems" by Witten, Bell and Nevill, presented at DCC91 a week ago (pages 23-32 of the conference proceedings, IEEE Computer Society Press Number 2202, ISBN 0-8186-9202-2). Doug Jones jones@cs.uiowa.edu