rogier@prl.philips.nl (04/12/91)
I am looking for a compression scheme which compresses static text, compress once decompress many. One way of doing it is prefix omission, like changing text into pointers to a dictionary which is stored as in the following example: text entry prefix length stored suffix form 0 form formally 4 ally format 4 t The suffices can be compressed by Huffman coding. Ref: "Compression of Concordances in Full-Text Retrieval Systems" Y. Choueka et.al., ACM SIGIR, 11-th conf. on research & development in Information Retrieval, June 1988, Grenoble-France. Another way of doing it is finding the number of occurrences of all possible sub-strings in the text. Then have a good heuristic to pick the sub-strings you are going to put in a dictionary. Questions: Does anyone know good heuristics for this, does anyone know other solutions or references? -------------------------------------------------- Rogier Wester Philips Research Laboratories, The Netherlands. e-mail: rogier@prle.prl.philips.nl
jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) (04/15/91)
From article <2700@prles2.prl.philips.nl>, by rogier@prl.philips.nl: > > I am looking for a compression scheme which compresses static text, > compress once decompress many. > You want to look at "Models for compression in full-text retrieval systems" by Witten, Bell and Nevill, presented at DCC91 a week ago (pages 23-32 of the conference proceedings, IEEE Computer Society Press Number 2202, ISBN 0-8186-9202-2). Doug Jones jones@cs.uiowa.edu