alcmist@well.UUCP (Frederick Wamsley) (09/08/89)
I'm looking for text compression code which will be used to compress blocks of text 1K-30K in size. It should be able to compress/decompress a 30K block in somewhere around 10 seconds on an AT-class machine. The text will be source for a programming language, so there will be a lot of common strings to take advantage of. Compression should be better than Huffman-coding since that's what we're using now. Naturally it will have to allow for 64K segments. All pointers to something appropriate will be gratefully received ... -- Fred Wamsley {ucbvax,pacbell,apple,hplabs}!well!alcmist; CIS 72247,3130; GEnie FKWAMSLEY; USPS - why bother? Have you hugged your iguana today?
rmyers@net1.ucsd.edu (Robert Myers) (09/09/89)
In article <13511@well.UUCP> alcmist@well.UUCP (Frederick Wamsley) writes: >I'm looking for text compression code which will be used to compress blocks > ... Contact Bookmaster Corp. in Telluride, CO. They have text compression utilities that were used in the past for legal software. As I recall, they have a generic cruncher that will compress, index, and create a dictionary on text files with the performance you require (30K in 10sec on AT). I'm not sure of the price, but here's the address: Bookmaster Corp. Box 2396 Telluride, CO 81435 (303) 728-6412 T.K. Plummer
dmt@mtunb.ATT.COM (Dave Tutelman) (09/11/89)
In article <1958@network.ucsd.edu> rmyers@net1.UUCP (Robert Myers) writes: >In article <13511@well.UUCP> alcmist@well.UUCP (Frederick Wamsley) writes: >>I'm looking for text compression code which will be used to compress blocks >> ... >Contact Bookmaster Corp. in Telluride, CO... >As I recall, they have a generic cruncher that will compress, index, >and create a dictionary on text files with the performance you >require (30K in 10sec on AT). I may be missing something, but what's the difference between this and the "archivers" that we all use like PKZIP, zoo, ARC (careful..) etc? - Function (sounds very similar)? - Speed? - Compression ratio? I missed (and couldn't find) the base note, so maybe I've missed the key that would enlighten me. However, the Bookmaster reference doesn't have anything in it that gives me a clue to the difference. +---------------------------------------------------------------+ | Dave Tutelman | | Physical - AT&T Bell Labs - Middletown, NJ | | Logical - ...att!mtunb!dmt | | Audible - (201) 957 6583 | +---------------------------------------------------------------+
rmyers@net1.ucsd.edu (Robert Myers) (09/11/89)
In article <1656@mtunb.ATT.COM> dmt@mtunb.UUCP (Dave Tutelman) writes: >In article <1958@network.ucsd.edu> rmyers@net1.UUCP (Robert Myers) writes: >>In article <13511@well.UUCP> alcmist@well.UUCP (Frederick Wamsley) writes: >>>I'm looking for text compression code which will be used to compress blocks >>> ... >>Contact Bookmaster Corp. in Telluride, CO... >>As I recall, they have a generic cruncher that will compress, index, >>and create a dictionary on text files with the performance you >>require (30K in 10sec on AT). > >I may be missing something, but what's the difference between this >and the "archivers" that we all use like PKZIP, zoo, ARC (careful..) >etc? > - Function (sounds very similar)? > - Speed? > - Compression ratio? > Sorry for the lack of more specific information on the original posting. Essentially, Bookmaster's program (not sure of the name) will take a text file and *compress* it. This process involves creating two files; one consists of a word dictionary and index, and the other is the original text coded into one and two byte tokens that correspond to the dictionary. This process takes about 10 sec or so to compress a small (30K) text file on an AT. FUNCTION: The function of this program in NOT for archival purposes. Consider that it makes two files out of one, not one file out of many. Its function is for 1) text compression, 2) ultra fast decompression of text, and 3) high speed searching. As I said before, this was part of a program for searching legal depositions. SPEED: The initial compression is what takes the longest. To decompress the text, the dictionary file must be loaded into RAM and the tokens are decoded on the fly. This allows you to decode the words as they are read from the text token file. I'm not sure of any exact specifications on decompression speed, but I believe that it is in the area of a few thousand words per second. COMPRESSION RATIO: As always, this varys with the file. Compression ratios range from 50% (worst) to 20% (best) of the size of the original file. Normally, (for a 50 page deposition!) the compressed text AND dictionary (together - not separatedly) are one third (1/3) the size of the original file. I know they used this method to compress the Bible. The size of the total text plus the dictionary is appx. 1.2 megs. Compressed text = 1.03 megs, dictionary = 190K. I believe the uncompressed text is aound 4 to 6 megabytes. This is considerably better than an any of the archivers I've seen, plus it can be decompressed very quickly. Hope this has been of help. T.K. Plummer