ppd491@leah.Albany.Edu (Peter P. Donohue) (11/20/89)
I am writing a program in turbo pascal 5.5 on a PC running MS-Dos that does a reasonable amount of reading and writing of ascii data. I was wondering if anyone could recommend a good alogarithm for compressing ascii data so that the data files take up less disk space? It doesn't necessarily have to be the best or most efficient; I'm willing to give up some for programming ease. Thanks. Pete -- Peter P. Donohue ppd491@albny1vx.bitnet . "Education is a journey, ppd491@leah.albany.edu . not a destination..."
leonard@bucket.UUCP (Leonard Erickson) (11/24/89)
ppd491@leah.Albany.Edu (Peter P. Donohue) writes: > I am writing a program in turbo pascal 5.5 on a PC running MS-Dos >that does a reasonable amount of reading and writing of ascii data. I >was wondering if anyone could recommend a good alogarithm for >compressing ascii data so that the data files take up less disk space? >It doesn't necessarily have to be the best or most efficient; I'm >willing to give up some for programming ease. Thanks. This probably won't help much in your situation, but if you are doing a lot of work reading *sorted* strings, you can get quite an improvement by using the following trick: Save each string on a sperate "line" (ie use cr or cr/lf as a record seperator). Now "encode" each string as <count><substring> where count is the number of characters it has in common with the previous record, and substring is the portion of the string that differs. Example: if the previos record was "previous" and the string for the current record is "previously", store the current record as <8><ly>. To avoid problems you'll need to do something to keep the count from "colliding" with the seperator, such as adding 32 or 128 to it. This would the record into: @ly (assuming +32 coding) The above trick is very useful for things like spell checkers where the data is sorted. And the compression ratio *improves* as the list gets longer! -- Leonard Erickson ...!tektronix!reed!percival!bucket!leonard CIS: [70465,203] "I'm all in favor of keeping dangerous weapons out of the hands of fools. Let's start with typewriters." -- Solomon Short