[comp.sys.ibm.pc] ansi data compression in turbo pascal

ppd491@leah.Albany.Edu (Peter P. Donohue) (11/20/89)

   I am writing a program in turbo pascal 5.5 on a PC running MS-Dos
that does a reasonable amount of reading and writing of ascii data.  I
was wondering if anyone could recommend a good alogarithm for
compressing ascii data so that the data files take up less disk space?
It doesn't necessarily have to be the best or most efficient; I'm
willing to give up some for programming ease.  Thanks.
   
					Pete
-- 
Peter P. Donohue 
ppd491@albny1vx.bitnet               .  "Education is a journey,
ppd491@leah.albany.edu               .    not a destination..."

leonard@bucket.UUCP (Leonard Erickson) (11/24/89)

ppd491@leah.Albany.Edu (Peter P. Donohue) writes:


>   I am writing a program in turbo pascal 5.5 on a PC running MS-Dos
>that does a reasonable amount of reading and writing of ascii data.  I
>was wondering if anyone could recommend a good alogarithm for
>compressing ascii data so that the data files take up less disk space?
>It doesn't necessarily have to be the best or most efficient; I'm
>willing to give up some for programming ease.  Thanks.

This probably won't help much in your situation, but if you are doing a lot
of work reading *sorted* strings, you can get quite an improvement by using
the following trick:

Save each string on a sperate "line" (ie use cr or cr/lf as a record
seperator). Now "encode" each string as <count><substring> where count
is the number of characters it has in common with the previous record,
and substring is the portion of the string that differs.

Example: if the previos record was "previous" and the string for the current
record is "previously", store the current record as <8><ly>.

To avoid problems you'll need to do something to keep the count from
"colliding" with the seperator, such as adding 32 or 128 to it. This would
the record into:
@ly
(assuming +32 coding)

The above trick is very useful for things like spell checkers where the
data is sorted. And the compression ratio *improves* as the list gets longer!

-- 
Leonard Erickson		...!tektronix!reed!percival!bucket!leonard
CIS: [70465,203]
"I'm all in favor of keeping dangerous weapons out of the hands of fools.
Let's start with typewriters." -- Solomon Short