[comp.compression] Help with selecting compression scheme

sichermn@beach.csulb.edu (Jeff Sicherman) (06/23/91)

  I would appreciate assistance in selecting the most appropriate
compression algorithm for the following application:

  Handwritten and typed documents will be scanned on a 200 DPI scanner.
Because the content is all text there is no need for high resolution
(may even reduce it down to 150,75... if readability of image on-screen
is preserved). The documents are to be stored on disk and called up by
a program instead of having to pull them from a file cabinet. This leads
to the following requirements:

1.  High compression ratios to minimize disk space consumption. There
    may evenetually be 20,000 images of approximately 8" x 10" size
    though some are not full and might be cropped by manual editing.

2.  Because the images will be scanned once and stored the compression
    time is not an issue. We don't have a super-computer and would like
    at most a few minutes per image so don't get carried away though.

3.  Since retrieval times are of some importance, decompression time
    should be reasonable. A few seconds, perhaps < 10 on a 286 chip,
    would be nice though some problems could be mitigated by having
    the decompression and display proceed in parallel rather than
    doing it to a file/memory first and then displaying it (would
    also save memory space and eliminate disk-io as a slowing factor.

4.  Would like code available in the public domain or a library which
    is not execessively expensive. A published algorithm is OK if it
    would not be terribly difficult to implement.

    I wonder if facsimile compression, which would exploit the samness
of successive lines (text characters) would be very appropriate.

    Thanks for any asistance.

Jeff Sicherman

brad@looking.on.ca (Brad Templeton) (06/23/91)

Clearly the best form of "compression" on such items is OCR.  A lot of
the commercial OCR out there is crap, but I hear that some of the very
best -- and most expensive -- actually does the job, although probably not
on handwriting.

To my mind, if I were working on one of those paperless office deals, I
would want OCR, not just because it would save tremendous amounts of space,
but because I don't want a file of images, I would want to be able to
search the files too.

Many people are pushing paperless office image storage systems right now,
but I expect the first one to use decent OCR (with image storage where OCR
is impossible) will wipe all the others out.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473