sichermn@beach.csulb.edu (Jeff Sicherman) (06/23/91)
I would appreciate assistance in selecting the most appropriate compression algorithm for the following application: Handwritten and typed documents will be scanned on a 200 DPI scanner. Because the content is all text there is no need for high resolution (may even reduce it down to 150,75... if readability of image on-screen is preserved). The documents are to be stored on disk and called up by a program instead of having to pull them from a file cabinet. This leads to the following requirements: 1. High compression ratios to minimize disk space consumption. There may evenetually be 20,000 images of approximately 8" x 10" size though some are not full and might be cropped by manual editing. 2. Because the images will be scanned once and stored the compression time is not an issue. We don't have a super-computer and would like at most a few minutes per image so don't get carried away though. 3. Since retrieval times are of some importance, decompression time should be reasonable. A few seconds, perhaps < 10 on a 286 chip, would be nice though some problems could be mitigated by having the decompression and display proceed in parallel rather than doing it to a file/memory first and then displaying it (would also save memory space and eliminate disk-io as a slowing factor. 4. Would like code available in the public domain or a library which is not execessively expensive. A published algorithm is OK if it would not be terribly difficult to implement. I wonder if facsimile compression, which would exploit the samness of successive lines (text characters) would be very appropriate. Thanks for any asistance. Jeff Sicherman
brad@looking.on.ca (Brad Templeton) (06/23/91)
Clearly the best form of "compression" on such items is OCR. A lot of the commercial OCR out there is crap, but I hear that some of the very best -- and most expensive -- actually does the job, although probably not on handwriting. To my mind, if I were working on one of those paperless office deals, I would want OCR, not just because it would save tremendous amounts of space, but because I don't want a file of images, I would want to be able to search the files too. Many people are pushing paperless office image storage systems right now, but I expect the first one to use decent OCR (with image storage where OCR is impossible) will wipe all the others out. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473