doug@herbert.uucp (Doug Phillipson 5-0134) (02/05/90)
Hello USA. I am looking for OCR software for a Microtek scanner (hooked up to a SUN 4/110). I have some good scanning software (SPARC based) for it but need to do OCR. Also does anyone know anything about a product called ReadRight. It is a OCR package for a HP Scanjet (sadly for a PC). I need to know what format it scans into so I can possibly use the Microtek to scan and the ReadRight to do OCR. Rumor has it that Caere corp is giong to bring out something SPARC based for SUN's. Any body got any info on SPARC based OCR stuff? Thanks in advance Douglas Phillipson
eugene@eos.UUCP (Eugene Miya) (02/13/90)
I posted a note and summary about this two years ago. I looked at everything avalable at the time from the high end Kurzweil (Xerox), DEST, Transimage, etc. We eventually got an Apple scanner and use Omnipage. It was just time. Basically the field is in its infancy. OCRs make numerous mistakes. It saves some efforts, but OCR system have difficulties distinguishing ells and ones and pipes: l, 1, |. If a period (dot) is near (line above) its it a big 'i'? How about a : from a ;? Then you also have dust or paper marks. The bigger the point size the better. You also have to think about multiple fonts. Omnipage requires a fairly clean large disk to help it work. Full disks work, it just takes longer. I would NEVER buy an OCR system based on the word of ANY other individual. That is how crude the technology is. My suggestion is take samples of what you scanned, and do it, you get recognition and scanning time. Also for a benchmark, you might just make up a sheet (if you have multiple fonts and point sizes), type out for each size the entire keyboard to a piece of paper and see what characters and sizes the system recognizes. A test drive is the most important thing you can do, otherwise save your money. Computers have a ways to go. I'm not happy with the Apple, but it is what we have and I use it to scan in indices which get posted to other news groups like comp.parallel. Per page error rate is an important measure. Another gross generalization from --eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov resident cynic at the Rock of Ages Home for Retired Hackers: "You trust the `reply' command with all those different mailers out there?" "If my mail does not reach you, please accept my apology." {ncar,decwrl,hplabs,uunet}!ames!eugene Do you expect anything BUT generalizations on the net?
shankar@SRC.Honeywell.COM (Subash Shankar) (07/25/90)
What resolution (in dots per inch) is generally considered acceptable for optical character recognition, on typical textbook/journal sources? I realize this is a vague question, so vague answers are OK :-) --- Subash Shankar Honeywell Systems & Research Center MN65-2100 voice: (612) 782 7558 US Snail: 3660 Technology Dr., Minneapolis, MN 55418 shankar@src.honeywell.com srcsip!shankar