[comp.graphics] OCR

doug@herbert.uucp (Doug Phillipson 5-0134) (02/05/90)

	Hello USA.  I am looking for OCR software for a Microtek scanner
(hooked up to a SUN 4/110).  I have some good scanning software 
(SPARC based) for it but need to do OCR.  Also does anyone know anything
about a product called ReadRight. It is a OCR package for a HP Scanjet 
(sadly for a PC). I need to know what format it scans into so I can
possibly use the Microtek to scan and the ReadRight to do OCR.

Rumor has it that Caere corp is giong to bring out something SPARC based
for SUN's.  Any body got any info on SPARC based OCR stuff?

Thanks in advance

Douglas Phillipson

eugene@eos.UUCP (Eugene Miya) (02/13/90)

I posted a note and summary about this two years ago.
I looked at everything avalable at the time from the high end
Kurzweil (Xerox), DEST, Transimage, etc.  We eventually got
an Apple scanner and use Omnipage.  It was just time.

Basically the field is in its infancy.  OCRs make numerous mistakes.
It saves some efforts, but OCR system have difficulties distinguishing
ells and ones and pipes: l, 1, |.  If a period (dot) is near (line
above) its it a big 'i'?  How about a : from a ;?  Then you also
have dust or paper marks.  The bigger the point size the better.
You also have to think about multiple fonts.

Omnipage requires a fairly clean large disk to help it work.
Full disks work, it just takes longer.  I would NEVER buy an OCR system
based on the word of ANY other individual.  That is how crude the
technology is.  My suggestion is take samples of what you scanned,
and do it, you get recognition and scanning time.  Also for a
benchmark, you might just make up a sheet (if you have multiple fonts
and point sizes), type out for each size the entire keyboard to a
piece of paper and see what characters and sizes the system recognizes.
A test drive is the most important thing you can do, otherwise save your
money.  Computers have a ways to go.

I'm not happy with the Apple, but it is what we have and I use it
to scan in indices which get posted to other news groups like comp.parallel.
Per page error rate is an important measure.

Another gross generalization from

--eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov
  resident cynic at the Rock of Ages Home for Retired Hackers:
  "You trust the `reply' command with all those different mailers out there?"
  "If my mail does not reach you, please accept my apology."
  {ncar,decwrl,hplabs,uunet}!ames!eugene
  Do you expect anything BUT generalizations on the net?

shankar@SRC.Honeywell.COM (Subash Shankar) (07/25/90)

What resolution (in dots per inch) is generally considered acceptable for
optical character recognition, on typical textbook/journal sources?

I realize this is a vague question, so vague answers are OK :-)
---
Subash Shankar             Honeywell Systems & Research Center MN65-2100
voice: (612) 782 7558      US Snail: 3660 Technology Dr., Minneapolis, MN 55418
shankar@src.honeywell.com  srcsip!shankar