[comp.sys.ibm.pc] Scanning, Optical Character Recognition

ns1b@TIBER.EDRC.CMU.EDU (Nikolaos Sahinidis) (05/18/89)

	I would like to transform a large amount of printed data into 
a text file.  Does anyone have experience with scanners and software
for Optical Character Recognition ?
	Any suggestions would be appreciated.

Thanks,  Nikos

-- 

davidr@hplsla.HP.COM (David M. Reed) (05/19/89)

I have not yet got to work with an expensive system (such as the kind that
come with a dedicated card and cost $3000, like TrueScan), but I have been
able to use some inexpensive (<$500) software based OCR programs.  From that
I would say that my number 1 criteria is for accuracy to be greater than 99%,
and secondly to be capable of reading kerned print (preferrable even to have
the program prompt for translation when it comes across an unrecognized
character such as a double f).  Most of the inexpensive programs seem to be
limited to typewritten and dot-matrix fixed-space print, thus eliminating
what seems to be 98% of what I want to copy (books, magazines, newspapers,
LaserJet proportional font output, legal documents, etc.)  And 99% accuracy
rate will still require you to carefully read what has been translated from
image to character, for that means that at least 1 letter out of 100 is
probably incorrect.  I frequently type 70+ characters per line, so that means
I will probably have at least 2 incorrect characters in three lines of text!