[comp.sys.ibm.pc] OCR for handwritten documents

anderson@uwmacc.UUCP (06/15/87)

I have a large, handwritten document that I would like to read
with a scanner and convert the original text to Ascii text. I
suppose the hardest part of the problem is teaching the program
recognize the several signs that can represent any given letter.
The expert system itself doesn't seem to me like the daunting
part. The hardest part would be recognizing the signs, since I
know nothing about that problem theoretically. The document is
big, you see -- over 37,000 pages -- and even counting in the
time it would take me to develop the software, there would be
a likely saving over the time it would take to enter the text
by keying it in. The original took 14 years to write.

I'd appreciate any pointers anyone had to offer.
-- 
==ARPA:===============anderson@vms.macc.wisc.edu===Jess Anderson======
| UUCP: {harvard,seismo,rutgers,  (avoid ihnp4!)   1210 W. Dayton    | 
|   akgua,allegra,usbvax}!uwvax!uwwircs!anderson   Madison, WI 53706 |
==BITNET:======================anderson@wiscmacc===608/263-6988=======

sarah@laticorp.UUCP (Sarah Groves Hobart) (06/16/87)

In article <1633@uwmacc.UUCP> anderson@uwmacc.UUCP (Jess Anderson) writes:
>I have a large, handwritten document that I would like to read
>with a scanner and convert the original text to Ascii text. 

You don't say if your handwritten document is in cursive or hand-printed
form.  If it's hand-printed, a good OCR could do most of your
work for you.  Some OCR's will use context to improve character
recognition, using known patterns of English characters to determine
what the suspicious character could logically be.  (Is your
document in English?)

Even if the document is in cursive, take some sample sheets of the document
to your friendly OCR salesperson and see if any of the commercial scanners
can handle it as is.   A lot of scanners will make a valiant try, and
the error rate could be acceptable if you're going to edit the document
anyway.  Look for ones that will conveniently mark uncertain characters
and words for you.   (Now I may be talking through my hat here . . .I've
never fed cursive handwriting through a scanner, but a lot of them say
they will try to separate characters correctly.)

In a month or so I'll be working with a Palantir OCR, so I'll be
interested to see what your request turns up.  If you get much
interesting mail on this topic, please post a summary.  Thanks!

Sarah Groves Hobart