folta@tove.cs.umd.edu (Wayne Folta) (02/06/91)
I just got Olduvai's Read-It! OCR software--they had a sale. First, is there anyone from Olduvai out there? Also, are there any other users out there who have some tips to share. Second, here are some observations that might be of interest to others: 1. Since it is a relatively-simpleminded pattern matcher, it can be trained to recognize non-standard things, such as handwritten, block numbers. I have had good success recognizing my (relatively carefully written) numbers as well as my parents' numbers. It has no problem with several different ways of writing 4's, 8's, etc., that various people have. I also experimented with handwritten, block letters. It worked okay, but I didn't persue it. 2. I think you should be able to read fill-in-the-dot-style exam answer sheets. Each answer would have a line through the bubbles, so that the answers would be scanned as lines with dots at different points. I don't know if it would have trouble with the usual, horizontally-oriented answers, which would necessitate non-standard vertical answers. 3. I never got good results with the Washington Post. Maybe the program has some A.I. in it? :-) 4. I am satisfied with results on U.S. News and World Report and the Wall Street Journal. 5. Monospaced fonts, such as Courier, work *very* well. I use that to transfer articles from DOS-bound, 5-1/4"-only friends. 6. The scanner driver for the HP ScanJet+ doesn't have wide-enough control over brightness. 7. Most of the serious problems I have scanning magazines result not from the mis-matching of characters, but from the program taking the bottom of a previous line with a character, resulting in horrible mismatches. 8. It is not too hard to train it to recognize underlined characters. Bold is okay, but italic is harder because it uses a rectangular bounding box around characters, so you often get an italic character and part of the character to its right. 9. If you have non-standard needs, or you scan certain publications regularly, Read-It!'s training time could be worth it. If you like to scan a lot of articles from all over, a smarter, non-trainable package would probably be a better buy. -- Wayne Folta (folta@cs.umd.edu 128.8.128.8)