[comp.ai] Character recognition

denis@lerouf.dec.com (MICHEL DENIS, @KALAMAZOO@VBO) (01/13/87)

About CHARACTER RECOGNITION :

Has anybody a list of books and publications related to character/words
recognition and its algorithms ? Also especially any piece of software which
implements some of those techniques would be useful for a start !

Thanks in advance and regards,

Michel.

ps: please mail me on :

(DEC E-NET)	LEROUF::DENIS
(UUCP)		...decvax!decwrl!dec-rhea!dec-lerouf!denis
(ARPA)		denis%lerouf.DEC@decwrl.ARPA

vic@zen.UUCP (Victor Gavin) (10/16/87)

I have been puttering about for the past few weeks with an HP ScanJet (one
of those 300dpi digitizers). I have been asked to write some software which
can (given an image produced by the scanner) reproduce the original text of
the paper in a machine readable form.

The text will normally be numbers and the image will initially be a bit
pattern.

If someone can point me to some introductory texts on character recognition
I would be grateful.

If someone has already tackled this problem, any help I can get will be much
appreciated.

		vic
--
Victor Gavin						Zengrange Limited
vic@zen.co.uk						Greenfield Road
..!mcvax!ukc!zen.co.uk!vic				Leeds LS9 8DB
+44 532 489048						England

roy@phri.UUCP (Roy Smith) (10/25/87)

In article <641@zen.UUCP> vic@zen.UUCP (Victor Gavin) writes:
> I have been asked to write some software which can (given an image
> produced by the scanner) reproduce the original text of the paper in a
> machine readable form.

	I don't know much about it, but a company called DEST markets a
300-dpi scanner for the Macintosh (and, I think, IBM-PC) for about $2k,
including character recognition software.  Unless your application has some
special requirements, I would imagine getting one of these jobs would be a
lot more cost-effective than writing your own software.

	I've added comp.sys.mac to the Newsgroups line to see if anybody
there has any experience with the DEST they could share.  While I'm at it,
can somebody compare and contrast the O($2k) scanners with the el-cheapo
Thunderscan for me.  What to the "real" scanners have going for them that I
can't do with a Thunderscan?
-- 
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

oster@dewey.soe.berkeley.edu (David Phillip Oster) (10/25/87)

In article <2984@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
>In article <641@zen.UUCP> vic@zen.UUCP (Victor Gavin) writes:
>> from a scanner image reproduce the original text of the paper in a
>> machine readable form.

>can somebody compare and contrast the O($2k) scanners with the el-cheapo
>Thunderscan for me.  What to the "real" scanners have going for them that I
>can't do with a Thunderscan?

Thunderscan offers very high quality scanning, at resolutions up to
300 dpi, and up to 5 bits per pixel. (32 grays.) It can handle
originals up to 15" wide (in a wide carriage imagewriter) and at least
32767 scan lines long. (I haven't actually tried anything longer than
11", but when it finishes, the "continue scan" button is still waiting
to be presssed.) However, it is slow, (5 to 40 minutes, depending on
resolution and size of original.)  and only works on single sheet,
thin, bendable material. (The material has to fit in the imagewriter
printer.) That means you'd do well to have a xerographic copier handy.
The expensive scanners are flat bed, copier style machines, and do
their work faster (can't be too much faster, though. It takes
15minutes to send an 8"x10" page at 1-bit per pixel 300dpi, over a
9600 baud line if you do not use a compressing transfer protocol.)

Olduvai Software makes a line of software that parses scanned pages
back into text. Either the current issue of MacUser has a review, or I
saw it in a recent copy of MacWeek, but for < $200.00 you get a
software package to do syntactic pattern recognition of letter
features, to determine the ASCII for the scanned page.

It is still cheaper to hire a human typist, but soon the cost balance
will flip the other way. (I expect that copy shops will offer a
service: bring in your books and blank disks, and for a few cents a
page, get them digitized to ASCII. (And won't that boost our needs for
on-line storage (What, only 300Gigabytes! How do your get by with such
a small library?)))

(note, I've directed followups to just comp.misc. If people want to continue
this discussion, they can read it there.)

--- David Phillip Oster            --A Sun 3/60 makes a poor Macintosh II.
Arpa: oster@dewey.soe.berkeley.edu --A Macintosh II makes a poor Sun 3/60.
Uucp: {uwvax,decvax,ihnp4}!ucbvax!oster%dewey.soe.berkeley.edu

korn@apple.UUCP (Peter "Arrgh" Korn) (10/26/87)

In <21433@ucbvax.BERKELEY.EDU>, oster@dewey.soe.berkeley.edu.UUCP (David Phillip Oster) said:  

>>In article <641@zen.UUCP> vic@zen.UUCP (Victor Gavin) writes:
>>> from a scanner image reproduce the original text of the paper in a
>>> machine readable form.
>
>...[discission of the ThunderScan scanner]...
>
>The expensive scanners are flat bed, copier style machines, and do
>their work faster (can't be too much faster, though. It takes
>15minutes to send an 8"x10" page at 1-bit per pixel 300dpi, over a
>9600 baud line if you do not use a compressing transfer protocol.)

If you assume that 9600 baud is the fastest they are transmitting data.
The macintosh can accept data over it's serial port at a rate that is
quite a bit faster than that (56K baud easily, and appletalk is another
8 times faster than that).

Also, most of the newer 'professional' scanners are using the SCSI port,
which can get you a full page scanned and transmitted to the Mac's RAM,
displayed on the screen eagerly awaiting the deftest commands of the user
in as fast as 14 seconds (and perhaps even a second or two faster than that).

>Olduvai Software makes a line of software that parses scanned pages
>back into text. Either the current issue of MacUser has a review, or I
>saw it in a recent copy of MacWeek, but for < $200.00 you get a
>software package to do syntactic pattern recognition of letter
>features, to determine the ASCII for the scanned page.

Unfortunately their advertisements seemed to be a little ahead of their
ability to deliver when I spoke with them about a month ago.  I recall their
saying something about it being at least Christmas before they would actually
be shipping product--don't quote me on this last one, as the event happened
fully 30 days ago.  Nonetheless, after at least two months of advertising
in MacUser their product wasn't anywhere near shipping when I called them.

>It is still cheaper to hire a human typist, but soon the cost balance
>will flip the other way. 

I hope this happens soon.  However, from my experience with character
recognition, it won't happen for a little while yet.  *If* all that
you are scanning is 10 or 12 pitch mono-spaced Courier, Letter Gothic,
or one of a small set of other fonts, then computer character recognition 
is a viable option for you that may well save you a lot of $$ vs. paying
a typist to do it.  However, to my knowledge, there exists no scanner
anywhere that can properly deal with all types of proportional spaced
fonts at anything near acceptable accuracy (remember that 99.5% accuracy
works out to 3 errors every typewritten page) let alone handle typeset
text that is kerned (such as you find in the newspapers and books that
you read).

Having spend the better part of 6 months selling these beasties, and
going to school at a University that had one of the more expensive
Kurtzweil machines, I've become somewhat jaded by their promise.  They
seem to be much like expert systems--very good in a tightly controled
environment, but not very good beyond that.

>...
>
>(note, I've directed followups to just comp.misc. If people want to continue
>this discussion, they can read it there.)

Normally I would have respected this; and all followups to this posting I
have redirected to comp.misc, but I felt that there's been enough interest
at least in comp.sys.mac to correct some of the statements made about scanning
speed and character recognition software in the forum in which it was made.

Peter
-- 
 Peter "Arrgh" Korn    korn@apple.com   !hplabs!amdahl!apple!korn    "hi mom!"

wew@naucse.UUCP (Bill Wilson) (10/28/87)

Flagstaff Engineering (602-523-6461) is currently writing
character recognition software for scanners and PC's.  
You may want to give them a call.

kevinc@auvax.UUCP (Kevin Barry Crocker) (10/28/87)

In article <2984@phri.UUCP>, roy@phri.UUCP (Roy Smith) writes:
> In article <641@zen.UUCP> vic@zen.UUCP (Victor Gavin) writes:
> > I have been asked to write some software which can (given an image
> > produced by the scanner) reproduce the original text of the paper in a
> > machine readable form.
> 
> 	I don't know much about it, but a company called DEST markets a
> 300-dpi scanner for the Macintosh (and, I think, IBM-PC) for about $2k,
This may not be relevant to all, but a recent issue of PC Magazine does
a review of both Desktop Publishing and Scanners for the PC Market.
The issue is Volume 6 Number 17 October 13, 1987.  Now, I realize that
for Mac users this may not be totally relevant but some of these
companies may make suitable software to make thier product usable on
the Mac - especially those that link to PageMaker.  In fact I seem to
remember some vendors products being touted as both market products.

ihnp4!alberta!auvax!kevinc (Kevin Crocker Athabasca University)
Do our employers have opinions or is that what we get paid for! 

cem@ihlpa.ATT.COM (45261-Malloy) (11/03/87)

In article <477@naucse.UUCP>, wew@naucse.UUCP (Bill Wilson) writes:
> 
> Flagstaff Engineering (602-523-6461) is currently writing
> character recognition software for scanners and PC's.  
> You may want to give them a call.

I called them a while back and they send me a demo of their
OCR software.  The demo does everything that I wanted.  I guess
the 600$ is a little redundent.  Can anyone confirm this?

Clancy Malloy
ihlpj!cem

clive@drutx.ATT.COM (Clive Steward) (11/09/87)

in article <641@zen.UUCP>, vic@zen.UUCP (Victor Gavin) says:
> 
> 
> I have been puttering about for the past few weeks with an HP ScanJet (one
> of those 300dpi digitizers). I have been asked to write some software which
> can (given an image produced by the scanner) reproduce the original text of
> the paper in a machine readable form.
> If someone has already tackled this problem, any help I can get will be much
> appreciated.
> 

Yes, there's some software for the Macintosh which is purported to do
just this, with text.  Presumably, like other such systems, it's
pretty much confined to non-proportional fonts.  Since numbers are
often non-proportional even in otherwise proportional fonts so that
columns will look right, this sounds like it would do your job.

There's at least one package which purports to do this; it's called
Read-it!, said to be for 'popular' scanners, which presumably includes 
all the 300 dpi ones as well as Thunderscan etc. which can do more.  
It was apparently demo'ed in 'pre-release form' at MacWorld Expo in August.

It's from:

    Olduvai Software, Inc.
    6900 Mentone
    Coral Gables, Florida 33146
    USA
    Phone  (305) 665-4665

They list it in the September MacUser ad for $295 list.  Reading that,
I find they say it works on "including AST Turboscan, Microtek, Abaton 300,
MacScan, LoDown, Spectrum, Datacopy, Dest, etc."  "Type tables form
most popular typewriter and LaserWriter fonts are included, or you can
use it's unique "learning mode" to teach it to recognize an unlimited
number of fonts, includeing foriegn and special characters." (sic).

They also say, "Read-It TS, a special version of Read-It! optimized
for the Thunderscan is also available"  $149.00 list.  But though I
have and like Thunderscan, I don't know that it's what you want for
high volume.  It's 1/10 the price, and 1/10 the speed, though often
with better looking results for pictures.


Good Luck!

And if you get it and have results, would appreciate mail to see what
it's like; probably others would like a posting too!


Clive Steward