denis@lerouf.dec.com (MICHEL DENIS, @KALAMAZOO@VBO) (01/13/87)
About CHARACTER RECOGNITION : Has anybody a list of books and publications related to character/words recognition and its algorithms ? Also especially any piece of software which implements some of those techniques would be useful for a start ! Thanks in advance and regards, Michel. ps: please mail me on : (DEC E-NET) LEROUF::DENIS (UUCP) ...decvax!decwrl!dec-rhea!dec-lerouf!denis (ARPA) denis%lerouf.DEC@decwrl.ARPA
vic@zen.UUCP (Victor Gavin) (10/16/87)
I have been puttering about for the past few weeks with an HP ScanJet (one of those 300dpi digitizers). I have been asked to write some software which can (given an image produced by the scanner) reproduce the original text of the paper in a machine readable form. The text will normally be numbers and the image will initially be a bit pattern. If someone can point me to some introductory texts on character recognition I would be grateful. If someone has already tackled this problem, any help I can get will be much appreciated. vic -- Victor Gavin Zengrange Limited vic@zen.co.uk Greenfield Road ..!mcvax!ukc!zen.co.uk!vic Leeds LS9 8DB +44 532 489048 England
roy@phri.UUCP (Roy Smith) (10/25/87)
In article <641@zen.UUCP> vic@zen.UUCP (Victor Gavin) writes: > I have been asked to write some software which can (given an image > produced by the scanner) reproduce the original text of the paper in a > machine readable form. I don't know much about it, but a company called DEST markets a 300-dpi scanner for the Macintosh (and, I think, IBM-PC) for about $2k, including character recognition software. Unless your application has some special requirements, I would imagine getting one of these jobs would be a lot more cost-effective than writing your own software. I've added comp.sys.mac to the Newsgroups line to see if anybody there has any experience with the DEST they could share. While I'm at it, can somebody compare and contrast the O($2k) scanners with the el-cheapo Thunderscan for me. What to the "real" scanners have going for them that I can't do with a Thunderscan? -- Roy Smith, {allegra,cmcl2,philabs}!phri!roy System Administrator, Public Health Research Institute 455 First Avenue, New York, NY 10016
oster@dewey.soe.berkeley.edu (David Phillip Oster) (10/25/87)
In article <2984@phri.UUCP> roy@phri.UUCP (Roy Smith) writes: >In article <641@zen.UUCP> vic@zen.UUCP (Victor Gavin) writes: >> from a scanner image reproduce the original text of the paper in a >> machine readable form. >can somebody compare and contrast the O($2k) scanners with the el-cheapo >Thunderscan for me. What to the "real" scanners have going for them that I >can't do with a Thunderscan? Thunderscan offers very high quality scanning, at resolutions up to 300 dpi, and up to 5 bits per pixel. (32 grays.) It can handle originals up to 15" wide (in a wide carriage imagewriter) and at least 32767 scan lines long. (I haven't actually tried anything longer than 11", but when it finishes, the "continue scan" button is still waiting to be presssed.) However, it is slow, (5 to 40 minutes, depending on resolution and size of original.) and only works on single sheet, thin, bendable material. (The material has to fit in the imagewriter printer.) That means you'd do well to have a xerographic copier handy. The expensive scanners are flat bed, copier style machines, and do their work faster (can't be too much faster, though. It takes 15minutes to send an 8"x10" page at 1-bit per pixel 300dpi, over a 9600 baud line if you do not use a compressing transfer protocol.) Olduvai Software makes a line of software that parses scanned pages back into text. Either the current issue of MacUser has a review, or I saw it in a recent copy of MacWeek, but for < $200.00 you get a software package to do syntactic pattern recognition of letter features, to determine the ASCII for the scanned page. It is still cheaper to hire a human typist, but soon the cost balance will flip the other way. (I expect that copy shops will offer a service: bring in your books and blank disks, and for a few cents a page, get them digitized to ASCII. (And won't that boost our needs for on-line storage (What, only 300Gigabytes! How do your get by with such a small library?))) (note, I've directed followups to just comp.misc. If people want to continue this discussion, they can read it there.) --- David Phillip Oster --A Sun 3/60 makes a poor Macintosh II. Arpa: oster@dewey.soe.berkeley.edu --A Macintosh II makes a poor Sun 3/60. Uucp: {uwvax,decvax,ihnp4}!ucbvax!oster%dewey.soe.berkeley.edu
korn@apple.UUCP (Peter "Arrgh" Korn) (10/26/87)
In <21433@ucbvax.BERKELEY.EDU>, oster@dewey.soe.berkeley.edu.UUCP (David Phillip Oster) said: >>In article <641@zen.UUCP> vic@zen.UUCP (Victor Gavin) writes: >>> from a scanner image reproduce the original text of the paper in a >>> machine readable form. > >...[discission of the ThunderScan scanner]... > >The expensive scanners are flat bed, copier style machines, and do >their work faster (can't be too much faster, though. It takes >15minutes to send an 8"x10" page at 1-bit per pixel 300dpi, over a >9600 baud line if you do not use a compressing transfer protocol.) If you assume that 9600 baud is the fastest they are transmitting data. The macintosh can accept data over it's serial port at a rate that is quite a bit faster than that (56K baud easily, and appletalk is another 8 times faster than that). Also, most of the newer 'professional' scanners are using the SCSI port, which can get you a full page scanned and transmitted to the Mac's RAM, displayed on the screen eagerly awaiting the deftest commands of the user in as fast as 14 seconds (and perhaps even a second or two faster than that). >Olduvai Software makes a line of software that parses scanned pages >back into text. Either the current issue of MacUser has a review, or I >saw it in a recent copy of MacWeek, but for < $200.00 you get a >software package to do syntactic pattern recognition of letter >features, to determine the ASCII for the scanned page. Unfortunately their advertisements seemed to be a little ahead of their ability to deliver when I spoke with them about a month ago. I recall their saying something about it being at least Christmas before they would actually be shipping product--don't quote me on this last one, as the event happened fully 30 days ago. Nonetheless, after at least two months of advertising in MacUser their product wasn't anywhere near shipping when I called them. >It is still cheaper to hire a human typist, but soon the cost balance >will flip the other way. I hope this happens soon. However, from my experience with character recognition, it won't happen for a little while yet. *If* all that you are scanning is 10 or 12 pitch mono-spaced Courier, Letter Gothic, or one of a small set of other fonts, then computer character recognition is a viable option for you that may well save you a lot of $$ vs. paying a typist to do it. However, to my knowledge, there exists no scanner anywhere that can properly deal with all types of proportional spaced fonts at anything near acceptable accuracy (remember that 99.5% accuracy works out to 3 errors every typewritten page) let alone handle typeset text that is kerned (such as you find in the newspapers and books that you read). Having spend the better part of 6 months selling these beasties, and going to school at a University that had one of the more expensive Kurtzweil machines, I've become somewhat jaded by their promise. They seem to be much like expert systems--very good in a tightly controled environment, but not very good beyond that. >... > >(note, I've directed followups to just comp.misc. If people want to continue >this discussion, they can read it there.) Normally I would have respected this; and all followups to this posting I have redirected to comp.misc, but I felt that there's been enough interest at least in comp.sys.mac to correct some of the statements made about scanning speed and character recognition software in the forum in which it was made. Peter -- Peter "Arrgh" Korn korn@apple.com !hplabs!amdahl!apple!korn "hi mom!"
wew@naucse.UUCP (Bill Wilson) (10/28/87)
Flagstaff Engineering (602-523-6461) is currently writing character recognition software for scanners and PC's. You may want to give them a call.
kevinc@auvax.UUCP (Kevin Barry Crocker) (10/28/87)
In article <2984@phri.UUCP>, roy@phri.UUCP (Roy Smith) writes: > In article <641@zen.UUCP> vic@zen.UUCP (Victor Gavin) writes: > > I have been asked to write some software which can (given an image > > produced by the scanner) reproduce the original text of the paper in a > > machine readable form. > > I don't know much about it, but a company called DEST markets a > 300-dpi scanner for the Macintosh (and, I think, IBM-PC) for about $2k, This may not be relevant to all, but a recent issue of PC Magazine does a review of both Desktop Publishing and Scanners for the PC Market. The issue is Volume 6 Number 17 October 13, 1987. Now, I realize that for Mac users this may not be totally relevant but some of these companies may make suitable software to make thier product usable on the Mac - especially those that link to PageMaker. In fact I seem to remember some vendors products being touted as both market products. ihnp4!alberta!auvax!kevinc (Kevin Crocker Athabasca University) Do our employers have opinions or is that what we get paid for!
cem@ihlpa.ATT.COM (45261-Malloy) (11/03/87)
In article <477@naucse.UUCP>, wew@naucse.UUCP (Bill Wilson) writes: > > Flagstaff Engineering (602-523-6461) is currently writing > character recognition software for scanners and PC's. > You may want to give them a call. I called them a while back and they send me a demo of their OCR software. The demo does everything that I wanted. I guess the 600$ is a little redundent. Can anyone confirm this? Clancy Malloy ihlpj!cem
clive@drutx.ATT.COM (Clive Steward) (11/09/87)
in article <641@zen.UUCP>, vic@zen.UUCP (Victor Gavin) says: > > > I have been puttering about for the past few weeks with an HP ScanJet (one > of those 300dpi digitizers). I have been asked to write some software which > can (given an image produced by the scanner) reproduce the original text of > the paper in a machine readable form. > If someone has already tackled this problem, any help I can get will be much > appreciated. > Yes, there's some software for the Macintosh which is purported to do just this, with text. Presumably, like other such systems, it's pretty much confined to non-proportional fonts. Since numbers are often non-proportional even in otherwise proportional fonts so that columns will look right, this sounds like it would do your job. There's at least one package which purports to do this; it's called Read-it!, said to be for 'popular' scanners, which presumably includes all the 300 dpi ones as well as Thunderscan etc. which can do more. It was apparently demo'ed in 'pre-release form' at MacWorld Expo in August. It's from: Olduvai Software, Inc. 6900 Mentone Coral Gables, Florida 33146 USA Phone (305) 665-4665 They list it in the September MacUser ad for $295 list. Reading that, I find they say it works on "including AST Turboscan, Microtek, Abaton 300, MacScan, LoDown, Spectrum, Datacopy, Dest, etc." "Type tables form most popular typewriter and LaserWriter fonts are included, or you can use it's unique "learning mode" to teach it to recognize an unlimited number of fonts, includeing foriegn and special characters." (sic). They also say, "Read-It TS, a special version of Read-It! optimized for the Thunderscan is also available" $149.00 list. But though I have and like Thunderscan, I don't know that it's what you want for high volume. It's 1/10 the price, and 1/10 the speed, though often with better looking results for pictures. Good Luck! And if you get it and have results, would appreciate mail to see what it's like; probably others would like a posting too! Clive Steward