chuq@plaid.UUCP (05/15/87)
Date: Thu, 14 May 87 16:39:00 PDT From: dick@ccb.ucsf.edu (Dick Karpinski) I believe that some but not all of the scanners discussed in the last couple of weeks on desktop have software which trys to generate ASCII text from the scanned image of the input page. I believe that I would be most delighted with one that constructed the PostScript which would generate "approximately" the page that was scanned, but with the text in ASCII, not bitmap. Much too much to ask for, right? I did take a page of dot-matrix print in to one outfit selling a $2k 300 dpi scanner with OCR software for the IBM-PC line, but there were several problems: 1) The spacing on my sample input overwhelmed the fixed spacing OCR software available then. 2) No interface nor software was available for the Macintosh. 3) Operation seemed awkward and not at all intuitive. 4) Software costs took the package price up to around $3k. 5) The best recognition rates seemed to be only 95-98% correct. I am told that a $36k Kurtzweil multi-font scanner will do just about everything I want. (Not sure about a Macintosh interface.) But I will never be able to afford that. Should I wait a few years, or is one of these current products really capable of reading most of the submissions to my newsletter so that I can convert them all to some pleasant consistent font? That would make my newsletter look more like a magazine and less like a piece of patchwork. Dick Dick Karpinski Manager of Unix Services, UCSF Computer Center UUCP: ...!ucbvax!ucsfcgl!cca.ucsf!dick (415) 476-4529 (11-7) BITNET: dick@ucsfcca or dick@ucsfvm Compuserve: 70215,1277 USPS: U-76 UCSF, San Francisco, CA 94143-0704 Telemail: RKarpinski Domain: dick@cca.ucsf.edu Home (415) 658-6803 Ans 658-3797 ---------------------------------------- Submissions to: desktop%plaid@sun.com -OR- sun!plaid!desktop Administrivia to: desktop-request%plaid@sun.com -OR- sun!plaid@desktop-request Paths: {ihnp4,decwrl,hplabs,seismo,ucbvax}!sun Chuq Von Rospach chuq@sun.COM [I don't read flames] There is no statute of limitations on stupidity
chuq%plaid@Sun.COM (Chuq Von Rospach) (05/18/87)
From: rbl@nitrex.UUCP ( Dr. Robin Lake ) Date: 18 May 87 13:02:49 GMT Distribution: comp Organization: The Standard Oil Co., Cleveland >I am told that a $36k Kurtzweil multi-font scanner will do just about >everything I want. (Not sure about a Macintosh interface.) But I >will never be able to afford that. Should I wait a few years, or is >one of these current products really capable of reading most of the >submissions to my newsletter so that I can convert them all to some >pleasant consistent font? That would make my newsletter look more >like a magazine and less like a piece of patchwork. We have a 4 year old Kurzweil. It has not been able to handle dot matrix well, but we have not tried to fiddle with the threshold settings, etc. to make it do so. We are looking at a new system made by Palantir this Thursday. We'll give dot matrix a try and let you know. Rob Lake ---------------------------------------- Submissions to: desktop%plaid@sun.com -OR- sun!plaid!desktop Administrivia to: desktop-request%plaid@sun.com -OR- sun!plaid@desktop-request Paths: {ihnp4,decwrl,hplabs,seismo,ucbvax}!sun Chuq Von Rospach chuq@sun.COM [I don't read flames] There is no statute of limitations on stupidity
chuq%plaid@Sun.COM (Chuq Von Rospach) (05/19/87)
From: hoptoad!gnu@ucbvax.Berkeley.EDU (John Gilmore) Date: 18 May 87 07:57:42 GMT Organization: Nebula Consultants in San Francisco From: dick@ccb.ucsf.edu (Dick Karpinski) > I believe that I would be most delighted > with one that constructed the PostScript which would generate "approximately" > the page that was scanned, but with the text in ASCII, not bitmap. Good luck. The scanner manufacturers have tried to jump on the coattails of the PostScript bandwagon by inventing a scanner language and calling it PreScript but from what I've seen it is no relation at all and wasn't worth mentioning except for the cute name. So far the scanner industry seems to be taking a severe split -- the guys who give you bits and, incidentally, here's a floppy for your FeeCees that might turn that into ascii, sort of; and the folks who are really working on read-anything scanners. Don't expect anything in the way of real character recognition from the cheapies. > I did take a page of dot-matrix print in to one outfit selling a $2k 300 dpi > scanner with OCR software for the IBM-PC line, but there were several problems > 1) The spacing on my sample input overwhelmed the fixed spacing OCR > software available then. > 2) No interface nor software was available for the Macintosh. > 3) Operation seemed awkward and not at all intuitive. > 4) Software costs took the package price up to around $3k. > 5) The best recognition rates seemed to be only 95-98% correct. > > I am told that a $36k Kurtzweil multi-font scanner will do just about > everything I want. Since you're in San Francisco you can easily find out. Go downtown to the Krishna Copy Center and buy an hour or two's time on the Kurzweil. They can give you the data on mac disks, IBM disks, or by modem. I tried to scan in the draft ANSI C Standard a few months ago on that machine, and while I am not an experienced operator, it had too many troubles to be useful. It made 10-20 mistakes per page on the best of pages (on multi font typeset text, probably offset printed) and in many cases it would totally garble a line for no reason, while reading the preceding and following lines without trouble. As it was, it's faster to just type the page yourself (or hire somebody who types 90-100 wpm to do it) than to try to find and fix all the mistakes the scanner makes. -- Copyright 1987 John Gilmore; you may redistribute only if your recipients may. (This is an effort to bend Stargate to work with Usenet, not against it.) {sun,ptsfa,lll-crg,ihnp4,ucbvax}!hoptoad!gnu gnu@ingres.berkeley.edu ---------------------------------------- Submissions to: desktop%plaid@sun.com -OR- sun!plaid!desktop Administrivia to: desktop-request%plaid@sun.com -OR- sun!plaid!desktop-request Paths: {ihnp4,decwrl,hplabs,seismo,ucbvax}!sun Chuq Von Rospach chuq@sun.COM [I don't read flames] There is no statute of limitations on stupidity
chuq%plaid@Sun.COM (Chuq Von Rospach) (05/19/87)
Date: Tue, 19 May 87 09:53:50 CDT From: James Peterson <peterson@MCC.COM> > From: hoptoad!gnu@ucbvax.Berkeley.EDU (John Gilmore) > I tried to scan in the draft ANSI C Standard a few months ago on that > machine, and while I am not an experienced operator, it had too many > troubles to be useful. We have a Palantir that I have been using for several months to see how well it works. It is a 300 dpi scanner with built-in ASCII conversion. It comes complete with software to run on a SUN, but I found it easier to write my own programs to interpret the scanner output than to use theirs (personal taste -- their programs run under SunWindows, and I don't). Over all I've found that they do a pretty good job on most input, but that the input that I want to scan is close to the margin of acceptable input -- tables tend to be too small or on poor contrast paper or ... I can scan, for example, the Zip code directory for Austin in about an hour, but it then takes me two weeks of evening work to format it and correct the scanning errors. So far I have only scanned stuff that should have built in redundancy that I can check by program. For example, with the Zip codes, all scanned zip codes should be in a small range of legal values, the street names should all be alphabetic, and in order. And so on. This allows me to catch a lot of scanner errors without having to read and compare every entry. It also tends to expose errors in the printed input -- no multi-page reference table that I have scanned has been without printed errors. ---------------------------------------- Submissions to: desktop%plaid@sun.com -OR- sun!plaid!desktop Administrivia to: desktop-request%plaid@sun.com -OR- sun!plaid!desktop-request Paths: {ihnp4,decwrl,hplabs,seismo,ucbvax}!sun Chuq Von Rospach chuq@sun.COM [I don't read flames] There is no statute of limitations on stupidity
chuq%plaid@Sun.COM (Chuq Von Rospach) (05/29/87)
From: seismo!sun!cwruecmp!rbl%nitrex.uucp@RUTGERS.EDU ( Dr. Robin Lake ) Date: 26 May 87 13:04:38 GMT Organization: The Standard Oil Co., Cleveland >From: rbl@nitrex.UUCP ( Dr. Robin Lake ) >Date: 18 May 87 13:02:49 GMT >We have a 4 year old Kurzweil. It has not been able to handle dot >matrix well, but we have not tried to fiddle with the threshold settings, >etc. to make it do so. We are looking at a new system made by Palantir >this Thursday. We'll give dot matrix a try and let you know. We did look at the Palantir, but did not test dot matrix as one of our "clients" had an 87 page copy of a typewritten document they wanted scanned. Palantir ran about 30 seconds per page. With no tune-up ("showroom stock") it picked up every mark on the page, missed some blended letters, read 0 as o and completely satisfied the "client". I plan to run the same 87 pages thru the Kurzweil, with and without tuning. It may take 2 - 3 weeks, so stay tuned! THIS IS NOT AN ENDORSEMENT OR CRITICISM OF ANY PRODUCT!! "One Test is Worth a Thousand Expert Opinions" The Riehle Axiom. Rob Lake ---------------------------------------- Submissions to: desktop%plaid@sun.com -OR- sun!plaid!desktop Administrivia to: desktop-request%plaid@sun.com -OR- sun!plaid!desktop-request Paths: {ihnp4,decwrl,hplabs,seismo,ucbvax}!sun Chuq Von Rospach chuq@sun.COM Delphi: CHUQ Now, where did my ex-wife put my Fairy Dust?