[comp.sys.mac] Advanced OCR software for HP Scan-Jet Plus

ariel@bimacs.BITNET (Ariel J. Frank) (12/13/89)

Hello netland.

I need your collective advice on advanced OCR software for a HP
Scan-Jet Plus we are getting soon with interfaces both to a PC and a
MAC. It comes with Scanning Gallary Plus software but we need
something more advanced/sophisticated/intelligent/learning etc... What
do you recommend?  Is there something good that runs both on a PC and
a MAC? Any experiences, warnings etc... Any support for tough languages
(Hebrew?). Any recommendations for Read-It!, ReadStar II (Plus),
ReadRight, whatelse?

Any info, facts, advice, insight will be appreciated. Please E-mail
and I will summerize if enough interest. Thanks in advance, Ariel.

ariel@bimacs.BITNET (Ariel J. Frank) (12/21/89)

For the follwing query:

> Subject:

> Hello netland.

> I need your collective advice on advanced OCR software for a HP
> Scan-Jet Plus we are getting soon with interfaces both to a PC and a
> MAC. It comes with Scanning Gallary Plus software but we need
> something more advanced/sophisticated/intelligent/learning etc... What
> do you recommend?  Is there something good that runs both on a PC and
> a MAC? Any experiences, warnings etc... Any support for tough languages
> (Hebrew?). Any recommendations for Read-It!, ReadStar II (Plus),
> ReadRight, whatelse?

> Any info, facts, advice, insight will be appreciated. Please E-mail
> and I will summerize if enough interest. Thanks in advance, Ariel.

I got only a few answers this round. I had some from before. Here is a
(slightly edited) summary. Thanks to all that replied. Ariel.

-------------------------------------------------------------------------------

From: ath@prosys (Anders Thulin)

Here's the summary (slightly updated as regards Recognita) from a
similar question I asked some time ago.  As you see, I didn't get much
response.

A rather disappointing result: only two responses. Hardly worth a
summary, I'd say.  Hope it can stimulate further investigations,
though.

Summary of question about IBM PC OCR software:
----------------------------------------------

* Received comments:

+ Paragor, Recognita:

> I have used several different packages, but I have not yet found anything
> that could really be used in Iceland. The problem has to to with all
> our accented characters. Foe example, most of the packages would be confused
> by "accented-i", consider it to be an "i" or a "l". This applied equally
> well to fixed fonts programs like Paragor (I thint that is how it was
> spelled) and "trainable" programs. One program proved to be a lot better
> than the rest - a Hungarian program named "Recognita".

  frisk@rhi.hi.is

+ Omnipage:

> I was told that this software can recognize the 10 western
> languages, but I don't know if Swedish is included.
> The price is 1995 US-$ for an AT 80286 and it's also available for a MAC,
> for which it is much cheaper.
> Friends told me, that they are quite content with the results - they've
> tested it with some pages of the Bible, which worked out pretty good.

> It's available in 3 versions:
> MAC for US-$ 795.00
> MS-DOS for XT or AT up to 80286 US-$ 1995.00 (it comes with an additional
>       board for a long slot. That includes 2MB RAM and a coprocessor)
> 80386 for US-$ 895.00

  reischl@siecomp.UUCP

* Reviews

  No reviews were reported or found.

  The PC Omnipage package is probably close to the Mac Omnipage.
  Reviews for the Mac version can be found in:

  MacUser, February 1989
    reviews 8 OCR pacakges for Mac, one of which is Omnipage

  Personal Computer World (England), June 1989
    compares Omnipage and TextPert.

* Other software & comments:
  This list is based on magazine ads and info from a few Swedish retailers.
  I don't have any experience with any of these products.

- Recognita, Recognita Plus:

  Based on contour analysis of letters. Trainable.  Reads 6-24 point
  glyphs (300 dots/inch). Can read uncompressed TIFF files. 99.9% hit
  rate claimed (for vendor-provided test sheet). Can use EMS.

- OCR Systems: ReadRight

  Template matching.  Untrainable (?).

- Flagstaff Engineering: SPOT

  Trainable (?)
--
Anders Thulin, Programsystem AB, Teknikringen 2A, S-583 30 Linkoping, Sweden
ath@prosys.se   {uunet,mcsun}!sunic!prosys!ath

-------------------------------------------------------------------------------

From: "Michael W. Picher" <PICHER@MAINE>

I have a ScanJet Plus with Scangal and ReadRight software.  The ReadRight
software is not very intelligent.  It does a pretty good job with
normal sized courier type but most anything else you had better forget.
I do believe there is an add-in board from a company that Xerox owns
which is supposed to be excellent (the name escapes me however).
Good luck!

Mike

    Michael W. Picher,    /     **    ** MicroLab / LexIkon Microsystems
      Vice President     /     **    **         333 Water Street
                        /     ***   **        Augusta, Maine 04330
   Picher@Maine.Bitnet /    **  ****             (207) 623-4012
                      /   ** Consultants and PC Compatible Manufacturers
-------------------------------------------------------------------------------

From: Mahlzeit! <siecomp!reischl%harvard@harvunxw.BITNET>

just recently I recommended the same to another reader of this group.
You might wanna look through his summary, that he published a couple
of days ago.  Anyway, my recommendation is "OMNIPAGE", which is
available for a MAC ($ 795.-), an 80286 ($ 1995.- it comes with an
additional board) and an 80386 ($ 895.-).  The manufacturer can get
you more information and a distributer info: Caere Corporation 100
Cooper Court Los Gatos, CA 95030 call toll free (800)535-SCAN

This product recognizes the 10 western character set (which probably
EXcludes Hebrew) and it is pretty reliable.  A Demo-Version for the
80386 is available, where they removed the "save"- options.  Have fun

Wolfgang Reischl      (reischl@siecomp.UUCP) Siemens Components Inc.
                                             2191 Laurelwood Rd.
                                             Santa Clara, CA 95054

-------------------------------------------------------------------------------

From: simon <simon%ALBERTA.UUCP@UALTAMTS>

We have several HP Scan-Jet Pluses and AccuText from Xerox.  The HP is
great, and was a good buy to boot.  AccuText is supposed to be one of
the better OCR or ICR, as they call it, packages.  It works well with
printed type typically found in magazines and newspapers, even small
type sizes.  Its drawbacks are in non-prose text, e.g., when scanning
a program listing, even when the source is perfectly printed (e.g.,
Numerical Recipes), or when the type is not even, as when processing the
scan of a sheet typed with a typewriter (I got complete junk).  My
complaints to Xerox where treated as "some people are never satisfied",
with recommendations that I get more even text.  If the text is not
printed according to specifications, it is often quicker to type it in
oneself.  The technology still has some way to go, but in some instances
it does work very well.

Good luck.
-------------------
W. Simon Tortike,                         | tel    : 403/492-3338
Dept of Mining, Metallurgical             | fax    : 403/492-7219
      and Petroleum Engineering,          | CDNnet : simon@cs.UAlberta.CA
University of Alberta,                    | uucp   : simon@alberta.uucp
Edmonton, AB, CANADA T6G 2G6.             |

-------------------------------------------------------------------------------

From: C. H. Enchurle <chenhu@silver.bacs.indiana.edu>

        I have used HP Scan-Jet with an AT&T6386 in my office.  The
software I used were ReadRight OCR, v.1.3 and FlagStaff SPOT, V.2.1.
I am not very satified with these two packages.  But they can do a
good job sometimes.  One of their common points is that they are
cheap, around $200-$300.  I was told that both have new version
available now.  But I don't know the prices for new ones.
--chc

-------------------------------------------------------------------------------

From: yuan@uhccux.uhcc.hawaii.edu (Yuan 'Hacker' Chang)

        ReadRight can only read fixed-space fonts around 11 - 13
points.  Pica and Courier seems to work fine, and maybe Roman (such as
the NLQ type from Epson FX printers).  For a program that can read
proportional spaced fonts, FlagStff Engineering's SPOT is a good
choice.  It'll "learn" whatever font you have (sufficiently dark, of
course).  Only problem is that it's a bit on the pricey side (~
$1,000).  So it all depends on what your needs are.

        Somebody disagreed with me about the capability of ReadRight
to read proportional space fonts, so I'll try to elaborate more on
that.  According to the manual, ReadRight "can read a range of point
sizes and pitches -- 6 to 12 points and 10 to 15 pitch" You mileage
may vary here.  And here's a list of fonts that ReadRight supports:

        MONOSPACED                        PROPORTIONALLY SPACED
        Courier                                bold
        Pica                                Cubic
        Elite                                Roman/Madeleine
        Prestige Pica                        Title
        Prestige Elite                        Modern
        Letter Gothic                        Thesis
        OCR-B                                Theme
        Bookface Academic                Arcadia
        Prestige Renown/Style                Gothic/Victory
                                        Majestic
-----
Problem is that most of your laser-printed fonts aren't supported.  The
manual states:

        "Laser printer fonts, such as Hewlett-Packard LaserJet and
        LaserJet+ fonts and Canon fonts, are supported as long as
        they are very similar to fonts included in the tables above.
        Examples include Courier, Elite, pica, and Letter Gothic.  Fonts
        similar to typeset fonts -- Helvetica, Times Roman, and others --
        are not supported at present but will be in the future."

I apologize here if anybody's led to believe that ReadRight's totally
incapable of reading ANY proportional spaced fonts.  If you want to
read laser-printed fonts such as Times Roman and Helvetica, you are
still out of luck.

Yuan Chang                                       "What can go wrong, did"
UUCP:      {uunet,ucbvax,dcdwest}!ucsd!nosc!uhccux!yuan
ARPA:           uhccux!yuan@nosc.MIL               "Wouldn't you like to
INTERNET:  yuan@uhccux.UHCC.HAWAII.EDU         be an _A_m_i_g_o_i_d too?!?"

-------------------------------------------------------------------------------

From: mitisft!dold@news.think.com

SPOT and other OCR and Document Translation programs come from
Flagstaff Engineering,
Flagstaff, Arizona
(602) 779-3341

-------------------------------------------------------------------------------

From: pete@octopus.UUCP (Pete Holzmann)

Calera Recognition Systems has come out with a Kurzweil-beating system. For
a *LOT* less bucks. $2500 list for the 1-2min/page version, $3500 for one
that is twice as fast and recognizes landscape mode text. This does not
include the scanner, PC, or auto paper feeder. On the other hand, it
does more than even the latest $13000 Kurzweil scanner does:

        - auto column, table, figure recognition
        - auto recognition of all type 6-28 point, bold, italic, underlined.
        - *completely* formats text into the WP format of your choice,
                including columns, etc etc. Puts non-text stuff (whatever
                remains after recognition) into graphics file format of
                your choice.

    With this thing, you really can slap a sheet of paper onto your scanner,
    and have a completely formatted WP document a minute later. No correcting
    needed. IT WORKS.

    Doesn't include a scanner 'cuz they work with any popular scanner. Or
    with a graphics file. Or a fax file (fax->wp is a nice idea, for those
    who use fax!)

Pete
--
  OOO   __| ___      Peter Holzmann, Octopus Enterprises
 OOOOOOO___/ _______ USPS: 19611 La Mar Court, Cupertino, CA 95014
  OOOOO \___/        UUCP: {hpda,pyramid}!octopus!pete
___| \_____          Phone: 408/996-7746


-------------------------------------------------------------------------------

--
    Ariel J. Frank
    Deputy Chairperson, Dept. of Mathematics and Computer Science
    Bar Ilan University, Ramat Gan, Israel 52100
    Tel: (972-3-) 5318407/8, Fax: (972-3-) 344766
    BITNET:   ariel@bimacs (also F68388@barilan)
    INTERNET: ariel@bimacs.biu.ac.il
    ARPA:     ariel%bimacs.bitnet@cunyvm.cuny.edu
    CSNET:    ariel%bimacs.bitnet%cunyvm.cuny.edu@csnet-relay
    UUCP:     ...uunet!mcvax!humus!bimacs!ariel

john@hpuorfa.HP.COM (John Scott) (01/04/90)

< I need your collective advice on advanced OCR software for a HP
< Scan-Jet Plus we are getting soon with interfaces both to a PC and a
< MAC. It comes with Scanning Gallary Plus software but we need
< something more advanced/sophisticated/intelligent/learning etc... What
< do you recommend?  Is there something good that runs both on a PC and
< a MAC? Any experiences, warnings etc... Any support for tough languages
< (Hebrew?). Any recommendations for Read-It!, ReadStar II (Plus),
< ReadRight, whatelse?
  
  Yes, and all the above.
  The package you want to look at is Omnipage.  It Recognizes 11 European
  character sets, and runs on PC's and MAC's.  The only warning is that the
  version that runs on anything less than a 386 requires a special board
  that boosts the price about 1,000.  It normally sells for 895.
  It is a package that we are endorsing in the Southern Sales Region.
   
					  Caere Corporation
					  100 Cooper Court
					  Los Gatos, CA 95030
					  1-800-535-SCAN


   John Scott
   Dealer Support Consultant
   Hewlett Packard 
   Southern Sales Region