ruslan@uncecs.edu (Robin C. LaPasha) (01/11/90)
Replies to my request for scanners and software that can handle Cyrillic text: [My comments -as here- will be in brackets and with my initials; replies are separated by dashed lines - RL] ---------------------------------- Date: Thu, 04 Jan 90 16:27:41 From: Malcolm Brown <mbb@jessica.Stanford.EDU> Robin, here at Stanford, we've had some luck scanning cyrillic using a Kurzweil Model 4000. Kurzweil doesn't make this anymore; it was the last of their stand-alone models. Anything less than a trainable Kurzweil system would have a chance, I'd think. [_would_ or _would NOT_? - RL] Incidentally, we even got the Kurzweil to scan a Russian/English dictionary. Here we had to contend not only with different alphabets but different fonts as well. it was slow going, but it seemed a bit more efficient than keyboarding. Hence I'd recommend you check out the new, top-of-the-line Kurzweil systems (the 5100, I believe). good luck Malcolm Brown Stanford University --------------------------------- Date: 4 Jan 90 16:42:58 PST (Thu) From: gatech!mailrus!ames!elroy!gryphon.COM!richard@mcnc.org (Richard Sexton) Hi Robin; I think you need to talk to Flagstaff Engineering in Flagstaff Arizona. They claim to be able to scan this stuff. I don't have an address, but they advertise regularly in Byte. How the hell are ya anyway ? [Fine, Richard. How're the Drosophilae? ;^) - RL] ----------------------------------------- Date: Fri, 05 Jan 90 06:39:20 EST From: "Donald Parsons dfp10@uacsc2.albany.edu" <DFP10@ALBNYVM1.UUCP> Contact Dimitri Vulis at dlv@cunyvms1 - he saw a fairly successful demonstration of a particular OCR program working with a standard flat bed scanner. He is owner of the RUSTEX-l@ubvm list which concentrates on Tex Cyrillic fonts (and PC fonts and telecom to USSR). -------------------------------- Date: Fri, 5 Jan 90 15:28 EDT From: <DLV%CUNYVMS1.BITNET@CUNYVM.CUNY.EDU> I few months ago I spent a few hours playing with a flat bed HP scanner connected to an MS-DOS computer running Spot OCR software from Flagstaff engineering (602-779-3341). The software was very very good at reading Russian text produced on a HP LJ using TeX and WNCYR fonts (upright and italic) with some English words mixed in. (I came across it unexpectedly at a trade show and I just happened to have a business letter that I was about to mail to Moscow; had I been prepared, I would have tried other kinds of text). It's trainable and it will remember the letters the next time you use it (it remembers several typefaces). The output is saved in a file and a little programming would be needed to convert the file into whatever scheme you use to store Cyrillic (e.g. the software is not smart enough to understand that 'A' is sometimes Latin A and sometimes Russian A). The only 2 problems that I recall were 1) it sometimes (not most of the time) choked on multi-part characters (i kratkoe, y), 2) it sometimes was confused by kerning (which WNCYR has more of than most typefaces). Unfortunately, I was asked $850 for the software (and the list price is even more), which is a bit more than I can afford. I tried to convince my school to pay for it, but we're really short of funds (this is not an election year and the finanical situation is even worse than it was last year, although it's not publicized as much). They said for OCR applications you need a flat bed scanner; hand scanners just aren't adequate. If it does become likely that we'll be able to afford OCR software (we already have a scanner, but no ocr software for it), I'll also test CAT software and a few others that I've heard are good before choosing one. I'd appreciate hearing about your experiences and/or replies to your question. [This next piece is from later mail by the same correspondent - RL] If Duke folks can afford it, then a HP flatbed scanner + SPOT software look like a very good choice; but advise them not to get a hand scanner or a one that pulls a sheet thru, like a fax machine. CAT software also sounds very very good from the reviews, but I've never tried it. Dimitri Vulis Department of Mathematics City University of New York Graduate Center ------------------------------------------ Date: Sat, 6 Jan 90 15:34:21 gmt From: frisk@rhi.hi.is (Fridrik Skulason) Organization: University of Iceland (RHI) I have seen a Hungarian product called 'Recognita' that should be able to handle this. It runs on a AT-class machine with a 300 (or more) dpi scanner. I can find the address of the manufacturer if you want. -frisk ---------------------------------------- Date: Sat, 6 Jan 90 20:38:00 -0600 From: David Boyes <dboyes@brazos.rice.edu> I use AccuText on the Mac and OCR Plus on the PC (both done by Xerox) to do classical Greek and Anglo-Saxon text. AccuText has a Cyrillic configuration, but I've never used it. We scan the stuff, create MS Word files on both machines and then feed them into our structured archive system. We get about 90% accuracy on the Greek texts (thank god -- they're a hassle to proof) and 92-95% accuracy on the A-S texts. These packages are by the same programmers that do the Kurzweil firmware. It's nice stuff. [I import the texts into] MS Word on the PC, [use] the Adobe downloadable fonts for Greek, and a PostScript printer. Works great for me. ----------------------------------- From @sunic.sunet.se:ath@prosys Sun Jan 7 04:14:22 1990 From: ath@prosys.UUCP (Anders Thulin) Organization: Programsystem AB, Teknikringen 2A, S-583 30 Linkoping, Sweden The Kurzweil OCR system seems to have a rather good reputation in general. It's main drawback is the price. Calera Recognition Systems (sorry - have no address) have produced something of a Kurzweil killer - at least as good as K. but cheaper. In the PC world there are at least two packages that may be usable here: Xerox Accutext and Recognita. Accutext is claimed to be based on some kind of AI techniques. It is trainable, although I cannot swear that it can deal with non-Latin character sets. Some people have had trouble scanning program listings with it - ordinary text seems OK. Recognita is the only non-US OCR software I know of for the PC. It is from a Hungarian company (Szyk?) and seems to be very good at accents - reports indicate that it is able to handle both Swedish and Icelandic better than most other comparable systems. I wouldn't at all be surprised to learn that it is good at Russian. I don't know what methods Accutext uses for character recognition - Recognita, however, uses some form of contour-based method. It should be possible to make it learn Russian. There is one other European OCR package, but it is for the Mac: Textpert. It is trainable, but some reviews seem to indicate that it is somewhat complicated to train. In one review (in Personal Computer World - a British magazine) Textpert was matched against Omnipage. Omnipage 'won'. Again I don't know what method of regnition it uses. I'm sorry that I don't have any hands-on experience from these - I am hoping my dealer can give me a demo of Recognita soon, as I am trying to find software that would permit me to scan Swedish texts. Hope this is of some help, -- Anders Thulin ath@prosys.se {uunet,mcsun}!sunic!prosys!ath Telelogic Programsystem AB, Teknikringen 2B, S-583 30 Linkoping, Sweden ----------------------------------------- Date: Mon, 8 Jan 90 13:56:05 EST From: doering@Kodak.COM (Paul Doering) Hi, Robin. I saw your inquiry in comp.periphs. Here is a bit of info paraphrased from a recent Electronic Engineeering Times -- Hungarian state-owned company SzKI offers its Recognita Plus software for PC's: OCR on any Roman- or Greek-based alphabet without preliminary training. Handles material from typesetters, typewriters; daisywheel, NLQ dot-matrix, or laser printers. Uses a feature-analysis recognition algorithm for all popular fonts. Training procedure is available if needed. I wish I could tell you how to contact SzKI. If I were interested enough, I'd call the Hungarian embassy in Washington. SzKI claims to be the world's leader in OCR. I'd feel better about that claim if I had ever heard of them before. Still, Here's a lead, and you can handle it as you wish.... Good luck Paul Doering; Kodak Research, Rochester NY. doering@kodak.com ----------------------------------- Date: Sun, 7 Jan 90 10:29:10 EST From: djb@wjh12.harvard.edu (David J. Birnbaum) [Note - I'm combining several email messages from this correspondent - RL] Robin, I am preparing an article for the Bits & Bytes Review on OCR for non-Latin texts. As a Slavist who works with a variety of modern and medieval materials (he alliterated), I am particularly interested in Cyrillic. ++++++++++++++++++++++++++++ [Quoting previous email] >I have gotten a lot more response than I would have expected, so >I will be supplying a summary. I'll look forward to it. I have considerable practical experi- ence with Datacopy and SPOT and I'll be looking into Textpert (soon to be released for MS-DOS) and Kurzweil for my article in the Bits & Bytes Review. If there is anything else in your sum- mary that looks promising, I'll try to include it as well. +++++++++++++++++++++++ Still patiently looking forward to news of OCR for Cyril- lic. I got a call yesterday from someone in Kansas (whose name I promptly misplaced), who said that he's about to post a summary of similar information to Humanist. [Excerpt from an earlier message: "the Humanist Listserv run (until recently) by Willard McCarty at UToronto. If you don't currently subscribe to Humanist, you might find it interesting; their mission is to provide a forum for computing in the humanities." - RL] He has access to USENET and I told him to join comp.fonts and watch for your posting. Cheers, David ----------------------------- That's all for now! Thanks for all the input. I'm amazed at what's out there; we have both hardware and software choices that just didn't exist 5 years ago (the last time I asked anyone about it.) If I get more details about these products or info about products not mentioned, I'll post later with a followup. - RL Summary - Hardware: if you're not going with a Kurzweil (turnkey?) system, hook a flatbed scanner up to a Mac or PC. Software: PC: SPOT (Flagstaff Engineering) CAT Recognita [plus](SzKI) Datacopy OCR Plus (Xerox) (something by) (Calera Recognition Systems) [in development: Textpert] Kurzweil program...? Mac: Accutext (Xerox) Textpert OmniPage -- =-=-=-=-=-=-=- Robin LaPasha |Deep-Six your ruslan@ecsvax.uncecs.edu |files with VI! ;^) ;^) ;^)