npn@cbnewsl.att.com (nils-peter.nelson) (03/22/91)
Partly for amusement but mainly to test some new technology and software I did the following: 1. Scanned a random page of Swedish text on a 400 dpi Fujitsu scanner. 2. Ran it through some new optical character recognition software from Bell Labs research. 3. Ran the ASCII through de-hyphenation code to create unformatted, unfilled text. 4. Ran the ASCII through a new filter for creating ISO 8859-1 files from escaped ASCII. 5. Ran that through the DWB 3.2 troff I then cheated slightly and threw in 5 or 6 lines of troff to get 2 column output and one drop cap. The resulting PostScript output looks amazingly like the original input, except that the hyphenation is all different: I used the standard English hyphenation rules. There were the usual number of OCR mis-recognitions, but there are also entire sentences that are flawless. I found it all pretty amazing. Here's a sample line from the ASCII troff file: Det \(a"r 60 \(a*r sedan detta skrevs \- och sedan Bj\(a"re h\(a"rads hembygdsf\(o"rening grundades. F\(o"reningen har v\(a"l