[comp.text] paper to troff to paper in Swedish

npn@cbnewsl.att.com (nils-peter.nelson) (03/22/91)

Partly for amusement but mainly to test some new technology
and software I did the following:
1. Scanned a random page of Swedish text on a 400 dpi
Fujitsu scanner.
2. Ran it through some new optical character recognition
software from Bell Labs research.
3. Ran the ASCII through de-hyphenation code to create
unformatted, unfilled text.
4. Ran the ASCII through a new filter for creating ISO 8859-1
files from escaped ASCII.
5. Ran that through the DWB 3.2 troff

I then cheated slightly and threw in 5 or 6 lines of troff
to get 2 column output and one drop cap.
The resulting PostScript output looks amazingly like the
original input, except that the hyphenation is all different:
I used the standard English hyphenation rules.
There were the usual number of OCR mis-recognitions, but
there are also entire sentences that are flawless.
I found it all pretty amazing.
Here's a sample line from the ASCII troff file:
Det \(a"r 60 \(a*r sedan detta skrevs \- och sedan Bj\(a"re h\(a"rads hembygdsf\(o"rening grundades. F\(o"reningen har v\(a"l