shiva@well.sf.ca.us (Kenneth Porter) (02/23/91)
A perennial question here is how to translate PostScript to ASCII. It occurred to me that if you have OCR software and a PostScript to bitmap converter like GoScript, you could combine the two to get a good approximation of the original document. Avoid the intermediate step of printing the PostScript and scanning the image back in as this will degrade the image quality and the inherent regularity of character bitmaps that come from using a direct PostScript to bitmap conversion. Ken (shiva@well.sf.ca.us)
jwz@lucid.com (Jamie Zawinski) (02/25/91)
Ok, I've been meaning to write this for a while... Sort of the canonical 'sed' program. Put this in your .cshrc file: alias unps \(sed \ \''s/^[ \t]*[^()]*$//g;s/%.*$//g;s/^[^(]*(//g;s/)[^(]*(/ /g;s/)[^)]*$//g;'\' \ \| tr '\\012' '\\040' \| tr -s '\\040' '\\040' \; echo \'\'\) and then "unps < somefile" will extract all of the PostScript strings from "somefile." There will be no newlines, and all occurences of multiple spaces will be collapsed to a single space, so you'll need to wrap the lines by your favorite method. -- Jamie