[comp.lang.postscript] Translating PostScript to ASCII: A Proposal

shiva@well.sf.ca.us (Kenneth Porter) (02/23/91)

A perennial question here is how to translate PostScript to
ASCII. It occurred to me that if you have OCR software and a
PostScript to bitmap converter like GoScript, you could combine
the two to get a good approximation of the original document.  
Avoid the intermediate step of printing the PostScript and
scanning the image back in as this will degrade the image
quality and the inherent regularity of character bitmaps that
come from using a direct PostScript to bitmap conversion.
 
Ken (shiva@well.sf.ca.us)

jwz@lucid.com (Jamie Zawinski) (02/25/91)

Ok, I've been meaning to write this for a while...

Sort of the canonical 'sed' program.

Put this in your .cshrc file:

alias unps \(sed \
 \''s/^[ \t]*[^()]*$//g;s/%.*$//g;s/^[^(]*(//g;s/)[^(]*(/ /g;s/)[^)]*$//g;'\' \
 \| tr '\\012' '\\040' \| tr -s '\\040' '\\040' \; echo \'\'\)

and then "unps < somefile" will extract all of the PostScript strings from
"somefile."  There will be no newlines, and all occurences of multiple spaces
will be collapsed to a single space, so you'll need to wrap the lines by your
favorite method.
			-- Jamie