[alt.sources.d] Postscript to Text converter

nagar@netcom.COM ( Nagar) (05/26/91)

I am looking for a postscript to
text converter, is there such a
program available through
ftp from simtel20 or some other
site?

I would appreciate a e_mail..

thanks

mathew@jane.Jpl.Nasa.Gov (Mathew Yeates) (05/27/91)

In article <1991May26.063129.26177@netcom.COM> nagar@netcom.COM ( Nagar) writes:
>I am looking for a postscript to
>text converter, is there such a
>program available through
>ftp from simtel20 or some other
>site?
>
>I would appreciate a e_mail..
>
>thanks

I too am interested, and dubious that such a thing exists. 

gtoal@tardis.computer-science.edinburgh.ac.uk (05/27/91)

In article <1991May26.181915.14910@elroy.jpl.nasa.gov> mathew@jane.Jpl.Nasa.Gov (Mathew Yeates) writes:
>In article <1991May26.063129.26177@netcom.COM> nagar@netcom.COM ( Nagar) writes:
>>I am looking for a postscript to
>>text converter, is there such a
>>program available through
>>ftp from simtel20 or some other
>>site?
>>
>>I would appreciate a e_mail..
>>
>>thanks
>
>I too am interested, and dubious that such a thing exists. 


This is going to sound silly, but the best way of getting what you
want is to print out your postscript and scan it back in!

If you haven't got a scanner, get a copy of Ghostscript and output
to some bitmap form which can be read in by one of the PD OCR packages --
cut out the middle man :-)

If you're a real hacker, get the Ghostscript sources and hack them
to output any text to a data structure instead of the bitmap, and
do an x-y sort on your data structure.  Modulo superscripts and
subscripts, you might have a chance of reconstructing lines.

Graham
PS Don't mail me asking where to find ghostscript or ocr software -
I don't know...

clewis@ferret.ocunix.on.ca (Chris Lewis) (05/28/91)

In article <9105262212.AA29690@ucbvax.Berkeley.EDU> gtoal@tardis.computer-science.edinburgh.ac.uk writes:
>In article <1991May26.181915.14910@elroy.jpl.nasa.gov> mathew@jane.Jpl.Nasa.Gov (Mathew Yeates) writes:
>>In article <1991May26.063129.26177@netcom.COM> nagar@netcom.COM ( Nagar) writes:
>>>I am looking for a postscript to
>>>text converter, is there such a
>>>program available through
>>>ftp from simtel20 or some other
>>>site?

>This is going to sound silly, but the best way of getting what you
>want is to print out your postscript and scan it back in!

If you have a scan-2-text converter rather than simply a raster reader.

>If you're a real hacker, get the Ghostscript sources and hack them
>to output any text to a data structure instead of the bitmap, and
>do an x-y sort on your data structure.  Modulo superscripts and
>subscripts, you might have a chance of reconstructing lines.

You can do this without Ghostscript.  I've taken the output of
various text processors and reconstructed an ASCII version using
perl (this is also doable in awk).  You need to search for the (x,y)
coordinate settings, and translate these into row and column positions,
and then "drop" the strings enclosed in parenthesis at that position.

Hard things are if the postscript contains reverse line motion
(which requires you to buffer a whole page).  Or, if the point sizes
vary a lot. Of course, this approach won't handle graphics and other
stuff, but as long as your scanner is reasonably accurate in only
snagging x:y and text display commands, it'll work well enough.

If you're familiar with awk or perl, you can usually whomp one of these
things up in about an hour.  Sorry I didn't save the one I did for someone
else on the net.
-- 
Chris Lewis, Phone: (613) 832-0541, Domain: clewis@ferret.ocunix.on.ca
UUCP: ...!cunews!latour!ecicrl!clewis; Ferret Mailing List:
ferret-request@eci386; Psroff (not Adobe Transcript) enquiries:
psroff-request@eci386 or Canada 416-832-0541.  Psroff 3.0 in c.s.u soon!