[comp.misc] Looking for a PostScript to ASCII converter

add@sciences.sdsu.edu (James D. Murray) (07/17/90)

	After poking around via FTP for a while I haven't been able to
find a PostScript to ASCII converter.  Not having a PostScript printer
it would be a nice thing to have.  Even a PostScript to Microsoft Word,
Sprint, Wordstar, Wordperfect, etc. would be welcome.

	Anyone know were I might FTP this, or a similar program?

emv@math.lsa.umich.edu (Edward Vielmetti) (07/19/90)

In article <1990Jul17.044534.23396@ucselx.sdsu.edu> add@sciences.sdsu.edu (James D. Murray) writes:

	   After poking around via FTP for a while I haven't been able to
   find a PostScript to ASCII converter.  Not having a PostScript printer
   it would be a nice thing to have.  Even a PostScript to Microsoft Word,
   Sprint, Wordstar, Wordperfect, etc. would be welcome.

Postscript is a humongously flexible page description language.  If you have
a document written with (say) microsoft word it may be possible to deconstruct
the postscript and find out what actual characters are involved in the text.
Things like ligatures, font changes, odd spacing can make it tough.

It all depends on the document I guess is what I'm trying to say.

--Ed

Edward Vielmetti, U of Michigan math dept <emv@math.lsa.umich.edu>
comp.archives moderator

bambi@kirk.nmg.bu.oz (David J. Hughes) (07/20/90)

From article <EMV.90Jul18165824@urania.math.lsa.umich.edu>, by emv@math.lsa.umich.edu (Edward Vielmetti):
> In article <1990Jul17.044534.23396@ucselx.sdsu.edu> add@sciences.sdsu.edu (James D. Murray) writes:
> 
> 	   After poking around via FTP for a while I haven't been able to
>    find a PostScript to ASCII converter.  Not having a PostScript printer
>    it would be a nice thing to have.  Even a PostScript to Microsoft Word,
>    Sprint, Wordstar, Wordperfect, etc. would be welcome.
> 
> Postscript is a humongously flexible page description language.  If you have
> a document written with (say) microsoft word it may be possible to deconstruct
> the postscript and find out what actual characters are involved in the text.
> Things like ligatures, font changes, odd spacing can make it tough.

Another question on the same line - does anyone know of a filter for
translating PostScript into Epson type graphics and character formatting
codes.  I know that the power of PostScript is substantially greater
than that of the Epson command code set, but in our situation a filter
such as this would be invaluable.  We have labs of student PC's that
spool their printjobs to UNIX hosts.  Final copies of assignments etc
can be sent to LaserWriters but draft copies are sent to the (free)
dot-matrix printers (Epsons).  Has anyone done anything like this using 
the GNU GhostScript interpretor ?

Any help would be appreciated.


bambi
+----------------------------------------------------------------------------+
| David J. Hughes   (AKA bambi)	 |   bambi@kowande.bu.oz.au                  |
| Systems Programmer		 |   bambi@kowande.bu.oz.au@uunet.uu.net     |
| Network Management Group       |   ..!uunet!munnari!kowande.bu.oz.au!bambi |
| Bond University, Gold Coast    |   Phone : +61 75 951111                   |
| Queensland,  Australia  4229   |   Fax :   +61 75 951456                   |
+----------------------------------------------------------------------------+

max@pnet51.orb.mn.org (Max Tardiveau) (07/20/90)

I think many people don't realize that Postscript is not a graphics
format like TIFF. Postscript is a programming language, with all
rights and7 privileges. Programs that generate Postscript usually
use only a subset of the language (except programs like Illustrator
which take (almost) full advantage of the language).

So what does that mean to you, if you have a Postscript file and
you want to print it ? That means you must have a Postscript
interpreter. There is one (obviously) in all Postscript printers,
but you can also find others that can run on non-dedicated
computers (a Postscript printer is a dedicated computer with
the engine of a photocopy machine, roughly).

So where do you find Postscript interpreters ? A few names come
to my mind :

- Freedom of the Press is a Postscript clone that runs on Macintosh.
- Ghostscript is a GNU Postscript clone (a little rough, though).
- Postscript display systems like NeWS and NeXT. I think there is
  a driver that will print Postscript on an Imagewriter from a NeXT.

The bottom line is that you should not expect a simple translation
from Postscript to HPGL, Quickdraw, TIFF or Epson codes. You need
a complete interpreter, and that's not a piece of cake.

I hope this will clarify things a little bit.

Max

--------------------------------------------------------------------
We don't care. We don't have to. We're the phone company.
UUCP: {amdahl!bungia, uunet!rosevax, crash}!orbit!pnet51!max
ARPA: crash!orbit!pnet51!max@nosc.mil
INET: max@pnet51.orb.mn.org

add@sciences.tmc.edu (James D. Murray) (07/20/90)

	Perhaphs I should try and write a PostScript to ASCII disassembler
using AWK.  All I /need is the bare formatting of the text.  I don't care
about fonts, graphics, bold, italic, etc.

	I think I'd settle for PostScript to HPLJ PCL, though.

doug@eris.berkeley.edu (Doug Merritt) (07/23/90)

In article <1990Jul17.044534.23396@ucselx.sdsu.edu> add@sciences.sdsu.edu (James D. Murray) writes:
>
>	After poking around via FTP for a while I haven't been able to
>find a PostScript to ASCII converter.  Not having a PostScript printer
>it would be a nice thing to have.

Coincidentally, I just wrote one.

Reading the responses to this so far, I see the usual "it can't be done"
sort of thinking. Naturally this is true in the general case, but the
point is, what if you just want to read the damn document, and yet have
no Postscript support? In such cases, *anything* is better than nothing.
I've patiently read Postscript documents many times, searching for text
amidst billions and billions of commands by *eyeball*. FOO!

I've looked for such a converter ("text extractor" is probably a better term)
for some years now, including asking around in comp.sources.wanted and
inside sources at Adobe, but no one seems to even understand the utility
of such a beast. I *do* have access to PS printers, so my only motivation
is saving trees...why waste 30 pages of paper if I don't even know what
I'm printing or whether it's important to me? I want *some* idea of what's
in the document first.

So I wrote a very stupid, very naive, quick and dirty little program that
simply searches for the text embedded inside of the rest of the Postscript
program, and prints *that* out. Formatting is nonexistent, output is ugly,
numeric character escapes are minimally and badly handled.

But you can read the text. Let me know if you want the program and I'll
send it on. Improvements to the source very gladly accepted. Approximations
of page coordinate placement commands is the next obvious extension, although
again it's not completely doable (e.g. spiral text).

If you need more sophistication than something on this level, Ghostscript
is probably the best answer.
	Doug
	Doug Merritt		doug@eris.berkeley.edu (ucbvax!eris!doug)
			or	uunet.uu.net!crossck!dougm