[alt.sources.wanted] seeking PostScript->ASCII filter

dhw@iti.org (David H. West) (05/25/91)

The Subject says it all.  Fully general conversion isn't necessary,
luckily, since I'm only interested in the text content of the input.

I thought I saw one or more of these go by a month or two ago, but I
didn't need it then, and I can't remember the name(s), and a scan
through the comp.sources.misc index doesn't yield any likely names.


-David West    dhw@iti.org

raymond@math.berkeley.edu (Raymond Chen) (05/25/91)

The only way to do it (short of writing your own Postscript interpreter)
is to customize the parser for the Postscript file itself.  Here's one
I wrote that handles groff Postscript output.

#!/usr/unsupported/perl
# Skip the leading glop

while (<> !~ /^%%Page: 1 2/) { ; }

@stack = ();
$y = 0;

main: while(<>){ chop;
while (s/\\$//) { $_ .= <>; chop; }
next if /^%/;
s/\\\(/\\050/g;
s/\\\)/\\051/g;
  while ($_) {
  s/^\s*//;#nuke leading whitespace
  if (s/^([\d.-]+)//) { # a number
  push(@stack, $1); }
  elsif (s/^\/[@_a-zA-Z-]+//) { # a literal
    push(@stack, ""); }
  elsif (s/^\(([^)]*)\)//) { # a string
    push(@stack, $1); }
  elsif (s/^(\w+)//) { # a command
    $c = $1;
    if ($c eq "C") { print &spaceout($stack[2]); }
    elsif ($c eq "E") { print "~" if $stack[1] > 0; print &fixup($stack[0]); }
    elsif ($c eq "F") { print &fixup($stack[1]); }
    elsif ($c eq "F2") { ; }
    elsif ($c eq "G") { print &spaceout($stack[1]); }
    elsif ($c eq "H") { print &fixup($stack[2]); }
    elsif ($c eq "Q") { &moveshow; }
    elsif ($c eq "R") { shift(@stack); &moveshow; }
    elsif ($c eq "S") { shift(@stack); &spaceout($stack[0]); &moveshow; }
    elsif ($c eq "T") { shift(@stack); shift(@stack); &moveshow; }
    elsif ($c eq "BP") { }
    elsif ($c eq "EP") { print "\n", "-" x 40, "\n"; }
    elsif ($c eq "SF") { }
    elsif ($c eq "end") { last main; }
  else { print STDERR "\7", join(":", @stack), " <$c>?\n"; }
  @stack = ();
  }
  elsif (s/^<(.*)>//) { # a hex string
     $c = ""; $d = $1;
     while ($d =~ s/(..)//) { $c .= sprintf("%c", hex($1)); }
    push(@stack, $c); }
  else { print STDERR "\7How to parse $_?\n"; }
  }
  }

sub moveshow {
	if ($y != $stack[2]) { $y = $stack[2]; print "\n"; }
	else { print "~"; }
	print &fixup($stack[0]); 
}

sub spaceout { @t = split(//, $_[0]); $_[0] = &fixup(join(" ", @t)); }

sub fixup { $_[0] =~ s/\.\.\.\.[.]*/\.\.\.\./; 
$_[0] =~ s/\\(\d\d\d)/sprintf("%c",oct($1))/eg;
$_[0] =~ s/\214/fi/g; $_[0] =~ s/\215/fl/g; 
$_[0]; }