[comp.lang.postscript] ps2txt, extract strings from PostScript file

iqbal@seas.gwu.edu (Iqbal Qazi) (06/11/91)

	Here is a small C program I wrote that will extract strings
from a PostScript file.  Before you think this is the program to solve
your PostScript->text needs, let me tell you what this program does
NOT do:

	-  It won't do anything with graphics [Now THAT would be a good
trick!].
	-  It probably won't give you an accurate representation of
what your PS document will actually look like when it is printed.

	What it will do it to take the strings (usually [hopefully]
words/sentences in the file) out of a PostScript file (either as a
specified filename or as stdin) and print them on stdout.  That's all.
If you're lucky and your PS file has very little graphics or
text-enhancement-nasty in it, then you will get the text of your
document in plain old text to fondle to your heart's content.  You
will also get other stuff which you can ignore ("atends"), esp. if
there are graphics.

It's a simple little program, I'm surprized no one else wrote it.  Of
course it has limited functionality and usefulness, maybe that's why
:).

Any problems, comments, suggestions etc. to: iqbal@sparko.gwu.edu
If you do find it handy, drop me a line.

---------------------------------cut here---------------------------------

/*
	ps2txt.c						Iqbal Qazi
							      June 6, 1991

	USAGE:  ps2txt [-] [filename]

	Extracts strings from PostScript file.  Takes input from
	filename or standard input, output goes to standard output.

			i.e.	ps2txt homer.ps
				ps2txt dweezil.ps > dweezil.txt
			 	cat file.ps | ps2txt - | more
				ps2txt - < homer.ps 

	   Comments, suggestions, problems to:  Iqbal Qazi
			(iqbal@sparko.gwu.edu)
*/

#include <stdio.h>
#define ABORT(X,Y) printf(X,Y);puts(usage); exit(0)

char *usage="Usage:  ps2txt [-] [filename]";
char *str;
FILE *file, *source;

void main(argc, argv)
int argc; char **argv;
{
	int ch, para=0, last=0;
	str=argv[1];
	if (argc==1) { ABORT("",""); }
	if (argc!=2) { ABORT("ps2txt: Too many arguments.\n\n",""); }
	if ((str[0]=='-') && (str[1]==0))
		source=stdin;
	else if (str[0]!='-')
		if ((file=fopen(argv[1],"r"))==NULL)
			{ABORT("ps2txt: Error opening file:  %s.\n\n",argv[1]);}
		else source=file; 
	else
		{ ABORT("ps2txt: No such option:  %s.\n\n",argv[1]); }

	while ((ch=fgetc(source)) != EOF)
	{
		switch (ch)
		{
		case '\n' : if (last==1) { puts(""); last=0; } break;
		case '('  : if (para++>0) putchar(ch); break;
		case ')'  : if (para-->1) putchar(ch); last=1; break;
		case '\\' : if (para>0)
			    switch(ch=fgetc(source))
			    {
				case '(' :
				case ')' :  putchar(ch); break;
				case 't' :  putchar('\t'); break;
				case 'n' :  putchar('\n'); break;
				case '\\':  putchar('\\'); break;
				case '0' :  case '1' : case '2' : case '3' :
				case '4' :  case '5' : case '6' : case '7' :
					    putchar('\\');
				default  :  putchar(ch); break;
			    }
			    break;
		default	   :	if (para>0) putchar(ch);
		}
	}
}
-- 
~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~|~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^
Iqbal Qazi			     |   If cartoons were meant for adults,
Internet->    iqbal@sparko.gwu.edu   |      they'd be on in prime-time
(or even->   Bitnet:  WQ956C@GWUVM)  |                            -- Lisa