[comp.lang.postscript] PostScript page counter?

tdd@convex.cl.msu.edu (Thomas D. Davis) (07/27/90)

Please excuse me if this is the most-asked question of the year but we're
looking for software that will read a PostScript file and do nothing more
than determine the number of pages it will print.  Does anyone know of 
such a beast?

FWIW, the reason we need this capability is to handle on-demand printing
in the micro labs around campus where we charge for LaserWriter output on
a per-page basis.  We would like to be able to avoid printing jobs that
no one ever pays for.  I'm sure other folks have run into this...
--
Tom Davis                 | The above statement shall be construed,
Network Software Services | interpreted, and governed by me alone.
Michigan State University | EMail: tdd@convex.cl.msu.edu

henry@zoo.toronto.edu (Henry Spencer) (07/30/90)

In article <1990Jul26.173633.13911@msuinfo.cl.msu.edu> tdd@convex.cl.msu.edu (Thomas D. Davis) writes:
>looking for software that will read a PostScript file and do nothing more
>than determine the number of pages it will print.  Does anyone know of 
>such a beast?

Basically, it can't be done.  You essentially have to do full PostScript
interpretation.  The language is too complex for any simple shortcut.
The usual technique is to read the printer's page count before and after,
and subtract -- *it* knows how many pages it printed.
-- 
The 486 is to a modern CPU as a Jules  | Henry Spencer at U of Toronto Zoology
Verne reprint is to a modern SF novel. |  henry@zoo.toronto.edu   utzoo!henry

roy@phri.nyu.edu (Roy Smith) (07/31/90)

tdd@convex.cl.msu.edu (Thomas D. Davis) wants:
>> software that will read a PostScript file and do nothing more
>> than determine the number of pages it will print.

henry@zoo.toronto.edu (Henry Spencer) says:
> Basically, it can't be done [...] read the printer's page count before
> and after, and subtract -- *it* knows how many pages it printed.

	While Henry's answer is certainly the correct one, it is also
almost certainly not the one Thomas wants to hear.  An alternative approach
is to simply look for a "%Pages" line in the PS file and use the number
therein.  The vast majority of PS-producing programs know how many pages
they intend to print (and indeed will print), and are well-behaved enough
to supply this count for people to use as a PS comment.

	Of course, if you go this route, you have to understand that the
%Pages line might be missing, or wrong (either through accident or
mailicious intent by somebody trying to avoid paying per-page printing
charges).  Use it at your own peril.
--
Roy Smith, Public Health Research Institute
455 First Avenue, New York, NY 10016
roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy
"Arcane?  Did you say arcane?  It wouldn't be Unix if it wasn't arcane!"

mitch@hq.af.mil (Mitch Wright) (07/31/90)

/*
 *  In article <1990Jul26.173633.13911@msuinfo.cl.msu.edu> 
 *  tdd@convex.cl.msu.edu (Thomas D. Davis) writes:
 * 
 */

Thomas> looking for software that will read a PostScript file and do nothing
Thomas> more than determine the number of pages it will print.  Does anyone
Thomas> know of such a beast?

For a ``guess-timate'', I do an egrep(1) for the word "showpage".  Many
times this is set to another function name that does the showpage along with
other routines.  Obviously, you would have to search for that string instead.

The number of pages **should** then be:

(# found) -1* -1**

*=  Subtract one if the string is a declared routine
**= Subtract one if there is a default header page

WARNING::  This will ONLY be a rough estimate!

--
   ..mitch

   mitch@hq.af.mil (Mitch Wright) | The Pentagon, 1B1046 | (202) 695-0262

   The two most common things in the universe are hydrogen and stupidity,
   but not necessarily in that order. 

skdutta@cs.tamu.edu (Saumen K Dutta) (07/31/90)

tdd@convex.cl.msu.edu (Thomas D. Davis) wants:
>> software that will read a PostScript file and do nothing more
>> than determine the number of pages it will print.

henry@zoo.toronto.edu (Henry Spencer) says:
> Basically, it can't be done [...] read the printer's page count before
> and after, and subtract -- *it* knows how many pages it printed.

roy@phri.nyu.edu (Roy Smith) writes:
>   An alternative approach
>is to simply look for a "%Pages" line in the PS file and use the number
>therein.  The vast majority of PS-producing programs know how many pages
>they intend to print (and indeed will print), and are well-behaved enough
>to supply this count for people to use as a PS comment.
>
>	Of course, if you go this route, you have to understand that the
>%Pages line might be missing, or wrong (either through accident or
>mailicious intent by somebody trying to avoid paying per-page printing
>charges).  Use it at your own peril.


If you think there could be problems with the %Pages in postscript
file due no non-conformance of standards or suspecting malicious
intent you could also check up for the control character ^L in the
postscript file which is usually used for pagebreak



--
     _                                   ||Internet: skdutta@cssun.tamu.edu  
    (   /_     _ /   --/-/- _            ||Bitnet : skd8107@tamvenus.bitnet 
   __)_/(_____(_/_(_/_(_(__(_/_______    ||Uucp : uunet!cssun.tamu.edu!skdutta
                                 ..      ||Yellnet: (409) 846-8803

jgreely@oz.cis.ohio-state.edu (J Greely) (07/31/90)

In article <1990Jul30.153325.15342@zoo.toronto.edu> henry@zoo.toronto.edu
 (Henry Spencer) writes:
>The usual technique is to read the printer's page count before and after,
>and subtract -- *it* knows how many pages it printed.

Which is great for collecting usage information, but something I've
been considering is changing my spooler to redefine showpage before
each file is sent to the printer, aborting if it attempts to print
more than X pages.  Nice when you have students who send their output
to a printer without checking first.
--
J Greely (jgreely@cis.ohio-state.edu; osu-cis!jgreely)

glenn@heaven.woodside.ca.us (Glenn Reid) (07/31/90)

In article <6988@helios.TAMU.EDU> skdutta@cs.tamu.edu (Saumen K Dutta) writes:
>If you think there could be problems with the %Pages in postscript
>file due no non-conformance of standards or suspecting malicious
>intent you could also check up for the control character ^L in the
>postscript file which is usually used for pagebreak

In five years of looking at PostScript files I have never seen a control-L
character embedded in a file, nor should they be.  What application or
system do you use such that a ^L is "usually used for pagebreak," out
of curiosity?

I can't say I'd recommend that technique, although I guess it can't hurt
to look for ^L.

I have a feeling there is a midway-point between the "you can't count
pages without an interpreter" answer and the "just look at the %%Pages
comment" answers, although I'll admit that I haven't actually tried
it.

In reality, almost every PostScript driver defines some procedure in the
prologue the file called "EndPage" or "EP" or "SP" or whatever, and
there is one invocation of that procedure for each page printed.  With
a moderately simple parser, it should be possible to figure out the
name of the procedure that contains "showpage" in it somewhere, then
go looking for instances of that procedure name in the rest of the
document.  Example:

	/BP { /page_save save def } def		% begin-page
	/EP { page_save restore showpage } def	% end-page
	 ...
	BP
	 % lots of stuff
	EP

Any volunteers to write the simple parser?  If it can recognize the
basic syntax

	/name { { stuff } }

I suspect that you could find the procedure that contained "showpage"
fairly easily.  Any volunteers?  Want to endear yourself to the
PostScript community with a great page-counting program?  [ Want to
earn real $$ in your spare time sharpening saw blades? ]

(Glenn) cvn

-- 
 Glenn Reid				PostScript/NeXT consultant
 glenn@heaven.woodside.ca.us		Independent Software Developer
 ..{adobe,next}!heaven!glenn		415-851-1785

skdutta@cs.tamu.edu (Saumen K Dutta) (07/31/90)

in Article <226@heaven.woodside.ca.us> glenn@heaven.woodside.ca.us
(Glenn Reid) writes:
>
>In five years of looking at PostScript files I have never seen a control-L
>character embedded in a file, nor should they be.  What application or
>system do you use such that a ^L is "usually used for pagebreak," out
>of curiosity?
>
>I can't say I'd recommend that technique, although I guess it can't hurt
>to look for ^L.
>

Well, I saw them in some of the postscript outputs from dvips
translator. It was just at the place where you expect it to
be! I am not very sure whether it was inherited from the 
dvi file which was translated to the postscript file. Anyway
I am enclosing here some part of the file for your info. 

%!
%%Dimensions: 0 0 612 792
%%Title: dvips test
%%CreationDate: Tue Jul 31 01:48:38 1990
%%Creator: skdutta and [TeX82 DVI Translator Version 2.10b for
PostScri-
%-pt [Apple LaserWriter laser printer]
%%Pages: (atend)
%%BugHistory: Incorporates Allan Hetzel's 31-Oct-85 DARPA LASER-LOVERS
-
%-PS Version 23.0 X-on/X-off bug workaround
%%EndComments
%%EndProlog

.......
......
^L%%Page: 2 2

.......
.....
^L%%Page: 3 3

......
....
etc.


The ^L has been retyped as caret-L just to avoid pagebreak in
this :->


--
     _                                   ||Internet: skdutta@cssun.tamu.edu  
    (   /_     _ /   --/-/- _            ||Bitnet : skd8107@tamvenus.bitnet 
   __)_/(_____(_/_(_/_(_(__(_/_______    ||Uucp : uunet!cssun.tamu.edu!skdutta
                                 ..      ||Yellnet: (409) 846-8803

lau@kings.wharton.upenn.edu (Yan K. Lau) (07/31/90)

In article <226@heaven.woodside.ca.us> glenn@heaven.UUCP (Glenn Reid) writes:
>
>Any volunteers to write the simple parser?  If it can recognize the
>basic syntax
>
>	/name { { stuff } }
>
>I suspect that you could find the procedure that contained "showpage"
>fairly easily.  Any volunteers?  Want to endear yourself to the
>
>(Glenn) cvn
I suspect most applications generate "well-behaved" code that a simple parser
can be used to count pages.  But users can get very devious if they write
their own code or change the code of a PS file generated by an application.
Two things that I can think of off the top of my head.  The user can replace
the showpage with a copypage and erasepage.  Also, the parser would need to
deal the the #copies.  Looks like the simple parser needs a little more work.


Yan.
   )~  Yan K. Lau    lau@kings.wharton.upenn.edu      The Wharton School
 ~/~   -Sheenaphile-          128.91.11.233       University of Pennsylvania
 /\    God/Goddess/All that is -- the source of love, light and inspiration!

skdutta@cs.tamu.edu (Saumen K Dutta) (08/01/90)

In article <226@heaven.woodside.ca.us> glenn@heaven..... writes
>
>In reality, almost every PostScript driver defines some procedure in the
>prologue the file called "EndPage" or "EP" or "SP" or whatever, and
>there is one invocation of that procedure for each page printed.  With
>a moderately simple parser, it should be possible to figure out the
>name of the procedure that contains "showpage" in it somewhere, then
>go looking for instances of that procedure name in the rest of the
>document.  Example:
>
>	/BP { /page_save save def } def		% begin-page
>	/EP { page_save restore showpage } def	% end-page
>	 ...
>	hn GilmoBP
>	 % lots of stuff
>	EP
>
>Any volunteers to write the simple parser?  If it can recognize the
>basic syntax
>
>	/name { { stuff } }
>
>I suspect that you could find the procedure that contained "showpage"
>fairly easily.  Any volunteers?  Want to endear yourself to the
>PostScript community with a great page-counting program?  
>

Based on the above idea I tried to write a simple scanner ( I won't
call it parser ) with well available "awk" script. I ran it in one or
two outputs and it works well. I request anybody interested to read it,
use it and criticize it! I am not responsible for any bugs though I
would like to know about them.

--------------- CUT HERE --------------------------

BEGIN {
	found = 0; bracket = 0; procname = 0; Pages = 0}

{for (i = 1; i <= NF; i++) {

		if ( substr($i,1,1) == "%" ) {next}; 

		if (substr($1,1,1) == "/") {  
			if ((bracket == 0)&&(found == 0))
				{procname = substr($1,2,length($1)-1)}
		};  

		if ( substr($i,1,1) == "{" ) { bracket ++};

		if ( substr($i,length($i),1) == "}" ) {bracket --};

		if ($i == "showpage") {found = 1 };

		if ((found == 1)&&($i == procname)) { Pages ++};
	}
}

END {printf("Pages : %d\n",Pages)}

----------- CUT HERE ---------------------------------

to run it put it into a file and run the command

awk -f <program-file> <postscript-file>











--
     _                                   ||Internet: skdutta@cssun.tamu.edu  
    (   /_     _ /   --/-/- _            ||Bitnet : skd8107@tamvenus.bitnet 
   __)_/(_____(_/_(_/_(_(__(_/_______    ||Uucp : uunet!cssun.tamu.edu!skdutta
                                 ..      ||Yellnet: (409) 846-8803

lee@sq.sq.com (Liam R. E. Quin) (08/02/90)

Glenn Reid <glenn@heaven.UUCP> writes:
>I have a feeling there is a midway-point between the "you can't count
>pages without an interpreter" answer and the "just look at the %%Pages
>comment" answers [...]
and then suggests writing a simple parser to look for "showpage".

Some problems with that:
* you'd have to count copypage as well, of course
* you'd also have to take #copies into account (rather harder)
* what about
  1 NCopies 1 { copypage } repeat erasepage
  produced by applications whose programmers didn't know about #copies...?

The original poster wanted to prevent students printing large jobs,
presumably either accidentally or intentionally.  The former case can be
met in most cases by looking for %%Page and #copies.
The deliberate Blatter Of Large Documents (BOLD) can be stopped by
threatening death, or stringing up by the ankles... but not by anything
else short of interpreting enough of the file completely to determine
that it exceeds the Allowed Page Count.

For example, it is clear that Woody and Friends can produce a "cexec"
file that calls showpage from machine-code.  You couldn't do anything
about that.

And routines to hex-encode data are easily available.

Perhaps a better way would be to redefine showpage and copypage in the
printer at the start of each job to cause an error after MAX$JOB pages.
	/oldshowpage { showpage } load bind etc def
	/showpage {
	    NumberOfPages LotsAndLots gt
	    { please ignore to EOF }
	    { oldshowpage }
	    ifelse
	}

this relies on the undocumented PostScript command "please ignore to EOF"
which does exactly what it says [0.75 :-)].

Lee
-- 
Liam R. E. Quin,  lee@sq.com, {utai,utzoo}!sq!lee,  SoftQuad Inc., Toronto
``He left her a copy of his calculations [...]  Since she was a cystologist,
  she might have analysed the equations, but at the moment she was occupied
  with knitting a bootee.''  [John Boyd, Pollinators of Eden, 217]

glenn@heaven.woodside.ca.us (Glenn Reid) (08/02/90)

In article <1990Aug1.214448.17881@sq.sq.com> lee@sq.sq.com (Liam R. E. Quin) writes:
>Glenn Reid <glenn@heaven.UUCP> writes:
>>I have a feeling there is a midway-point between the "you can't count
>>pages without an interpreter" answer and the "just look at the %%Pages
>>comment" answers [...]
>and then suggests writing a simple parser to look for "showpage".

>Some problems with that:
>* you'd have to count copypage as well, of course
>* you'd also have to take #copies into account (rather harder)
>* what about
>  1 NCopies 1 { copypage } repeat erasepage
>  produced by applications whose programmers didn't know about #copies...?

>The original poster wanted to prevent students printing large jobs,
>presumably either accidentally or intentionally.  The former case can be
>met in most cases by looking for %%Page and #copies.

Hmmm.  If the problem is multiple copies of the same page, then the
scheme that I proposed won't do much good, admittedly.  I doubt that
that comes up very often, but I defer to those who actually needed
the page counter to begin with :-)

>Perhaps a better way would be to redefine showpage and copypage in the
>printer at the start of each job to cause an error after MAX$JOB pages.
>	/oldshowpage { showpage } load bind etc def
>	/showpage {
>	    NumberOfPages LotsAndLots gt
>	    { please ignore to EOF }
>	    { oldshowpage }
>	    ifelse
>	}

This is a good approach, but suffers from all the same defects (it doesn't
pay attention to "copypage" and "#copies", for example).  It can also
be fairly easily defeated by anybody who cares to do so, with
"systemdict /showpage get exec".

The advantage of the parser running on the host computer is that you
get to decide whether or not to print it at all.  For example, the
aforementioned "simple parser" could simply REJECT jobs that contained
"copypage" or "#copies" anywhere in the body of the document, on the
grounds that they aren't necessary and they may be malicious.

I liked the awk script that Saumen Dutta posted....

Anyway, I still think that counting instances of "showpage" will work
on a large percentage of the world's documents, and the program can be
made more and more paranoid and careful as desired.

/Glenn

P.S.  While we're at it, what a great idea it would be to put a filter
into "lpd" that rejects any PostScript file that contains "systemdict",
"erasepage", "72 mul", "#copies" or "Courier" :-)

-- 
 Glenn Reid				PostScript/NeXT consultant
 glenn@heaven.woodside.ca.us		Independent Software Developer
 ..{adobe,next}!heaven!glenn		415-851-1785