tdd@convex.cl.msu.edu (Thomas D. Davis) (07/27/90)
Please excuse me if this is the most-asked question of the year but we're looking for software that will read a PostScript file and do nothing more than determine the number of pages it will print. Does anyone know of such a beast? FWIW, the reason we need this capability is to handle on-demand printing in the micro labs around campus where we charge for LaserWriter output on a per-page basis. We would like to be able to avoid printing jobs that no one ever pays for. I'm sure other folks have run into this... -- Tom Davis | The above statement shall be construed, Network Software Services | interpreted, and governed by me alone. Michigan State University | EMail: tdd@convex.cl.msu.edu
henry@zoo.toronto.edu (Henry Spencer) (07/30/90)
In article <1990Jul26.173633.13911@msuinfo.cl.msu.edu> tdd@convex.cl.msu.edu (Thomas D. Davis) writes: >looking for software that will read a PostScript file and do nothing more >than determine the number of pages it will print. Does anyone know of >such a beast? Basically, it can't be done. You essentially have to do full PostScript interpretation. The language is too complex for any simple shortcut. The usual technique is to read the printer's page count before and after, and subtract -- *it* knows how many pages it printed. -- The 486 is to a modern CPU as a Jules | Henry Spencer at U of Toronto Zoology Verne reprint is to a modern SF novel. | henry@zoo.toronto.edu utzoo!henry
roy@phri.nyu.edu (Roy Smith) (07/31/90)
tdd@convex.cl.msu.edu (Thomas D. Davis) wants: >> software that will read a PostScript file and do nothing more >> than determine the number of pages it will print. henry@zoo.toronto.edu (Henry Spencer) says: > Basically, it can't be done [...] read the printer's page count before > and after, and subtract -- *it* knows how many pages it printed. While Henry's answer is certainly the correct one, it is also almost certainly not the one Thomas wants to hear. An alternative approach is to simply look for a "%Pages" line in the PS file and use the number therein. The vast majority of PS-producing programs know how many pages they intend to print (and indeed will print), and are well-behaved enough to supply this count for people to use as a PS comment. Of course, if you go this route, you have to understand that the %Pages line might be missing, or wrong (either through accident or mailicious intent by somebody trying to avoid paying per-page printing charges). Use it at your own peril. -- Roy Smith, Public Health Research Institute 455 First Avenue, New York, NY 10016 roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy "Arcane? Did you say arcane? It wouldn't be Unix if it wasn't arcane!"
mitch@hq.af.mil (Mitch Wright) (07/31/90)
/*
* In article <1990Jul26.173633.13911@msuinfo.cl.msu.edu>
* tdd@convex.cl.msu.edu (Thomas D. Davis) writes:
*
*/
Thomas> looking for software that will read a PostScript file and do nothing
Thomas> more than determine the number of pages it will print. Does anyone
Thomas> know of such a beast?
For a ``guess-timate'', I do an egrep(1) for the word "showpage". Many
times this is set to another function name that does the showpage along with
other routines. Obviously, you would have to search for that string instead.
The number of pages **should** then be:
(# found) -1* -1**
*= Subtract one if the string is a declared routine
**= Subtract one if there is a default header page
WARNING:: This will ONLY be a rough estimate!
--
..mitch
mitch@hq.af.mil (Mitch Wright) | The Pentagon, 1B1046 | (202) 695-0262
The two most common things in the universe are hydrogen and stupidity,
but not necessarily in that order.
skdutta@cs.tamu.edu (Saumen K Dutta) (07/31/90)
tdd@convex.cl.msu.edu (Thomas D. Davis) wants: >> software that will read a PostScript file and do nothing more >> than determine the number of pages it will print. henry@zoo.toronto.edu (Henry Spencer) says: > Basically, it can't be done [...] read the printer's page count before > and after, and subtract -- *it* knows how many pages it printed. roy@phri.nyu.edu (Roy Smith) writes: > An alternative approach >is to simply look for a "%Pages" line in the PS file and use the number >therein. The vast majority of PS-producing programs know how many pages >they intend to print (and indeed will print), and are well-behaved enough >to supply this count for people to use as a PS comment. > > Of course, if you go this route, you have to understand that the >%Pages line might be missing, or wrong (either through accident or >mailicious intent by somebody trying to avoid paying per-page printing >charges). Use it at your own peril. If you think there could be problems with the %Pages in postscript file due no non-conformance of standards or suspecting malicious intent you could also check up for the control character ^L in the postscript file which is usually used for pagebreak -- _ ||Internet: skdutta@cssun.tamu.edu ( /_ _ / --/-/- _ ||Bitnet : skd8107@tamvenus.bitnet __)_/(_____(_/_(_/_(_(__(_/_______ ||Uucp : uunet!cssun.tamu.edu!skdutta .. ||Yellnet: (409) 846-8803
jgreely@oz.cis.ohio-state.edu (J Greely) (07/31/90)
In article <1990Jul30.153325.15342@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >The usual technique is to read the printer's page count before and after, >and subtract -- *it* knows how many pages it printed. Which is great for collecting usage information, but something I've been considering is changing my spooler to redefine showpage before each file is sent to the printer, aborting if it attempts to print more than X pages. Nice when you have students who send their output to a printer without checking first. -- J Greely (jgreely@cis.ohio-state.edu; osu-cis!jgreely)
glenn@heaven.woodside.ca.us (Glenn Reid) (07/31/90)
In article <6988@helios.TAMU.EDU> skdutta@cs.tamu.edu (Saumen K Dutta) writes: >If you think there could be problems with the %Pages in postscript >file due no non-conformance of standards or suspecting malicious >intent you could also check up for the control character ^L in the >postscript file which is usually used for pagebreak In five years of looking at PostScript files I have never seen a control-L character embedded in a file, nor should they be. What application or system do you use such that a ^L is "usually used for pagebreak," out of curiosity? I can't say I'd recommend that technique, although I guess it can't hurt to look for ^L. I have a feeling there is a midway-point between the "you can't count pages without an interpreter" answer and the "just look at the %%Pages comment" answers, although I'll admit that I haven't actually tried it. In reality, almost every PostScript driver defines some procedure in the prologue the file called "EndPage" or "EP" or "SP" or whatever, and there is one invocation of that procedure for each page printed. With a moderately simple parser, it should be possible to figure out the name of the procedure that contains "showpage" in it somewhere, then go looking for instances of that procedure name in the rest of the document. Example: /BP { /page_save save def } def % begin-page /EP { page_save restore showpage } def % end-page ... BP % lots of stuff EP Any volunteers to write the simple parser? If it can recognize the basic syntax /name { { stuff } } I suspect that you could find the procedure that contained "showpage" fairly easily. Any volunteers? Want to endear yourself to the PostScript community with a great page-counting program? [ Want to earn real $$ in your spare time sharpening saw blades? ] (Glenn) cvn -- Glenn Reid PostScript/NeXT consultant glenn@heaven.woodside.ca.us Independent Software Developer ..{adobe,next}!heaven!glenn 415-851-1785
skdutta@cs.tamu.edu (Saumen K Dutta) (07/31/90)
in Article <226@heaven.woodside.ca.us> glenn@heaven.woodside.ca.us (Glenn Reid) writes: > >In five years of looking at PostScript files I have never seen a control-L >character embedded in a file, nor should they be. What application or >system do you use such that a ^L is "usually used for pagebreak," out >of curiosity? > >I can't say I'd recommend that technique, although I guess it can't hurt >to look for ^L. > Well, I saw them in some of the postscript outputs from dvips translator. It was just at the place where you expect it to be! I am not very sure whether it was inherited from the dvi file which was translated to the postscript file. Anyway I am enclosing here some part of the file for your info. %! %%Dimensions: 0 0 612 792 %%Title: dvips test %%CreationDate: Tue Jul 31 01:48:38 1990 %%Creator: skdutta and [TeX82 DVI Translator Version 2.10b for PostScri- %-pt [Apple LaserWriter laser printer] %%Pages: (atend) %%BugHistory: Incorporates Allan Hetzel's 31-Oct-85 DARPA LASER-LOVERS - %-PS Version 23.0 X-on/X-off bug workaround %%EndComments %%EndProlog ....... ...... ^L%%Page: 2 2 ....... ..... ^L%%Page: 3 3 ...... .... etc. The ^L has been retyped as caret-L just to avoid pagebreak in this :-> -- _ ||Internet: skdutta@cssun.tamu.edu ( /_ _ / --/-/- _ ||Bitnet : skd8107@tamvenus.bitnet __)_/(_____(_/_(_/_(_(__(_/_______ ||Uucp : uunet!cssun.tamu.edu!skdutta .. ||Yellnet: (409) 846-8803
lau@kings.wharton.upenn.edu (Yan K. Lau) (07/31/90)
In article <226@heaven.woodside.ca.us> glenn@heaven.UUCP (Glenn Reid) writes: > >Any volunteers to write the simple parser? If it can recognize the >basic syntax > > /name { { stuff } } > >I suspect that you could find the procedure that contained "showpage" >fairly easily. Any volunteers? Want to endear yourself to the > >(Glenn) cvn I suspect most applications generate "well-behaved" code that a simple parser can be used to count pages. But users can get very devious if they write their own code or change the code of a PS file generated by an application. Two things that I can think of off the top of my head. The user can replace the showpage with a copypage and erasepage. Also, the parser would need to deal the the #copies. Looks like the simple parser needs a little more work. Yan. )~ Yan K. Lau lau@kings.wharton.upenn.edu The Wharton School ~/~ -Sheenaphile- 128.91.11.233 University of Pennsylvania /\ God/Goddess/All that is -- the source of love, light and inspiration!
skdutta@cs.tamu.edu (Saumen K Dutta) (08/01/90)
In article <226@heaven.woodside.ca.us> glenn@heaven..... writes > >In reality, almost every PostScript driver defines some procedure in the >prologue the file called "EndPage" or "EP" or "SP" or whatever, and >there is one invocation of that procedure for each page printed. With >a moderately simple parser, it should be possible to figure out the >name of the procedure that contains "showpage" in it somewhere, then >go looking for instances of that procedure name in the rest of the >document. Example: > > /BP { /page_save save def } def % begin-page > /EP { page_save restore showpage } def % end-page > ... > hn GilmoBP > % lots of stuff > EP > >Any volunteers to write the simple parser? If it can recognize the >basic syntax > > /name { { stuff } } > >I suspect that you could find the procedure that contained "showpage" >fairly easily. Any volunteers? Want to endear yourself to the >PostScript community with a great page-counting program? > Based on the above idea I tried to write a simple scanner ( I won't call it parser ) with well available "awk" script. I ran it in one or two outputs and it works well. I request anybody interested to read it, use it and criticize it! I am not responsible for any bugs though I would like to know about them. --------------- CUT HERE -------------------------- BEGIN { found = 0; bracket = 0; procname = 0; Pages = 0} {for (i = 1; i <= NF; i++) { if ( substr($i,1,1) == "%" ) {next}; if (substr($1,1,1) == "/") { if ((bracket == 0)&&(found == 0)) {procname = substr($1,2,length($1)-1)} }; if ( substr($i,1,1) == "{" ) { bracket ++}; if ( substr($i,length($i),1) == "}" ) {bracket --}; if ($i == "showpage") {found = 1 }; if ((found == 1)&&($i == procname)) { Pages ++}; } } END {printf("Pages : %d\n",Pages)} ----------- CUT HERE --------------------------------- to run it put it into a file and run the command awk -f <program-file> <postscript-file> -- _ ||Internet: skdutta@cssun.tamu.edu ( /_ _ / --/-/- _ ||Bitnet : skd8107@tamvenus.bitnet __)_/(_____(_/_(_/_(_(__(_/_______ ||Uucp : uunet!cssun.tamu.edu!skdutta .. ||Yellnet: (409) 846-8803
lee@sq.sq.com (Liam R. E. Quin) (08/02/90)
Glenn Reid <glenn@heaven.UUCP> writes: >I have a feeling there is a midway-point between the "you can't count >pages without an interpreter" answer and the "just look at the %%Pages >comment" answers [...] and then suggests writing a simple parser to look for "showpage". Some problems with that: * you'd have to count copypage as well, of course * you'd also have to take #copies into account (rather harder) * what about 1 NCopies 1 { copypage } repeat erasepage produced by applications whose programmers didn't know about #copies...? The original poster wanted to prevent students printing large jobs, presumably either accidentally or intentionally. The former case can be met in most cases by looking for %%Page and #copies. The deliberate Blatter Of Large Documents (BOLD) can be stopped by threatening death, or stringing up by the ankles... but not by anything else short of interpreting enough of the file completely to determine that it exceeds the Allowed Page Count. For example, it is clear that Woody and Friends can produce a "cexec" file that calls showpage from machine-code. You couldn't do anything about that. And routines to hex-encode data are easily available. Perhaps a better way would be to redefine showpage and copypage in the printer at the start of each job to cause an error after MAX$JOB pages. /oldshowpage { showpage } load bind etc def /showpage { NumberOfPages LotsAndLots gt { please ignore to EOF } { oldshowpage } ifelse } this relies on the undocumented PostScript command "please ignore to EOF" which does exactly what it says [0.75 :-)]. Lee -- Liam R. E. Quin, lee@sq.com, {utai,utzoo}!sq!lee, SoftQuad Inc., Toronto ``He left her a copy of his calculations [...] Since she was a cystologist, she might have analysed the equations, but at the moment she was occupied with knitting a bootee.'' [John Boyd, Pollinators of Eden, 217]
glenn@heaven.woodside.ca.us (Glenn Reid) (08/02/90)
In article <1990Aug1.214448.17881@sq.sq.com> lee@sq.sq.com (Liam R. E. Quin) writes: >Glenn Reid <glenn@heaven.UUCP> writes: >>I have a feeling there is a midway-point between the "you can't count >>pages without an interpreter" answer and the "just look at the %%Pages >>comment" answers [...] >and then suggests writing a simple parser to look for "showpage". >Some problems with that: >* you'd have to count copypage as well, of course >* you'd also have to take #copies into account (rather harder) >* what about > 1 NCopies 1 { copypage } repeat erasepage > produced by applications whose programmers didn't know about #copies...? >The original poster wanted to prevent students printing large jobs, >presumably either accidentally or intentionally. The former case can be >met in most cases by looking for %%Page and #copies. Hmmm. If the problem is multiple copies of the same page, then the scheme that I proposed won't do much good, admittedly. I doubt that that comes up very often, but I defer to those who actually needed the page counter to begin with :-) >Perhaps a better way would be to redefine showpage and copypage in the >printer at the start of each job to cause an error after MAX$JOB pages. > /oldshowpage { showpage } load bind etc def > /showpage { > NumberOfPages LotsAndLots gt > { please ignore to EOF } > { oldshowpage } > ifelse > } This is a good approach, but suffers from all the same defects (it doesn't pay attention to "copypage" and "#copies", for example). It can also be fairly easily defeated by anybody who cares to do so, with "systemdict /showpage get exec". The advantage of the parser running on the host computer is that you get to decide whether or not to print it at all. For example, the aforementioned "simple parser" could simply REJECT jobs that contained "copypage" or "#copies" anywhere in the body of the document, on the grounds that they aren't necessary and they may be malicious. I liked the awk script that Saumen Dutta posted.... Anyway, I still think that counting instances of "showpage" will work on a large percentage of the world's documents, and the program can be made more and more paranoid and careful as desired. /Glenn P.S. While we're at it, what a great idea it would be to put a filter into "lpd" that rejects any PostScript file that contains "systemdict", "erasepage", "72 mul", "#copies" or "Courier" :-) -- Glenn Reid PostScript/NeXT consultant glenn@heaven.woodside.ca.us Independent Software Developer ..{adobe,next}!heaven!glenn 415-851-1785