[comp.lang.postscript] PostScript Grammar

bradlee@cg-atla.UUCP (Rob Bradlee) (07/26/89)

Has anyone seen or created their own grammar to describe PostScript?
How about using yacc and lex to parse PostScript (including comments)?
I'm having a go at it, and would love to hear anyone else's ideas or
efforts.   Thanks in advance.


-- 
Rob Bradlee  w:(508)-658-5600 X5153  h:(617)-944-5595
AGFA Compugraphic Division.    ...!{ima,ulowell,ism780c}!cg-atla!bradlee
200 Ballardvale St.
Wilmington, Mass. 01887           The Nordic Way: Ski till it hurts!

batcheldern@level.dec.com (Ned Batchelder) (07/27/89)

In article <7456@cg-atla.UUCP>, bradlee@cg-atla.UUCP (Rob Bradlee) writes:

> Has anyone seen or created their own grammar to describe PostScript?
> How about using yacc and lex to parse PostScript (including comments)?
> I'm having a go at it, and would love to hear anyone else's ideas or
> efforts.   Thanks in advance.

You could easily use lex to tokenize PostScript, but I don't think a grammar
makes much sense. PostScript is purely token-oriented; after tokens, there
really isn't much else for structure. For example, this is a common
construct:

	foo bar gt
	{	this do }
	{	that do }
	ifelse

but in fact, there is no rule that it has to be done this way. I could have
said:

	{	this do }
	{	that do }
	foo bar gt
	3 1 roll
	ifelse

or,
	/baz load
	/quux load
	lic		% (Long Involved Computation)
	foo bar gt
	3 1 roll
	{ ifelse } stopped pop

or any number of other bizarre things. So long as there is a boolean and
two executables on the stack when the ifelse is executed, it's legal. So
to determine if you had valid PostScript "syntax", you would have to
interpret the tokens, not just clump them together into a grammar.

And by the way, even at the token level, PostScript can get kind of
tricky, since the PostScript program can take over the reading of the
input from the interpreter. Check out any large image file:

	set_up_the_image
	foo bar baz image
	12d480c7b61a2c3e9f8d7c6a2f1e39d8c723b1987a58745
	% lots more lines of hex stuff...
	120d7a23498762d34a98f7e2c34b6978e62f3d498a76234
	showpage

Even to know that the hex should not be tokenized would require complex
interpretation of the PostScript.

Ned Batchelder, Digital Equipment Corp., BatchelderN@Hannah.DEC.com