johnl@ima.UUCP (08/16/87)
A couple years ago I implemented a parser for a fairly large language (900+ states) using a SLR(1) parser generator, which I also wrote. (It was a cleaned-up version of one I wrote for a formal languages class; it's written in Common Lisp.) I found that it was necessary to provide two distinct phases of error handling -- error signalling and error recovery. From what I remember of YACC, it provides only a way to do the error recovery: a way to include special error productions, something like <statement> :== <error> <;> The problem is that once the syntax error is detected, it then has to skip ahead past a whole bunch of input tokens until it finds the semicolon, and *then* it will execute the actions associated with the production. If the action you specified was printing a diagnostic message, you've just lost all the context for the message. For my parser generator (called STACC), I took a slightly different approach. As soon as the parser recognizes it's gotten an invalid token, it calls a user-supplied error signalling function. For my application, I had the tokenizer buffer the last line of input, so that my signalling function could produce a nice message indicating the context of the error. It's also possible to look at the internal state of the parser at this time, if you want to include specific information about what went wrong in your message. The second phase is error recovery, and it happens much as in YACC: (1) the parse stack is popped until it finds a state with an error transition; (2) the error state is pushed; (3) tokens are read and discarded until one is found that has a valid transition, and (4) parsing continues normally from there. Incidentally, I would never have undertaken this project if I had had to write the parser by hand. Even with the time it took me to write the parser generator, I estimate it would have taken me two or three times longer to do it by hand, and the resulting code would have been much larger and more difficult to maintain. (It's particularly easy to implement parser generators in Lisp, given its ability to manipulate code fragments as data objects, and the presence of symbols as a primitive datatype.) -Sandra Loosemore sandra@cs.utah.edu, sandra@utah-cs.uucp -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request