[comp.compilers] recursive-descent

johnl@ima.UUCP (08/16/87)

A couple years ago I implemented a parser for a fairly large language
(900+ states) using a SLR(1) parser generator, which I also wrote.  (It
was a cleaned-up version of one I wrote for a formal languages class; 
it's written in Common Lisp.)  I found that it was necessary
to provide two distinct phases of error handling -- error signalling and
error recovery.

From what I remember of YACC, it provides only a way to do the error
recovery:  a way to include special error productions, something like

      <statement> :== <error> <;>

The problem is that once the syntax error is detected, it then has to
skip ahead past a whole bunch of input tokens until it finds the
semicolon, and *then* it will execute the actions associated with the
production.  If the action you specified was printing a diagnostic
message, you've just lost all the context for the message.

For my parser generator (called STACC), I took a slightly different
approach.  As soon as the parser recognizes it's gotten an invalid
token, it calls a user-supplied error signalling function.  For my
application, I had the tokenizer buffer the last line of input, so that
my signalling function could produce a nice message indicating the
context of the error.  It's also possible to look at the internal
state of the parser at this time, if you want to include specific
information about what went wrong in your message.  The second phase is
error recovery, and it happens much as in YACC: (1) the parse stack is
popped until it finds a state with an error transition; (2) the error
state is pushed; (3) tokens are read and discarded until one is found
that has a valid transition, and (4) parsing continues normally from
there.

Incidentally, I would never have undertaken this project if I had had
to write the parser by hand.  Even with the time it took me to write
the parser generator, I estimate it would have taken me two or three
times longer to do it by hand, and the resulting code would have been
much larger and more difficult to maintain.  (It's particularly easy to
implement parser generators in Lisp, given its ability to manipulate
code fragments as data objects, and the presence of symbols as a
primitive datatype.)

-Sandra Loosemore
sandra@cs.utah.edu, sandra@utah-cs.uucp
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request