[comp.compilers] More on yacc error reporting

drt@notecnirp.Princeton.EDU (David Tarditi) (08/10/89)
   The previous article by Heirach may contain some errors.  The principal
error is the confusion of possible lookahead sets for a reduction in a state
with the set of expected tokens.

  The possible lookaheads for a reduction may be larger than the set of
expected tokens, given the left context up to the point of the error in the
parse. This is because Yacc is an LALR parser and combines states containing
identical sets of items.  Thus the lookahead for a reduction may contain
some terminals for which a reduction is possible, but which will be detected
as an error later in the parse before the terminal is even shifted.

  This problem exists even without default reductions being present.  Default
reductions merely add the complication, as mentioned before, of ending up
in a state where possible shifts have been lost. 

  Here is an example showing how the possible lookaheads for a reduction
may be larger than the expected tokens:

Consider the following example:

	S -> A
	S -> a A b
	A -> x

The LALR(1) set of items is

  (1)	S -> .A	 , $		goto on A to (2)
	S -> .a A b, $		shift on a to (4)
	A -> .x	, $		shift  on x to (3)

  (2)   S -> A.  , $		reduce on $

  (3)   A -> x.	 , b/$		reduce on b or $
 
  (4)   S -> a .A b , $		goto on A to (5)
	A -> .x , b		shift on x to (3)
	
  (5)   S -> a A . b , $	shift on b to (6)
	
  (6)  S -> a A b., $		reduce on $

Consider the input string:

	x q

The actions would be

	shift on x from state 1 to state 3
	error in state 3  - possible lookahead is $ or b, but actually only $

 Now look at the input string:

	a x q

	shift on a from state 1 to state 4
	shift on x from state 4 to state 3
	error in state 3 - possible lookahead is $ or b, but actually only b

  Without default reductions, the set of expected tokens is easy to
calculate.  Take each token in the lookahead, and parse until either the
token is shifted or an error is encountered (whichever happens first).   If an
error is encountered, it is not possible to shift the token here, and the
token is not expected.

  Heirich does have the right idea for solving the problem with default
actions. Grab the state we are in before any reduce actions
called for by the current terminal are performed.   This can be done by
grabbing the state immediately after yylex returns a token, as Heirich points
out.

  When an error is detected, compute a list of states that would be
passed through while performing default reductions.  Start with the
saved state and perform default reductions until none are possible.
Then take the list and combine all the possible tokens which could be
shifted in each state to get a list of expected tokens.

  This requires adding code to save the stack for the LR parser and
to turn off the execution of semantic actions during the test parsing.
-------------------------------
David Tarditi	(drt@notecnirp.princeton.edu)
-- 
--
Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU
{ decvax | harvard | yale | bbn }!ima.  Meta-mail to ima!compilers-request.
Please send responses to the author of the message, not the poster.