[comp.compilers] Parser error detection and recovery

georgev@fiu.edu (Vincent George) (03/22/91)

I was interested in any published research material dealing with syntax
error detection and recovery for LL(k) or LR(k) parsers. I am really
interested in the "recovery" methodologies employed by parser generators.

Also, in regards to the very popular YACC parser generator, I was
interested in any research done in the area of maximizing the error
detection and recovery capabilities of a YACC generated parser by managing
the "proper" placement of error productions in a grammar.

Please direct any information and/or comments to:

			 georgev@fiu.edu

Thanks in advance.   Vince George.

[There has been considerable discussion of this in the past.  Yacc's
default reductions make error recovery hard, since by the time it realizes
there's an error it may have lost much of the context that one would want
to use to recover from the error. -John]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

snyder@CHILDE.CS.NYU.EDU (Kirk Snyder) (03/23/91)

This is a response to the request by Vincent George for information on
error recovery in LR parser generators.
                                          
A reasonable place to start in looking for information about error
recovery is `A Practical Method for Constructing Efficient LALR(k) Parsers
with Automatic Error Recovery', by Phillippe Charles.  This is his Ph.D.
thesis at NYU.  Phillippe just defended a couple of weeks ago, so it might
be tough to get his thesis right away.  I know he submitted it to the
recording office, but I don't know how long it takes to get from there to
University Microfilms.

Phillippe's work includes recognizing errors at the earliest possible
point even if default rules are used, and automatic recognition of scopes
without adding error productions to the grammar.  Phillippe's method has
been implemented, but at the moment the implementation is IBM proprietary.
I've begun working on a non-proprietary version, but it's not a high
priority project for me.  If you'd like me to let you know when it's
finished send me email.

-- Kirk Snyder
   snyder@cs.nyu.edu
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

arnold@audiofax.com (Arnold Robbins) (03/23/91)

In article <9103211951.AA01615@fiu.edu> georgev@fiu.edu (Vincent George) writes:
>I was interested in any published research material dealing with syntax
>error detection and recovery for LL(k) or LR(k) parsers. I am really
>interested in the "recovery" methodologies employed by parser generators.

Check out either "Crafting A Compiler", or the newer "Crafting A Compiler
With C", both by Fischer & LeBlanc for good discussions of error recovery
in both LL and LR parsers.  Alas, I know of no Yacc compatible parser
generator using any of the methods discussed.

Disclaimer: I did the C code for "Crafting A Compiler With C".
-- 
Arnold Robbins				AudioFAX, Inc.
2000 Powers Ferry Road, #200 / Marietta, GA. 30067    
INTERNET: arnold@audiofax.com Phone:   +1 404 618 4281
UUCP:	  emory!audfax!arnold Fax-box: +1 404 618 4581
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

rekers@cwi.nl (Jan Rekers) (03/25/91)

In article <9103211951.AA01615@fiu.edu>, georgev@fiu.edu (Vincent
George) writes:
|> I was interested in any published research material dealing with syntax
|> error detection and recovery for LL(k) or LR(k) parsers. I am really
|> interested in the "recovery" methodologies employed by parser generators.

You might be interested in the following paper by Philippe Charles: An
LR(k) Error Diagnosis and Recovery Method. He describes a very
satisfactory method of error recovery, purely based on the grammar.

The paper can be found in the proceedings of the 2nd international
workshop on parsing technologies, Cancun, Mexico, February 1991,
Association for Computational Linguistics.

The email adress of Philippe Charles is charles@ibm.com

Jan Rekers (rekers@cwi.nl)   Centrum voor Wiskunde en Informatica (CWI)
			     P.O. Box 4079, 1009 AB Amsterdam, The Netherlands
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

djones@megatest.com (Dave Jones) (03/26/91)

>From article <9103211951.AA01615@fiu.edu>, by georgev@fiu.edu (Vincent George):

> [There has been considerable discussion of this in the past.  Yacc's
> default reductions make error recovery hard, since by the time it realizes
> there's an error it may have lost much of the context that one would want
> to use to recover from the error. -John]

The default reductions have never given me any trouble. As you will
recall, they do make it very tedious to list all the legal tokens when an
error occurs.

However, the original yacc has a bug related to default-reductions which
makes proper error-recovery impossible: It creates states with default
reductions and a shift-action on "error". The bug is a typo. To fix it,
find the following line in file y3.c 


		if(temp1[1] > 0) lastred = 0;

Change it to read

		if(temp1[2] > 0) lastred = 0;
[Well, what do you know. -John]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

scott@bbxsda.UUCP (Scott Amspoker) (03/29/91)

In article <16273@prometheus.megatest.UUCP> djones@megatest.com (Dave Jones) writes:
>However, the original yacc has a bug related to default-reductions which
>makes proper error-recovery impossible: It creates states with default
>reductions and a shift-action on "error". The bug is a typo. ...

That might explain some problems I've been having.  Can somebody post (or
mail) a short yacc file that demonstrates the problem?  I need to determine
if the yacc I am using has this bug.

-- 
Scott Amspoker
Basis International, Albuquerque, NM
(505) 345-5232
unmvax.cs.unm.edu!bbx!bbxsda!scott
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

djones@decwrl.dec.com (Dave Jones) (04/02/91)

>From article <1796@bbxsda.UUCP>, by scott@bbxsda.UUCP (Scott Amspoker):
> In article <16273@prometheus.megatest.UUCP> djones@megatest.com (Dave Jones) writes:
>>However, the original yacc has a bug related to default-reductions which
>>makes proper error-recovery impossible: It creates states with default
>>reductions and a shift-action on "error". The bug is a typo. ...
> 
> That might explain some problems I've been having.  Can somebody post (or
> mail) a short yacc file that demonstrates the problem?  I need to determine
> if the yacc I am using has this bug.

You bet.

Before I get into that, I would like to thank Sun Microsystems. I reported
the bug to them a year or two ago, and in the letter explained how to
change the "1" into a "2" to make it work. Knowing as I do about priorities
at computer companies and "hot" projects, I expected that to be the
end of it; I never expected to see it fixed, even though it was a one-
character change to the source file. We have a source licence from BSD,
so I fixed it myself. But just now I checked out the Sun version, and
by golly, they fixed it. Amazing.

It is also a little amazing that in the years from 1983 and 1989,
the bug went unfixed and probably unnoticed. I guess the grammar-writers just
shook their heads and figured there was something about it they did not
understand.

Luckily, when I researched this before I made up and saved a test case.
Here 'tis.

botch.y
^^^^ snip ^^^^ snip ^^^^ snip ^^^^ snip ^^^^ snip ^^^^ snip ^^^^ snip ^^^^

	%token SYM
	%%
	start  : list
	       ;

	list : /* empty */
	     | list SYM
	     | list error
	     ;

^^^^ snip ^^^^ snip ^^^^ snip ^^^^ snip ^^^^ snip ^^^^ snip ^^^^ snip ^^^^


To test a yacc as to whether it has the bug, do a "yacc -vd botch.y",
and look at the y.output file. If you see a state which can shift "error",
but has a default reduction, you have a buggy yacc.

   Example of BUGGY y.output:

    state 2
        start :  list_    (1)
        list :  list_SYM
        list :  list_error 

        error  shift 4
        SYM  shift 3
        .  reduce 1

It would appear that it can shift to state 4 on a (synthesized) error token,
but in fact it will do the default reduction (reduction 1) before
the error-token can be synthesized by the parser.

A properly working yacc would produce the following:

    state 2
        start :  list_    (1)
        list :  list_SYM
        list :  list_error 

        $end  reduce 1
        error  shift 4
        SYM  shift 3
        .  error

Notice that the default action is "error", not a reduction. The
error-action synthesizes the error-token, which can then be shifted.
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

julia@cs.warwick.ac.uk (Julia Dain) (04/10/91)

This is another response to the request by Vincent George for information
on error recovery in LR parser generators.

Sorry it is rather late, I have not been able to read news for a few
weeks.

My PhD thesis `Automatic Error Recovery for LR Parsers in Theory and
Practice' (University of Warwick, Coventry, UK, 1990) contains
descriptions of 2 methods for automatic error recovery.  These are not
based on error productions.  Both were implemented for yacc and used to
build several different parsers, including for C and Pascal, which were
tested on collections of student C and Pascal programs. The first method
attempts a single token change to the input first; if this "failed"
(criterion was, could the parser accept N more tokens, where N was
parameterized for different versions) a phrase-level recovery based on
replacement of input by a goal non-terminal takes place.  The second
method is based on the concept of minimum distance - it constructs several
possible repairs, from the state information at point of detection of
error, and chooses the one at minimum distance from the actual input.

As John Levine mentioned, there has been a lot of stuff published in this
area.  Other error recovery schemes which have been incorporated into
parser generators have been described by
 Roehrich; Fischer, Milton and Quiring; Anderson and Backhouse;
 Sippu and Soisalon-Soininen; Spenke et al; Boullier and Jourdan;
 Gray; Koskimies et al.
I have got a large bibliography on syntax error handling for LR parsers if
anyone is interested.
I have also written a survey article (submitted for publication...)

--
Julia Dain, Dept Computer Science, University of Warwick, Coventry CV4 7AL, UK
julia@cs.warwick.ac.uk    Phone +44 203 523364    Fax +44 203 525714
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.