[comp.compilers] old yacc bug. Fixed?

djones@megatest.uucp (Dave Jones) (10/05/88)

I'm in the process of writing a compiler using yacc.  A fellow worker
loaned me  "Introduction to Compiler Construction with Unix"
by Axel T., Schreiner and H. George Friedman, Jr., Prentice Hall, 1985,
so I've been trying to read it.

The book alludes to a bug in yacc. The main thing I want to know is
whether the Sun Unix 4.2 release 3.4 has the bug, and if so what do
I do about it?

They give some examples of using the "error" token, with the following
footnote, which I quote verbatim:

   This way of extending repetative constructs has a drawback due to
   a bug in yacc ( as distributed with Bell version 7, Berkeley 
   4.2bsd [sic], and various derivatives): if in a state the default 
   action is to reduce, and if the next terminal symbol cannot be 
   shifted but _error_ could be (e.g., on a trailing _error_ rule),
   yacc's tables dictate that the reduction take place, event if the next 
   terminal symbol cannot be shifted subsequently.  In this case error
   recovery takes place "too late", and the parser can, in fact, go
   into a loop, mistakenly reduce rules several times, etc.  The
   4.1bsd [sic] distribution actually contains a correction for this
   bug, based on ["Practical LR error recovery", Sigplan Notices, 
   Aug. 79]. Essentially, in these cases all possible inputs must be
   enumerated, so that the error can be detected; this results in
   slightly larger parser tables.  The correction in 4.1bsd [sic] 
   contains a typographical error, however.  A definite correction 
   is available from the authors (S. Johnson, personal 
   communication, 1982)

Notice that it says the bug is fixed in 4.1 but not 4.2, except
that the fix is wrong!  At least that's the interpretation I put on it.
Can anyone make heads or tails of this gobblety-gook?  (I love that
"in fact" part.)

Some questions"

  1) How one may determine whether or not a given yacc has the bug?
  2) What to do about it if you've got a buggy one?
  3) Exactly what are the consequences -- does "go into a loop" mean
       "loop forever"?
  4) What is the correction in 4.1? Is it in the source code?
  5) What is the typographical error in the correction?
  6) What is the "definite correction"? Do you have to have source?
  7) How to obtain the correction from the authors?

The best try I can make is that maybe one could expect to see
entries of this form:


state 3
         X : Y _ error  (6)
         Z : A _        (5)

         error shift 4
         . reduce 5

And in this case the parser would reduce by rule 5, when shifting 
_error_ is in order.  But I can't find any such states in any
output I can generate.  Besides, this would just cause some extra
input to be thrown away.  The parser would not get into a loop
or make extra reductions.

Please, HELP! 
[From djones@megatest.uucp (Dave Jones)]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

friedman@m.cs.uiuc.edu (10/07/88)

Dave Jones (djones@megatest.uucp) writes in part:

> I'm in the process of writing a compiler using yacc.  A fellow worker
> loaned me  "Introduction to Compiler Construction with Unix"
> by Axel T., Schreiner and H. George Friedman, Jr., Prentice Hall, 1985,
> so I've been trying to read it.

> The book alludes to a bug in yacc. The main thing I want to know is
> whether the Sun Unix 4.2 release 3.4 has the bug, and if so what do
> I do about it?

I'm one of the authors of this book, George Friedman.

I don't know whether this, or any other particular version of Unix, has a
bug-free yacc.  The best suggestion I can give is to try out some of the
kinds of code that we mention in the book will cause problems, and see if
they do indeed cause those problems on your system.

The correction, and machine readable copies of the files in the book, are
available from me, as stated in the book.  I'd be pleased to send a copy to
anyone who has purchased the book.  (Please pardon me if I'm less helpful
to those who have not.)

The two easiest methods of distribution are by magnetic tape, or by using
"ftp" if it is available to you.  You can send a tape to me at the address
below.  Return postage for the tape would be appreciated, but is not required.

If you prefer something other than tar format at 1600 cpi, please let me know.

With "ftp", it is easy.  Our machine is called a.cs.uiuc.edu (formerly known
as uiucdcs).  In the ftp home directory, pub/friedman is a subdirectory, from
which one can copy out files "tar" and "tar.out".  File tar is a tar output
file, and tar.out is the listing of what is in file tar.

       		H. George Friedman, Jr.
       		Department of Computer Science
       		University of Illinois at Urbana-Champaign
       		1304 West Springfield Avenue
       		Urbana, Illinois  61801

       		USENET, UUCP:  uunet!uiucdcs!friedman
       		CSNET, ARPA:   friedman@cs.uiuc.edu
		BITNET:        friedman@cs.uiuc.edu
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

scjones@uunet.uu.net (Larry Jones) (10/11/88)

In article <2737@ima.ima.isc.com>, djones@megatest.uucp (Dave Jones) writes:
> The book [Introduction to Compiler Construction with Unix by Schreiner
> and Friedman] alludes to a bug in yacc. The main thing I want to know is
> whether the Sun Unix 4.2 release 3.4 has the bug, and if so what do
> I do about it?

I can't speak for Sun or Berkeley, but I know that it's fixed on the
version of System V Release 3.0 that I have.

> Some questions"
>   1) How one may determine whether or not a given yacc has the bug?

Try yaccing the following grammar with -v:

	%token a
	%%
	s    : oseq
	     ;
	oseq : /* empty */
	     | oseq a
	     | oseq error
	     ;

and then look at the y.output file.  If your yacc has the bug, you will
see a state definition that looks like:

	state 2
		s :  oseq_    (1)
		oseq :  oseq_a 
		oseq :  oseq_error 
		
		error  shift 4
		a  shift 3
		.  reduce 1

For reference, the correct state definition is:

	state 2
		s :  oseq _     (1)
		oseq :  oseq _ a 
		oseq :  oseq _ error 
	
		$end  reduce 1
		error  shift 4
		a  shift 3
		.  error

That is, rule 1 should only be reduced if the next input token is the
end marker.  Other input tokens should be considered errors.

>   2) What to do about it if you've got a buggy one?

Complain to your vendor?

>   3) Exactly what are the consequences -- does "go into a loop" mean
>        "loop forever"?

I believe it is possible to get into an infinite loop, although I haven't
seen it happen.  What I have seen is the same rule get reduced twice
in a row even though the grammar should not allow that to occur.

>   4) What is the correction in 4.1? Is it in the source code?
>   5) What is the typographical error in the correction?

Don't know.  I don't have access to any Berkeley stuff.

>   6) What is the "definite correction"? Do you have to have source?

The correction is ridiculously simple if you have the source code - as I
recall it consisted of changing a 2 to a 1 in one place in the code.
Unfortunately, I don't seem to be able to lay my hands on it right now
to give you the details.  (I never did understand why S & F didn't just
publish it instead of screwing around like this!)

>   7) How to obtain the correction from the authors?

I wrote a letter to Friedman and asked for it.  I'm sure he's on the net,
although I don't have any idea what the address is.

----
Larry Jones                         UUCP: uunet!sdrc!scjones
SDRC                                      scjones@sdrc.uucp
2000 Eastman Dr.                    BIX:  ltl
Milford, OH  45150                  AT&T: (513) 576-2070
"Save the Quayles" - Mark Russell
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request