djones@megatest.uucp (Dave Jones) (10/05/88)
I'm in the process of writing a compiler using yacc. A fellow worker loaned me "Introduction to Compiler Construction with Unix" by Axel T., Schreiner and H. George Friedman, Jr., Prentice Hall, 1985, so I've been trying to read it. The book alludes to a bug in yacc. The main thing I want to know is whether the Sun Unix 4.2 release 3.4 has the bug, and if so what do I do about it? They give some examples of using the "error" token, with the following footnote, which I quote verbatim: This way of extending repetative constructs has a drawback due to a bug in yacc ( as distributed with Bell version 7, Berkeley 4.2bsd [sic], and various derivatives): if in a state the default action is to reduce, and if the next terminal symbol cannot be shifted but _error_ could be (e.g., on a trailing _error_ rule), yacc's tables dictate that the reduction take place, event if the next terminal symbol cannot be shifted subsequently. In this case error recovery takes place "too late", and the parser can, in fact, go into a loop, mistakenly reduce rules several times, etc. The 4.1bsd [sic] distribution actually contains a correction for this bug, based on ["Practical LR error recovery", Sigplan Notices, Aug. 79]. Essentially, in these cases all possible inputs must be enumerated, so that the error can be detected; this results in slightly larger parser tables. The correction in 4.1bsd [sic] contains a typographical error, however. A definite correction is available from the authors (S. Johnson, personal communication, 1982) Notice that it says the bug is fixed in 4.1 but not 4.2, except that the fix is wrong! At least that's the interpretation I put on it. Can anyone make heads or tails of this gobblety-gook? (I love that "in fact" part.) Some questions" 1) How one may determine whether or not a given yacc has the bug? 2) What to do about it if you've got a buggy one? 3) Exactly what are the consequences -- does "go into a loop" mean "loop forever"? 4) What is the correction in 4.1? Is it in the source code? 5) What is the typographical error in the correction? 6) What is the "definite correction"? Do you have to have source? 7) How to obtain the correction from the authors? The best try I can make is that maybe one could expect to see entries of this form: state 3 X : Y _ error (6) Z : A _ (5) error shift 4 . reduce 5 And in this case the parser would reduce by rule 5, when shifting _error_ is in order. But I can't find any such states in any output I can generate. Besides, this would just cause some extra input to be thrown away. The parser would not get into a loop or make extra reductions. Please, HELP! [From djones@megatest.uucp (Dave Jones)] -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { decvax | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
friedman@m.cs.uiuc.edu (10/07/88)
Dave Jones (djones@megatest.uucp) writes in part: > I'm in the process of writing a compiler using yacc. A fellow worker > loaned me "Introduction to Compiler Construction with Unix" > by Axel T., Schreiner and H. George Friedman, Jr., Prentice Hall, 1985, > so I've been trying to read it. > The book alludes to a bug in yacc. The main thing I want to know is > whether the Sun Unix 4.2 release 3.4 has the bug, and if so what do > I do about it? I'm one of the authors of this book, George Friedman. I don't know whether this, or any other particular version of Unix, has a bug-free yacc. The best suggestion I can give is to try out some of the kinds of code that we mention in the book will cause problems, and see if they do indeed cause those problems on your system. The correction, and machine readable copies of the files in the book, are available from me, as stated in the book. I'd be pleased to send a copy to anyone who has purchased the book. (Please pardon me if I'm less helpful to those who have not.) The two easiest methods of distribution are by magnetic tape, or by using "ftp" if it is available to you. You can send a tape to me at the address below. Return postage for the tape would be appreciated, but is not required. If you prefer something other than tar format at 1600 cpi, please let me know. With "ftp", it is easy. Our machine is called a.cs.uiuc.edu (formerly known as uiucdcs). In the ftp home directory, pub/friedman is a subdirectory, from which one can copy out files "tar" and "tar.out". File tar is a tar output file, and tar.out is the listing of what is in file tar. H. George Friedman, Jr. Department of Computer Science University of Illinois at Urbana-Champaign 1304 West Springfield Avenue Urbana, Illinois 61801 USENET, UUCP: uunet!uiucdcs!friedman CSNET, ARPA: friedman@cs.uiuc.edu BITNET: friedman@cs.uiuc.edu -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { decvax | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
scjones@uunet.uu.net (Larry Jones) (10/11/88)
In article <2737@ima.ima.isc.com>, djones@megatest.uucp (Dave Jones) writes: > The book [Introduction to Compiler Construction with Unix by Schreiner > and Friedman] alludes to a bug in yacc. The main thing I want to know is > whether the Sun Unix 4.2 release 3.4 has the bug, and if so what do > I do about it? I can't speak for Sun or Berkeley, but I know that it's fixed on the version of System V Release 3.0 that I have. > Some questions" > 1) How one may determine whether or not a given yacc has the bug? Try yaccing the following grammar with -v: %token a %% s : oseq ; oseq : /* empty */ | oseq a | oseq error ; and then look at the y.output file. If your yacc has the bug, you will see a state definition that looks like: state 2 s : oseq_ (1) oseq : oseq_a oseq : oseq_error error shift 4 a shift 3 . reduce 1 For reference, the correct state definition is: state 2 s : oseq _ (1) oseq : oseq _ a oseq : oseq _ error $end reduce 1 error shift 4 a shift 3 . error That is, rule 1 should only be reduced if the next input token is the end marker. Other input tokens should be considered errors. > 2) What to do about it if you've got a buggy one? Complain to your vendor? > 3) Exactly what are the consequences -- does "go into a loop" mean > "loop forever"? I believe it is possible to get into an infinite loop, although I haven't seen it happen. What I have seen is the same rule get reduced twice in a row even though the grammar should not allow that to occur. > 4) What is the correction in 4.1? Is it in the source code? > 5) What is the typographical error in the correction? Don't know. I don't have access to any Berkeley stuff. > 6) What is the "definite correction"? Do you have to have source? The correction is ridiculously simple if you have the source code - as I recall it consisted of changing a 2 to a 1 in one place in the code. Unfortunately, I don't seem to be able to lay my hands on it right now to give you the details. (I never did understand why S & F didn't just publish it instead of screwing around like this!) > 7) How to obtain the correction from the authors? I wrote a letter to Friedman and asked for it. I'm sure he's on the net, although I don't have any idea what the address is. ---- Larry Jones UUCP: uunet!sdrc!scjones SDRC scjones@sdrc.uucp 2000 Eastman Dr. BIX: ltl Milford, OH 45150 AT&T: (513) 576-2070 "Save the Quayles" - Mark Russell -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { decvax | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request