nigelh@sol.uvic.ca (R. Nigel Horspool) (01/13/91)
jmr1g@ra.cs.Virginia.EDU (Jeremiah M. Ratner) writes: >Is there any way i can configure yacc or some other tool to tell me, >at each step in a parse, the set of tokens that may follow the current >token? I am currently doing this by hand, and it is, as they say, >a tedious and error-prone process. Yacc optimizes its tables to use default reductions, so it is impossible to tell from the state listing in the y.output file which symbols are valid in any state that contains a reduce action. If you are content with an approximate answer, it can be obtained directly from LR parse tables that have not been optimized. E.g., the row of the LR parse table for state n might contain entries like: symbol a: shift to state s1 symbol b: shift to state s2 symbols {c,d,e}: reduce by rule r1 symbols {f,g}: reduce by rule r2 and the set of symbols for which actions are defined (i.e. {a,b,c,d,e,f}) is a good approximation to what symbols are allowed when state n is the top state on the LR state stack. This info is readily available from yacc's y.output file. The set is correct (not an approximation) if a full LR(1) parser generator is used. However, for the SLR(1) and LALR(1) methods (Yacc uses LALR(1)), the set of symbols that trigger reduce actions may be too large. To get an exact answer, it is necessary to take context into account to eliminate some impossible next symbols from the set. If you need to know the exact set of valid next symbols at some point in a parse, you can run an algorithm that simulates the updates that occur to the state stack when each reduce action is performed. An algorithm in pseudo Pascal goes something like this: ValidNextSymbols := Valid( SS, VT ); { where SS is the current state stack and VT is the set of all terminal symbols in the grammar} function Valid( SS, T ) Result := set of symbols, x, such that the top state on stack SS has a shift action for x or an accept action for x; Result := Result * T; for each reduce action in the top state on stack SS do Test := set of symbols that trigger the reduce action; Test := Test * T; if Test != empty then CSS := copy of SS; update the stack CSS as required by the reduce action; { i.e. pop k states where k is the length of the RHS of the rule being reduced by, and push the state reached by the goto action for the LHS symbol of the rule } Result := Result + Valid( CSS, Test ); end if end for; return Result; end function; {Note: `+' represents set union and `*' represents set intersection.} Since the algorithm checks which reduce symbols are valid (i.e. those that can eventually be shifted), it works equally well on parse tables that have been optimized by the use of default reductions. I.e., it could be implemented to work with the tables used by a yacc-generated parser. R. Nigel Horspool -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
djones@decwrl.dec.com (Dave Jones) (01/15/91)
> Is there any way i can configure yacc or some other tool to tell me, > at each step in a parse, the set of tokens that may follow the current > token? I am currently doing this by hand, and it is, as they say, > a tedious and error-prone process. > [This same question came up in November, but no answers. -John] There were plenty of answers John. Some of them were wrong, but not all. I do agree that the definitive answer has yet to be written. Anyway, here's the crux of the problem: The table-driven parser which a typical implementation of yacc generates keeps state-information on a stack. The things it puts on the stack are called "states" in the jargon, although the entire stack really comprises the parser's state. It is easy to determine which tokens can be shifted when a particular state is on top of the stack. To do that you can have your parser look in the tables, just as the regular parser does when it is doing its thing, or you can build something that reads the y.output file you get using the -v option. (AWK might come in handy.) But knowing which tokens can be shifted when a particular state is on top is not enough, because other states that might have allowed other tokens to be shifted may have been on top of the stack since the previous token was shifted. That is possible because yacc typically uses "default reductions". (You can find the technical info on this stuff in the _Dragon Book_, new or old version. In my copy of the new Dragon Book, the applicable part begins on page 244.) So what you have to do is to figure out what tokens *might* have gotten shifted in states that have been "default reduced" since the last token was shifted. Of course, you could just keep a token-set, coded as a bit map. That would be a straight-forward way to do it, and obviously correct, but would impose some overhead even when the input was error-free. Several people, including myself, posted algorithms that they claimed could reconstruct the sets after the fact, some by analysing the stack by running simulations, others by keeping some bookkeeping info on what states have been on the stack since the last shift. The only one I have a copy of is the algorithm I suggested, and it is no longer as obvious to me as it was then. More later... [My brain must have curdled, I forgot about the lively discussion in mid-1989 about finding the follow set in the context of error correction and detection. There was a lot of discussion including several trenchant messages from Mr. Jones, and several proposed solutions. -John] -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
djones@decwrl.dec.com (Dave Jones) (01/16/91)
> In the book 'Compiler Construction using Unix' by Schreiner & Friedman > (sorry, I don't have the book here at work, and I don't recall the > publisher) a modification to yacc is described which prints the set of > allowed tokens when encountering a syntax error. It's actually sufficient > to modify the yaccpar file. > > There is a tar file on a.cs.uiuc.edu in ~ftp/pub/friedman which contains the > modified yaccpar file. It's buggy. In fact, on any non-trivial grammar, it is likely to fail very badly, omiting most of the legal tokens. What the hack actually does is to print the tokens which could be shifted at the time the compiler detected the error. That is not sufficient, for the reason I first pointed out in a letter to comp.compilers, posted Feb. 23, 1989. Sorry, I don't have the article number. The problem is that the parser is likely to have altered its state since the last token-shift, (through "default reductions"), at times when other tokens could have been shifted. If anyone cares to see them, I still have a couple of counter-examples on hand. [Oops, right. Since yacc may have taken some default reductions before noticing an error, the set of allowable tokens in the state where it notices the error need not have much to do with the state where the error really occurred. -John] -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.