jtn@potomac.ads.com (John T. Nelson) (09/07/90)
I know this isn't a Un*x question specifically but I don't know where to post a question like this. It's a problem with YACC. I seem to have a reduce/reduce conflict that YACC is unable to resolve, yet it looks to me like it SHOULD be able to figure out the ambiguity. Now actually I'm using the Macintosh implementation of GNU Bison, however I assume that the two oprate identically. I've written a grammar to recognize the following input sequence: A = \n B \n C \n D \n E = \n F \n G \n H \n etc etc. Notice that line feeds are considered tokens here and not white space. I do this because the existance of a token helps delimit lines and distinguish the "X =" lines from the other lines. At least that's the idea. Problem is that Bison kicks me out with an error when it gets to the second "=" in the line E = \n. The grammar looks a lot like this: SECTION : NAME SPECS | SECTION NAME SPECS ; NAME : TOKEN_WORD TOKEN_EQUAL_SIGN OPTIONAL_THING LINE_FEED | NUMBER TOKEN_EQUAL_SIGN OPTIONAL_THING LINE_FEED ; SPECS : SPEC | SPECS SPEC ; SPEC : TOKEN_WORD OPTIONAL_THING LINE_FEED | NUMBER OPTIONAL_THING LINE_FEED ; OPTIONAL_THING : | TOKEN_SPECIAL_SYMBOL ; "OPTIONAL_THING" recognizes a special identifier or an empty sequence so with the above input string it always does the empty thing. Bison seems to get confused when it gets to SPEC. It fails to match the second "=" and immediately enters yyerror. It's almost as if it were loosing track of the alternative "NAME" in rule "SECTION." Is the intention not getting through to the implementation? Am I doing something uterlly stupid? Now the Bison processor does indeed catch a bunch of shift/reduce and 2 reduce/reduce conflicts however the "TOKEN_EQUAL_SIGN" in the NAME rule should be enough to uniquify it from the "SPEC" rules and that's what bugs me..... Bison is unable to make a decision on the string when the token that resolves the ambiguity is one or perhaps two tokens ahead in the stream. Notice that I am also using TWO recursive rules in tandem (SECTION, SPECS). Maybe this is confusing Bison. Any ideas? I've gone over it and over it and worse, I can't figure out how to rewrite the grammar to do the same thing so I'm basically stuck with this form... or am I? Hope you have some thoughts. I don't. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= ORGANIZATION: Advanced Decision Systems GEOGRAPHIC: Arlington, VA UUCP: kzin!speaker@mimsy.umd.edu INTERNET: jtn@potomac.ads.com SPOKEN: Yo... John! PHONE: (703) 243-1611 PROJECT: The Conrail Locomotive/Harpsichord Fusion Program =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
volpe@underdog.crd.ge.com (Christopher R Volpe) (09/08/90)
In article <9138@potomac.ads.com>, jtn@potomac.ads.com (John T. Nelson) writes: |>I've written a grammar to recognize the following input |>sequence: |> |>A = \n |> B \n |> C \n |> D \n |>E = \n |> F \n |> G \n |> H \n |>The grammar looks a lot like this: |>SECTION |> : |> NAME SPECS |>| |> SECTION NAME SPECS |> ; |>NAME |> : |> TOKEN_WORD TOKEN_EQUAL_SIGN OPTIONAL_THING LINE_FEED |>| |> NUMBER TOKEN_EQUAL_SIGN OPTIONAL_THING LINE_FEED |> ; |>SPECS |> : |> SPEC |>| |> SPECS SPEC |> ; |>SPEC |> : |> TOKEN_WORD OPTIONAL_THING LINE_FEED |>| |> NUMBER OPTIONAL_THING LINE_FEED |> ; |>OPTIONAL_THING |> : |>| |> TOKEN_SPECIAL_SYMBOL |> ; |>Now the Bison processor does indeed catch a bunch of shift/reduce and 2 |>reduce/reduce conflicts however the "TOKEN_EQUAL_SIGN" in the NAME rule |>should be enough to uniquify it from the "SPEC" rules and that's what |>bugs me..... Bison is unable to make a decision on the string when the |>token that resolves the ambiguity is one or perhaps two tokens ahead |>in the stream. The problem isn't that the grammar is ambiguous, because it's not. The problem is that the grammar is not LALR(1), as yacc (and probably Bison) require. When the lookahead symbol is a TOKEN_WORD or a NUMBER (after just having reduced to SPEC and then SPECS), it can't tell whether it should reduce to SECTION or shift the current input symbol. You would need an LALR(2) parser to do that. I'm not sure if this will work, but you might want to try making the SECTION rule right-recursive. This will consume MUCHO stack space during the parse because the reductions aren't done until the end of the input is reached. Therefore, the stack requirement is proportional to the size of the input, rather than constant, as would be the case with left-recursion (because the reductions are done at each step). So, if you have stack space to burn, try the following: (again, I'm not sure it will work anyway) SECTION : NAME SPECS | NAME SPECS SECTION Hope this helps... ================== Chris Volpe G.E. Corporate R&D volpecr@crd.ge.com