[gnu.misc.discuss] Problem with YACC and LEX

jtn@potomac.ads.com (John T. Nelson) (09/07/90)

I know this isn't a Un*x question specifically but I don't know where
to post a question like this.  It's a problem with YACC.  I seem to
have a reduce/reduce conflict that YACC is unable to resolve, yet it
looks to me like it SHOULD be able to figure out the ambiguity.
Now actually I'm using the Macintosh implementation of GNU Bison,
however I assume that the two oprate identically.

I've written a grammar to recognize the following input
sequence:

A = \n
	B \n
	C \n
	D \n
E = \n
	F \n
	G \n
	H \n

etc etc.

Notice that line feeds are considered tokens here and not white space.
I do this because the existance of a token helps delimit lines and
distinguish the "X =" lines from the other lines.  At least that's the
idea.  Problem is that Bison kicks me out with an error when it gets
to the second "=" in the line E = \n.

The grammar looks a lot like this:


SECTION
	:
	NAME	SPECS
|
	SECTION		NAME	SPECS
	;

NAME
	:
	TOKEN_WORD	TOKEN_EQUAL_SIGN	OPTIONAL_THING	LINE_FEED
|
	NUMBER	TOKEN_EQUAL_SIGN	OPTIONAL_THING	LINE_FEED
	;

SPECS
	:
	SPEC
|
	SPECS	SPEC
	;

SPEC
	:
	TOKEN_WORD	OPTIONAL_THING	LINE_FEED
|
	NUMBER	OPTIONAL_THING	LINE_FEED
	;

OPTIONAL_THING
	:
|
	TOKEN_SPECIAL_SYMBOL
	;

"OPTIONAL_THING" recognizes a special identifier or an empty sequence
so with the above input string it always does the empty thing.

Bison seems to get confused when it gets to SPEC.  It fails to match the
second "=" and immediately enters yyerror.  It's almost as if it were
loosing track of the alternative "NAME" in rule "SECTION."  Is the
intention not getting through to the implementation?  Am I doing
something uterlly stupid?

Now the Bison processor does indeed catch a bunch of shift/reduce and 2
reduce/reduce conflicts however the "TOKEN_EQUAL_SIGN" in the NAME rule
should be enough to uniquify it from the "SPEC" rules and that's what
bugs me..... Bison is unable to make a decision on the string when the
token that resolves the ambiguity is one or perhaps two tokens ahead
in the stream.

Notice that I am also using TWO recursive rules in tandem (SECTION,
SPECS).  Maybe this is confusing Bison.  Any ideas?  I've gone over it
and over it and worse, I can't figure out how to rewrite the grammar to
do the same thing so I'm basically stuck with this form... or am I?

Hope you have some thoughts.  I don't.


-- 

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
ORGANIZATION:  Advanced Decision Systems   GEOGRAPHIC: Arlington, VA
UUCP:          kzin!speaker@mimsy.umd.edu  INTERNET:   jtn@potomac.ads.com
SPOKEN:        Yo... John!                 PHONE:      (703) 243-1611
PROJECT:       The Conrail Locomotive/Harpsichord Fusion Program
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

volpe@underdog.crd.ge.com (Christopher R Volpe) (09/08/90)

In article <9138@potomac.ads.com>, jtn@potomac.ads.com (John T. Nelson) writes:
|>I've written a grammar to recognize the following input
|>sequence:
|>
|>A = \n
|>	B \n
|>	C \n
|>	D \n
|>E = \n
|>	F \n
|>	G \n
|>	H \n
|>The grammar looks a lot like this:
|>SECTION
|>	:
|>	NAME	SPECS
|>|
|>	SECTION		NAME	SPECS
|>	;
|>NAME
|>	:
|>	TOKEN_WORD	TOKEN_EQUAL_SIGN	OPTIONAL_THING	LINE_FEED
|>|
|>	NUMBER	TOKEN_EQUAL_SIGN	OPTIONAL_THING	LINE_FEED
|>	;
|>SPECS
|>	:
|>	SPEC
|>|
|>	SPECS	SPEC
|>	;
|>SPEC
|>	:
|>	TOKEN_WORD	OPTIONAL_THING	LINE_FEED
|>|
|>	NUMBER	OPTIONAL_THING	LINE_FEED
|>	;
|>OPTIONAL_THING
|>	:
|>|
|>	TOKEN_SPECIAL_SYMBOL
|>	;
|>Now the Bison processor does indeed catch a bunch of shift/reduce and 2
|>reduce/reduce conflicts however the "TOKEN_EQUAL_SIGN" in the NAME rule
|>should be enough to uniquify it from the "SPEC" rules and that's what
|>bugs me..... Bison is unable to make a decision on the string when the
|>token that resolves the ambiguity is one or perhaps two tokens ahead
|>in the stream.

The problem isn't that the grammar is ambiguous, because it's not.
The problem is that the grammar is not LALR(1), as yacc (and probably
Bison) require. When the lookahead symbol is a TOKEN_WORD or a NUMBER
(after just having reduced to SPEC and then SPECS), it can't tell
whether it should reduce to SECTION or shift the current input symbol.
You would need an LALR(2) parser to do that.                            
I'm not sure if this will work, but you might want to try making
the SECTION rule right-recursive. This will consume MUCHO stack space
during the parse because the reductions aren't done until the end of the
input is reached. Therefore, the stack requirement is proportional
to the size of the input, rather than constant, as would be the
case with left-recursion (because the reductions are done at each
step). So, if you have stack space to burn, try the following: (again,
I'm not sure it will work anyway)
SECTION
       :
       NAME SPECS
   |    NAME SPECS SECTION

Hope this helps...
==================
Chris Volpe
G.E. Corporate R&D
volpecr@crd.ge.com