[comp.lang.c] yacc with multiple attributes

jbaker@ee.ucla.edu (Joe Baker) (06/30/88)

[ This isn't really a C question, but all the yacc users seem to be
here.  Sorry for the interruption from the C/FORTRAN debate. ]

I am working on an interpreter for a little language with C-like
syntax.  Since I have several data types, and will probably be adding 
type constructors at some point, I'd like to set up a fairly general 
type system.  Because of this, lots of the non-terminals will have
several attributes.  I can see a couple of possible approaches:

1) Build an explicit parse tree, use the yacc value stack to hold
the node pointers, and carry the type information in the nodes.

2) Use a struct on the value stack, and carry both value and type
information (i.e., copy the stuff in Aho, Sethi, and Ullman, chap. 6.)
I think this would work with various intermediate code representations,
but I'm leaning to an explicit parse tree to simplify error recovery.

3) ?

Any opinions or pointers to examples?

Thanks in advance.  Please send mail to me and I will summarize to the
net if there is any interest.

- Joe Baker, Dept. of Electrical Engineering
6731 Boelter Hall, UCLA, L.A., CA 90024 (213) 825-7079, 825-2327
ARPA: jbaker@ee.ucla.edu UUCP: {ihnp4|randvax|ucbvax}!ucla-cs!uclaee!jbaker

smryan@garth.UUCP (07/01/88)

In article <16345@brl-adm.ARPA> jbaker@ee.ucla.edu (Joe Baker) writes:
>[ This isn't really a C question, but all the yacc users seem to be
>here.  Sorry for the interruption from the C/FORTRAN debate. ]

Please do.
 
>2) Use a struct on the value stack, and carry both value and type
>information (i.e., copy the stuff in Aho, Sethi, and Ullman, chap. 6.)
>I think this would work with various intermediate code representations,

I am currently doing something in this style and found out too late about
major Yacc bugs/features/problems/characteristics: although the stack is
very good about passing up synthesised attributes, it is terrible at passing
down inherited attributes.

I have patched around this by inserting semantics into a production to set
aside inherited attributes. Yacc converts these embedded semantics into a
null production, but seems unable/unwilling to compute lookahead through
these null productions. This has distorted the language and its grammar.

I would suggest, if you haven't started anything yet, avoid Yacc at all costs.

blandy@cornell.UUCP (07/01/88)

In article <838@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes:
>In article <16345@brl-adm.ARPA> jbaker@ee.ucla.edu (Joe Baker) writes:
>
>I am currently doing something in this style and found out too late about
>major Yacc bugs/features/problems/characteristics: although the stack is
>very good about passing up synthesised attributes, it is terrible at passing
>down inherited attributes.
>
>I have patched around this by inserting semantics into a production to set
>aside inherited attributes. Yacc converts these embedded semantics into a
>null production, but seems unable/unwilling to compute lookahead through
>these null productions. This has distorted the language and its grammar.
>

Yacc is actually resonably capable at this;  the problem you've having
with lookahead comes from the fact that you're using unnamed actions;
yacc assumes that all unnamed actions are different, and so cannot do
much lookahead beyond them.

Yacc will choke on:

prod:
   a { fiddle something } b c { first result }
 | b { identical fiddle } b c { different result }
 ;

because it doesn't realize that the two fiddles are identical; because it
generates a different null production for each, that grammar is truly
not LR(1).

Try replacing the above with:

FIDDLE:
    /* empty rule */ { the same fiddle as above }
   ;

prod:
    a FIDDLE b c { first result }
  | a FIDDLE b c { second result }
  ;

In this case, yacc knows that the two fiddles are the same, and so can 
generate a parser for the thing.

Right?


--
Jim Blandy - blandy@crnlcs.bitnet
"insects were insects when man was just a burbling whatisit."  - archie

smryan@garth.UUCP (Steven Ryan) (07/03/88)

>Yacc will choke on:
>
>prod:
>   a { fiddle something } b c { first result }
> | b { identical fiddle } b c { different result }
> ;

Do you mean?

    prod -> a {sem1} b {sem2} | a {sem3} c {sem4}

I would expect

    prod -> a $1 b {sem2} | a $2 c {sem4}
    $1   ->    {sem1}
    $2   ->    {sem3}

So given the items 'prod->a.$1 b' and 'prod->a.$2 c' it would then add
the items '$1->.' and '$2->.' which is a reduce/reduce conflict in
LR(0). Computing followers gives FOLLOWS($1)='b' and FOLLOWS($2)='c'. A
one symbol lookahead through the null production resolves the conflict.
Reduce to $1 and execute sem1 if the lookahead is b, and $2 with sem3 if
c.

The problem is not whether the grammar is LR(1) but is Yacc? It is generally
advertised as LR but sometimes the fine print says LALR. All too often LR
does not mean full LR(k) or even LR(1), but LALR, SLR, ....

Of course

    prod -> a $1 b c | a $2 b d

is LR(2) rather than LR(1).

hughes@jif.berkeley.edu (eric hughes) (07/03/88)

According to the abstract to "YACC: Yet Another Compiler Compiler", by
Stephen C. Johnson, Bell Labs (the long documentation, for those who have
not seen it)

	The class of specifications accepted is a very 
	general one: LALR(1) grammars with disambiguating rules.

YACC does not claim to be LR(1).

Eric Hughes
hughes@math.berkeley.edu

blandy@marduk.cs.cornell.edu (Jim Blandy) (07/05/88)

In article <859@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes:
>>Yacc will choke on:
>>
>>prod:
>>   a { fiddle something } b c { first result }
>> | b { identical fiddle } b c { different result }
>> ;
>
>Do you mean?
>
>    prod -> a {sem1} b {sem2} | a {sem3} c {sem4}

No, I don't think so.  Yacc will produce parsers for all LALR(1) grammars;
if I said that yacc would choke because something was not LR, that was
wrong.  LALR(1) is the class of grammars we're accepting here.

The original poster was complaining about the inability of yacc to cope
with situations like the above, where one wants to take the same action
near the beginning of several productions;  it comes up quite often.
If you realize that the two fiddles are identical, then you generate

prod -> a $1 b c { first result }
      | b $1 b c { first result }
      ;

$1 -> /* empty rule */ { fiddle zwingli }
    ;

which is fine by yacc.  His problem was that yacc DOESN'T realize that
the two fiddles are the same, and thus has a conflict, since the one-
token lookahead won't solve this problem.  Note that, yes, I do want
the same token after the { fiddle } s.
--
Jim Blandy - blandy@crnlcs.bitnet
"insects were insects when man was just a burbling whatisit."  - archie

merlin@ucrmath.UUCP (Mike Sullivan) (07/22/88)

In article <18840@cornell.UUCP> blandy@cs.cornell.edu (Jim Blandy) writes:
>In article <859@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes:
>>>Yacc will choke on:
>>>
>>>prod:
>>>   a { fiddle something } b c { first result }
>>> | b { identical fiddle } b c { different result }
>>> ;
>>
>>Do you mean?
>>
>>    prod -> a {sem1} b {sem2} | a {sem3} c {sem4}
>
>No, I don't think so.  
Lots of stuff deleted.
>
>His problem was that yacc DOESN'T realize that
>the two fiddles are the same, and thus has a conflict, since the one-
>token lookahead won't solve this problem.  Note that, yes, I do want
>the same token after the { fiddle } s.

However, in this case yacc doesn't NEED to realize that the two fiddles are
the same, because both these productions appear in different states
(being distinguished by the first token), and so there is no conflict.
Notice that yacc knows which rule it is in by seeing an 'a' for the first
and a 'b' for the second, there is no problem in this case.  
A case to demonstrate the problem in question here is this:
   prod:
      a { fiddle something } b c { first result }
    | a { identical fiddle } b c { different result }
    ;
Now there is a reduce/reduce conflict on 'b', since yacc is in both
rules at the same time (essentially), and 'b' does not distinguish
the two rules.


Mike Sullivan		ucsd!ucrmath!merlin

blandy@marduk.cs.cornell.edu (Jim Blandy) (07/23/88)

For the purpose of perhaps making things more coherent:  I messed up
on the posting;  Mike's sample productions are what I meant to post.

The point to note is that it is useful to give the semantic action a
name (by associating it with an epsilon production and then using the
production in place of the action) so that yacc does not feel the need
to differentiate between two productions we humans know to be
equivalent.
--
Jim Blandy - blandy@crnlcs.bitnet
"insects were insects when man was just a burbling whatisit."  - archy