[comp.compilers] I need answer to a simple YACC question.

najmi@hpcuhb.HP.COM (Farrukh Najmi) (02/04/89)

I would appreciate some help with a YACC related question.

I know I can look at the y.output (on HPUX anyways) to
find the productions in each state's LALR(1) items. However
I want to find the lookahead set for a given production
for a given state.

In other words I need to know how I can get the lookahead portion
of the LALR(1) item by looking at YACC generated files.
Thanks in advance for any help.

Farrukh Najmi.
[From hp-sdd!najmi@hpcuhb.HP.COM (Farrukh Najmi)]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

djones@megatest.uucp (Dave Jones) (02/07/89)

> I know I can look at the y.output (on HPUX anyways) to
> find the productions in each state's LALR(1) items. However
> I want to find the lookahead set for a given production
> for a given state.
> 
> In other words I need to know how I can get the lookahead portion
> of the LALR(1) item by looking at YACC generated files.
> Thanks in advance for any help.
> 
> Farrukh Najmi.

Tain't easy. And if you could do it, you probably would not have
what you wanted anyway.  Do you want to calculate these for error-
messages? (If so, you might ought to have said so.) 

You see, which tokens can next be accepted depends not just on the 
state which is on the top of the stack, but potenitally on other 
states deep underneath. That's because yacc tables are LALR(1) (with 
default productions as well), not the space-consuming canonical LR(1)
tables.

So if you had a list of all tokens which could be shifted when a
given state was on top of the stack, it might have entries which would
not be legal as a next token in an input file which had blocked the 
parser with a syntax error.

But be of good cheer!

If you only need the information at runtime, for error-messages,
you can hack the parser, yyparse() or its genotype, yaccpar, so
that when it blocks, it will first save a copy of the stack, then run 
in simulation on each possible token until it either shifts or blocks.
If it shifts it, remember the token. In any case, restore the stack and 
try again, until you've tested all the tokens.  Then report, "legal
tokens are...", and spew out all the ones you have remembered.

                     Dave J.
P.S. I'm writing a compiler using yacc, even as we speak, and I just
figured this out on the fly.  I think I'm going to incorporate it!
Thanks for the inspiration.
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

steve@cs.utexas.edu (02/17/89)

In article <3328@ima.ima.isc.com>, djones@megatest.uucp (Dave Jones) writes:
> [yacc error states are hard to analyze, particularly if you want to see
> the follow sets in a particular context.]

In fact here is the code that does this error dumping:

257,275c256,257
< 			if ((yy_n = yypact[yystate]) > YYFLAG && yy_n < YYLAST)
< 			  {
< 			  short index;
< 			  extern char *p_error();
< 			  fprintf(stderr,"expecting: ");
< 			  for (index = yy_n>0 ? yy_n : 0;
< 			       index < YYLAST;
< 			       ++index)
< 			    {
< 			    if (yychk[yyact[index]] == index - yy_n
< 			        && index - yy_n != YYERRCODE)
< 			    fprintf(stderr,"%s, ",p_error(index-yy_n));
< 			    }
< 			  fprintf(stderr,"found:");
< 			  fprintf(stderr,"%s\n",p_error(yychar));
< 			  }
< 			else
< 			  yyerror( "syntax error" );
< 			  goto skip_init;
---
> 				yyerror( "syntax error" );
> 				goto skip_init;

For those of you who cannot tell this is a diff of yaccpar that does that
does this error dumping.  This yaccpar is from the standard SYSTEM V.  You
do need to supply the function char *p_error(int).  This function takes the
token number passed in, and changes it into a string name for that token.
Errors come out in the fashion:

expecting: ')', ',', '+', found '['.

Not totally coherent, but better than 'syntax error'.  This message gets
very long in some cases.

Three last notes:
  1) Credit for this change does not belong to me but to one of my partners
     at Remora, Inc.  I think he got this trick from a book (which my copy
     is unfortunately at home) on building a compiler in UNIX (reference
     supplied if requested).  At least I can claim I understand what is
     happening here.
  2) Do not comment on the programing style too much.  This code was hacked
     today from our (Remora's) version of yaccpar into the UNIX version
     just for this post.  The code as posted works (is tested).
  3) Excuse the cryptic nature of this message, I am typing it in on my
     own lunch time when I should be debuging Motorola's code.

                   enough from this mooncalf - Steven
--
Steven R Weintraub                        cs.utexas.edu!oakhill!devsys!steve
Motorola Inc.  Austin, Texas 
(512) 440-3023 (office) (512) 453-6953 (home)

--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request