[comp.unix.questions] YACC question

bobg+@andrew.cmu.edu (Robert Steven Glickstein) (01/15/90)

Here is a fragment of Yacc code:

    %token PLUS MINUS

    %%

    expr:       mulexpr PLUS mulexpr
        | mulexpr MINUS mulexpr

It's very straightforward; the yylex() routine must be written to return
the constant PLUS when it encounters a '+' in the input, and the
constant MINUS when it encounters a '-' in the input.  However, Yacc
allows you to rewrite the above fragment as

    %%

    expr:       mulexpr '+' mulexpr
        | mulexpr '-' mulexpr

My question is, where does Yacc find the '+' and the '-' characters? 
Apparently they're not gotten via a call to yylex().  Does Yacc simply
do a getchar()?

I ask because I have written a parser which can be configured to read
from various input sources (standard input, file, string).  There are
many places in the parser where token-name constants are used when a
single character will do; however, if the single character is retrieved
by getchar() or some other hardwired mechanism, I'll have to stick to
the yylex() approach (since my yylex() knows where to read characters
from).

Please e-mail your replies.  Thanks in advance.

_______________________________
Bob Glickstein, System Designer
Information Technology Center  room 220
Carnegie Mellon University
Pittsburgh, PA  15213-3890
(412) 268-6743

Internet: bobg+@andrew.cmu.edu
Bitnet: bobg%andrew.cmu.edu@cmuccvma.bitnet
UUCP: ...!harvard!andrew.cmu.edu!bobg

I could dance till the cows come home.  On second thought, I'd rather
dance with the cows till you come home.
		-- Groucho Marx

evan@plx.UUCP (Evan Bigall) (01/16/90)

>
>    expr:       mulexpr PLUS mulexpr
>        | mulexpr MINUS mulexpr
>
>It's very straightforward; the yylex() routine must be written to return
>the constant PLUS when it encounters a '+' in the input, and the
>constant MINUS when it encounters a '-' in the input.  However, Yacc
>allows you to rewrite the above fragment as
>
>    expr:       mulexpr '+' mulexpr
>        | mulexpr '-' mulexpr
>
>My question is, where does Yacc find the '+' and the '-' characters? 
>Apparently they're not gotten via a call to yylex().  Does Yacc simply
>do a getchar()?

Quoting from the yacc section of my sys5.2 "Suport Tool Guide":

}	The rules section is made up of one or more grammar rules.  A grammar
}rule has the form 
}
}A : BODY ;
}
}where "A" represents a nonterminal name, and "BODY" represents a sequence of
}zero or more names and LITERALS {my emphasis}.  The colon and the semicolon
}are yacc punctuation. 

{later it says:}

}A literal consists of a character enclosed in single quotes (').  As in C
}language, the backslash (\) is an escape character within literals....

Really all that is going on here is that yacc is using the value of the
character literal as the token number.  This is why the yacc generated token
numbers start at 257 (on machines with ""normal"" char sets).

The standard way to represent this as a lex rule is:

.                      	return(*yytext);

to return a literal for all charcters not recognized by another rule. 

Evan


-- 
Evan Bigall, Plexus Software, Santa Clara CA (408)982-4840  ...!sun!plx!evan
"I barely have the authority to speak for myself, certainly not anybody else"