[net.lang.c] Fearless PCC Killers

donn (04/03/83)
Reference: ima.308

I know it's foolish to rise to the bait of a C syntax article posted on
April 1, but I couldn't resist...

John Levine says that the Portable C Compiler goofs when it parses the
expression

	a && b *= c;				(1)

as

	a && (b *= c);				(2)

because the && operator has higher precedence than *=.  If we look at
the definition in the C syntax summary in K&R (section 18.1 of the C
Reference Manual) we see that the following grammar rules define the
behavior of operators:

	expression:
		identifier			(3)
		expression binop expression	(4)
		lvalue asgnop expression	(5)
		...

	lvalue:
		identifier			(6)
		...

	binop:
		... && ...

	asgnop:
		... *= ...

An LR(1) parser looking at (1) will read up to the b and try to decide
whether to reduce according to rule (3) or rule (6).  Since the
lookahead set for rule (6) in this state contains *= and the lookahead
set for rule (3) in this state does not contain *=, the parser must
reduce by rule (6).  This forces the parser to continue shifting and
eventually reduce by rule (5) despite the fact that asgnops have lower
precedence than binops.  Hence we get the result that (2) is the
correct parse of (1), contrary to John's claim.

However John is right in saying that the PCC is not doing the right
thing here, even though it gets the right results.  The problem is that
Dennis Ritchie's informal grammar has a reduce-reduce conflict in it,
so the PCC doesn't use it.  There is a situation where the parser can't
tell whether an identifier is an lvalue or an expression, namely:

	*a = ...

The grammar causes = to be in the lookahead set for both expressions
and lvalues here, hence the parser can't use the lookahead to tell
whether to reduce to an expression or an lvalue.  You could hack the
grammar to fix the problem, but it would be messy (for example you
could distinguish between *lvalue and *(expression)...).  On the other
hand if there is no distinction between lvalues and expressions then
the rules of precedence can fix this up; but if you drop this
distinction then you lose the ability to parse (1) correctly.  The PCC
takes the latter dodge and so it should flunk out on expressions
like (1).  So why does it still parse (1) as (2)?

What John Levine calls a bug in YACC is actually a "feature".  The
assignment operator *= is not atomic under the PCC; in fact it is
really just the two tokens * and =.  (In fact you can write "a * = b"
for "a *= b", which is something I didn't know until I looked it up.)
The reason *= beats out && in precedence is that there is only one
symbol of lookahead (the "feature" which keeps parsers small) and when
trying to parse b in (1) the parser sees the * but not the =.  * of
course has higher precedence than && so the parser shifts instead of
reducing like it "should".  This is why the PCC "correctly" barfs on:

	a && b = c;

I suppose it goes without saying that if the authors of the PCC had
made *= and the other assignment operators into single tokens then this
oddity would never have cropped up.  (Actually this is the way the PCC
handles the Version 6 assignment operators; the difference seems to have
been made for aesthetic reasons.) But if they had made it easy, think
of all the fun we would have missed...

Donn Seeley  UCSD Chemistry Dept. RRCF  ucbvax!sdcsvax!sdchema!donn
             (619) 452-4016             sdamos!donn@nprdc