donn (04/03/83)
Reference: ima.308 I know it's foolish to rise to the bait of a C syntax article posted on April 1, but I couldn't resist... John Levine says that the Portable C Compiler goofs when it parses the expression a && b *= c; (1) as a && (b *= c); (2) because the && operator has higher precedence than *=. If we look at the definition in the C syntax summary in K&R (section 18.1 of the C Reference Manual) we see that the following grammar rules define the behavior of operators: expression: identifier (3) expression binop expression (4) lvalue asgnop expression (5) ... lvalue: identifier (6) ... binop: ... && ... asgnop: ... *= ... An LR(1) parser looking at (1) will read up to the b and try to decide whether to reduce according to rule (3) or rule (6). Since the lookahead set for rule (6) in this state contains *= and the lookahead set for rule (3) in this state does not contain *=, the parser must reduce by rule (6). This forces the parser to continue shifting and eventually reduce by rule (5) despite the fact that asgnops have lower precedence than binops. Hence we get the result that (2) is the correct parse of (1), contrary to John's claim. However John is right in saying that the PCC is not doing the right thing here, even though it gets the right results. The problem is that Dennis Ritchie's informal grammar has a reduce-reduce conflict in it, so the PCC doesn't use it. There is a situation where the parser can't tell whether an identifier is an lvalue or an expression, namely: *a = ... The grammar causes = to be in the lookahead set for both expressions and lvalues here, hence the parser can't use the lookahead to tell whether to reduce to an expression or an lvalue. You could hack the grammar to fix the problem, but it would be messy (for example you could distinguish between *lvalue and *(expression)...). On the other hand if there is no distinction between lvalues and expressions then the rules of precedence can fix this up; but if you drop this distinction then you lose the ability to parse (1) correctly. The PCC takes the latter dodge and so it should flunk out on expressions like (1). So why does it still parse (1) as (2)? What John Levine calls a bug in YACC is actually a "feature". The assignment operator *= is not atomic under the PCC; in fact it is really just the two tokens * and =. (In fact you can write "a * = b" for "a *= b", which is something I didn't know until I looked it up.) The reason *= beats out && in precedence is that there is only one symbol of lookahead (the "feature" which keeps parsers small) and when trying to parse b in (1) the parser sees the * but not the =. * of course has higher precedence than && so the parser shifts instead of reducing like it "should". This is why the PCC "correctly" barfs on: a && b = c; I suppose it goes without saying that if the authors of the PCC had made *= and the other assignment operators into single tokens then this oddity would never have cropped up. (Actually this is the way the PCC handles the Version 6 assignment operators; the difference seems to have been made for aesthetic reasons.) But if they had made it easy, think of all the fun we would have missed... Donn Seeley UCSD Chemistry Dept. RRCF ucbvax!sdcsvax!sdchema!donn (619) 452-4016 sdamos!donn@nprdc