[comp.lang.c] A bug in 'lint'

frisk@askja.UUCP (08/17/87)

Take a look at the (*) marked line in following code fragment:

    main()
    {
        int x;
        x = 1;
        x && x += 1;       /*     (*)     */
        return(0);
    }

That line does not make any sense of course, since && has higher precedence
than +=. Changing it to 

        x && (x += 1);

would make it legal (but still stupid), but that's not the point here.

RATHER .....

The lint program is perfectly happy with it (at least on HP/UX, Eunice,
and Altos SysV Unix). 

(And what's even worse, so is cc on Eunice and Altos SysV Unix)

cc on HP/UX complains (correctly) (as MSC does) that "x && x" is not an lvalue.

Any comments ?
-- 
Fridrik Skulason  Univ. of Iceland, Computing Center
       UUCP  ...mcvax!hafro!askja!frisk                BIX  frisk

                     "This line intentionally left blank"

mouse@mcgill-vision.UUCP (der Mouse) (08/30/87)

In article <273@askja.UUCP>, frisk@askja.UUCP (Fridrik Skulason) writes:
> main() { int x; x=1;   x && x += 1;   return(0); }
> [ with reference to the "x && x += 1" statement ]

> The lint program is perfectly happy with it (at least on HP/UX,
> Eunice, and Altos SysV Unix).
> (And what's even worse, so is cc on Eunice and Altos SysV Unix)
> cc on HP/UX complains (correctly) (as MSC does) that "x && x" is not
> an lvalue.
> Any comments ?

4.3BSD cc accepts the above and rewrites it to x&&(x+=1) without
telling you anything about it (or at least that's what it looks like
from the assembly code produced with -S).  Just to check that it wasn't
that it had the precedence wrong (!), I tried "x+=1&&x" which it
handled correctly.

4.3BSD lint is also silent, not surprisingly.

This is BROKEN, I say BROKEN, BROKEN BROKEN BROKEN....but I don't
suppose the C support group at Berzerkeley is listening. :-(

					der Mouse

				(mouse@mcgill-vision.uucp)

donn@utah-cs.UUCP (Donn Seeley) (09/17/87)

This issue seems to come up about once a year...  I posted a little
piece on the subject a few years ago; I'll append it to this message
for the edification of y'all.  The basic response: the parse
'x && (x += 1)' is arguably correct, and PCC-derived compilers will
produce this parse (but for the wrong reasons).

	This is BROKEN, I say BROKEN, BROKEN BROKEN BROKEN....but I
	don't suppose the C support group at Berzerkeley is listening.
	:-(

If you can locate a C support group at Berkeley, let me know :-),

Donn Seeley    University of Utah CS Dept    donn@cs.utah.edu
40 46' 6"N 111 50' 34"W    (801) 581-5668    utah-cs!donn

------------------------------------------------------------------------

I know it's foolish to rise to the bait of a C syntax article posted on
April 1, but I couldn't resist...

John Levine says that the Portable C Compiler goofs when it parses the
expression

	a && b *= c;				(1)

as

	a && (b *= c);				(2)

because the && operator has higher precedence than *=.  If we look at
the definition in the C syntax summary in K&R (section 18.1 of the C
Reference Manual) we see that the following grammar rules define the
behavior of operators:

	expression:
		identifier			(3)
		expression binop expression	(4)
		lvalue asgnop expression	(5)
		...

	lvalue:
		identifier			(6)
		...

	binop:
		... && ...

	asgnop:
		... *= ...

An LR(1) parser looking at (1) will read up to the b and try to decide
whether to reduce according to rule (3) or rule (6).  Since the
lookahead set for rule (6) in this state contains *= and the lookahead
set for rule (3) in this state does not contain *=, the parser must
reduce by rule (6).  This forces the parser to continue shifting and
eventually reduce by rule (5) despite the fact that asgnops have lower
precedence than binops.  Hence we get the result that (2) is the
correct parse of (1), contrary to John's claim.

However John is right in saying that the PCC is not doing the right
thing here, even though it gets the right results.  The problem is that
Dennis Ritchie's informal grammar has a reduce-reduce conflict in it,
so the PCC doesn't use it.  There is a situation where the parser can't
tell whether an identifier is an lvalue or an expression, namely:

	*a = ...

The grammar causes = to be in the lookahead set for both expressions
and lvalues here, hence the parser can't use the lookahead to tell
whether to reduce to an expression or an lvalue.  You could hack the
grammar to fix the problem, but it would be messy (for example you
could distinguish between *lvalue and *(expression)...).  On the other
hand if there is no distinction between lvalues and expressions then
the rules of precedence can fix this up; but if you drop this
distinction then you lose the ability to parse (1) correctly.  The PCC
takes the latter dodge and so it should flunk out on expressions
like (1).  So why does it still parse (1) as (2)?

What John Levine calls a bug in YACC is actually a "feature".  The
assignment operator *= is not atomic under the PCC; in fact it is
really just the two tokens * and =.  (In fact you can write "a * = b"
for "a *= b", which is something I didn't know until I looked it up.)
The reason *= beats out && in precedence is that there is only one
symbol of lookahead (the "feature" which keeps parsers small) and when
trying to parse b in (1) the parser sees the * but not the =.  * of
course has higher precedence than && so the parser shifts instead of
reducing like it "should".  This is why the PCC "correctly" barfs on:

	a && b = c;

I suppose it goes without saying that if the authors of the PCC had
made *= and the other assignment operators into single tokens then this
oddity would never have cropped up.  (Actually this is the way the PCC
handles the Version 6 assignment operators; the difference seems to have
been made for aesthetic reasons.) But if they had made it easy, think
of all the fun we would have missed...

Donn Seeley  UCSD Chemistry Dept. RRCF  ucbvax!sdcsvax!sdchema!donn
             (619) 452-4016             sdamos!donn@nprdc