[comp.lang.c] Pcc bites it

schwartz@shire.cs.psu.edu (Scott Schwartz) (03/12/89)

In article <24820@amdcad.AMD.COM>, tim@crackle (Tim Olson) writes:
>| > main(m,n){scanf("%d",&n);for(m=n>0^n>9;n&&m*=n--;);
>| > printf(m?"Answer=%d\n":"error\n",m);}

>Interesting -- the program *is* incorrect, because && has higher
>precedence than *= so it is parsed as:

Unfortunately, pcc accepts lots of things that aren't C.  No need to
be surprised this time!  Another fun example is "struct {int foo};".

>What do other compilers out there do?

Gcc 1.34 says "invalid lvalue in assignment".
-- 
Scott Schwartz		<schwartz@shire.cs.psu.edu>

decot@hpisod2.HP.COM (Dave Decot) (03/15/89)

> Interesting -- the program *is* incorrect, because && has higher
> precedence than *= so it is parsed as:
> 
> 	(n&&m) *= n--
> 
> which is illegal because (n&&m) is not an lvalue.

No it isn't.  It doesn't have two different valid interpretations, so it's
not ambiguous, so the precedence rules are not applicable.  The only valid
parsing of the expression is

    n && (m *= n--).

> Our MetaWare and Green Hills compilers catch it, but every pcc-based
> compiler I tried compiled it with no warnings.

Your compilers are broken.

> Looking at the assembly-language generated, it appears that it compiled as:
> 
> 	n && (m*=n--)

...which is correct.

> What do other compilers out there do?

I hope they perform as above.  I'm tired of compiler bugs.

Dave

tim@crackle.amd.com (Tim Olson) (03/17/89)

In article <2550086@hpisod2.HP.COM> decot@hpisod2.HP.COM (Dave Decot) writes:
| > Interesting -- the program *is* incorrect, because && has higher
| > precedence than *= so it is parsed as:
| > 
| > 	(n&&m) *= n--
| > 
| > which is illegal because (n&&m) is not an lvalue.
| 
| No it isn't.  It doesn't have two different valid interpretations, so it's
| not ambiguous, so the precedence rules are not applicable.  The only valid
| parsing of the expression is

Sorry -- I should have specified that I was talking about
pANS-conforming compilers.

| > Our MetaWare and Green Hills compilers catch it, but every pcc-based
| > compiler I tried compiled it with no warnings.
| 
| Your compilers are broken.

No, they are pANS-conforming.

The relevant grammar is:

	(3.3.13) logical-AND-expression:
		inclusive-OR-expression
		logical-AND-expression && inclusive-OR-expression


	(3.3.16) assignment-expression:
		conditional-expression
		unary-expression assignment-operator assignment-expression


There is no way to parse

 	n&&m *= n--

without adding parenthesis to make (n&&m) a unary-expression or (m *=
n--) an inclusive-OR-expression.


	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)

djones@megatest.UUCP (Dave Jones) (03/17/89)

From article <2550086@hpisod2.HP.COM>, by decot@hpisod2.HP.COM (Dave Decot):
  From some article, by somebody (??):
>> Interesting -- the program *is* incorrect, because && has higher
>> precedence than *= so it is parsed as:
>> 
>> 	(n&&m) *= n--
>> 
>> which is illegal because (n&&m) is not an lvalue.
> 
> No it isn't.  It doesn't have two different valid interpretations, so it's
> not ambiguous, so the precedence rules are not applicable.  The only valid
> parsing of the expression is
> 
>     n && (m *= n--).
> 

Two can play this "No it isn't" game.

I'm sure lots of others will correct you on this, but just in case...

A compiler which allows the above statement is broken, in as much
as it will compile nonportable programs without warning. The fact that
the statement is semanticly meaningless when parsed occording to the
precedence rules is not a legal reason for parsing it some other way.

The precedence rules, as defined in K&R, and the ANSII draft standard,
and as implemented in conforming compilers, are purely syntactic.
The part of the compiler which determines precedence, (called the "parser"),
does not refer to the semantic values assigned to identifiers, except in
the case of identifiers which have attained type-name status through the
typedef mechanism.

Many languages are completely context-insensitive when parsing.  In Pascal,
for example, if you write this:

   if (int1 + int2 > 0 or boolvar ) then foo;

It will tell you there is an error.

The only meaningful way to parse it is like this:

   ((int1 + int2) > 0) or boolvar


But because ">" is (incredibly) defined to have higher precedence
than "+", it parses it like this:


    (int1 + (int2>0)) or boolvar

When it gets to the semantic analysis phase, the compiler will complain
of type-mismatches.

Tim_CDC_Roberts@cup.portal.com (03/18/89)

In <2550086@hpisod2.HP.COM>, decot@hpisod2.HP.COM (Dave Decot) writes:

>>    (n&&m) *= n--
> ... doesn't have two different valid interpretations, so it's not
> ambiguous, so the precedence rules are not applicable.

I disagree with this statement.  The parser does not necessarily have
any knowledge about whether a construct is _semantically_ valid.  It
is attempting to make a _syntactically_ valid interpretation, and BOTH
interpretations [ (n&&m) *= n--  vs.  n && (m*=n--) ] are syntactically
valid.

I say this tentatively - I've written compilers, but not a C compiler.
I would appreciate an opinion by a C compiler writer.

Tim_CDC_Roberts@cup.portal.com                | Control Data...
...!sun!portal!cup.portal.com!tim_cdc_roberts |   ...or it will control you.

steve@umigw.MIAMI.EDU (steve emmerson) (03/20/89)

(This discussion reminds me of the skit on Saturday Night Live where one
side argues that a certain product is a dessert topping and the other argues 
that it's a floor wax :-).

I believe the string "n&&m*=n--" is legal under the grammar rules for
*K&R-1st edition C*.  Under pANS (or K&R-2nd edition C) however, I
believe it's illegal.  (Since in his original posting Tim Olson didn't 
say which grammar he was assuming, it turns out we're both right!).

I believe a given string is legal if it can be *generated* by the production
rules for the grammar.  Arguments as to whether or not a particular type 
of parser can correctly interpret a given string are, in my opinion, 
irrelevant.  The string "n&&m*=n--" can be generated from the production
rules for K&R-I C found in Appendix A.  Here's the parse tree:

                            expression
                                 |
            ---------------------------------------
            |             |                       |
        expression       binop                expression
            |             |                       |
            |             |         -------------------------------
            |             |         |             |               |
        primary           &&    expression      asgnop        expression
            |                       |             |               |
            |                       |             |         -------------
            |                       |             |         |           |
            n                   identifier        *=    expression     unop
                                    |                       |           |
                                    m                   identifier      --
                                                            |
                                                            n

It's up to the parser to (re)discover this parse tree upon encountering
the string "n&&m*=n--".  Some can; some can't.  Since it can be
*generated* from the K&R-I grammar, however, it's legal.

Admittedly, the Appendix A production rules are not very formal, still,
a very good argument can be made for their being the ultimate arbiter of
K&R-I C.  (They are certainly at least as valid as anything else ;-).
Arguments that the string is invalid Appendix A K&R-I C must show that

    1) the above parse tree is invalid; or
    2) the above parse tree is non-unique.  This would require the use
       of disambiguating precedence rules in order to parse it and a
       consequent conclusion that it's illegal.

In pANS C (and K&R-II C), on the other hand, the string "n&&m*=n--" cannot 
be generated---much less parsed.

Interesting, no?
-- 
Steve Emmerson                     Inet: steve@umigw.miami.edu [128.116.10.1]
SPAN: miami::emmerson (host 3074::)      emmerson%miami.span@star.stanford.edu
UUCP: ...!ncar!umigw!steve               emmerson%miami.span@vlsi.jpl.nasa.gov
"Computers are like God in the Old Testament: lots of rules and no mercy"

limes@wseng.sun.com (Greg Limes) (03/21/89)

In article <15936@cup.portal.com> Tim_CDC_Roberts@cup.portal.com writes:

>  In <2550086@hpisod2.HP.COM>, decot@hpisod2.HP.COM (Dave Decot) writes:

>  >>    (n&&m) *= n--
>  > ... doesn't have two different valid interpretations, so it's not
>  > ambiguous, so the precedence rules are not applicable.

>  I disagree with this statement.  The parser does not necessarily have
>  any knowledge about whether a construct is _semantically_ valid.  It
>  is attempting to make a _syntactically_ valid interpretation, and BOTH
>  interpretations [ (n&&m) *= n--  vs.  n && (m*=n--) ] are syntactically
>  valid.

[[disclaimer at the top: the last C compiler I wrote was about five
years ago; the grammar below is excerpted from dusty memories. I am
looking forward to getting my hands on the BNF for ANSI C.]]

Depends on how you write your syntax. If, for instance, you separate
the concept of "lvalue" (something you can assign to) from "expr" in
the parser instead of pushing that task downstream. For instance,
given this segment of a grammar:

	lval ::= IDENT
	lval ::= lval POSTOP
	expr ::= lval ASGOP expr
	expr ::= expr BINOP expr
	expr ::= lval

And this input:

	n&&m*=n--

The tokens emitted by the lexical analiser:

	IDENT BINOP IDENT ASGOP IDENT POSTOP

I contend that the reduction via the BINOP is delayed until after the
reduction via the ASGOP, simply because it is the only way that you
come to a valid parsing of the statement. The parser simply has no
choice; when it has shifted the second IDENT and reduced it to an
lval, and is considering what to do next, the following ASGOP forces
it to shift the asgop onto the stack, and defer the reduction on the
BINOP until later, since the result of such a reduction can not
possibly be on the left side of the ASGOP.

Just to be sure I am on the right track here, any compiler wizzes out
there want to take a look at a simple action trace for me?

 stack					token	action
					IDENT	shift
 IDENT					BINOP	reduce(1)
 lval					BINOP	reduce(5)
 expr					BINOP	shift
 expr BINOP				IDENT	shift
 expr BINOP IDENT			ASGOP	reduce(1)
 expr BINOP lval			ASGOP	reduce(1)
 expr BINOP lval ASGOP			IDENT	shift
 expr BINOP lval ASGOP IDENT		POSTOP	reduce(1)
 expr BINOP lval ASGOP lval		POSTOP	shift
 expr BINOP lval ASGOP lval POSTOP	<end>	reduce(2)
 expr BINOP lval ASGOP lval		<end>	reduce(3)
 expr BINOP expr			<end>	reduce(4)
 expr					<end>

I look forward to finding out what has changed in the C language that
would invalidate this parsing.