[comp.lang.c] possible operator precedence bug?

kenny@m.cs.uiuc.edu (10/07/88)

/* Written  6:17 pm  Oct  4, 1988 by jejones@mcrware.UUCP in m.cs.uiuc.edu:comp.lang.c */
/* ---------- "possible operator precedence bug?" ---------- */
[... expression that looks like]
|
|	M[Z]=Z[<hairy expression> ? <expression>, <expression>, "_." : " |"];
|
|The question is this: why does this make it through the compiler at all?
|K&R says that the comma operator has the lowest priority, so shouldn't
|the following be required for the code to correctly parse?
|
|	M[Z]=Z[<hairy expression> ? (<expression>, <expression>, "_.") : " |"];
|	                            ^                                ^
/* End of text from m.cs.uiuc.edu:comp.lang.c */

The precedence rules are used only when needed to resolve ambiguity
among two syntactically correct parses of the same sentence.  The use
of the comma between the query and the colon is unambiguous, and the
precedence rules do not come into play.  The relative precedence of ?:
and , is used is in ambiguous contexts like:

	x = ( a , b ? c : d );
and	x = ( a ? b : c , d);

where the higher precedence of ? : causes the statements to be
interpreted as

	x = ( a , ( b ? c : d ) );
and	x = ( ( a ? b : c ) , d ); respectively.

Kevin

Kevin Kenny			UUCP: {uunet,pur-ee,convex}!uiucdcs!kenny
Department of Computer Science	ARPA Internet or CSNet: kenny@CS.UIUC.EDU
University of Illinois
1304 W. Springfield Ave.
Urbana, Illinois, 61801		Voice: (217) 333-6680

jfh@rpp386.Dallas.TX.US (The Beach Bum) (10/07/88)

In article <751@mcrware.UUCP> jejones@mcrware.UUCP (James Jones) writes:
>A recently-posted C program that generates random mazes gives me cause to
>wonder about C operator precedence.
>
>I include the source here for reference--it's very short:

[ and I un - >'d it so I could throw some stuff at it ... ]

char*M,A,Z,E=40,J[40],T[40];main(C){for(*J=A=scanf(M="%d",&C);
--            E;             J[              E]             =T
[E   ]=  E)   printf("._");  for(;(A-=Z=!Z)  ||  (printf("\n|"
)    ,   A    =              39              ,C             --
)    ;   Z    ||    printf   (M   ))M[Z]=Z[A-(E   =A[J-Z])&&!C
&    A   ==             T[                                  A]     
|6<<27<rand()||!C&!Z?J[T[E]=T[A]]=E,J[T[A]=A-Z]=A,"_.":" |"];}


>The inner loop of this program contains an expression, which, as far as
>I can tell after feeding through cb, has the form
>
>	M[Z]=Z[<hairy expression> ? <expression>, <expression>, "_." : " |"];

This is the fully parenthised expression, as reported by "cparen".  Every
one should have this sucker - just after having "cdecl".

(M[Z])=
	(Z[(((A-(E=(A[J-Z])))&&(((!C)&(A==(T[A])))|((6<<27)<(rand()))))||
((!C)&(!Z)))?
	((((J[(T[E])=(T[A])])=E),((J[(T[A])=(A-Z)])=A)),"_.") :
	"|"])

>The question is this: why does this make it through the compiler at all?
>K&R says that the comma operator has the lowest priority, so shouldn't
>the following be required for the code to correctly parse?
>	M[Z]=Z[<hairy expression> ? (<expression>, <expression>, "_.") : " |"];
>	                            ^                                ^

No, as you can see above, the grammar groups the expression in the "correct"
fashion without the explicit parentheses.

Here is a snip from a YACC grammar which conforms to the 11/9/87 ANSI Draft.

conditional_expr
	: logical_or_expr
	| logical_or_expr '?' expr ':' conditional_expr
	;

assignment_expr
	: conditional_expr
	| unary_expr assignment_operator assignment_expr
	;

expr
	: assignment_expr
	| expr ',' assignment_expr
	;

As we see, conditional_expr is of a higher precedence, but the first
expression is an <expr>.  So, we luck out.  Watch this:

	a ? b : c , d , e

becomes

	((a ? b : c) , d) , e

Notice that you now get the behaviour you expected - ',' is of a
lower precendence and the left hand side groups to reflect this.
But the code of the form

	a ? b , c , d : e

becomes

	a ? ((b , c) , d) : e

because the grammar specifies the left hand side of ":" as a full-blown
expression.  Which implies that ? :'s nest the same way as if-then-elses
do.  Watch -

	a ?  b ? c : d : e	and
	v ?  w : x ? y : z

becomes [ you should be able to guess this one ;-) ]

	a ?  (b ? c : d) : e	and
	v ?  w : (x ? y : z)

10 points if you got it correct ...
-- 
John F. Haugh II (jfh@rpp386.Dallas.TX.US)                   HASA, "S" Division

      "Why waste negative entropy on comments, when you could use the same
                   entropy to create bugs instead?" -- Steve Elias

res@cbnews.ATT.COM (Robert E. Stampfli) (10/08/88)

>In article <751@mcrware.UUCP> jejones@mcrware.UUCP (James Jones) writes:
>>A recently-posted C program that generates random mazes gives me cause to
>>wonder about C operator precedence.
>>
>>I include the source here for reference--it's very short:
>
>char*M,A,Z,E=40,J[40],T[40];main(C){for(*J=A=scanf(M="%d",&C);
>--            E;             J[              E]             =T
>[E   ]=  E)   printf("._");  for(;(A-=Z=!Z)  ||  (printf("\n|"
>)    ,   A    =              39              ,C             --
>)    ;   Z    ||    printf   (M   ))M[Z]=Z[A-(E   =A[J-Z])&&!C
>&    A   ==             T[                                  A]     
>|6<<27<rand()||!C&!Z?J[T[E]=T[A]]=E,J[T[A]=A-Z]=A,"_.":" |"];}
  ^^^^^
>

When I tried compiling this, it didn't work, generating mazes which were
always columnar except for one horizontal row at the bottom.

However, by changing the "6<<27" to "4<<11" I found it worked
for my "big-endian" type machines.  The rand() function returns a
number in the range 0 - 2^15-1 on my systems, making the program 
always fail the comparison, as originally written.  Other systems
may have slightly different rand() functions, I suppose.  Regardless,
this is one jewel of a program.  Hope this helps someone.

Rob Stampfli
att!cbnews!res

kenny@m.cs.uiuc.edu (10/12/88)

karl@haddock.ima.isc.com writes:

>>>	M[Z]=Z[<hairy expression> ? <expression>, <expression>, "_." : " |"];
>>
>>Operator precedence only comes into play when there is ambiguity.  Here
>>there is no ambiguity - the above can only be legal C when parsed one
>>way, so there is no need to turn to operator precedence.

>This turns out not to be the case.  For example, "a + b=0" and "a+++++b" are
>both illegal, even though they could have been legal if parsed/scanned as
>"a + (b=0)" and "a++ + ++b".

Come on, Karl, you can do better than that.  You have to distinguish
between constructs that are incorrect *syntactically* (i.e., there is
no sequence of productions in the C grammar that can generate the
string) from those that are incorrect *semantically* (type
incompatibilities, expressions without lvalues on the left of an equal
sign, and so on).  You also have to look at the tokenized version of
the strings -- what the parser sees after the lexer is done with them.

For the first case you mention, 
	a + b = 0 ,
there is an ambiguous parse: one can interpret the statement as
	(a + b) = 0 ,
or as
	a + (b = 0) .
The precedence rules *do* apply, and resolve the ambiguity to the
interpretation,
	(a + b) = 0 , 
since + takes precedence over =.  Then we find that the statement,
while correct syntactically, is incorrect semantically, as (a + b)
does not have an lvalue.

In the second case, `a+++++b,' we note that lexical analysis is
defined to match the longest token scanning from left to right; the
tokenized version of the string is therefore
	 a ++ ++ + b .
There is only one parse for this string of tokens, so precedence
doesn't come into play:
	((a ++) ++) + b .
Once again, the resulting interpretation is syntactically correct, but
has a semantic error; the inner ++ operator postincrements a, but
(a++) has no lvalue, so the outer ++ operator gets a semantic error.

The case of
	a ? b , c : d
has no such problem.  There is no lexical confusion, particularly when
the spaces that I show are supplied.  There is no ambiguity in the
parse; the only syntactically correct interpretation is 
	a ? (b , c) : d .
Then, as long as c and d are type-compatible, and a can be coerced to
an integral type, the expression has a meaningful interpretation
semantically.  (The ice is thin if b and c are type-incompatible --
does anyone know if the latest dpANS is clearer on this point?)

Kevin Kenny			UUCP: {uunet,pur-ee,convex}!uiucdcs!kenny
Department of Computer Science	ARPA Internet or CSNet: kenny@CS.UIUC.EDU
University of Illinois
1304 W. Springfield Ave.
Urbana, Illinois, 61801		Voice: (217) 333-6680

gwyn@smoke.ARPA (Doug Gwyn ) (10/13/88)

In article <4700025@m.cs.uiuc.edu> kenny@m.cs.uiuc.edu writes:
-	a ? b , c : d
-has no such problem.  There is no lexical confusion, particularly when
-the spaces that I show are supplied.  There is no ambiguity in the
-parse; the only syntactically correct interpretation is 
-	a ? (b , c) : d .
-Then, as long as c and d are type-compatible, and a can be coerced to
-an integral type, the expression has a meaningful interpretation
-semantically.  (The ice is thin if b and c are type-incompatible --
-does anyone know if the latest dpANS is clearer on this point?)

I don't see the issue.  b,c does not require type compatibility
for b and c.  That comma-expression has the type of c.  Type
compatibility constraints apply after the parse is already done.

karl@haddock.ima.isc.com (Karl Heuer) (10/14/88)

In article <4700025@m.cs.uiuc.edu> kenny@m.cs.uiuc.edu writes:
>karl@haddock.ima.isc.com writes:
>>[someone writes:]
>>>Operator precedence only comes into play when there is ambiguity.  Here
>>>there is no ambiguity - the above can only be legal C when parsed one
>>>way, so there is no need to turn to operator precedence.
>>
>>This turns out not to be the case.  For example, "a + b=0" ...
>
>Come on, Karl, you can do better than that.

You're right.  The point I was trying to make was that the comment was
misleading, as it suggests that the compiler ought to parse "a+b=0" as
"a+(b=0)" because it's the "only way that yields legal C".  It doesn't work
that way, of course, but the stereotypical "dumb user" won't know that.

When I wrote that example, I had the dpANS but not K&R in front of me, and I
didn't realize that the latter uses an ambiguous grammar.  Using the ANSI
grammar as a base, I think the analogy holds: "a?b,c:d" can only be parsed one
way, so there is no need to parenthesize (b,c).  "a+b=0" can only be parsed
one way, which then fails to make sense semantically.  "Precedence" is just a
word you can use to help understand the situation; it's not used in the formal
grammar, which is already unambiguous.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint