maart@cs.vu.nl (Maarten Litmaath) (09/13/89)
gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
[... Must
0 ? 0 : i = 0
be parsed as
(0) ? (0) : (i=0)
or as
(0 ? 0 : i) = 0
? ...]
\
\OOPS! As Steve Emmerson was the first to point out to me, the correct
\way to parse expressions in C is to follow the formal grammar reduction
\rules, not rely on the precedence/associativity tables. Because the
\left operand in an assignment expression cannot be a conditional
\expression (it is constrained to be a unary expression with lvalueness),
\there is no legal way to parse the example into the second form. Thus
\I was mistaken, as were the compiler implementors who parsed it that
\way. Perhaps they made the same mistake I just did.
Then how about the following, Doug? "There is no legal way to parse..."
0 && i = 0
--
C, the programming language that's the same |Maarten Litmaath @ VU Amsterdam:
in all reference frames. |maart@cs.vu.nl, mcvax!botter!maart
gwyn@smoke.BRL.MIL (Doug Gwyn) (09/13/89)
In article <3236@solo10.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes: >Then how about the following, Doug? "There is no legal way to parse..." > 0 && i = 0 Sounds good to me; there is no legal way to parse 0 && i = 0 as (0 && i) = 0 but there is a legal parse as 0 && (i = 0) Why did you bring that example up? Are we using it wrong in the Standard, or what?
maart@cs.vu.nl (Maarten Litmaath) (09/13/89)
gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
\... there is no legal way to parse
\ 0 && i = 0
\as
\ (0 && i) = 0
\but there is a legal parse as
\ 0 && (i = 0)
gcc (ANSI or what?) does accept
0 ? 0 : i = 0
but it does NOT accept
0 && i = 0
In fact, I've never used a C compiler that accepted the latter construct.
Of course I fully agree it should be accepted.
--
creat(2) shouldn't have been create(2): |Maarten Litmaath @ VU Amsterdam:
it shouldn't have existed at all. |maart@cs.vu.nl, mcvax!botter!maart
gwyn@smoke.BRL.MIL (Doug Gwyn) (09/14/89)
In article <3242@solo12.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes: >gwyn@smoke.BRL.MIL (Doug Gwyn) writes: >\... there is no legal way to parse >\ 0 && i = 0 >\as >\ (0 && i) = 0 >\but there is a legal parse as >\ 0 && (i = 0) Actually, now that I've gotten back to my desk where I keep a copy of the proposed Standard, I find that you must supply the parentheses to get a legal parse. Without them there is no derivation from the grammar production rules. (I think. I keep getting this wrong.) >gcc (ANSI or what?) does accept > 0 ? 0 : i = 0 >but it does NOT accept > 0 && i = 0 >In fact, I've never used a C compiler that accepted the latter construct. >Of course I fully agree it should be accepted. Maybe the intention is to avoid confusion. For example, i = 0 && i = 0 would either have to have an ambiguous parse, or else its interesting subexpression would be parsed differently depending on context, which is confusing. I seem to recall Dave Prosser telling me that the Standard's grammar for C (apart from the preprocessor) constituted a "phrase structure" grammar. Perhaps if I knew what that meant I'd understand the rationale behind these particular expression parsing rules. It does appear that the current rules avoid parsing ambiguity. That's probably a worthwhile constraint.
djones@megatest.UUCP (Dave Jones) (09/14/89)
In article <3242@solo12.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes: > gwyn@smoke.BRL.MIL (Doug Gwyn) writes: >>... there is no legal way to parse >> 0 && i = 0 >>as >> (0 && i) = 0 >>but there is a legal parse as >> 0 && (i = 0) According to the the grammar at the back, there is no legal way to parse this at all. But, according to the precedence table on page 53 it does indeed parse as <0 && i> = 0 So if you believe the table, Mr. Gwyn had it exactly backwards, because the following is NOT a legal parse, according to either the precedence table or the grammar: 0 && <i = 0> But let's be generous and give him partial credit, assuming that the grammar is the Law and the precedence table is for amusement purposes only. Of course, even if you do parse it according to the precedence rules, you still get an error, because (0 && i) is not a legal L-value for an assignment operator. No surprise there. That's the way it was in old C. Like I said, it takes a bit of doing, but you can show that according to the grammar, it does not parse at all. But, interestingly, you can use parentheses to force it to parse the way the precedence rules say it does: ( 0 && i ) = 0 The grammar will parse that. Naturally, it parses as <0 && i> = 0 ... and we are back to the old not-an-L-value-problem. Allow me to test-burn my flamethrower for just one short burst... The grammar at the back of the book, presumably close to the ANSII standard, does not use precedence for "disambiguation". Too easy, I guess. Instead it has lot's of very confusing different kinds of expressions. Among them are "assignment-expression" and "logical-and-expression", for example. A crock, it you ask me. Presumably, (if they got it right), those rules imply the precedence given in the K&R table, except that, as we saw, some expressions are rejected by the grammar rather than by "semantic" rules, such as the L-value rule. I would like to hear the rationale behind creating such a convoluted grammar, when the precedence rules are so much easier to understand. I can just picture a beginning programmer trying to push and pop his mental stack through that maze as he tries to figure out why his program is in error. Besides, the precedence rules are also much better for compiler-writers, too! It's easy to go from a precedence table to a grammar like the published one, although I can't think of any reason why you would want to. But the reverse, going from the grammar to the precedence table is difficult. If it were easy, we wouldn't be having this discussion, right? I think they really botched it there. Big time. Ah. That's much better. Thank you. (If I ever write a C compiler, I'm going to use precedence, and if anybody ever complains that they got an "L-value" error message when they distinctly asked for a "syntax" error message, maybe I'll offer to wash their car.)
gwyn@smoke.BRL.MIL (Doug Gwyn) (09/14/89)
In article <8034@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes: >But let's be generous and give him partial credit, assuming that >the grammar is the Law and the precedence table is for amusement >purposes only. Don't give me any credit, because it turned out that there was no legal parse for the expression at all. The grammar IS the Law. I don't know exactly WHAT the precedence table is for, but it may be intended as a convenient quick way to check whether one needs to use grouping parentheses when writing code, without having to painstakingly work through the production rules. >I would like to hear the rationale behind creating such a convoluted >grammar, when the precedence rules are so much easier to understand. Having recently made several mistakes with this, I'm perhaps not the best source for that rationale. Note, however, that the grammar allowed me to eventually figure out the correct answer, whereas the precedence rules initially led me to jump to a false conclusion. (I made an error in applying the grammar, too, but that could be manually verified and corrected. It wasn't the fault of the grammar.) I don't find the grammar especially "convoluted"; in fact it's relatively straightforward. The reason for the several types of expression is precisely to reflect the correct precedence without requiring precedence to be provided by means outside the grammar. >Besides, the precedence rules are also much better for compiler- >writers, too! I disagree with this. In implementing several language translators, I've always used the formal grammar production rules directly in the implementation. I've never had occasion to try to impose precedence rules any other way.
bill@twwells.com (T. William Wells) (09/15/89)
In article <11054@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
: I seem to recall Dave Prosser telling me that the Standard's grammar
: for C (apart from the preprocessor) constituted a "phrase structure"
: grammar. Perhaps if I knew what that meant I'd understand the
: rationale behind these particular expression parsing rules.
The term "phrase structure grammar" is pretty much equivalent to
"specified by a pure BNF". It also implies that other language things
that are not directly specified in the grammar can nonetheless be
specified in terms of the grammar, i.e., that not only does the
grammar accept C but that, by and large, the parse that results
makes some kind of sense.
C can be specified by an LALR(1) grammar if the lexical scanner
returns tokens that distinguish type names from other identifiers. If
it doesn't, there is no unambiguous context free grammar for it. (This
from some code that demonstrated that the language is ambiguous; if
the language is ambiguous, then there is no unambiguous grammar for
it.) The importance of LALR(1) is that there are a number of widely
used tools (like Yacc) that accept LALR(1) grammars.
(BTW, Doug, I expect that you know much of this, but I figured I'd
say it for everyone else.)
: It does appear that the current rules avoid parsing ambiguity. That's
: probably a worthwhile constraint.
Very much so. One can write an ambiguous grammar and then have
extragrammatical constraints on the parse, to make it unambiguous.
Yacc does this and, for certain kinds of grammars, the parsers are
smaller and faster than they would be if the grammar were written
unambiguously and without the extra constraints.
But, having the grammar specification itself ambiguous just means
that you will end up with portability problems. Just as has happened
with the ?: and = situation.
---
The way to determine if a piece of text is legal C is to try to parse
it with the grammar. If that fails, it isn't legal C. Then look at
each phrase of the parse and see if there are any constraints that
are violated. If so, again it isn't legal.
The ?: operator is specified in the grammar by:
conditional-expression:
logical-OR-expression
logical-OR-expression ? expression : conditional-expression
The assignment operators are specified by:
assignment-expression:
conditional-expression
unary-expression assignment-operator assignment-expression
A logical-OR-expression can't have, directly, an assignment. If there
is one, it has to be in parenthesis. Thus, expressions like:
a = b ? c : d
have to be parsed as a = (b ? c : d) and this is how they have always
been. On the other hand,
a ? b = c : d
should parse as a ? (b = c) : d but I've seen compilers that refuse
to parse it, due to the fact that the specification in K&R didn't say
exactly what kind of expression belonged in the middle. Such are the
perils of not having a formal specification. Finally,
a ? b : c = d
can't be parsed as (a ? b : c) = d, since a ? b : c isn't a
unary-expression. It can, however, be parsed as a ? b : (c = d). As
far as I know, there are no compilers that this breaks on.
Similarly one analyzes a && b = c: the operands of && are specified
as logical-AND-expression and inclusive-or-expression, neither of
which can contain an unparenthesized assignment. And the left operand
of = must be a unary expression which a && b certainly isn't. Thus
this expression is not legal C.
---
Bill { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com
bill@twwells.com (T. William Wells) (09/15/89)
In article <8034@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes:
: The grammar at the back of the book, presumably close to the ANSII
: standard, does not use precedence for "disambiguation". Too easy, I guess.
: Instead it has lot's of very confusing different kinds of expressions.
: Among them are "assignment-expression" and "logical-and-expression",
: for example. A crock, it you ask me. Presumably, (if they got it right),
: those rules imply the precedence given in the K&R table, except that, as
: we saw, some expressions are rejected by the grammar rather than by
: "semantic" rules, such as the L-value rule. I would like to hear the
: rationale behind creating such a convoluted grammar, when the precedence
: rules are so much easier to understand.
The precedence rules are wrong. The grammar is right. The reason
for the grammar is that precedence rules can not be sufficient to
unambiguously specify the parse; the grammar is. Postfix operators
and ?: are the culprits. There is no simpler way, however, to
properly specify the grammar, in pure BNF, than by having one
itty-bitty rule set for each "level of precedence". Such is life.
: I can just picture a beginning
: programmer trying to push and pop his mental stack through that maze
: as he tries to figure out why his program is in error.
It actually isn't too hard, once you realize that the bulk of the
grammar contains rule chains like:
Z:
Y
Y
Y op Z
you just find the operator you want, look up towards the top of the
grammar to see if the surrounding operators would also parse (in
which case, they have precedence; this is a consequence of the
grammar being unambiguous), and if all goes well, you know how that
piece fits. Repeat for all the other pieces and you have your parse.
Besides which, the standard is explicitly *not* intended for use by a
beginning programmer.
: (If I ever write a C compiler, I'm going to use precedence, and if
: anybody ever complains that they got an "L-value" error message when they
: distinctly asked for a "syntax" error message, maybe I'll offer to wash
: their car.)
I wrote a precedence parser for C expressions. It was a mass of
kludges on top of fudges. I threw it away. C is just not specifiable
by precedence.
---
Bill { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com
maart@cs.vu.nl (Maarten Litmaath) (09/15/89)
gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
\... 0 && (i = 0)
\...
\Actually, now that I've gotten back to my desk where I keep a copy of
\the proposed Standard, I find that you must supply the parentheses to
\get a legal parse. Without them there is no derivation from the
\grammar production rules. (I think. I keep getting this wrong.)
You're right, according to the January '88 :-( copy of the dpANS I have
at home.
[... 0 ? 0 : i = 0
is accepted, whereas
0 && i = 0
is not.]
\Maybe the intention is to avoid confusion. For example,
\ i = 0 && i = 0
\would either have to have an ambiguous parse, or else its interesting
\subexpression would be parsed differently depending on context, which
\is confusing. [...]
Allright, let's avoid the confusion; another example:
a + b = 7
But why allow the `?:' expression, why make it a special case?
--
creat(2) shouldn't have been create(2): |Maarten Litmaath @ VU Amsterdam:
it shouldn't have existed at all. |maart@cs.vu.nl, mcvax!botter!maart
maart@cs.vu.nl (Maarten Litmaath) (09/15/89)
bill@twwells.com (T. William Wells) writes:
\... a ? b : c = d
\
\can't be parsed as (a ? b : c) = d, since a ? b : c isn't a
\unary-expression. It can, however, be parsed as a ? b : (c = d). As
\far as I know, there are no compilers that this breaks on.
\
\Similarly one analyzes a && b = c:
STOP! Let's do precisely what you said, let's analyze a && b = c
SIMILARLY:
a && b = c
"can't be parsed as (a && b) = c, since a && b isn't a
unary-expression. It can, however, be parsed as a && (b = c). As
far as I know, there are no compilers that this breaks on."
But wait! This example breaks on EVERY compiler!
--
creat(2) shouldn't have been create(2): |Maarten Litmaath @ VU Amsterdam:
it shouldn't have existed at all. |maart@cs.vu.nl, mcvax!botter!maart
djones@megatest.UUCP (Dave Jones) (09/15/89)
From article <1989Sep14.181049.8175@twwells.com), by bill@twwells.com (T. William Wells): ) ) : (If I ever write a C compiler, I'm going to use precedence, and if ) : anybody ever complains that they got an "L-value" error message when they ) : distinctly asked for a "syntax" error message, maybe I'll offer to wash ) : their car.) ) ) I wrote a precedence parser for C expressions. It was a mass of ) kludges on top of fudges. I threw it away. C is just not specifiable ) by precedence. ) You misunderstand my intensions. I agree that precedence parsers are obsolete. By "use precedence" I just refer to disambiguation rules such as the yacc %left and %prec directives. They make life easy and LR tables short. Take a peek a the cgram.y file in the source of the so called "portable" C compiler, if you dare. It may not be pretty, but it would be even not prettier if it were not for the precedence rules.
bill@twwells.com (T. William Wells) (09/16/89)
In article <3263@solo5.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes: : bill@twwells.com (T. William Wells) writes: : \... a ? b : c = d : \ : \can't be parsed as (a ? b : c) = d, since a ? b : c isn't a : \unary-expression. It can, however, be parsed as a ? b : (c = d). As : \far as I know, there are no compilers that this breaks on. : \ : \Similarly one analyzes a && b = c: : : STOP! Let's do precisely what you said, let's analyze a && b = c : SIMILARLY: : : a && b = c : : "can't be parsed as (a && b) = c, since a && b isn't a : unary-expression. It can, however, be parsed as a && (b = c). As : far as I know, there are no compilers that this breaks on." : : But wait! This example breaks on EVERY compiler! Yes. I was brain-dead on that one. I *know* how to read those damn things. And I know that that bit of code won't work. But I was asleep at the wheel. Somehow I reversed what I was going to say and just continued writing with the negation substituted for what I intended. Sorry. You may take the flames I usually reserve for others as having been directed at myself. That paragraph should have said that a ? b : c = d is also not valid because c = d is not a conditional-expression and so is not legal C. And will not parse on any compiler I know of. --- Bill { uunet | novavax | ankh | sunvice } !twwells!bill bill@twwells.com
news@ism780c.isc.com (News system) (09/16/89)
In article <3263@solo5.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
:STOP! Let's do precisely what you said, let's analyze a && b = c
:SIMILARLY:
:
: a && b = c
:
:"can't be parsed as (a && b) = c, since a && b isn't a
:unary-expression. It can, however, be parsed as a && (b = c). As
:far as I know, there are no compilers that this breaks on."
:
:But wait! This example breaks on EVERY compiler!
Unfortunately, pcc based compilers accept a && b += c. They do reject
a && b = c however. Writing a grammer for the language as was done by ANSI
helps avoid these funnies.
Marv Rubinstein
dg@lakart.UUCP (David Goodenough) (09/19/89)
OK, you say: a ? b : c = d can't be parsed sensibly as (a ? b : c) = d which I tend to agree with: the term in parentheses is not an lvalue. But what about: int a; char *b, *c, d; . . . a ? *b : *c = d Now, can that legally be done?? Whatever value a has (TRUE or FALSE) the a ? *b : *c construct could be considered an lvalue (OK, so you need a slightly twisted mind to see it :-) ) And it's a somewhat academic question, since the * (indirection) operator distributes over the ?: operator: the above could equally well be written as: *(a ? b : c) = d which had better be fair game, otherwise I'm going to complaing to my compiler vendor. -- dg@lakart.UUCP - David Goodenough +---+ IHS | +-+-+ ....... !harvard!xait!lakart!dg +-+-+ | AKA: dg%lakart.uucp@xait.xerox.com +---+
karzes@mfci.UUCP (Tom Karzes) (09/22/89)
Neither of the following are legal because conditional expressions (?:) are not regarded as lvalues: (a ? b : c) = d (a ? *b : *c) = d However, the following is legal since indirection expressions (*) are legal lvalues: *(a ? b : c) = d I personally don't see why ?: expressions can't be lvalues, provided the second and third operands have the same type. For example, the following would be legal, with result type int: int a, b, c[10], d, i; ... (a ? b : c[i]) = d; But the following would be illegal: int a, b, d; double c; ... (a ? b : c) = d; Oh well...