[comp.lang.c] C sintax

pardo@june.cs.washington.edu (David Keppel) (01/26/88)

( This isn't related to anything, but I've been unable to figure )
( out a mail path to the guy who asked, and I wanted to answer   )

>> 	;-D on  (My favorite sintax: switch(x+=*a+++*b){...})  Pardo
>
>Boy, you've got that. That's a sin, or should be.
>What, precisely, does x+=*a+++*b mean?

Well, actually it is ambiguous.  There are 2 (or more?) ways to parse
it.  I'm not sure the behavior is defined from lexer to lexer as to
which one is preferred.  Here are some possible interpretations:

    tmp = *a + ++*b;	/* means increment *b and add that to *a */
    x += tmp;
    switch (x) {...}

the (an) other is

    tmp = *a++ + *b;	/* means add *a to *b and increment a */
    x += tmp;
    switch (x) {...}

Not, of course that this is the most obfuscated code possible, but
it does get the point across.


		    ;-D on  (What point?)  Pardo

ark@alice.UUCP (01/26/88)

In article <4080@june.cs.washington.edu>, pardo@uw-june.UUCP writes:
> ( This isn't related to anything, but I've been unable to figure )
> ( out a mail path to the guy who asked, and I wanted to answer   )
> 
> >> 	;-D on  (My favorite sintax: switch(x+=*a+++*b){...})  Pardo
> >
> >Boy, you've got that. That's a sin, or should be.
> >What, precisely, does x+=*a+++*b mean?
> 
> Well, actually it is ambiguous.  There are 2 (or more?) ways to parse
> it.  I'm not sure the behavior is defined from lexer to lexer as to
> which one is preferred.

It's not ambiguous: C lexical analysis is defined by "maximal munch."

There is one issue, though: whether this is run on a C compiler
in which =* is a token (ancient).  If so, then += is probably
defined as two tokens as well.  On such an ancient compiler, it means

	x + =* a ++ + * b

which is illegal.  On more modern compilers, it means

	x += * a ++ + * b

with no ambiguity.

bright@Data-IO.COM (Walter Bright) (01/28/88)

In article <4083@june.cs.washington.edu> pardo@uw-june.UUCP (David Keppel) writes:
<<What, precisely, does x+=*a+++*b mean?
<Well, actually it is ambiguous.  There are 2 (or more?) ways to parse
<it.  I'm not sure the behavior is defined from lexer to lexer as to
<which one is preferred.  Here are some possible interpretations:
<    x += *a++ + *b;	/* means add *a to *b and increment a */
<the (an) other is
<    x += *a + ++*b;	/* means increment *b and add that to *a */

It's not ambiguous since the following rule is always applied:
	Tokens are formed from the longest possible sequence of characters
	that could form a token.
Therefore, (x+=*a+++*b) always parses as (x += * a ++ + * b). If there wasn't
this rule, there would be all kinds of problems, as in (x + = * a + ++ * b).
Note that the += was parsed as two separate tokens! Obviously impractical.

al@gtx.com (0732) (01/29/88)

In article <4080@june.cs.washington.edu> pardo@uw-june.UUCP (David Keppel) writes:
->>What, precisely, does x+=*a+++*b mean?
->
->Well, actually it is ambiguous.  There are 2 (or more?) ways to parse
->it.  I'm not sure the behavior is defined from lexer to lexer as to
->which one is preferred.  Here are some possible interpretations:
->
->    tmp = *a + ++*b;	/* means increment *b and add that to *a */
->    x += tmp;
->    switch (x) {...}
->
->the (an) other is
->
->    tmp = *a++ + *b;	/* means add *a to *b and increment a */
->    x += tmp;
->    switch (x) {...}
->
->Not, of course that this is the most obfuscated code possible, but
->it does get the point across.

Actually, it is not ambiguous.  RTFK&R. The issue is whether x+++y means
(x++)+y or x+(++y). According to K&R, p 179, the answer can only
be (x++)+y.  The rule is "the next token is taken to include  the longest
string of characters which could possibly constitute a token."  I think
there is an implicit assumption of left-to-right parsing.

Notice that this makes 1+++b syntactically illegal, even though it might
reasonably be interpreted as 1+(++b).

I'll stick my neck out and say I don't think there ARE any syntactic
ambiguities in C. (there are, of course, the semantic ones resulting
from undefined order of evaluation).  The potential ambiguities such
as the dangling else and those involving the comma operator are solved
by fiat.

Are there any syntactic ambiguities K&R didn't resolve?

    ----------------------------------------------------------------------
   | Alan Filipski, GTX Corp, 2501 W. Dunlap, Phoenix, Arizona 85021, USA |
   | {ihnp4,cbosgd,decvax,hplabs,amdahl}!sun!sunburn!gtx!al (602)870-1696 |
    ----------------------------------------------------------------------

mnc@m10ux.UUCP (Michael Condict) (02/03/88)

In <548@gtx.com>, al@gtx.com (Alan Filipski) writes:
> I'll stick my neck out and say I don't think there ARE any syntactic
> ambiguities in C. (there are, of course, the semantic ones resulting
> from undefined order of evaluation).  The potential ambiguities such
> as the dangling else and those involving the comma operator are solved
> by fiat.
> 
> Are there any syntactic ambiguities K&R didn't resolve?
> 
>     ----------------------------------------------------------------------
>    | Alan Filipski, GTX Corp, 2501 W. Dunlap, Phoenix, Arizona 85021, USA |
>    | {ihnp4,cbosgd,decvax,hplabs,amdahl}!sun!sunburn!gtx!al (602)870-1696 |
>     ----------------------------------------------------------------------

Yes, there are, at least if you mean context-free ambiguities.  That is,
without a symbol table or other context-sensitive techniques, the following
is ambiguous in C:

	{ t (x); ... }

If ``typedef ... t;'' appeared previously then this is a declaration of a var
named x of type t, otherwise it is a call to a function t with argument x.
This is more than a theoretical problem, since it implies that any parser
for C must include a symbol table package.  Thus if you wish to provide a
black-box C parser module to be used in multiple applications, either each
application must agree to use the symbol table package embedded in the parser,
or the application must redundantly provide its own symbol table.  Neither
prospect is very appealing to me.
-- 
Michael Condict		{ihnp4|vax135|cuae2}!m10ux!mnc
AT&T Bell Labs		(201)582-5911    MH 3B-416
Murray Hill, NJ

wsmith@uiucdcsb.cs.uiuc.edu (02/05/88)

In <548@gtx.com>, al@gtx.com (Alan Filipski) writes:
> I'll stick my neck out and say I don't think there ARE any syntactic
> ambiguities in C. (there are, of course, the semantic ones resulting
> from undefined order of evaluation).  The potential ambiguities such
> as the dangling else and those involving the comma operator are solved
> by fiat.
> 
> Are there any syntactic ambiguities K&R didn't resolve?
> 
>     ----------------------------------------------------------------------
>    | Alan Filipski, GTX Corp, 2501 W. Dunlap, Phoenix, Arizona 85021, USA |
>    | {ihnp4,cbosgd,decvax,hplabs,amdahl}!sun!sunburn!gtx!al (602)870-1696 |
>     ----------------------------------------------------------------------

main()
{
    int a,b;
 
    a = 0; b = 0;

    printf("%0d",a+++b);
}

Is this (a++)+b or a+(++b)?

Bill Smith
pur-ee!uiucdcs!wsmith
wsmith@a.cs.uiuc.edu

john@viper.Lynx.MN.Org (John Stanley) (02/08/88)

In article <165600034@uiucdcsb> wsmith@uiucdcsb.cs.uiuc.edu writes:
 >
 >main()
 >{
 >    int a,b;
 > 
 >    a = 0; b = 0;
 >
 >    printf("%0d",a+++b);
 >}
 >
 >Is this (a++)+b or a+(++b)?
 >
 >Bill Smith
 >pur-ee!uiucdcs!wsmith
 >wsmith@a.cs.uiuc.edu


  Because the C parser is defined as always taking the largest number
of characters that can be interpreted as a single token, and that it
scans from left to right, the answer should always be:

	((a++)+b)

--- 
John Stanley (john@viper.UUCP)
Software Consultant - DynaSoft Systems
UUCP: ...{amdahl,ihnp4,rutgers}!meccts!viper!john

daniel@sco.COM (daniel edelson) (02/09/88)

In article <4083@june.cs.washington.edu< pardo@uw-june.UUCP (David Keppel) writes:
<
...
<<< 	;-D on  (My favorite sintax: switch(x+=*a+++*b){...})  Pardo
<<
<<What, precisely, does x+=*a+++*b mean?
<
<Well, actually it is ambiguous.  There are 2 (or more?) ways to parse
<it.  I'm not sure the behavior is defined from lexer to lexer as to
<which one is preferred.  Here are some possible interpretations:
<
<    tmp = *a++ + *b;	/* means add *a to *b and increment a */
<    x += tmp;
<    switch (x) {...}
<
<the (an) other is
<
<    tmp = *a + ++*b;	/* means increment *b and add that to *a */
<    x += tmp;
<    switch (x) {...}
<
...
<		    ;-D on  (What point?)  Pardo

It is ambiguous in that there are two valid expressions that
could look taht way. But correct compilers would only give
one of those twwo parsings. There is a little-used rule 
in K&R that says that when forming tokens, you ust the longest
string that could be a token. Thus, the expression a+++b
would be (correctly) parsed as a++ + b. Parsing it
as a + ++b would violate the longest-token rule.
Things becomes even more obfiscurated in ANSI C with the 
introduction of unary plus.
-- 
uucp:   {uunet|decvax!microsoft|ucbvax!ucscc|ihnp4|amd}!sco!daniel
ARPA:   daniel@sco.COM     Inter:  daniel@saturn.ucsc.edu--------------
pingpong: a dink to the right side with lots of top spin | Disclaimed |
fishing:  flies in morning and evening, day - spinners   | as usual   |