[comp.lang.c] What does "a---b" mean?

adamk@mit-amt.MEDIA.MIT.EDU (Adam Kao) (01/03/90)

I've been thinking about a C interpreter and ran into this problem.
Harbison & Steele says the lexical scanner always makes the largest
token, so I assume the example from the subject line would be tokenized
as "(a--) - b", and in general "a---...---b" would come out as
"(((a--)--)...--)-b" or something (ignoring what that might mean).

But it's not clear to me that this is consistent with the precedence
and associativity rules (mainly because I have trouble understanding
them!).  Is it actually true that the lexical scanner is defined to be
"dumb" and MUST ignore syntax while tokenizing?  Then "a--b" would
always generate a compile error, right?  (Waitaminute!  This explains
that problem my friend was having . . .)

This is further complicated by the requirement that compound
assignment operators actually be considered as two separate tokens
during the scanning phase ("a - = b" is "a -= b" . . .) but the tokens
are combined into one token before being sent off to the syntax
analyzer.

More generally I'm not sure I understand how C handles ambiguity with
multiple-character tokens.  To summarize, my understanding is:

1. The lexical analyzer scans left to right making the biggest token possible.
2. Compound assignment tokens (ONLY) can be read as two consecutive
   tokens (ie separated by whitespace ONLY) that are IMMEDIATELY
   joined into one token.
3. Remaining ambiguity is resolved by simple (hah!) precedence and
   associativity rules.  All remaining ambiguities can be resolved by
   these rules, which essentially answer the questions "When do
   parentheses enclose this operator?" (precedence) and "Which
   operator does this operand go with?" (associativity).  In
   particular, the questions "Which operand does this operator go
   with?" and "What operator(s) is this?" never arise or are irrelevant.

Is this correct?  Is it complete?

Any other suggestions or "gotcha's" about C interpreters are welcome.

Thank you for your time.

Adam

chris@mimsy.umd.edu (Chris Torek) (01/03/90)

In article <1303@mit-amt.MEDIA.MIT.EDU> adamk@mit-amt.MEDIA.MIT.EDU
(Adam Kao) writes:
>To summarize, my understanding is:
>
>1. The lexical analyzer scans left to right making the biggest token
>   possible.

Yes.

>2. Compound assignment tokens (ONLY) can be read as two consecutive
>   tokens (ie separated by whitespace ONLY) that are IMMEDIATELY
>   joined into one token.

Yes, but only in Classic C, not in New C (proposed ANSI standard C).

>3. Remaining ambiguity is resolved by simple (hah!) precedence and
>   associativity rules.

More or less.  C has no nonassociative operators, so every syntactically
correct expression has a defined grouping.  The result can still be
ambiguous in another sense, e.g.,

	p += *p++;

is easy to parse but hard to describe.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris