adamk@mit-amt.MEDIA.MIT.EDU (Adam Kao) (01/03/90)
I've been thinking about a C interpreter and ran into this problem. Harbison & Steele says the lexical scanner always makes the largest token, so I assume the example from the subject line would be tokenized as "(a--) - b", and in general "a---...---b" would come out as "(((a--)--)...--)-b" or something (ignoring what that might mean). But it's not clear to me that this is consistent with the precedence and associativity rules (mainly because I have trouble understanding them!). Is it actually true that the lexical scanner is defined to be "dumb" and MUST ignore syntax while tokenizing? Then "a--b" would always generate a compile error, right? (Waitaminute! This explains that problem my friend was having . . .) This is further complicated by the requirement that compound assignment operators actually be considered as two separate tokens during the scanning phase ("a - = b" is "a -= b" . . .) but the tokens are combined into one token before being sent off to the syntax analyzer. More generally I'm not sure I understand how C handles ambiguity with multiple-character tokens. To summarize, my understanding is: 1. The lexical analyzer scans left to right making the biggest token possible. 2. Compound assignment tokens (ONLY) can be read as two consecutive tokens (ie separated by whitespace ONLY) that are IMMEDIATELY joined into one token. 3. Remaining ambiguity is resolved by simple (hah!) precedence and associativity rules. All remaining ambiguities can be resolved by these rules, which essentially answer the questions "When do parentheses enclose this operator?" (precedence) and "Which operator does this operand go with?" (associativity). In particular, the questions "Which operand does this operator go with?" and "What operator(s) is this?" never arise or are irrelevant. Is this correct? Is it complete? Any other suggestions or "gotcha's" about C interpreters are welcome. Thank you for your time. Adam
chris@mimsy.umd.edu (Chris Torek) (01/03/90)
In article <1303@mit-amt.MEDIA.MIT.EDU> adamk@mit-amt.MEDIA.MIT.EDU (Adam Kao) writes: >To summarize, my understanding is: > >1. The lexical analyzer scans left to right making the biggest token > possible. Yes. >2. Compound assignment tokens (ONLY) can be read as two consecutive > tokens (ie separated by whitespace ONLY) that are IMMEDIATELY > joined into one token. Yes, but only in Classic C, not in New C (proposed ANSI standard C). >3. Remaining ambiguity is resolved by simple (hah!) precedence and > associativity rules. More or less. C has no nonassociative operators, so every syntactically correct expression has a defined grouping. The result can still be ambiguous in another sense, e.g., p += *p++; is easy to parse but hard to describe. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris