[net.lang.c] Other 'significant' blanks

kpmartin@watmath.UUCP (Kevin Martin) (12/09/84)

Should a compiler allow blanks between the 'op' and the '=' in an
'op=' operation? e.g. should
   a | = 4;
be legal?
What about other multi-character operators (like ++)? In this case,
blanks are to be used to prevent ambiguity, i.e.
 a - -- b    and     a -- - b    are both un-ambiguous. But if a blank
is allowed between the two '-'s, these expressions both mean the same
things (plural).

There seems to be no good reason to allow such a blank, and for certain
operators (such as --) allowing a blank would create (more) ambiguity.
Perhaps the compiler which allows such blanks should just be called buggy.
                      Kevin Martin, UofW Software Development Group

keesan@bbncca.ARPA (Morris Keesan) (12/10/84)

--------------------------------
> From: kpmartin@watmath.UUCP (Kevin Martin)
> Subject: Other 'significant' blanks
> Should a compiler allow blanks between the 'op' and the '=' in an
> 'op=' operation? e.g. should
>    a | = 4;
> be legal?
> What about other multi-character operators (like ++)?
> .
> .
> .
> There seems to be no good reason to allow such a blank, and for certain
> operators (such as --) allowing a blank would create (more) ambiguity.
> Perhaps the compiler which allows such blanks should just be called buggy.
>                       Kevin Martin, UofW Software Development Group

From the C Reference Manual:

------
    2. Lexical conventions
	There are six classes of tokens: identifiers, keywords, constants,
    strings, operators, and other separators. . . . [White space is] ignored
    except as [it] serves to separate tokens.

    7.2 Unary operators
	[ ++ and -- are referred to as "operators"]

    7.14 Assignment operators
	. . . The two parts of a compound assignment operator are separate
    tokens.
------

    From this I read that tokens may not have spaces in them, operators are
tokens, ++ and -- are operators, and therefore ++ and -- may not have spaces
in them.  The same reasoning would apply to compound assignment operators
(e.g. +=), except for the explicit (and somewhat mystifying) exception in
7.14.  Several months ago I noticed that our C compiler allowed blanks in the
compound assignment operators, and I was about to fix this when I came across
the exception.  I see no reason why the exception should be there, and if I
were specifying the language from scratch, I wouldn't put in the exception,
but any compiler which claims to compile K&R C is buggy if it doesn't accept
white space in compound assignment operators.


-- 
			    Morris M. Keesan
			    {decvax,linus,ihnp4,wivax,wjh12,ima}!bbncca!keesan
			    keesan @ BBN-UNIX.ARPA

henry@utzoo.UUCP (Henry Spencer) (12/11/84)

> Should a compiler allow blanks between the 'op' and the '=' in an
> 'op=' operation? ...

K&R actually says explicitly that (e.g.) "+=" is two tokens; hence
space between them is allowable (section 7.14).  However, practically
no C compilers other than the original Ritchie compiler have done it
this way.  One reason for this is that it makes the language non-LALR(1),
so all the yacc-based parsers croak.  The current ANSI C draft (12 Nov)
says that "+=" is one token.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

donn@utah-gr.UUCP (Donn Seeley) (12/13/84)

From: Henry Spencer (henry@utzoo.UUCP):

	K&R actually says explicitly that (e.g.) "+=" is two tokens;
	hence space between them is allowable (section 7.14).  However,
	practically no C compilers other than the original Ritchie
	compiler have done it this way.  One reason for this is that it
	makes the language non-LALR(1), so all the yacc-based parsers
	croak. ...

It may be the case that YACC parsers can't correctly handle '+=' as two
tokens, but it's incorrect to say that they don't try to.  The PCC
(which has a YACC parser and is the basis of many specific C compilers)
considers '+=' to be two tokens and avoids problems most of the time by
using the rule 'shift when there is a shift-reduce conflict.' Last time
I checked on our 4.2 BSD PCC, there were 7 shift-reduce conflicts...
Precedence is also used to resolve conflicts.  (I should note that
'=+', the old style assignment operator, IS considered a single token
by the PCC.) Apart from the fact that the PCC allows space between the
arithmetic operator and the equals sign, you can also see the
'two-token' effect by comparing the following two statements:

	a && b += c;
	a && b  = c;

The first statement is parsed as 'a && (b += c)' because the PCC sees a
'+' as the token following the 'b', and a '+' is higher in precedence
than '&&', so it shifts; the 'two-token' effect prevents the compiler
from noticing that the precedence of an assignment operator is lower
than '&&'.  The second statement parses as '(a && b) = c' and earns an
error because 'a && b' is not an lvalue.

Donn Seeley    University of Utah CS Dept    donn@utah-cs.arpa
40 46' 6"N 111 50' 34"W    (801) 581-5668    decvax!utah-cs!donn

draves@harvard.ARPA (Richard Draves) (12/14/84)

> > Should a compiler allow blanks between the 'op' and the '=' in an
> > 'op=' operation? ...
> 
> K&R actually says explicitly that (e.g.) "+=" is two tokens; hence
> space between them is allowable (section 7.14).  However, practically
> no C compilers other than the original Ritchie compiler have done it
> this way.  One reason for this is that it makes the language non-LALR(1),
> so all the yacc-based parsers croak.  The current ANSI C draft (12 Nov)
> says that "+=" is one token.
> -- 
> 				Henry Spencer @ U of Toronto Zoology
> 				{allegra,ihnp4,linus,decvax}!utzoo!henry

Why not have the lexxer recognize the double tokens?  For instance,
one rule for lex might be "+"[ \t\n]*"=", etc.

Rich