[comp.lang.c++] C++ Comments

sdm@cs.brown.edu (Scott Meyers) (05/22/89)

Consider the following C++ source line:

    //**********************

How should this be treated by the C++ compiler?  The GNU g++ compiler
treats this as a comment-to-EOL followed by a bunch of asterisks, but the
AT&T compiler treats it as a slash followed by an open-comment delimiter.
I want the former interpretation, and I can't find anything in Stroustrup's
book which indicates that any other interpretation is to be expected.

Actually, compiling -E quickly shows that the culprit is the preprocessor,
so my questions are:
    1.  Is this a bug in the AT&T preprocessor?  If not, why not?  If so,
    	will it be fixed in 2.0, or are we stuck with it?
    2.  Is it a bug in the GNU preprocessor?  If so, why?

Scott Meyers
sdm@cs.brown.edu

chapman@eris.berkeley.edu (Brent Chapman) (05/23/89)

In article <6957@brunix.UUCP> sdm@cs.brown.edu (Scott Meyers) writes:
>Consider the following C++ source line:
>
>    //**********************
>
>How should this be treated by the C++ compiler?  The GNU g++ compiler
>treats this as a comment-to-EOL followed by a bunch of asterisks, but the
>AT&T compiler treats it as a slash followed by an open-comment delimiter.
>I want the former interpretation, and I can't find anything in Stroustrup's
>book which indicates that any other interpretation is to be expected.

I'm new to C++, but C has always used a "greedy" parsing algorithm (that is,
it always takes the longest possible next token); I don't know why C++ would
do otherwise.  From K&R, p. 179:

    If the input stream has been parsed into tokens up to a given character,
    the next token is taken to include the longest string of characters which
    could possibly constitute a token.

g++ is doing the right thing, and the AT&T compiler is wrong.


-Brent
--
Brent Chapman					Capital Market Technology, Inc.
Computer Operations Manager			1995 University Ave., Suite 390
{lll-tis,ucbvax!cogsci}!capmkt!brent		Berkeley, CA  94704
capmkt!brent@{lll-tis.arpa,cogsci.berkeley.edu} Phone: 415/540-6400

ark@alice.UUCP (Andrew Koenig) (05/23/89)

In article <6957@brunix.UUCP>, sdm@cs.brown.edu (Scott Meyers) writes:

> Consider the following C++ source line:

>     //**********************

> How should this be treated by the C++ compiler?

It's a // comment followed by a bunch of *'s.

However, many C preprocessors strip comments out of your
program and also don't recognize C++ comments.  Thus by the
time the C++ compiler sees this, it looks like this:

	/

AT&T does not supply a preprocessor with its C++ translator,
any more than it supplies a linker, assembler, or C compiler.
It's up to whoever ports C++ to deal with the preprocessor.
-- 
				--Andrew Koenig
				  ark@europa.att.com

hansen@pegasus.ATT.COM (Tony L. Hansen) (05/23/89)

<>Consider the following C++ source line:
<>
<>    //**********************
<>
<>How should this be treated by the C++ compiler?  The GNU g++ compiler
<>treats this as a comment-to-EOL followed by a bunch of asterisks, but the
<>AT&T compiler treats it as a slash followed by an open-comment delimiter.
<>I want the former interpretation, and I can't find anything in Stroustrup's
<>book which indicates that any other interpretation is to be expected.
<
<I'm new to C++, but C has always used a "greedy" parsing algorithm (that is,
<it always takes the longest possible next token); I don't know why C++ would
<do otherwise.  From K&R, p. 179:
<
<    If the input stream has been parsed into tokens up to a given character,
<    the next token is taken to include the longest string of characters which
<    could possibly constitute a token.
<
<g++ is doing the right thing, and the AT&T compiler is wrong.

Actually, the problem is with the C preprocessor being used with the cfront
compiler, not with the AT&T compiler. The G++ preprocessor deals with //
comments. Whatever vendor supplied your port of cfront should have also used
a preprocessor which understands // comments.

					Tony Hansen
				att!pegasus!hansen, attmail!tony
				    hansen@pegasus.att.com

diamond@diamond.csl.sony.junet (Norman Diamond) (05/23/89)

In article <6957@brunix.UUCP> sdm@cs.brown.edu (Scott Meyers) writes:

>Consider:  //**********************

>The GNU g++ compiler
>treats this as a comment-to-EOL followed by a bunch of asterisks, but the
>AT&T compiler treats it as a slash followed by an open-comment delimiter.
>I want the former interpretation,

Greedy lexing suggests that you should get what you want.  Carelessness
(imprecision) in specifying grammars permits the AT&T interpretation,
but it really should not be allowed, especially if they still can't
parse a+++++b.

--
Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.co.jp@relay.cs.net)
  The above opinions are my own.   |  Why are programmers criticized for
  If they're also your opinions,   |  re-implementing the wheel, when car
  you're infringing my copyright.  |  manufacturers are praised for it?

skinner@saturn.ucsc.edu (Robert Skinner) (05/23/89)

In article <2900@pegasus.ATT.COM>, hansen@pegasus.ATT.COM (Tony L. Hansen) writes:
> <>Consider the following C++ source line:
> <>
> <>    //**********************
> <>
> <>How should this be treated by the C++ compiler?  The GNU g++ compiler
> <
> <    If the input stream has been parsed into tokens up to a given character,
> <    the next token is taken to include the longest string of characters which
> <    could possibly constitute a token.
> <
> <g++ is doing the right thing, and the AT&T compiler is wrong.
> 
> Actually, the problem is with the C preprocessor being used with the cfront
> compiler, not with the AT&T compiler. The G++ preprocessor deals with //
> comments. Whatever vendor supplied your port of cfront should have also used
> a preprocessor which understands // comments.

the AT&T compiler uses /lib/cpp, which knows nothing about the //
comment.  This is great for portability, every C compiler has a 
preprocessor.

Unfortunately, it is a royal PAIN when using a package like 
curses that uses lots #define macros.  You can't put the name of 
the macro in a // comment, e.g.
	// this routine uses move
or
	// move the object down

without getting an argument mismatch error from cpp.

Such is the price we pay for portability and 
building on previous tools.

Robert 
skinner@saturn.ucsc.edu

shap@polya.Stanford.EDU (Jonathan S. Shapiro) (05/23/89)

In article <24700@agate.BERKELEY.EDU> chapman@eris.berkeley.edu (Brent Chapman) writes:
>In article <6957@brunix.UUCP> sdm@cs.brown.edu (Scott Meyers) writes:
>>Consider the following C++ source line:
>>
>>    //**********************
>>
>>How should this be treated by the C++ compiler?

According to the C++ lang. definition, this is a single comment
extending to the end of the line.  Remember, however, that a
translator-based implementation applies the C preprocessor first,
which sees the /*... and eliminates it before the compiler gets a shot
at the input.

Moral of the story is: don't do this.  Whether it's right or wrong, it
isn't portable.

Jon

hansen@pegasus.ATT.COM (Tony L. Hansen) (05/23/89)

< the AT&T compiler uses /lib/cpp, which knows nothing about the // comment.

No it doesn't! The AT&T compiler, as sold, is a shell script (CC), a
compilation pass (cfront), some post-compilation pass programs (patch and
munch), and some libraries. It is up to the vendor who buys AT&T's compiler
to add in the preprocessor, C compiler and linker, and to modify CC
accordingly to find those pieces. It sounds like your vendor chose to use
/lib/cpp.

					Tony Hansen
				att!pegasus!hansen, attmail!tony
				    hansen@pegasus.att.com

nichols@cbnewsc.ATT.COM (robert.k.nichols) (05/24/89)

In article <9383@alice.UUCP> ark@alice.UUCP (Andrew Koenig) writes:
|In article <6957@brunix.UUCP>, sdm@cs.brown.edu (Scott Meyers) writes:
|> Consider the following C++ source line:
|>     //**********************
|> How should this be treated by the C++ compiler?
|
|It's a // comment followed by a bunch of *'s.
|
|However, many C preprocessors strip comments out of your
|program and also don't recognize C++ comments.  Thus by the
|time the C++ compiler sees this, it looks like this:
|
|	/

If cpp is invoked with the "-C" option it will leave comments as is,
which should solve problems like the above.  This won't solve problems
with // comments in macro definitions, though.
-- 
.sig included at no extra charge.          |  Disclaimer: My mind is my own.
Cute quote: `` ''                          |
>> Bob Nichols   nichols@iexist.att.com << |

easterb@ucscb.UCSC.EDU (William K. Karwin) (05/24/89)

In article <6957@brunix.UUCP> sdm@cs.brown.edu (Scott Meyers) writes:
>Consider the following C++ source line:
>
>    //**********************
>
>How should this be treated by the C++ compiler?  The GNU g++ compiler
>treats this as a comment-to-EOL followed by a bunch of asterisks, but the
>AT&T compiler treats it as a slash followed by an open-comment delimiter.

Some students ran into this problem, and the "macros-expanded-even-
though-they're-in-comments" problem this school term, in a class
using C++.  We think one way to solve it is to have in a makefile:

.c.o:
	@sed s/\\/\\/.\*// $< > $*.C
	CC $(CFLAGS) -c $*.C
	@/bin/rm -f $*.C

The sed command strips // comments and all characters following on a
line.  We are using the .c suffix for our C++ code files.

William Karwin, ...ucbvax!ucscc!ucscb!easterb

ark@alice.UUCP (Andrew Koenig) (05/24/89)

In article <7636@saturn.ucsc.edu>, easterb@ucscb.UCSC.EDU (William K. Karwin) writes:

> The sed command strips // comments and all characters following on a
> line.  We are using the .c suffix for our C++ code files.

What will it do with this?

	a = b /* *// c;
-- 
				--Andrew Koenig
				  ark@europa.att.com

gsf@ulysses.homer.nj.att.com (Glenn Fowler[drew]) (05/24/89)

In article <958@cbnewsc.ATT.COM>, nichols@cbnewsc.ATT.COM (robert.k.nichols) writes:
> > > Consider the following C++ source line:
> > >     //**********************
> If cpp is invoked with the "-C" option it will leave comments as is,
> which should solve problems like the above.  This won't solve problems
> with // comments in macro definitions, though.

even with -C, the lines following a //*** may be treated as a single comment:

	#define X Y
	//*****
	X	/* X is not expanded */

to get this right the // must be recognized by each component of the C++
compilation system
-- 
Glenn Fowler    (201)-582-2195    AT&T Bell Laboratories, Murray Hill, NJ
uucp: {att,decvax,ucbvax}!ulysses!gsf       internet: gsf@ulysses.att.com

jima@hplsla.HP.COM (Jim Adcock) (05/25/89)

> AT&T does not supply a preprocessor with its C++ translator,
> any more than it supplies a linker, assembler, or C compiler.
> It's up to whoever ports C++ to deal with the preprocessor.
> -- 
> 				--Andrew Koenig

So are preprocessor commands to be considered part of C++
*the language*, or not?

cline@sunshine.ece.clarkson.edu (Marshall Cline) (05/25/89)

In article <7636@saturn.ucsc.edu> easterb@ucscb.UCSC.EDU (William K. Karwin) writes:
>Summary: one of many possible fixes
>Some students ran into this problem, and the "macros-expanded-even-
>though-they're-in-comments" problem this school term, in a class
>using C++.  We think one way to solve it is to have in a makefile:
>.c.o:	   @sed s/\\/\\/.\*// $< > $*.C
>	   CC $(CFLAGS) -c $*.C
>	   @/bin/rm -f $*.C
>The sed command strips // comments and all characters following on a
>line.  We are using the .c suffix for our C++ code files.
>William Karwin, ...ucbvax!ucscc!ucscb!easterb

As you said, this is _one_ of _many_ fixes.  But it should be pointed
out that "sed" is ignorant of the appropriate language constructs.
Thus a printf which is supposed to print the string constant
	"double slash (//) starts a C++ comment"
would be bashed into
	"double slash (
which would undoubtedly cause numerous syntax errors.

Any regular expression parser (like "sed") is limited to regular languages.
Even a push-down-automata (recognizing _context_free_languages_) is
insufficient.  The only correct "fix" is then a context *sensitive* language
recognizer, which is nearly as complex as a Turing Machine.  In other words,
somebody's gonna have to buckle down and write a "c++pp" (like Gnu apparently
has done).

Marshall

--
	________________________________________________________________
	Marshall P. Cline	ARPA:	cline@sun.soe.clarkson.edu
	ECE Department		UseNet:	uunet!sun.soe.clarkson.edu!cline
	Clarkson University	BitNet:	BH0W@CLUTX
	Potsdam, NY  13676	AT&T:	(315) 268-6591

jmm@eci386.uucp (John Macdonald) (05/26/89)

In article <7636@saturn.ucsc.edu> easterb@ucscb.UCSC.EDU (William K. Karwin) writes:
|In article <6957@brunix.UUCP> sdm@cs.brown.edu (Scott Meyers) writes:
|>Consider the following C++ source line:
|>
|>    //**********************
|>
|
|...         We think one way to solve it is to have in a makefile:
|
|.c.o:
|	@sed s/\\/\\/.\*// $< > $*.C
|	CC $(CFLAGS) -c $*.C
|	@/bin/rm -f $*.C

This will cause rare and therefore surprising problems whenever a program
has a string containing // (for example, generator programs for: ed scripts
(or any other generating programs that use pattern matches, sed, perl, ...);
JCL (did I really admit that I thought of that example?); checks for doubled
slashes in pathnames generated by concatenating a bunch of strings).