[comp.lang.c] lex grammer for C comments

michael@nyit.UUCP (Michael Gwilliam) (04/04/88)

NOTE: Sorry this reply took so long, but our phone line was out for a long
time.

-----

Well the information is back and I've summerized the replies.  In case
you forgot the question it is,  "Can C comments be filtered out with
LEX as regular expressions?"

The answer is, "Yes, but it may not be a good idea."

The reasons are...

o	It's nearly impossible to read.

o	An extended comment could over flow the buffer.


The correct way of doing this seems to be:


You could use states, something like this (I might have the syntax
a bit wrong):  
	"/*"		{ BEGIN comment; }
	<COMMENT>.	;
	<COMMENT>"*/"	{ BEGIN 0; }
The problem is that this requires you to set up states for everything,
which is a pain.

Here's what I did -- built my own little automata inside the action
for the "/*" pattern.  This is stripped out of working code.

"/*"	        {
		    /* Comment. */
		    register enum { S_STAR, S_NORMAL, S_END } S;

		    for (S = S_NORMAL; S != S_END; )
			switch (input()) {
			    case '\0':
				/* Complain about premature EOF? */
				S = S_END;
				break;
			    case '*':
				S = S_STAR;
				break;
			    case '/':
				if (S == S_STAR) {
				    S = S_END;
				    break;
				}
				/* FALLTHROUGH */
			    default:
				S = S_NORMAL;
				break;
			}
		}
(credit goes to rsalz)

Another method uses states.


%START Normal Comment
%%
					{ BEGIN Normal; }
<Normal>"/*"				{ ECHO; BEGIN Comment; }
<Comment>"*/"				{ ECHO; printf("\n"); BEGIN Normal; }
<Comment>\				|
<Comment>[^ \t\n*]+			|
<Comment>"*"/[^/]			|
<Comment>.				|
<Comment>\n				{ ECHO; }
<Normal>.				|
<Normal>\n				{ }

(credit goes to Tony Hansen)

If you're hard set on doing this, a good reference seems to be...

_Introduction_to_Compiler_Construction_with_Unix_, by Axel T. Schreiner and
H. George Friedman, Jr., Prentice-Hall, 1985, on page 25 gives:

	"/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/".

The reason that the expression I used was accepting nexted comments
is that lex tries to match the largest case.

Nested comments are not regular expression so they are hopeless without
writting a little C code.  I never really wanted to do them anyway, I guess
I just didn't make myself clear.  (Besides, I'm told they're not ANSI.)


Thanks for all the help from...

Erik Baalbergen <mcvax!cs.vu.nl!erikb@uunet>
Kjell Post <cmcl2!ida.liu.se!kpo>
MH Cox <rutgers!garage.nj.att.com!mhc@gatech>
R. Nigel Horspool <rutgers!uw-beaver!uvicctr!nigelh@gatech>
cmcl2!gondor!psuvax1!gondor!schmidt@uiucdcs (David E. Schmidt)
cmcl2!harvard!pineapple.bbn.com!rsalz
harvard!gsg!gsgpyr!lew@linus (Paul Lew)
harvard!ll-xn!ames!sdcsvax!sdcc6.UCSD.EDU!ix426@linus (Tom Stockfisch)
sbcs!mmintl!franka@pwa-b
sbcs!pegasus!hansen@cbosgd

and I hope to goodness I gave proper credit to everyone.

michael

djones@megatest.UUCP (Dave Jones) (04/07/88)

in article <262@nyit.UUCP>, michael@nyit.UUCP (Michael Gwilliam) says:
> 
> 
> 
> NOTE: Sorry this reply took so long, but our phone line was out for a long
> time.
> 
> -----
> 
> Well the information is back and I've summerized the replies.  In case
> you forgot the question it is,  "Can C comments be filtered out with
> LEX as regular expressions?"
> 
> The answer is, "Yes, but it may not be a good idea."
> 

   Well... I agree with the "not a good idea" part.

> ...
> Here's what I did -- built my own little automata inside the action
				           ^^^^^^^^ pl.
> for the "/*" pattern.  This is stripped out of working code.
> 
> ...


Considered as a puzzle solution, that's cheating.   Of course you
can do it in C!  My question is, "Is there an LR(k) grammar for C comments?"
If so, show. If not, prove not.  This would make a good assignment in an
automata theory class.
 


	Dave /* ** ****/ Jones

djones@megatest.UUCP (Dave Jones) (04/07/88)

I just read a little further down.  I forgot about the carat ^ in Lex.
That makes it too easy.