michael@nyit.UUCP (Michael Gwilliam) (04/04/88)
NOTE: Sorry this reply took so long, but our phone line was out for a long
time.
-----
Well the information is back and I've summerized the replies. In case
you forgot the question it is, "Can C comments be filtered out with
LEX as regular expressions?"
The answer is, "Yes, but it may not be a good idea."
The reasons are...
o It's nearly impossible to read.
o An extended comment could over flow the buffer.
The correct way of doing this seems to be:
You could use states, something like this (I might have the syntax
a bit wrong):
"/*" { BEGIN comment; }
<COMMENT>. ;
<COMMENT>"*/" { BEGIN 0; }
The problem is that this requires you to set up states for everything,
which is a pain.
Here's what I did -- built my own little automata inside the action
for the "/*" pattern. This is stripped out of working code.
"/*" {
/* Comment. */
register enum { S_STAR, S_NORMAL, S_END } S;
for (S = S_NORMAL; S != S_END; )
switch (input()) {
case '\0':
/* Complain about premature EOF? */
S = S_END;
break;
case '*':
S = S_STAR;
break;
case '/':
if (S == S_STAR) {
S = S_END;
break;
}
/* FALLTHROUGH */
default:
S = S_NORMAL;
break;
}
}
(credit goes to rsalz)
Another method uses states.
%START Normal Comment
%%
{ BEGIN Normal; }
<Normal>"/*" { ECHO; BEGIN Comment; }
<Comment>"*/" { ECHO; printf("\n"); BEGIN Normal; }
<Comment>\ |
<Comment>[^ \t\n*]+ |
<Comment>"*"/[^/] |
<Comment>. |
<Comment>\n { ECHO; }
<Normal>. |
<Normal>\n { }
(credit goes to Tony Hansen)
If you're hard set on doing this, a good reference seems to be...
_Introduction_to_Compiler_Construction_with_Unix_, by Axel T. Schreiner and
H. George Friedman, Jr., Prentice-Hall, 1985, on page 25 gives:
"/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/".
The reason that the expression I used was accepting nexted comments
is that lex tries to match the largest case.
Nested comments are not regular expression so they are hopeless without
writting a little C code. I never really wanted to do them anyway, I guess
I just didn't make myself clear. (Besides, I'm told they're not ANSI.)
Thanks for all the help from...
Erik Baalbergen <mcvax!cs.vu.nl!erikb@uunet>
Kjell Post <cmcl2!ida.liu.se!kpo>
MH Cox <rutgers!garage.nj.att.com!mhc@gatech>
R. Nigel Horspool <rutgers!uw-beaver!uvicctr!nigelh@gatech>
cmcl2!gondor!psuvax1!gondor!schmidt@uiucdcs (David E. Schmidt)
cmcl2!harvard!pineapple.bbn.com!rsalz
harvard!gsg!gsgpyr!lew@linus (Paul Lew)
harvard!ll-xn!ames!sdcsvax!sdcc6.UCSD.EDU!ix426@linus (Tom Stockfisch)
sbcs!mmintl!franka@pwa-b
sbcs!pegasus!hansen@cbosgd
and I hope to goodness I gave proper credit to everyone.
michaeldjones@megatest.UUCP (Dave Jones) (04/07/88)
in article <262@nyit.UUCP>, michael@nyit.UUCP (Michael Gwilliam) says: > > > > NOTE: Sorry this reply took so long, but our phone line was out for a long > time. > > ----- > > Well the information is back and I've summerized the replies. In case > you forgot the question it is, "Can C comments be filtered out with > LEX as regular expressions?" > > The answer is, "Yes, but it may not be a good idea." > Well... I agree with the "not a good idea" part. > ... > Here's what I did -- built my own little automata inside the action ^^^^^^^^ pl. > for the "/*" pattern. This is stripped out of working code. > > ... Considered as a puzzle solution, that's cheating. Of course you can do it in C! My question is, "Is there an LR(k) grammar for C comments?" If so, show. If not, prove not. This would make a good assignment in an automata theory class. Dave /* ** ****/ Jones
djones@megatest.UUCP (Dave Jones) (04/07/88)
I just read a little further down. I forgot about the carat ^ in Lex. That makes it too easy.