michael@nyit.UUCP (Michael Gwilliam) (04/04/88)
NOTE: Sorry this reply took so long, but our phone line was out for a long time. ----- Well the information is back and I've summerized the replies. In case you forgot the question it is, "Can C comments be filtered out with LEX as regular expressions?" The answer is, "Yes, but it may not be a good idea." The reasons are... o It's nearly impossible to read. o An extended comment could over flow the buffer. The correct way of doing this seems to be: You could use states, something like this (I might have the syntax a bit wrong): "/*" { BEGIN comment; } <COMMENT>. ; <COMMENT>"*/" { BEGIN 0; } The problem is that this requires you to set up states for everything, which is a pain. Here's what I did -- built my own little automata inside the action for the "/*" pattern. This is stripped out of working code. "/*" { /* Comment. */ register enum { S_STAR, S_NORMAL, S_END } S; for (S = S_NORMAL; S != S_END; ) switch (input()) { case '\0': /* Complain about premature EOF? */ S = S_END; break; case '*': S = S_STAR; break; case '/': if (S == S_STAR) { S = S_END; break; } /* FALLTHROUGH */ default: S = S_NORMAL; break; } } (credit goes to rsalz) Another method uses states. %START Normal Comment %% { BEGIN Normal; } <Normal>"/*" { ECHO; BEGIN Comment; } <Comment>"*/" { ECHO; printf("\n"); BEGIN Normal; } <Comment>\ | <Comment>[^ \t\n*]+ | <Comment>"*"/[^/] | <Comment>. | <Comment>\n { ECHO; } <Normal>. | <Normal>\n { } (credit goes to Tony Hansen) If you're hard set on doing this, a good reference seems to be... _Introduction_to_Compiler_Construction_with_Unix_, by Axel T. Schreiner and H. George Friedman, Jr., Prentice-Hall, 1985, on page 25 gives: "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/". The reason that the expression I used was accepting nexted comments is that lex tries to match the largest case. Nested comments are not regular expression so they are hopeless without writting a little C code. I never really wanted to do them anyway, I guess I just didn't make myself clear. (Besides, I'm told they're not ANSI.) Thanks for all the help from... Erik Baalbergen <mcvax!cs.vu.nl!erikb@uunet> Kjell Post <cmcl2!ida.liu.se!kpo> MH Cox <rutgers!garage.nj.att.com!mhc@gatech> R. Nigel Horspool <rutgers!uw-beaver!uvicctr!nigelh@gatech> cmcl2!gondor!psuvax1!gondor!schmidt@uiucdcs (David E. Schmidt) cmcl2!harvard!pineapple.bbn.com!rsalz harvard!gsg!gsgpyr!lew@linus (Paul Lew) harvard!ll-xn!ames!sdcsvax!sdcc6.UCSD.EDU!ix426@linus (Tom Stockfisch) sbcs!mmintl!franka@pwa-b sbcs!pegasus!hansen@cbosgd and I hope to goodness I gave proper credit to everyone. michael
djones@megatest.UUCP (Dave Jones) (04/07/88)
in article <262@nyit.UUCP>, michael@nyit.UUCP (Michael Gwilliam) says: > > > > NOTE: Sorry this reply took so long, but our phone line was out for a long > time. > > ----- > > Well the information is back and I've summerized the replies. In case > you forgot the question it is, "Can C comments be filtered out with > LEX as regular expressions?" > > The answer is, "Yes, but it may not be a good idea." > Well... I agree with the "not a good idea" part. > ... > Here's what I did -- built my own little automata inside the action ^^^^^^^^ pl. > for the "/*" pattern. This is stripped out of working code. > > ... Considered as a puzzle solution, that's cheating. Of course you can do it in C! My question is, "Is there an LR(k) grammar for C comments?" If so, show. If not, prove not. This would make a good assignment in an automata theory class. Dave /* ** ****/ Jones
djones@megatest.UUCP (Dave Jones) (04/07/88)
I just read a little further down. I forgot about the carat ^ in Lex. That makes it too easy.