820785gm@aucs.UUCP (09/21/87)
Well, I wasn't sure where to post this, but I'm sure someone on this group will know the answer. My question refers to the regular grammer of lex. For recognizing comments (ie (*...*) ) I have this horrible long statement that works. However, I have a much nicer one that I don't uunderstand why it doesn't work.(Nobody here can seem to tell me why either). Here is the short lex program: %% "(*"([^*]|("*"/[^)]))*"*)" printf("&%s&\n", yytext); but it doesn't work. Would somebody mind explaining why ? The part that really confuses me is shown in this run : (*andrew*) macleod (*was*) here &(*andrew*& ) macleod &(*was*& ) here (**)kk &(**& it recognizes the grammer here, but it doesn't )kk accept the last ')'. (****) Anyway, can anybody help me. I'm sure it must be simple. &(****& ) (***) *) Oh yeah, its under 4.3BSD. &(***) *& ) Thanks a lot, ANDROID.
rsalz@bbn.com (Richard Salz) (09/22/87)
In comp.lang.c (<443@aucs.UUCP>), 820785gm@aucs.UUCP (ANDREW MACLEOD) writes: ] how do I write a lex regexp that recognizes comments (* .... *) ? You're probably better off doing what I did. This is from "real-life" code: /* State of our comment automata. */ typedef enum { S_STAR, S_NORMAL, S_END } STATE; "/*" { /* Comment. */ register STATE S; for (S = S_NORMAL; S != S_END; ) switch (input()) { case '\0': S = S_END; break; case '*': S = S_STAR; break; case '/': if (S == S_STAR) { S = S_END; break; } /* FALLTHROUGH */ default: S = S_NORMAL; break; } } -- For comp.sources.unix stuff, mail to sources@uunet.uu.net.
vern%lbl-helios@LBL-RTSG.ARPA (Vern Paxson) (09/22/87)
Regarding the lex pattern:
"(*"([^*]|("*"/[^)]))*"*)" printf("&%s&\n", yytext);
^
|
Lex does not handle trailing context properly inside of ()'s.
Essentially you can only have one instance of trailing context in a
pattern, and it splits the pattern into two halves, the
stuff-to-be-matched and the stuff-that-must-follow. This limitation is
well rooted in the overall DFA approach lex uses for pattern matching,
so don't expect it to go away. The limitations mentioned by Esmond
Pitt in comp.unix.wizards (that ^ and $ are not matched inside ()'s)
have a similar origin.
Vern
Vern Paxson vern@lbl-csam.arpa
Real Time Systems ucbvax!lbl-csam.arpa!vern
Lawrence Berkeley Laboratory (415) 486-6411
wsmith@uiucdcsb.cs.uiuc.edu (09/23/87)
This works well: nostarp ([^*]|\*[^)]) %% \(\*{nostarp}*\*\) return(PCOMMENT); /* normal comment */ \(\*{nostarp}* return(BADCOMMENT); /* unterminated comment */ Bill Smith wsmith@a.cs.uiuc.edu ihnp4!uiucdcs!wsmith
chris@mimsy.UUCP (Chris Torek) (09/25/87)
In article <443@aucs.UUCP> 820785gm@aucs.UUCP (ANDREW MACLEOD) writes: >... For recognizing comments (ie (*...*) ) I have this horrible long >statement that works. However, I have a much nicer one that I don't >understand why it doesn't work. ... Here is the short lex program: > >"(*"([^*]|("*"/[^)]))*"*)" printf("&%s&\n", yytext); >(*andrew*) macleod (*was*) here >&(*andrew*& >) macleod &(*was*& >) here Your `/' (right context) causes lex to back up one character after matching a string that contains one `*)'. In general, right context does not work; it should be used sparingly, if at all. The expression \(\*(\*[^)]|[^*])*\*+\) should work. You should be aware that this stores the entire comment in the array `yytext'---whether it fits or not. Comments can easily overrun lex's token buffer. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris