schmidt@zola.ics.uci.edu (Doug Schmidt) (07/10/89)
In their book ``Introduction to Compiler Construction with UNIX,'' Schreiner and Friedman provide the following LEX regular expression for recognizing C comments: ---------------------------------------- "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/" ---------------------------------------- Ignoring the possibility of overflowing an internal LEX buffer, does anyone know of any legal C comments that fail to match with this regular expression? thanks, Doug -- Master Swordsman speak of humility; | schmidt@ics.uci.edu (ARPA) Philosophers speak of truth; | office: (714) 856-4034 Saints and wisemen speak of the Tao of no doubt; The moon, sun, and sea speaks for itself. -- Hiroshi Hamada
merlyn@iwarp.intel.com (Randal Schwartz) (07/11/89)
In article <19365@paris.ics.uci.edu>, schmidt@zola (Doug Schmidt) writes: | In their book ``Introduction to Compiler Construction with UNIX,'' | Schreiner and Friedman provide the following LEX regular expression | for recognizing C comments: | | ---------------------------------------- | "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/" | ---------------------------------------- | | Ignoring the possibility of overflowing an internal LEX buffer, | does anyone know of any legal C comments that fail to match | with this regular expression? Try: /***/ (This one was easy. :-) The problem with this one is that "star" "not-slash" can match "star" "star" where the second star is sometimes the first star of the terminating "star" "slash" This is a common problem with these regex's. Now come on people, is this one of those "20 most commonly asked questions in comp.lang.c"? And who has the right answer? (Not me.... I screwed up last time... :-) Just another C hacker, -- /== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\ | on contract to Intel, Hillsboro, Oregon, USA | | merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn | \== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/
kearns@read.columbia.edu (Steve Kearns) (07/11/89)
A previous article claimed that >| ---------------------------------------- >| "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/" >| ---------------------------------------- does not match > /***/ >(This one was easy. :-) A guess it was not so easy. "/***/" does indeed match the above regular expression: "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/" 0 0 1 Under each star I have listed how many times it iterates to match "/***/". -steve
greg@uop.EDU (Greg Onufer) (07/12/89)
kearns@read.columbia.edu (Steve Kearns) writes: >A previous article claimed that >>| ---------------------------------------- >>| "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/" >>| ---------------------------------------- I believe that "/*""/"*([^*/]|[^*]"/"|"/*"|"*"[^/])*"*"*"*/" +++++ will work better. Try "/* * /* */" with the previous regex... I could not claim to know whether or not it is a legal comment, but the previous regex DOES pass "/* /* */". Cheers!greg
tps@chem.ucsd.edu (Tom Stockfisch) (07/12/89)
In article <19365@paris.ics.uci.edu> schmidt@zola.ics.uci.edu (Doug Schmidt) writes: >In their book ``Introduction to Compiler Construction with UNIX,'' >Schreiner and Friedman provide the following LEX regular expression >for recognizing C comments: >"/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/" This expression fails on each of the following: /*****//hello world */ /* hello /* /* world */ So, who has the shortest single LEX expression that correctly matches C comments -- ignoring string and character constants, and disallowing start conditions? Mine is "/*"\/*([^/]|{[^*/]\/+})*"*/" -- || Tom Stockfisch, UCSD Chemistry tps@chem.ucsd.edu
flynn@anyguay.acm.rpi.edu (Kevin Lincoln Flynn) (07/12/89)
In article <502@chem.ucsd.EDU> tps@chem.ucsd.edu (Tom Stockfisch) writes: >This expression fails on each of the following: > > /*****//hello world */ > Please correct me if I'm wrong, but I don't believe this is a legal C comment. /*****/ should be a complete comment... /hello world */ is not part of it. - Flynn Kevin Lincoln Flynn flynn@anyguay.acm.rpi.edu, userfwvl@mts.rpi.edu 2151 12th Street H (518) 273-6914 W (518) 447-8561 Troy, NY 12180 ...Argue for your limitations, and sure enough they're yours.
tps@chem.ucsd.edu (Tom Stockfisch) (07/12/89)
In article <5939@rpi.edu> flynn@anyguay.acm.rpi.edu (Kevin Lincoln Flynn) writes: >In article <502@chem.ucsd.EDU> tps@chem.ucsd.edu (Tom Stockfisch) writes: >>This expression fails on each of the following: >> /*****//hello world */ > Please correct me if I'm wrong, but I don't believe this is a legal C >comment. /*****/ should be a complete comment... /hello world */ is not >part of it. Precisely. The given pattern, "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/" considers /hello world */ to be part of the /*****/ comment. I presume the original idea was to have lex pick legal comments out of C source code. Given arbitrary preceding and following text, /*****//hello world */ is certainly legal C. -- || Tom Stockfisch, UCSD Chemistry tps@chem.ucsd.edu
mccanne@cory.Berkeley.EDU (Steven McCanne) (07/12/89)
In article <502@chem.ucsd.EDU> tps@chem.ucsd.edu (Tom Stockfisch) writes: >So, who has the shortest single LEX expression that correctly >matches C comments -- >Mine is > "/*"\/*([^/]|{[^*/]\/+})*"*/" How about: "/*"([^*]|\*+[^/*])*\*+\/ Steve