[comp.lang.c] regex for C comments

schmidt@zola.ics.uci.edu (Doug Schmidt) (07/10/89)

In their book ``Introduction to Compiler Construction with UNIX,''
Schreiner and Friedman provide the following LEX regular expression
for recognizing C comments:

----------------------------------------
"/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/"
----------------------------------------

Ignoring the possibility of overflowing an internal LEX buffer,
does anyone know of any legal C comments that fail to match
with this regular expression?

thanks,

  Doug
--
Master Swordsman speak of humility;             | schmidt@ics.uci.edu (ARPA)
Philosophers speak of truth;                    | office: (714) 856-4034
Saints and wisemen speak of the Tao of no doubt;
The moon, sun, and sea speaks for itself. -- Hiroshi Hamada

merlyn@iwarp.intel.com (Randal Schwartz) (07/11/89)

In article <19365@paris.ics.uci.edu>, schmidt@zola (Doug Schmidt) writes:
| In their book ``Introduction to Compiler Construction with UNIX,''
| Schreiner and Friedman provide the following LEX regular expression
| for recognizing C comments:
| 
| ----------------------------------------
| "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/"
| ----------------------------------------
| 
| Ignoring the possibility of overflowing an internal LEX buffer,
| does anyone know of any legal C comments that fail to match
| with this regular expression?

Try:
  /***/

(This one was easy. :-)
The problem with this one is that
  "star" "not-slash"
can match
  "star" "star"
where the second star is sometimes the first star of the terminating
  "star" "slash"

This is a common problem with these regex's.  Now come on people,
is this one of those "20 most commonly asked questions in comp.lang.c"?

And who has the right answer?  (Not me.... I screwed up last time... :-)

Just another C hacker,
-- 
/== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\
| on contract to Intel, Hillsboro, Oregon, USA                           |
| merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn	         |
\== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/

kearns@read.columbia.edu (Steve Kearns) (07/11/89)

A previous article claimed that 
>| ----------------------------------------
>| "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/"
>| ----------------------------------------

does not match 

>  /***/
>(This one was easy. :-)

A guess it was not so easy.  "/***/" does indeed match the 
above regular expression:

"/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/"
       0                       0   1

Under each star I have listed how many times it iterates to match
"/***/".  

-steve

greg@uop.EDU (Greg Onufer) (07/12/89)

kearns@read.columbia.edu (Steve Kearns) writes:

>A previous article claimed that 
>>| ----------------------------------------
>>| "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/"
>>| ----------------------------------------

I believe that 

"/*""/"*([^*/]|[^*]"/"|"/*"|"*"[^/])*"*"*"*/"
                       +++++
will work better.

Try   "/* * /* */"  with the previous regex...
I could not claim to know whether or not it is a legal comment, but
the previous regex DOES pass "/* /* */".

Cheers!greg

tps@chem.ucsd.edu (Tom Stockfisch) (07/12/89)

In article <19365@paris.ics.uci.edu> schmidt@zola.ics.uci.edu (Doug Schmidt) writes:
>In their book ``Introduction to Compiler Construction with UNIX,''
>Schreiner and Friedman provide the following LEX regular expression
>for recognizing C comments:
>"/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/"

This expression fails on each of the following:

	/*****//hello world */

	/* hello /* /* world */

So, who has the shortest single LEX expression that correctly
matches C comments --
ignoring string and character constants,
and disallowing start conditions?

Mine is

	"/*"\/*([^/]|{[^*/]\/+})*"*/"
-- 

|| Tom Stockfisch, UCSD Chemistry	tps@chem.ucsd.edu

flynn@anyguay.acm.rpi.edu (Kevin Lincoln Flynn) (07/12/89)

In article <502@chem.ucsd.EDU> tps@chem.ucsd.edu (Tom Stockfisch) writes:
>This expression fails on each of the following:
>
>	/*****//hello world */
>
  Please correct me if I'm wrong, but I don't believe this is a legal C
comment.  /*****/ should be a complete comment...  /hello world */ is not
part of it.  
 
  - Flynn

Kevin Lincoln Flynn    flynn@anyguay.acm.rpi.edu, userfwvl@mts.rpi.edu
2151 12th Street       H (518) 273-6914  W (518) 447-8561
Troy, NY  12180        
...Argue for your limitations, and sure enough they're yours.

tps@chem.ucsd.edu (Tom Stockfisch) (07/12/89)

In article <5939@rpi.edu> flynn@anyguay.acm.rpi.edu (Kevin Lincoln Flynn) writes:
>In article <502@chem.ucsd.EDU> tps@chem.ucsd.edu (Tom Stockfisch) writes:
>>This expression fails on each of the following:
>>	/*****//hello world */
>  Please correct me if I'm wrong, but I don't believe this is a legal C
>comment.  /*****/ should be a complete comment...  /hello world */ is not
>part of it.  

Precisely.  The given pattern,

	"/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/"

considers

	/hello world */

to be part of the

	/*****/

comment.  I presume the original idea was to have lex pick legal comments
out of C source code.  Given arbitrary preceding and following text,

	/*****//hello world */

is certainly legal C.
-- 

|| Tom Stockfisch, UCSD Chemistry	tps@chem.ucsd.edu

mccanne@cory.Berkeley.EDU (Steven McCanne) (07/12/89)

In article <502@chem.ucsd.EDU> tps@chem.ucsd.edu (Tom Stockfisch) writes:
>So, who has the shortest single LEX expression that correctly
>matches C comments --
>Mine is
>	"/*"\/*([^/]|{[^*/]\/+})*"*/"

How about:

	"/*"([^*]|\*+[^/*])*\*+\/

Steve