merlyn@intelob.intel.com (Randal L. Schwartz @ Stonehenge) (03/17/89)
In article <9900010@bradley>, brian@bradley writes: | > Does anyone have a sed or awk script which we | > can use to preprocess the C source and get rid of all the comments before | > sending it to the compiler? | | The following works in vi: :%s/\/\*.*\*\///g | | I don't know if it will work in sed, but it should... Nope. Just try it on the line: foo; bar; /* comment1 */ bletch; /* comment2 */ 'bletch;' disappears with the comments. The regexp that matches comments looks like (in egrep/lex notation): [/][*]([*]*[^*/])*[*]+[/] (I use [X] here instead of \X because I hate backslashes...). Sed and vi are not powerful enough to eat things like this in one regexp. Didn't we just go through this about nine months ago? :-) (And didn't I give the wrong answer at least twice? :-) :-) -- Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 on contract to BiiN (for now :-), Hillsboro, Oregon, USA. ARPA: <@intel-iwarp.arpa:merlyn@intelob> (fastest!) MX-Internet: <merlyn@intelob.intel.com> UUCP: ...[!uunet]!tektronix!biin!merlyn Standard disclaimer: I *am* my employer! Cute quote: "Welcome to Oregon... home of the California Raisins!"
leo@philmds.UUCP (Leo de Wit) (03/18/89)
In article <4221@omepd.UUCP> merlyn@intelob.intel.com (Randal L. Schwartz @ Stonehenge) writes: [] |Nope. Just try it on the line: | | foo; bar; /* comment1 */ bletch; /* comment2 */ | |'bletch;' disappears with the comments. | |The regexp that matches comments looks like (in egrep/lex notation): | | [/][*]([*]*[^*/])*[*]+[/] | |(I use [X] here instead of \X because I hate backslashes...). | |Sed and vi are not powerful enough to eat things like this in one |regexp. Sed is often underestimated; it IS powerful enough: s/\/\*\([^*/]*\*\)\1*\///g will eat the comments away just nicely (I'll leave the HOW as an exercise for the reader). Leo.
leo@philmds.UUCP (Leo de Wit) (03/18/89)
In article <981@philmds.UUCP> leo@philmds.UUCP (Leo de Wit) writes: |Sed is often underestimated; it IS powerful enough: | |s/\/\*\([^*/]*\*\)\1*\///g | |will eat the comments away just nicely (I'll leave the HOW as an |exercise for the reader). Shame on me. That it works is merely a coincidence (put a * in a comment and see it fail). \1 matches the previous string matching a \( \) expression, not the expression itself. And since sed doesn't like \( \)* type expressions, this would be hard to do in one regexpr. Can it be proven to be impossible (that is, deleting the comments with one sed command - multi-line comments not considered) ? Leo.
tps@chem.ucsd.edu (Tom Stockfisch) (03/23/89)
In article <4221@omepd.UUCP> merlyn@intelob.intel.com (Randal L. Schwartz @ Stonehenge) writes: >| >Does anyone have a sed or awk script which we >| > can use to preprocess the C source and get rid of all the comments >| The following works in vi: :%s/\/\*.*\*\///g >Nope. Just try it on the line: > foo; bar; /* comment1 */ bletch; /* comment2 */ >'bletch;' disappears with the comments. >The regexp that matches comments looks like (in egrep/lex notation): > [/][*]([*]*[^*/])*[*]+[/] >Didn't we just go through this about nine months ago? :-) >(And didn't I give the wrong answer at least twice? :-) :-) You still don't have it right, I'm afraid. This pattern won't work on / /* / */ It is unbelievable how hard this task is in regular expressions, when it is trivial to code by hand. To convince yourself that a pattern is correct, I think you have to show two things 1. That the body between the "/*" and "*/" cannot possibly contain a "*/", 2. That the body can contain any other sequence of characters. Various other patterns which have been posted (including ones by famous net gurus) have failed correctly to match the following: 1. /*****//hello world */ 2. /* hello /* /* world */ 3. /* */ hello /* */ 4. /**// /* this input should produce "/ \n" for output */ 5. /* */ hello */ So what works? I haven't been able to crack this one, which also correctly ignores comments in strings and character constants. If you want a practical program, use start states and don't match an entire comment with one pattern -- you won't be in danger of overflowing yytext[]. If you want to see how it's done with regular expressions, study the following. /* lex program that strips comments */ okslash ([^*/]"/"+) %% "/*""/"*([^/]|{okslash})*"*/" ; \"((\\(.|\n))|[^\\"])*\" ECHO; \'((\\(.|\n))|[^\\'])*\' ECHO; .|\n ECHO; -- || Tom Stockfisch, UCSD Chemistry tps@chem.ucsd.edu
lfoard@wpi.wpi.edu (Lawrence C Foard) (03/24/89)
I tried the comment stripper I poster earlier today on these pathological cases and it seems to get the right answer. Script started on Fri Mar 24 01:56:10 1989 % cat tmp3.tmp Commented should be / /* / */ # / /*****//hello world */ # /hello world */ /* hello /* /* world */ # /* */ hello /* */ # hello /**// /* this input should produce "/ \n" for output */ # / /* */ hello */ # hello */ % ../tmp/a.out <tmp3.tmp Commented should be / # / /hello world */ # /hello world */ # hello # hello / # / hello */ # hello */ % ^D script done on Fri Mar 24 01:56:36 1989 Now the only question is did I parse it right? -- Disclaimer: My school does not share my views about FORTRAN. FORTRAN does not share my views about my school.
rupley@arizona.edu (John Rupley) (03/25/89)
In article <1492@wpi.wpi.edu>, lfoard@wpi.wpi.edu (Lawrence C Foard) writes: > I tried the comment stripper I poster earlier today on these pathological > cases and it seems to get the right answer. Close, but no cigar. We're talking real pathology, here.... try: (echo '/*';yes '*//*';echo 'cosmetic */') | stripper_name Recursion blows the stack for your program. Previously posted strippers handle the above. If you insist on a compilable file, use a script to produce: /* [stack-blowing number of lines of *//*] */ compilable program text Why strip comments? (1) the original poster had a broken compiler that choked on comments; (2) the start of a cheap way to get a list or inverted index of identifiers (cpp does too much). I suspect all useful points (and more? :-) have been made about comment stripping -- perhaps this thread should die now. John Rupley rupley!local@megaron.arizona.edu