brian@bradley.UUCP (03/14/89)
> /* Written 9:58 am Mar 9, 1989 by jrv@siemens.UUCP */ > /* ---------- "Want a way to strip comments from a" ---------- */ > Does anyone have a sed or awk script which we > can use to preprocess the C source and get rid of all the comments before > sending it to the compiler? The following works in vi: :%s/\/\*.*\*\///g I don't know if it will work in sed, but it should... ............................................................................... "Don't drop acid, take it pass-fail!" Brian Michael Wendt UUCP: {cepu,uiucdcs,noao}!bradley!brian Bradley University ARPA: cepu!bradley!brian@seas.ucla.edu (309) 677-2335 ICBM: 40 40' N 89 34' W
smk@cbnews.ATT.COM (Stephen M. Kennedy) (03/17/89)
In article <9900010@bradley> brian@bradley.UUCP writes: > The following works in vi: :%s/\/\*.*\*\///g /* * Unfortunately, multi-line comments aren't deleted. */ --- Steve Kennedy cbatt!cbosgd!smk
smk@cbnews.ATT.COM (Stephen M. Kennedy) (03/17/89)
In article <9900010@bradley> brian@bradley.UUCP writes: > The following works in vi: :%s/\/\*.*\*\///g /* And this */ important_variable = 42 /* doesn't work either! */ --- Steve Kennedy cbatt!cbosgd!smk
rkl1@hound.UUCP (K.LAUX) (03/17/89)
In article <9900010@bradley>, brian@bradley.UUCP writes: | | > /* Written 9:58 am Mar 9, 1989 by jrv@siemens.UUCP */ | > /* ---------- "Want a way to strip comments from a" ---------- */ | > Does anyone have a sed or awk script which we | > can use to preprocess the C source and get rid of all the comments before | > sending it to the compiler? | | The following works in vi: :%s/\/\*.*\*\///g | | I don't know if it will work in sed, but it should... | Yes, it will. The only problem is that it won't strip out comments that span more than one line...'Aye, There's the Rub. --rkl
leo@philmds.UUCP (Leo de Wit) (03/17/89)
In article <4896@cbnews.ATT.COM> smk@cbnews.ATT.COM (Stephen M. Kennedy) writes: |In article <9900010@bradley> brian@bradley.UUCP writes: |> The following works in vi: :%s/\/\*.*\*\///g | |/* And this */ important_variable = 42 /* doesn't work either! */ And how about: puts(" A comment /* in here */"); And you can give more examples showing it isn't that trivial; a challenge for the sed adept, perhaps ... Leo.
loo@mister-curious.sw.mcc.com (Joel Loo) (03/18/89)
In article <978@philmds.UUCP>, leo@philmds.UUCP (Leo de Wit) writes: > In article <4896@cbnews.ATT.COM> smk@cbnews.ATT.COM (Stephen M. Kennedy) writes: > |In article <9900010@bradley> brian@bradley.UUCP writes: > |> The following works in vi: :%s/\/\*.*\*\///g > | > |/* And this */ important_variable = 42 /* doesn't work either! */ > > And how about: > > puts(" A comment /* in here */"); > > And you can give more examples showing it isn't that trivial; a challenge > for the sed adept, perhaps ... > > Leo. [And a lot of previous articles on the same topic] The problem is: sed and vi do not understand C syntax. Solution: write a lex program to strip comments. The program must understand C syntax enough to know what is a comment and what is not. Encouragement: it should not be too difficult. -------------------------------------------------------------------- Joel Loo Peing Ling composed on Fri Mar 17 10:44:52 CST 1989 ---------------------------- Now: ---------------------------------- MCC | Email: loo@sw.mcc.com 3500 West Balcones Centre Dr. | Voice: (512)338-3680 (O) Austin, TX 78759 | (512)343-1780 (H)
rupley@arizona.edu (John Rupley) (03/18/89)
In article <2131@mister-curious.sw.mcc.com>, loo@mister-curious.sw.mcc.com (Joel Loo) writes: > In article <978@philmds.UUCP>, leo@philmds.UUCP (Leo de Wit) writes: > > And how about: > > puts(" A comment /* in here */"); > > And you can give more examples showing it isn't that trivial; a challenge > > for the sed adept, perhaps ... > > Leo. > [And a lot of previous articles on the same topic] > > The problem is: sed and vi do not understand C syntax. > > Solution: write a lex program to strip comments. The program must > understand C syntax enough to know what is a comment and what is not. > > Encouragement: it should not be too difficult. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It isn't. Six lines of Lex source (not counting initialization) are enough. A Lex source for ``uncomment'' has been posted in comp.sources.unix, as part of: Subject: Volume 16 (Ends January 17, 1989) identlist List identifiers and declarations for C sources Attached is a minimum test for an uncommenting algorithm, including tests for quotes inside and outside comments. John Rupley uucp: ..{uunet | ucbvax | cmcl2 | hao!ncar!noao}!arizona!rupley!local internet: rupley!local@megaron.arizona.edu (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929 ---------------------------------------------------------------------------- /* * tests for ``uncomment'' * assume C-code conventions: * strings start and end on one line * comments can be multi-line * no tests for varieties of: '"' \'"\' etc * no tests for strings with newline escaped */ string4 "hi /*\"hi there*/there\"" comment1 /*one"*/"*/ comment2 /*\"hi there"*/"*/" comment3 /*\"hi there*/ comment4 /* hello/*hello/*hello/*hello*/ comment5 /*******/ comment6 /*/*/ a /**/ b /***/ c /****/ d /*////*/ comment7 /*/*// a /**// b /***// c /****// d /*////*// 1. /*****//"hello world */" ok /"hello world */" 2. /* hello /* /* world */ ok 3. /* */ hello /* */ ok hello 4. /**// /* this should produce "/ \n" for output */ ok / 5. /* */ hello */ ok hello */ 6. /*/*/ hello ok hello 7. /*////*/ ok 8. /*//*/ ok 9. abc = "/* fake comment"; /* got who ? */ ok abc = "STRING"; 10. /* "start quote "then next line end quote, after more characters than on line 1" more more more */ " ok " ----------------------------------------------------------------------------
jeenglis@nunki.usc.edu (Joe English) (03/19/89)
leo@philmds.UUCP (Leo de Wit) writes: >In article <4896@cbnews.ATT.COM> smk@cbnews.ATT.COM (Stephen M. Kennedy) writes: >|In article <9900010@bradley> brian@bradley.UUCP writes: >|> The following works in vi: :%s/\/\*.*\*\///g >| >|/* And this */ important_variable = 42 /* doesn't work either! */ > >And how about: > > puts(" A comment /* in here */"); > >And you can give more examples showing it isn't that trivial; a challenge >for the sed adept, perhaps ... Does it *have* to be done in sed/awk/other text processor? This problem is fairly difficult to solve using regexp/editor commands, but it's a piece of cake to do in C: #include <stdio.h> void eatcomment(void); main() { int ch; int instring = 0; ch = getchar(); while (ch != EOF) { switch (ch) { case '"' : instring = !instring; break; case '/' : if (!instring) if ((ch = getchar()) == '*') { eatcomment(); ch=getchar(); } else putchar('/'); break; case '\\' : /* in case this is a \" in a string, */ putchar('\\'); /* pass it through now and don't let */ ch = getchar(); /* the switch() eat it */ } putchar(ch); ch = getchar(); } exit(0); } void eatcomment(void) { int ch; for (;;) { ch = getchar(); while (ch == '*') if ((ch = getchar()) == '/') return; if (ch == EOF) exit(1); /* oops */ } } ------------ This hasn't been tested thoroughly; it's mostly from memory. Joe English jeenglis@nunki.usc.edu
jeenglis@nunki.usc.edu (Joe English) (03/20/89)
I made a mistake in the comment-eating program I posted yesterday -- it won't handle /* something like *//* this. */ Change the line in the '/' case from: if ((ch = getchar()) == '*') { eatcomment(); ch=getchar(); } to: if ((ch = getchar()) == '*') { eatcomment(); ch=getchar(); continue; } and it will work. If anyone's interested. --Joe English jeenglis@nunki.usc.edu
rupley@arizona.edu (John Rupley) (03/20/89)
In article <3145@nunki.usc.edu>, jeenglis@nunki.usc.edu (Joe English) writes: > I made a mistake in the comment-eating program I > posted yesterday -- it won't handle > /* something like *//* this. */ > Change the line in the '/' case from: > if ((ch = getchar()) == '*') { eatcomment(); ch=getchar(); } > to: > if ((ch = getchar()) == '*') { eatcomment(); ch=getchar(); continue; } > and it will work. If anyone's interested. It still doesn't work. It won't uncomment itself. Or the following line: '"' /* hi there */ '"' Or distinguish a correct string, with escaped newlines, "hi\ /*\*/ /**/\ there" from an incorrect string without the escapes. The point is not _whether_ one can write an ``uncomment'' in C, but how, and in what language, one can do it most simply. It is certainly right to use C if uncommenting is part of a larger design, as in cpp or ctags. But if the whole aim is to uncomment, then a pattern-handling language, such as Lex, is more appropriate. A few lines of Lex source do the job, and assuming familiarity with regular expression syntax, it is easy to write and understand, and hard to get the logic wrong. It should be doable with sed or awk, but probably not as easily, because they see a file as a stream of lines rather than characters. In C, the proper setting up of the switch and flags is not trivial, as the previous posting witnesses. A Lex source for uncommenting is attached (which I hope does not belie the remark above about hard to get the logic wrong :-). John Rupley uucp: ..{uunet | ucbvax | cmcl2 | hao!ncar!noao}!arizona!rupley!local internet: rupley!local@megaron.arizona.edu -------------------------------------------------------------------- %{ /* UNCOMMENT- */ /* regexp for comment recognition based on usenet posting by: */ /* Chris Thewalt; thewalt@ritz.cive.cmu.edu */ %} STRING \"(\\\n|\\\"|[^"\n])*\" COMMENTBODY ([^*\n]|"*"+[^*/\n])* COMMENTEND ([^*\n]|"*"+[^*/\n])*"*"*"*/" QUOTECHAR \'[^\\]\'|\'\\.\'|\'\\[x0-9][0-9]*\' ESCAPEDCHAR \\. %START COMMENT %% <COMMENT>{COMMENTBODY} ; <COMMENT>{COMMENTEND} BEGIN 0; <COMMENT>.|\n ; "/*" BEGIN COMMENT; {STRING} ECHO; {QUOTECHAR} ECHO; {ESCAPEDCHAR} ECHO; .|\n ECHO; ---------------------------------------------------------------------------
leo@philmds.UUCP (Leo de Wit) (03/20/89)
In article <3114@nunki.usc.edu> jeenglis@nunki.usc.edu (Joe English) writes: | |leo@philmds.UUCP (Leo de Wit) writes: |> |> puts(" A comment /* in here */"); |> |>And you can give more examples showing it isn't that trivial; a challenge |>for the sed adept, perhaps ... | |Does it *have* to be done in sed/awk/other text processor? |This problem is fairly difficult to solve using regexp/editor |commands, but it's a piece of cake to do in C: Piece of cake? Your program can't even strip its own comments (try it)! Reason: | case '"' : | instring = !instring; | break; This is both a defect in your program, and the cause that subsequent comments aren't detected when using the source as input. After the sequence '"' instring is 1. Besides it doesn't handle multiple character char constants (e.g. '/*', though one could perhaps argue whether it should). |This hasn't been tested thoroughly; it's mostly |from memory. If your memory was ok, the program wasn't tested thoroughly 8-). Though the problem isn't difficult, it isn't so trivial as you thought it was. Leo.
dave@motto.UUCP (dave brown) (03/21/89)
I missed the original posting, so I didn't catch the exact question, but when I have needed to remove comments, I have simply passed the source through the preprocessor stage of the compiler only. Granted, this does a lot of other things, which may or may not be undesirable for your application. Some compilers, however, have options on the preprocessor which can limit the scope of the damage. If the original poster still hasn't solved his problem, he can contact me. I think we also have a quick and dirty C program which someone wrote which does the job. ----------------------------------------------------------------------------- | David C. Brown | uunet!mnetor!motto!dave | | Motorola Canada, Ltd. | 416-499-1441 ext 3708 | | Communications Division | Disclaimer: Motorola is a very big company | -----------------------------------------------------------------------------
jeenglis@nunki.usc.edu (Joe English) (03/21/89)
In article <9797@megaron.arizona.edu> you write: >It still doesn't work. It won't uncomment itself. Or the following line: > > '"' /* hi there */ '"' > Thanks -- I had a feeling I was forgetting something. I wrote an uncomment program a couple years ago (and I swear, it *did* work and it wasn't too hard to write :-) and I was trying to recall it from memory. Characters in single-quotes were the other case I forgot about -- and if I had tested the program on it's own source I would have caught that oversight. (I feel really stupid now... I think I'm going to stop posting to this newsgroup, as I have failed to say anything correct or intelligent for about a month now.) The Lex solution posted is much more elegant and simple; but since lex isn't universally available a C version is also useful... (I'm not going to try a third time, though.) --Joe English jeenglis@nunki.usc.edu
pem@zyx.SE (Per-Erik Martin) (03/22/89)
In article <983@philmds.UUCP> leo@philmds.UUCP (Leo de Wit) writes: >In article <3114@nunki.usc.edu> jeenglis@nunki.usc.edu (Joe English) writes: >| >|Does it *have* to be done in sed/awk/other text processor? >|This problem is fairly difficult to solve using regexp/editor >|commands, but it's a piece of cake to do in C: > >Piece of cake? Your program can't even strip its own comments (try it)! Here's another example in C. It *is* a piece of cake (15 minutes work). The problem can be described with a simple automata which is easily coded in in C (with goto's, >yech<). I've tested it on most of the pathological examples given in this group and it seems to work. ---------------------------------------------------------------------------- /* cstrip.c pem@zyx.SE, 1989 */ #include <stdio.h> main() { char c, c1; goto into_code; in_code: putchar(c); into_code: switch (c = (char)getchar()) { case EOF: exit(0); case '\'': goto in_char; case '"': goto in_string; case '/': c1 = c; if ((c = (char)getchar()) == '*') goto in_comment; putchar(c1); default: goto in_code; } in_char: putchar(c); switch (c = (char)getchar()) { case EOF: exit(1); case '\\': putchar(c); c = (char)getchar(); default: putchar(c); while ((c = (char)getchar()) != '\'') putchar(c); goto in_code; } in_string: putchar(c); switch (c = (char)getchar()) { case EOF: exit(1); case '"': goto in_code; case '\\': putchar(c); c = (char)getchar(); default: goto in_string; } in_comment: switch (c = (char)getchar()) { case EOF: exit(1); case '*': if ((c = (char)getchar()) == '/') goto into_code; default: goto in_comment; } } ---------------------------------------------------------------------------- -- ------------------------------------------------------------------------------- - Per-Erik Martin, ZYX Sweden AB, Bangardsgatan 13, S-753 20 Uppsala, Sweden - - Email: pem@zyx.SE - -------------------------------------------------------------------------------
Tim_CDC_Roberts@cup.portal.com (03/22/89)
You know, this discussion has brought up something that has bothered me (although not a great deal). When scanning the result of preprocessing a nontrivial C program with many include files, one finds dozens (in some cases hundreds) of blank lines. Obviously, they are the result of eliminating preprocessor directives and multiline comments. What I have always wondered is why, given the #line directive which can re-sync the preprocessor and the compiler, does the preprocessor insist on keeping all those blank lines? Why not eliminate them and issue a #line instead? Just curious. Tim_CDC_Roberts@cup.portal.com | Control Data... ...!sun!portal!cup.portal.com!tim_cdc_roberts | ...or it will control you.
rupley@arizona.edu (John Rupley) (03/22/89)
In article <852@lynx.zyx.SE>, pem@spunk.zyx.SE (Per-Erik Martin) writes: > Here's another example in C. It *is* a piece of cake (15 minutes work). > The problem can be described with a simple automata which is easily coded > in in C (with goto's, >yech<). I've tested it on most of the pathological > examples given in this group and it seems to work. This one fails, too. Try: /***/ hi there /**/ Goes to show, for a quick and clean coding of a pattern-matching automaton, think Lex. The Lex source that was posted is so simple it would be hard to get the logic wrong. Two out of two C postings suggest that it may be easier to err in coding the same automaton in C. Not to imply that C has no advantages -- following comparison is for size of source and for time of uncommenting main.c of an emacs distribution: timex/real wc -l 13.95 10 eatLex.l Lex 2.53 37 eatC.c C code that works 1:27.13 78 eat.sed Maarten L's recently posted sed script (more lines than the C code :-) :-) John Rupley rupley!local@megaron.arizona.edu
leo@philmds.UUCP (Leo de Wit) (03/22/89)
In article <852@lynx.zyx.SE> pem@spunk.zyx.SE (Per-Erik Martin) writes: |Here's another example in C. It *is* a piece of cake (15 minutes work). |The problem can be described with a simple automata which is easily coded |in in C (with goto's, >yech<). I've tested it on most of the pathological |examples given in this group and it seems to work. [] Appearances are deceptive, it won't handle trigraphs. For instance, try: ??' (trigraph for ^) and your code thinks it is in_char. What's worse, on systems where char isn't signed and EOF == -1, it will fail to see EOF (suggestion: don't use a char to compare against EOF). Another cake that is hard to digest (let alone the goto's, it was baked in only 15 minutes) 8-). Leo. P.S. What's the benefit of having a separate program strip off comments anyway?
chris@mimsy.UUCP (Chris Torek) (03/22/89)
In article <16078@cup.portal.com> Tim_CDC_Roberts@cup.portal.com writes: >When scanning the result of preprocessing a nontrivial C program with >many include files, one finds dozens (in some cases hundreds) of blank >lines. ... Why not eliminate them and issue a #line instead? Why bother? Typically there are at most a few tens in a row. It is probably faster to count 20 blank lines than to process one `#line 1234' directive. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
mnc@m10ux.UUCP (Michael Condict) (03/23/89)
In <9900010@bradley>, brian@bradley.UUCP writes: >> /* Written 9:58 am Mar 9, 1989 by jrv@siemens.UUCP */ >> Does anyone have a sed or awk script which we >> can use to preprocess the C source and get rid of all the comments before >> sending it to the compiler? > > The following works in vi: :%s/\/\*.*\*\///g > > I don't know if it will work in sed, but it should... Lest anyone actually be tempted to use such a naive method, you should be aware that it DOESN'T WORK, except for the simplest case of one comment per line and no multi-line comments. A correct sed command, which I may have posted before (forgive me) is shown below. To use it on SystemV-derived seds, you have to first delete all the comments from the sed script itself (ironically, enough!). To see all of the reasons why the simple method doesn't work, try this: Take the test C file appended after the sed script below and run it through the sed script into a file. Now run diff on the original C file and the one with comments removed. What you are looking at is all of the various ways that comments and things looking almost like comments can be intertwined in C source files. Michael Condict {att|allegra}!m10ux!mnc AT&T Bell Labs (201)582-5911 MH 3B-416 Murray Hill, NJ -------------------- Sed script to delete C comments ------------------------- # Delete comments from C source files: : delcom /\/\*/{ # Change first comment delim to @ (after eliminating existing @'s): s/@/<Used#to%be+an-At>/g s:/\*:@: # Read until we have the end comment: : morecm /\*\//!{ # Just to cut down on max buffer length: s/@.*/@/ N b morecm } # Get rid of any $'s: s/\$/<Used#to%be+a-Dollar>/g # First occurrence of */ is guaranteed to be the corresponding end # comment, because it is otherwise not legal C, so: s:\*/:$: s/@[^$]*\$/ / # Restore $'s and @'s: s/<Used#to%be+a-Dollar>/$/g s/<Used#to%be+an-At>/@/g b delcom } ------------------------ The test C program ---------------------------------- #define APAP\ 37 # /*hi*/ define GOO(x) y char *abc = "hi \"Joe\""; /* this is * a comment */ struct A_S { int wopper /**** a *** b *** c *//*again*/ ; }; int f (x, /* a * in a comment */ yoohoo) /**/ /* a /* b */ char *yoohoo; { int a, b, c = '\''; char * quote="h#w \ #bo{ut @hat?"; a = b /*oops*/*c; /****************/ } enum goober {a,b}; struct A_S *george(x) struct {int x; float y;} x; { return 0; } typedef int bar; struct A_S * * george2(moo, x, glop, foo) struct { int q[13]; float y;} x[]; bar moo , *foo[]; struct A_S *glop; /*a*/{ return 0; } /* Try various combinations of register arg decls:*/ flop(a_1, b) register a_1; { return 0; } struct BB {int f,g;} floop(a_1, b_1) register char *a_1; float register*b_1; { struct BB j; return j;} /* Test arg names that are substrings of one another: */ char sub1(abc, abcdef) int* abcdef; float abc; { return 0; } ----------------------------------------------------------------------------- -- Michael Condict {att|allegra}!m10ux!mnc AT&T Bell Labs (201)582-5911 MH 3B-416 Murray Hill, NJ
bph@buengc.BU.EDU (Blair P. Houghton) (03/23/89)
In article <16492@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >In article <16078@cup.portal.com> Tim_CDC_Roberts@cup.portal.com writes: >>When scanning the result of preprocessing a nontrivial C program with >>many include files, one finds dozens (in some cases hundreds) of blank >>lines. ... Why not eliminate them and issue a #line instead? > >Why bother? Typically there are at most a few tens in a row. It is >probably faster to count 20 blank lines than to process one >`#line 1234' directive. Howsabout 'cat -s file.c | whatever' or just 'more -s file.c' ? --Blair "What is the sound of one Usener posting...many times?"
ftw@masscomp.UUCP (Farrell Woods) (03/23/89)
In article <9833@megaron.arizona.edu> rupley@arizona.edu (John Rupley) writes: >This one fails, too. Try: > > /***/ hi there /**/ Shouldn't it be a requirement that the program to be stripped at least compile? This example will generate a syntax error. -- Farrell T. Woods Voice: (508) 392-2471 Concurrent Computer Corporation Domain: ftw@masscomp.com 1 Technology Way uucp: {backbones}!masscomp!ftw Westford, MA 01886 OS/2: Half an operating system
pem@zyx.SE (Per-Erik Martin) (03/24/89)
In article <9833@megaron.arizona.edu> rupley@arizona.edu (John Rupley) writes: > >This one fails, too. Try: > > /***/ hi there /**/ > Oops! Well, if you change the '*'-case in 'in_comment:' to this: do { if ((c = (char)getchar()) == '/') goto into_code; } while (c == '*'); it should work better. (Funny no one found the other bug yet... What do you expect after 15 minutes? ;-) >Goes to show, for a quick and clean coding of a pattern-matching >automaton, think Lex. The Lex source that was posted is so simple it >would be hard to get the logic wrong. Two out of two C postings suggest >that it may be easier to err in coding the same automaton in C. > >Not to imply that C has no advantages -- following comparison is for >size of source and for time of uncommenting main.c of an emacs distribution: > >[...timings...] Another advantage with C is that it's portable outside the Unix universe... -- ------------------------------------------------------------------------------- - Per-Erik Martin, ZYX Sweden AB, Bangardsgatan 13, S-753 20 Uppsala, Sweden - - Email: pem@zyx.SE - -------------------------------------------------------------------------------
pem@zyx.SE (Per-Erik Martin) (03/24/89)
In article <987@philmds.UUCP> leo@philmds.UUCP (Leo de Wit) writes: > >Appearances are deceptive, it won't handle trigraphs. For instance, try: >??' (trigraph for ^) and your code thinks it is in_char. > >What's worse, on systems where char isn't signed and EOF == -1, it will >fail to see EOF (suggestion: don't use a char to compare against EOF). > I simply didn't include trigraphs in the automaton and I'm well aware of the problem with EOF. The point I tried to make was that it's possible to solve a problem like that in, for example, C in a reasonable time, instead of using sed-scripts or lex (which is of no use outside the unix-world anyway). If you really want a comment stripper you can easily add trigraphs, handle EOF, etc. > >P.S. What's the benefit of having a separate program strip off comments anyway? Good question. None, as far as I know... -- ------------------------------------------------------------------------------- - Per-Erik Martin, ZYX Sweden AB, Bangardsgatan 13, S-753 20 Uppsala, Sweden - - Email: pem@zyx.SE - -------------------------------------------------------------------------------
Tim_CDC_Roberts@cup.portal.com (03/24/89)
I hereby revoke my suggestion that the preprocessor should suppress blank lines and use #line instead. In a typically homocentric fashion, I neglected to realize that even though it is more difficult for *ME* to read a preprocessor output with many blank lines, it is trivially easy for the compiler lexical analyzer to ignore them, since a "blank line" is only one byte long. Thanks to those who pointed this out. Tim_CDC_Roberts@cup.portal.com | Control Data... ...!sun!portal!cup.portal.com!tim_cdc_roberts | ...or it will control you.
rupley@arizona.edu (John Rupley) (03/24/89)
In article <1179@masscomp.UUCP>, ftw@masscomp.UUCP (Farrell Woods) writes: >In article <9833@megaron.arizona.edu> rupley@arizona.edu (John Rupley) writes: >>This one fails, too. Try: >> /***/ hi there /**/ > >Shouldn't it be a requirement that the program to be stripped at least compile? >This example will generate a syntax error. Aw, c'mon... be imaginative... replace "hi there" by a proper statement or whatever: /***/ main() {printf("hi there\n");} /**/ Cpp strips the comments (properly) and passes the program text. The buggy C code, which was being discussed in the previous posting, strips everything. Both of the earlier Lex postings do it right, which would seem to be the take-home lesson. John Rupley rupley!local@megaron.arizona.edu
daveb@gonzo.UUCP (Dave Brower) (03/24/89)
In article <16492@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >In article <16078@cup.portal.com> Tim_CDC_Roberts@cup.portal.com writes: >>When scanning the result of preprocessing a nontrivial C program with >>many include files, one finds dozens (in some cases hundreds) of blank >>lines. ... Why not eliminate them and issue a #line instead? > >Why bother? Typically there are at most a few tens in a row. It is >probably faster to count 20 blank lines than to process one >`#line 1234' directive. Yup, true enough for compilation. It is sort of annoying tough when you need to look at the intermediate file to figure something out. So, I offer this week's challenge: Smallest program that will take "blank line" style cpp output on stdin and send to stdout a scrunched version with appropriate #line directives. [f]lex, Yacc, [na]awk, sed, perl, c, c++ are all acceptable. This will be an amusing excercise in typical text massaging that can be enlightening for many people. Is this branching out of comp.lang.c? Where should it go? -dB -- "I came here for an argument." "Oh. This is getting hit on the head" {sun,mtxinu,amdahl,hoptoad}!rtech!gonzo!daveb daveb@gonzo.uucp
mnc@m10ux.UUCP (Michael Condict) (03/24/89)
Oops, the previous lex script I posted for deleting comments from C source code is incorrect -- it doesn't recognize: /***...**/ Here is a better one (simpler, too): %% \"([^\\"]*\\(.|\n))*[^\\"]*\" ECHO; "/*"([^*]|"*"+[^/*])*"*"*"*/" ; . ECHO; Okay, I promise to stop now. (Unless there is a bug in this one.) -- Michael Condict {att|allegra}!m10ux!mnc AT&T Bell Labs (201)582-5911 MH 3B-416 Murray Hill, NJ
bill@twwells.uucp (T. William Wells) (03/26/89)
In article <9797@megaron.arizona.edu> rupley@arizona.edu (John Rupley) writes:
: A Lex source for uncommenting is attached (which I hope does not belie
: the remark above about hard to get the logic wrong :-).
Try it on a very long comment. You might discover an overflowed lex
buffer. On the other hand, this shouldn't be too hard to fix. Just do
for the comment what you did for the noncommented text.
---
Bill { uunet | novavax } !twwells!bill
(BTW, I'm may be looking for a new job sometime in the next few
months. If you know of a good one where I can be based in South
Florida do send me e-mail.)
rupley@arizona.edu (John Rupley) (03/26/89)
> In article <620@gonzo.UUCP>, daveb@gonzo.UUCP (Dave Brower) writes: > So, I offer this week's challenge: Smallest program that will take > "blank line" style cpp output on stdin and send to stdout a scrunched > version with appropriate #line directives. [f]lex, Yacc, [na]awk, sed, > perl, c, c++ are all acceptable. This will be an amusing excercise in > typical text massaging that can be enlightening for many people. "Scrunching" is probably a matter of taste, with regard to the format of the ouput. So I am not sure what you, yourself, want. But below is a guess. Lex, of course. May not be portable, but it should work with minor mods on other Unices. Should be easy to modify for different output format. John Rupley rupley!local@megaron.arizona.edu %{ /*---------------------------start of text---------------------------*/ /*- * SCRUNCH.l * * Scrunch cpp output. * In-Reply-To: daveb@gonzo.UUCP (Dave Brower) * Message-ID: <620@gonzo.UUCP> #comp.lang.c * * Compress runs of "#" lines and blank lines, or runs of two or more * blank lines: * (\n*# lineno "file"\n+)* or \n\n\n+ * into a single line: * #line lineno "file"\n * which is output before the next line of program text * (corresponding to line "lineno" of the source "file"). * The values of "lineno" and "file" are adjusted for changes in * source resulting from #include statements. * Lines with whitespace are not considered blank and are passed. * * Compilation: * lex scrunch.l * cc -O lex.yy.c -ll -o scrunch * * Minimally tested with UNIX sys5r2 cpp only, as follows: * (a) /lib/cpp -Dprocessor=1 lex.yy.c >scruch.cpp #specify your processor * scrunch <scrunch.cpp >scrunch.cpp.c * cc -O scrunch.cpp.c -ll * cmp -l a.out scrunch #should give date/name diffs only * (b) compare line numbers in scrunch.cpp.c with lex.yy.c and scrunch.cpp * (no differences stood out) * * Possible bugs: * escaped newlines in macros. * ???? * * John Rupley * rupley!local@megaron.arizona.edu */ %} char file[BUFSIZ]; POUND #[ ]+[0-9]+[ ]+\".*$ TEXT [^#\n].*$ %START POUND TEXT %% <INITIAL>. {unput(yytext[0]); BEGIN TEXT;} <POUND>{POUND} sscanf(yytext, "# %d %s", &yylineno, &file[0]); <POUND>{TEXT} {printf("#line %d %s\n", yylineno-1, file); ECHO; BEGIN TEXT;} <POUND>\n ; <TEXT>{POUND} {sscanf(yytext, "# %d %s", &yylineno, &file[0]); BEGIN POUND;} <TEXT>\n{3,} {printf("\n"); BEGIN POUND;} <TEXT>{TEXT}|\n ECHO; . printf("\nERROR: file %s, line %d, char 0x%x=%c\n", file, yylineno, (unsigned int) yytext[0], yytext[0]); %% /*----------------------------end of text-------------------------------*/
rupley@arizona.edu (John Rupley) (03/26/89)
In article <893@m10ux.UUCP>, mnc@m10ux.UUCP (Michael Condict) writes: > Oops, the previous lex script I posted for deleting comments from > C source code is incorrect -- it doesn't recognize: /***...**/ > Here is a better one (simpler, too): > > %% > \"([^\\"]*\\(.|\n))*[^\\"]*\" ECHO; > "/*"([^*]|"*"+[^/*])*"*"*"*/" ; > . ECHO; You indeed fixed the /***/ error, but two errors remain. First, no handling of single-quoted double quotes: main() {printf("%c\n", '"');/*gotcha*/printf("%c\n", '"');} Second, your program crashes when uncommenting a real source file, with a sizeable change history or whatever inside a comment. You need at least one state change, so a comment can be matched line-by-line, and so not overflow a Lex buffer. Both previous Lex postings did it right. A third state, to handle quoted strings line-by-line, is perhaps optional, and the previous postings differ here. Apparently you missed the previous Lex postings, which I will be happy to email you on request. My argument, that it's difficult to make a logical error in coding this problem in Lex, has now been demonstrated wrong (sob :-). But at least Lex is still outscoring straight C (faint praise :-?). John Rupley rupley!local@megaron.arizona.edu
danw@tekchips.LABS.TEK.COM (Daniel E. Wilson) (03/27/89)
Why is everyone obsessed with stripping comments from their C programs. Is this some new programming trend? :) Dan Wilson
rupley@arizona.edu (John Rupley) (03/28/89)
In article <795@twwells.uucp>, bill@twwells.uucp (T. William Wells) writes: > In article <9797@megaron.arizona.edu> rupley@arizona.edu (John Rupley) writes: > : A Lex source for uncommenting is attached (which I hope does not belie > : the remark above about hard to get the logic wrong :-). > > Try it on a very long comment. You might discover an overflowed lex > buffer. On the other hand, this shouldn't be too hard to fix. Just do > for the comment what you did for the noncommented text. Nope.... no problem.... comments are thrown away line-by-line, by design, so that very long comments indeed do not blow the buffer. A very long string, however, will overflow the buffer, but clearly this is understood, and it can be viewed as a feature, although idiosyncratic, as noted in <9888@megaron.arizona.edu>. If you want to handle strings differently, add another start condition (state) begun by '"' and make explicit start condition 0 = <INITIAL>, or change the size of the match buffer (yytext[]) by including in the definitions: %{ #define YYLMAX 5000 /* or whatever */ %} John Rupley rupley!local@megaron.arizona.edu
stil@nikhefh.hep.nl (Gertjan Stil) (03/29/89)
In article <4895@cbnews.ATT.COM> smk@cbnews.ATT.COM (Stephen M. Kennedy) writes: >In article <9900010@bradley> brian@bradley.UUCP writes: >> The following works in vi: :%s/\/\*.*\*\///g > >/* > * Unfortunately, multi-line comments aren't deleted. > */ What about the following command in vi: :%s/\/\*[.|\n]*\*\///g This will work for multi-line comments. Gertjan Stil <no signature yet>
maw@auc.UUCP (Michael A. Walker) (03/30/89)
In article <9887@megaron.arizona.edu>, rupley@arizona.edu (John Rupley) writes: > > > In article <620@gonzo.UUCP>, daveb@gonzo.UUCP (Dave Brower) writes: > > So, I offer this week's challenge: Smallest program that will take > > "blank line" style cpp output on stdin and send to stdout a scrunched > > version with appropriate #line directives. [f]lex, Yacc, [na]awk, sed, > > perl, c, c++ are all acceptable. This will be an amusing excercise in > > typical text massaging that can be enlightening for many people. > > "Scrunching" is probably a matter of taste, with regard to the format > of the ouput. I don't know what is ment by the term scrunching, but here is my entry to the problem of removing comments in a C program. YACCR (Yet Another C Comment Remover :-) is a crazy looking lex specification that removes C comments from a source file. It also does not put out a lot of extra blank lines that cpp does. I have tested on most styles of C comments that I have seen and it seems to work, but PLEASE no flames if it doesn't!!!! In an earlier message, someone address the problem of a yytext overflow. YACCR redefines the YYLMAX constant as 500, but you can test it with other values. To use: 1. Save message in file called yaccr.l and edit this file to unwanted text. 2. Type: lex yaccr.l 3. Type: cc lex.yy.c -ll -lyaccr It should then be ready to go. Good luck. ---mike EMAIL: ...!gatech!auc!rambro!maw --------------------------------cut here-------------------------- %{ /* ** Specification: YACCR ** Description : YACCR removes comments from C programs. */ #define CR 0x0d #ifdef YYLMAX #undef YYLMAX #define YYLMAX 500 #endif %} %% "/*""*"*("/*"*|[^*/]|[^*]"/"|"*"[^/])*"*"*"*/" putchar(CR); . printf("%s",yytext); --------------------------------cut here--------------------------