lfoard@wpi.wpi.edu (Lawrence C Foard) (03/22/89)
I just made this C comment stripper, I tried it on it self and it works ok. If any one finds code it pukes on tell me (there is probably still something I missed). ----------------------cut here------------------- /* Public domain C comment stripper created by Lawrence Foard */ #include <stdio.h> char *a="/* this is a test 'of the emergency \" comment stripper \\ \'*/"; /* this is a'nasty\' "comment" meant / * to really confuse it"*//*\*/ int no_com() { int c; static int quote=0,squote=0,slash=0; c=getc(stdin); if (slash || (c=='\\')) { slash=!slash; return(c); } if ((quote^=((c=='"') && !squote)) || (squote^=((c=='\''/*\ and right here two \*/) && !quote))) return(c); if (c=='/') if ((c=getc(stdin))!='*') { ungetc(c,stdin); return('/'); } else { do while(getc(stdin)!='*'); while(getc(stdin)!='/'); return(no_com()); } return(c); } main() { int c; while((c=no_com())!=EOF) fputc(c,stdout); } -- Disclaimer: My school does not share my views about FORTRAN. FORTRAN does not share my views about my school.
rupley@arizona.edu (John Rupley) (03/24/89)
In article <1453@wpi.wpi.edu>, lfoard@wpi.wpi.edu (Lawrence C Foard) writes: > I just made this C comment stripper, I tried it on it self and it works > ok. If any one finds code it pukes on tell me (there is probably still > something I missed). It pukes ... try: /***/ main() {printf("hi there\n");} /* */ Score, anyone? (recent postings tested on K&R-I-syntax code) sed 1/1 correct Lex 2/2 correct C 2/2 wrong Hmmm. John Rupley rupley!local@megaron.arizona.edu
chris@mimsy.UUCP (Chris Torek) (03/26/89)
In article <9864@megaron.arizona.edu> rupley@arizona.edu (John Rupley) writes: >Score, anyone? (recent postings tested on K&R-I-syntax code) > > sed 1/1 correct > Lex 2/2 correct > C 2/2 wrong This sounds like a CHALLENGE! :-) I wrote the following working against the ten-minute spaghetti clock. It is slightly tested, and probably works, with the exception of #include <some/*weird:file[syntax]> (and unclosed comments, etc., in included files). It is more permissive than real C (allowing newlines in string and character constants, and allowing infintely long character constants) but should not get anything wrong that cpp gets right. Of course, there are no comments in it. :-) #include <stdio.h> enum states { none, slash, quote, qquote, comment, cstar }; main() { register int c, q = 0; register enum states state = none; while ((c = getchar()) != EOF) { switch (state) { case none: if (c == '"' || c == '\'') { state = quote; q = c; } else if (c == '/') { state = slash; continue; } break; case slash: if (c == '*') { state = comment; continue; } state = none; (void) putchar('/'); break; case quote: if (c == '\\') state = qquote; else if (c == q) state = none; break; case qquote: state = quote; break; case comment: if (c == '*') state = cstar; continue; case cstar: if (c != '*') state = c == '/' ? none : comment; continue; default: fprintf(stderr, "impossible state %d\n", state); exit(1); } (void) putchar(c); } if (state != none) fprintf(stderr, "warning: file ended with unterminated %s\n", state == quote || state == qquote ? (q=='"' ? "string" : "character constant") : "comment"); exit(0); } -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
rupley@arizona.edu (John Rupley) (03/27/89)
> In article <16539@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: > In article <9864@megaron.arizona.edu> rupley@arizona.edu (John Rupley) writes: > >Score, anyone? (recent postings tested on K&R-I-syntax code) > > > > sed 1/1 correct > > Lex 2/2 correct > > C 2/2 wrong > > This sounds like a CHALLENGE! :-) Unclear again (sob :-). Meant it as a comment, implying the VIRTUES of Lex (and even sed :-) for pattern matching. Contest-wise, your C code is the first correct as initially posted, it runs faster than the previous postings (after correction of the latter), and one can follow the neat state-machine implementation at first reading. > I wrote the following working against the ten-minute spaghetti clock. Wow! It took me longer to test it. For what its worth (as COMMENTARY -- please, no contest), counting new postings, too: sed, awk 2/3 correct as first posted (test vs K&R-I-type code) Lex 2/3 C 1/3 Hmmm. Conclusion? The probability of any particular piece of code being correct is independent of language and is a toss-up (:-)? But I still like Lex for this particular type of problem. John Rupley rupley!local@megaron.arizona.edu