[comp.lang.c] C comment stripper

lfoard@wpi.wpi.edu (Lawrence C Foard) (03/22/89)

I just made this C comment stripper, I tried it on it self and it works
ok. If any one finds code it pukes on tell me (there is probably still
something I missed).

----------------------cut here-------------------
/* Public domain C comment stripper created by Lawrence Foard */
#include <stdio.h>
char *a="/* this is a test 'of the emergency \" comment stripper \\ \'*/";
/* this is a'nasty\' "comment" meant / * to really confuse it"*//*\*/
int no_com()
 {
  int c; 
  static int quote=0,squote=0,slash=0;
  c=getc(stdin);
  if (slash || (c=='\\'))
   {
    slash=!slash;
    return(c);
   }
  if ((quote^=((c=='"') && !squote)) ||
      (squote^=((c=='\''/*\ and right here two \*/) && !quote)))
   return(c);
  if (c=='/') 
   if ((c=getc(stdin))!='*')
    {
     ungetc(c,stdin);
     return('/');
    }
   else
    {
     do
      while(getc(stdin)!='*');
     while(getc(stdin)!='/');
     return(no_com());
    }
   return(c);
  }  
   
main()
 {
  int c;
  while((c=no_com())!=EOF)
   fputc(c,stdout);
 }  
 -- 
Disclaimer: My school does not share my views about FORTRAN.
            FORTRAN does not share my views about my school.

rupley@arizona.edu (John Rupley) (03/24/89)

In article <1453@wpi.wpi.edu>, lfoard@wpi.wpi.edu (Lawrence C Foard) writes:
> I just made this C comment stripper, I tried it on it self and it works
> ok. If any one finds code it pukes on tell me (there is probably still
> something I missed).

It pukes ... try:

	/***/ main() {printf("hi there\n");} /* */

Score, anyone? (recent postings tested on K&R-I-syntax code)

	sed        1/1 correct
	Lex        2/2 correct
	C          2/2 wrong

Hmmm.

John Rupley
rupley!local@megaron.arizona.edu

chris@mimsy.UUCP (Chris Torek) (03/26/89)

In article <9864@megaron.arizona.edu> rupley@arizona.edu (John Rupley) writes:
>Score, anyone? (recent postings tested on K&R-I-syntax code)
>
>	sed        1/1 correct
>	Lex        2/2 correct
>	C          2/2 wrong

This sounds like a CHALLENGE!  :-)

I wrote the following working against the ten-minute spaghetti clock.
It is slightly tested, and probably works, with the exception of

	#include <some/*weird:file[syntax]>

(and unclosed comments, etc., in included files).  It is more
permissive than real C (allowing newlines in string and character
constants, and allowing infintely long character constants) but should
not get anything wrong that cpp gets right.

Of course, there are no comments in it. :-)


#include <stdio.h>

enum states { none, slash, quote, qquote, comment, cstar };

main()
{
	register int c, q = 0;
	register enum states state = none;

	while ((c = getchar()) != EOF) {
		switch (state) {
		case none:
			if (c == '"' || c == '\'') {
				state = quote;
				q = c;
			} else if (c == '/') {
				state = slash;
				continue;
			}
			break;
		case slash:
			if (c == '*') {
				state = comment;
				continue;
			}
			state = none;
			(void) putchar('/');
			break;
		case quote:
			if (c == '\\')
				state = qquote;
			else if (c == q)
				state = none;
			break;
		case qquote:
			state = quote;
			break;
		case comment:
			if (c == '*')
				state = cstar;
			continue;
		case cstar:
			if (c != '*')
				state = c == '/' ? none : comment;
			continue;
		default:
			fprintf(stderr, "impossible state %d\n", state);
			exit(1);
		}
		(void) putchar(c);
	}
	if (state != none)
		fprintf(stderr, "warning: file ended with unterminated %s\n",
			state == quote || state == qquote ?
				(q=='"' ? "string" : "character constant") :
			"comment");
	exit(0);
}
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

rupley@arizona.edu (John Rupley) (03/27/89)

> In article <16539@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> In article <9864@megaron.arizona.edu> rupley@arizona.edu (John Rupley) writes:
> >Score, anyone? (recent postings tested on K&R-I-syntax code)
> >
> >	sed        1/1 correct
> >	Lex        2/2 correct
> >	C          2/2 wrong
> 
> This sounds like a CHALLENGE!  :-)

Unclear again (sob :-).  Meant it as a comment, implying the VIRTUES of
Lex (and even sed :-) for pattern matching.  Contest-wise, your C code
is the first correct as initially posted, it runs faster than the previous
postings (after correction of the latter), and one can follow the neat
state-machine implementation at first reading.

> I wrote the following working against the ten-minute spaghetti clock.

Wow! It took me longer to test it.

For what its worth (as COMMENTARY -- please, no contest), counting new
postings, too:

      sed, awk        2/3 correct as first posted (test vs K&R-I-type code)
      Lex             2/3 
      C               1/3

Hmmm.  Conclusion?  The probability of any particular piece of code being
correct is independent of language and is a toss-up (:-)?  

But I still like Lex for this particular type of problem.

John Rupley
rupley!local@megaron.arizona.edu