[comp.lang.c] Stripping C comments: what about quotes??

krazy@claris.com (Jeff Erickson) (03/18/89)

From article <4221@omepd.UUCP>, by merlyn@intelob.intel.com (Randal L. Schwartz @ Stonehenge):
> The regexp that matches comments looks like (in egrep/lex notation):
> 
>   [/][*]([*]*[^*/])*[*]+[/]
> 
> (I use [X] here instead of \X because I hate backslashes...).
>
The problem with that expression is that is doesn't account for quotes.
For example:

	printf(foo ? "/*" : "*/");

gets turned into:

	printf(foo ? "");

if you aren't careful.  That is a correct regular expression for C comments,
but only if you assume the lack of quotes.  I'm not sure, but I don't think
you can find REAL C comments (no parts in quotes) with a regular expression
search, or a series of them.

Handle the following cases:

	printf(foo ? "/*" : "*/");
	printf("/*");   /*/ hi! /*/
	char foo[] = /*/"bar /*/"baz /*/";
	#define MYNAME "/*/"/*/"/*/"Jeff/*/"/*/"Ernie/*/"/*/"

All of these are legal under ANSI C.  Only the last is questionable under
classic C, becuase it relies on "x""y" being turned into "xy".

The last one should translate into:

	#define MYNAME "/*/" "Jeff/*/" "/*/"
or
	#define MYNAME "/*/Jeff/*//*/"

Happy decommenting!!!

-- 
Jeff Erickson     \  Internet: krazy@claris.com          AppleLink: Erickson4
Claris Corporation \      UUCP: {ames,apple,portal,sun,voder}!claris!krazy
415/960-2693        \________________________________________________________
____________________/        "I'm so heppy I'm mizzabil!" -- Krazy Kat