[comp.sys.sun] C comment remover

mukul@hi-csc.honeywell.com (Mukul Agrawal) (12/09/89)

I need a program that removes all the comments from a C program.  I use a
Sun 3/50 running OS4.0.3. This is needed because I need to count the
source lines of code without comments.

Is there a program in any of the usenet archives or elsewhere that does
this?  Are there any options to the other utilities ( such as indent,
cxref or cpp) that will do this for me.

Please mail me the replies.

Thanks in advance.
-- Mukul

moraes@cs.toronto.edu (Mark Moraes) (12/22/89)

In Sun-Spots-Digest Volume 8, Issue 219, Mukul Agrawal asks for a tool to
remove C comments.

The following sed script, posted by maart@cs.vu.nl (Maarten Litmaath) to
comp.unix.wizards seems to remove comments cleanly and correctly.

X_From: maart@cs.vu.nl (Maarten Litmaath)
X_Subject: Sed wins! It IS possible to strip C comments with 1 sed command!

leo@philmds.UUCP (Leo de Wit) writes:
\Can it be proven to be impossible (that is, deleting the comments
\with one sed command - multi-line comments not considered) ?

No, because the script below WILL do it. It won't touch "/*...*/" inside
strings. Multi-line comments ARE considered and handled OK.
One can either use "sed -f script" or "sed -n '<contents of script>'".
After the script some test input follows (an awful but valid C program).
Spoiler: the sequence

	H
	x
	s/\n\(.\).*/\1/
	x
	s/.//

deletes the first character of the pattern space and appends it to the hold
space; this space contains the characters not to be deleted.

----------8<----------8<----------8<----------8<----------8<----------
#n

: loop
/^$/{
	x
	p
	n
	b loop
}
/^"/{
	: double
	/^$/{
		x
		p
		n
		b double
	}
	H
	x
	s/\n\(.\).*/\1/
	x
	s/.//
	/^"/b break
	/^\\/{
		H
		x
		s/\n\(.\).*/\1/
		x
		s/.//
	}
	b double
}
/^'/{
	: single
	/^$/{
		x
		p
		n
		b single
	}
	H
	x
	s/\n\(.\).*/\1/
	x
	s/.//
	/^'/b break
	/^\\/{
		H
		x
		s/\n\(.\).*/\1/
		x
		s/.//
	}
	b single
}
/^\\/{
	H
	x
	s/\n\(.\).*/\1/
	x
	b break
}
/^\/\*/{
	s/.//
	: comment
	s/.//
	/^$/n
	/^*\//{
		s/..//
		b loop
	}
	b comment
}
: break
H
x
s/\n\(.\).*/\1/
x
s/.//
b loop

----------8<----------8<----------8<----------8<----------8<----------

main()
{
	/* this
	 * is
	   a comment
	 */
	char /* Z /* Z / Z * Z /*/ *s = "/*", /* Z /* Z / Z * Z **/ c = '*',
		d = '/', f = '\\', g = '\'',

		*q = "*/", *p = "\
/* these characters are\
 inside a string \"\\\
*/";
	int	i = 12 / 2 * 3;

	exit(0);
}

maart@cs.vu.nl (Maarten Litmaath) (01/10/90)

In article <4041@brazos.Rice.edu> moraes@cs.toronto.edu (Mark Moraes)
posted a (slow!) C comment remover I once wrote for fun in sed.  Sed *is*
a good tool to remove C comments, but it's clearer (and faster!) to divide
the task into 3 parts, each having its own sed script.  Furthermore there
was a small bug in the original script: it replaced each comment by *zero*
spaces instead of *one* space (easily fixed).  Anyway, below are the 3
scripts, to be used as follows:

	sed -f Cstrip.1.sed [file] | sed -f Cstrip.2.sed | sed -f Cstrip.3.sed

A still more twisted sample file is included.

[[Ed's Note: Placed in archives at Rice.]]

FTP:	Hostname : titan.rice.edu (128.42.1.30)
	Directory: sun-source
	Filename : crem.shar

Archive Server Address: archive-server@rice.edu
Archive Server Command: send sun-source crem.shar