[comp.lang.c] Sed wins! It IS possible to strip C comments with 1 sed command!

maart@cs.vu.nl (Maarten Litmaath) (03/21/89)

leo@philmds.UUCP (Leo de Wit) writes:
\Can it be proven to be impossible (that is, deleting the comments
\with one sed command - multi-line comments not considered) ?

No, because the script below WILL do it. It won't touch "/*...*/" inside
strings. Multi-line comments ARE considered and handled OK.
One can either use "sed -f script" or "sed -n '<contents of script>'".
After the script some test input follows (an awful but valid C program).
Spoiler: the sequence

	H
	x
	s/\n\(.\).*/\1/
	x
	s/.//

deletes the first character of the pattern space and appends it to the hold
space; this space contains the characters not to be deleted.
----------8<----------8<----------8<----------8<----------8<----------
#n

: loop
/^$/{
	x
	p
	n
	b loop
}
/^"/{
	: double
	/^$/{
		x
		p
		n
		b double
	}
	H
	x
	s/\n\(.\).*/\1/
	x
	s/.//
	/^"/b break
	/^\\/{
		H
		x
		s/\n\(.\).*/\1/
		x
		s/.//
	}
	b double
}
/^'/{
	: single
	/^$/{
		x
		p
		n
		b single
	}
	H
	x
	s/\n\(.\).*/\1/
	x
	s/.//
	/^'/b break
	/^\\/{
		H
		x
		s/\n\(.\).*/\1/
		x
		s/.//
	}
	b single
}
/^\\/{
	H
	x
	s/\n\(.\).*/\1/
	x
	b break
}
/^\/\*/{
	s/.//
	: comment
	s/.//
	/^$/n
	/^*\//{
		s/..//
		b loop
	}
	b comment
}
: break
H
x
s/\n\(.\).*/\1/
x
s/.//
b loop
----------8<----------8<----------8<----------8<----------8<----------
main()
{
	/* this
	 * is
	   a comment
	 */
	char /* Z /* Z / Z * Z /*/ *s = "/*", /* Z /* Z / Z * Z **/ c = '*',
		d = '/', f = '\\', g = '\'',

		*q = "*/", *p = "\
/* these characters are\
 inside a string \"\\\
*/";
	int	i = 12 / 2 * 3;

	exit(0);
}
-- 
 Modeless editors and strong typing:   |Maarten Litmaath @ VU Amsterdam:
   both for people with weak memories. |maart@cs.vu.nl, mcvax!botter!maart

envbvs@epb2.lbl.gov (Brian V. Smith) (03/22/89)

> leo@philmds.UUCP (Leo de Wit) writes:
> \Can it be proven to be impossible (that is, deleting the comments
> \with one sed command - multi-line comments not considered) ?
> 

THIS must be the method EVERYONE uses to strip comments from every C program
I have ever seen in the public domain or from Berkeley/DEC/Sun/etc...

(flame-suit ready)

There must be either a conspiracy against, or a primeval fear of including
comments in C programs.

chris@mimsy.UUCP (Chris Torek) (03/22/89)

In article <2164@helios.ee.lbl.gov> envbvs@epb2.lbl.gov (Brian V. Smith)
writes:
>THIS [sed script] must be the method EVERYONE uses to strip comments from
>every C program I have ever seen in the public domain or from Berkeley/
>DEC/Sun/etc... ...  There must be either a conspiracy against, or a
>primeval fear of including comments in C programs.

This accusation is mostly unfounded, but there is at least one known
case of someone at Berkeley removing comments someone else at Berkeley
had added.  No doubt I am unaware of various reasons for each of these
actions....

Anyway, we have already had the `how much commenting is proper' war
in comp.lang.misc or comp.misc.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

davidsen@steinmetz.ge.com (Wm. E. Davidsen Jr) (03/24/89)

  Once upon a time someone gave me a program written in lex and C which
(a) removed all comments, (b) replaced every symbol with a (non-keyword)
token from /usr/dict/words, and (c) formatted the source into N column
lines, refraining only from breaking strings over a line (strings which
*were* broken over lines became single l_o_n_g lines).

  It was written when a client of the author found it was cheaper to get
the source by lawsuit than to pay for it.

  I have sent a note to the last address I have for the original author
asking permission to post it here.
-- 
	bill davidsen		(wedu@crd.GE.COM)
  {uunet | philabs}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

dwc@homxc.ATT.COM (Malaclypse the Elder) (03/24/89)

In article <2164@helios.ee.lbl.gov>, envbvs@epb2.lbl.gov (Brian V. Smith) writes:
> 
> There must be either a conspiracy against, or a primeval fear of including
> comments in C programs.

all my C programs are ENTIRELY comments.  i then wait for the
maintainers to correctly debug my programs :-)

danny chen
att!homxc!dwc