[comp.lang.c] C comment stripper shell script? -> use sed pipeline

maart@cs.vu.nl (Maarten Litmaath) (03/25/89)

jim@bilpin.UUCP (Jim G) writes:
\#{ zapcom.sh }
\#  Remove comments from a C program
\#  sed removes comment strings which begin and end on the same line
\#  awk removes comment strings which extend across multiple lines
\#  sed/awk both handle nesting of comments within their context

Aha! You're using a SHELL script! Well, in that case there's another word
for my `sed approach' :-)
No awk necessary. This pipeline is reasonably fast too!
Usage:
	sed -f Cstrip.1.sed foo.c | sed -f Cstrip.2.sed | sed -f Cstrip.3.sed

: This is a shar archive.  Extract with sh, not csh.
: This archive ends with exit, so do not worry about trailing junk.
: --------------------------- cut here --------------------------
PATH=/bin:/usr/bin:/usr/ucb
echo Extracting 'Cstrip.1.sed'
sed 's/^X//' > 'Cstrip.1.sed' << '+ END-OF-FILE ''Cstrip.1.sed'
X#n
Xs/\(.\)/\1\
X/g
Xs/$/==/p
+ END-OF-FILE Cstrip.1.sed
chmod 'u=rw,g=r,o=r' 'Cstrip.1.sed'
set `wc -c 'Cstrip.1.sed'`
count=$1
case $count in
27)	:;;
*)	echo 'Bad character count in ''Cstrip.1.sed' >&2
		echo 'Count should be 27' >&2
esac
echo Extracting 'Cstrip.2.sed'
sed 's/^X//' > 'Cstrip.2.sed' << '+ END-OF-FILE ''Cstrip.2.sed'
X#n
X/"/{
X	: L0
X	p
X	n
X	/"/{
X		p
X		b
X	}
X	/\\/{
X		p
X		n
X	}
X	b L0
X}
X/'/{
X	: L1
X	p
X	n
X	/'/{
X		p
X		b
X	}
X	/\\/{
X		p
X		n
X	}
X	b L1
X}
X/\\/{
X	p
X	n
X	p
X	b
X}
X/\//{
X	h
X	n
X	/*/{
X		: L2
X		n
X		: L3
X		/*/{
X			n
X			/\//b
X			b L3
X		}
X		b L2
X	}
X	H
X	g
X}
Xp
+ END-OF-FILE Cstrip.2.sed
chmod 'u=rw,g=r,o=r' 'Cstrip.2.sed'
set `wc -c 'Cstrip.2.sed'`
count=$1
case $count in
232)	:;;
*)	echo 'Bad character count in ''Cstrip.2.sed' >&2
		echo 'Count should be 232' >&2
esac
echo Extracting 'Cstrip.3.sed'
sed 's/^X//' > 'Cstrip.3.sed' << '+ END-OF-FILE ''Cstrip.3.sed'
X#n
X/==/{
X	g
X	s/\n//gp
X	s/.*//
X	x
X	b
X}
XH
+ END-OF-FILE Cstrip.3.sed
chmod 'u=rw,g=r,o=r' 'Cstrip.3.sed'
set `wc -c 'Cstrip.3.sed'`
count=$1
case $count in
40)	:;;
*)	echo 'Bad character count in ''Cstrip.3.sed' >&2
		echo 'Count should be 40' >&2
esac
exit 0
-- 
 Modeless editors and strong typing:   |Maarten Litmaath @ VU Amsterdam:
   both for people with weak memories. |maart@cs.vu.nl, mcvax!botter!maart

jim@bilpin.UUCP (Jim G) (03/30/89)

    #{ v_langC.2 }
    IN ARTICLE <2216@solo8.cs.vu.nl>, maart@cs.vu.nl (Maarten Litmaath) WRITES:
>   jim@bilpin.UUCP (Jim G) [**THAT'S ME, FOLKS!**] writes:
>   \#{ zapcom.sh }
>   \#  Remove comments from a C program
>   \#  sed removes comment strings which begin and end on the same line
>   \#  awk removes comment strings which extend across multiple lines
>   \#  sed/awk both handle nesting of comments within their context
    [small but perfectly formed awk/sed script deleted]
>   
>   Aha! You're using a SHELL script! Well, in that case there's another word
>   for my `sed approach' :-)
>   No awk necessary. This pipeline is reasonably fast too!
    [immense sed script deleted]

    Although I don't dispute the efficacy of the supplied script ( I haven't
    checked it out, though ), I think that this m-iii-ght be taking a
    preference for sed a m-iii-te too far. My 3 line sed + 13 line awk
    script has been replaced by a 101 line script with 66 lines of sed -
    hmmm. Although awk is undoubtedly slower than sed, I use it in
    preference for solving editing problems which can be defined on a field
    basis, as I find it much easier to conceptualise solutions; I do not
    find the sed syntax or operation conducive to an intuitive
    problem/solution association ( obviously some peculiarity in how my
    brain, errrm, works ).

    I aimed for conciseness and a simple, balanced structure in the code
    (rather than maximum efficiency, or universal application), as this is
    easier for people (including me) to understand, and therefore
    alter/improve, if they wish; especially for novice users, who would
    probably feel safe in tinkering with zapcom.sh, but would probably have
    to be restrained and sedated after seeing Cstrip :-)

    Also, zapcom.sh is not universally applicable, in that it requires
    comment delimiters to be themselves delimited by white space/EOL (so awk
    can treat them as individual fields); and it won't handle correctly
    comment delimiters embedded in quotes. There obviously comes a point
    where the effort required to handle a special case outweighs the benefit
    achieved; I considered these cases to come into that category. 

    We have now had a reasonable number of constructive postings on this 
    subject to give all interested parties a good set of approaches from
    which to choose. Thankyou and goodnight ...
-- 
	   <Path: mcvax!ukc!icdoc!bilpin!jim> <UUCP: jim@bilpin.uucp>
  Programmers' maxim : If it's not aesthetically pleasing, it's probably wrong.