[comp.lang.c] v16i088: List identifiers and declarations for C sources

rsalz@uunet.uu.net (Rich Salz) (11/18/88)

Submitted-by: arizona!rupley!local (John Rupley)
Posting-number: Volume 16, Issue 88
Archive-name: identlist

TITLE: cdeclist, identlist - list external declarations 
for C source files; list identifiers

SUMMARY: The attached Lex source files are for filters that
generate, for a C source file:  
	(1) a list of external declarations (functions, arrays,
		variables, structures); 
	(2) a file of identifiers suitable for inverted indexing, 
		making a word list, etc.  
Run under SysV or BSD.  See README for test instructions:


Hope you find them useful,

John Rupley
 uucp: ..{cmcl2 | hao!ncar!noao}!arizona!rupley!local
 internet: rupley!local@megaron.arizona.edu

-------------------------------------------------------------------------
#!/bin/sh
# to extract, remove the header and type "sh filename"
if `test ! -s ./README`
then
echo "writing ./README"
cat > ./README << '\Rogue\Monster\'
README - Sat Aug 20 23:22:04 MST 1988

The attached Lex source files are for filters that generate, 
for a C source file:
	(1) a list of external declarations (functions, arrays, 
		variables, structures);
		each declaration on one line, for manipulation 
		by awk, etc.;
		initializers replaced by a /*comment*/;
		function definitions with parameter declarations and
		a dummy statement with appropriate return:
			{ [return [(n)];] }
		preprocessing with cpp replaces defined names and executes
		compilation conditionals;
	(2) a file of identifiers suitable for inverted indexing, making
		a word list, etc.

Use of the filters is explained by example:
	(1) mk_cdeclist and the test targets in the Makefile show the 
		generation of a declaration list;
	(2) mk_indentlist shows the generation of a list of identifiers.

The make file works under csh/BSD and ksh/SYSV.
To test:
	copy a C source file and any non-system #include dependencies into
		the directory with the unshared files from cdeclist.shar
	run:
		make TESTC="C_src_file" testall
	you should get a file LLLLfilename, which is the declaration list,
		some intermediate files LLfilename and temp?, which shou
		what is going on at each stage, and a file ZZZZfilename,
		which is the list of C keywords and identifiers, 
		

The cdeclist filters, although simple, should parse most styles of coding.  
Adjustment may be needed for the new ANSI standard.  The output
is suitable for making a lint library.

The identlist filters are offered without much being claimed for them.
I use the list as a guide and aid, without assuming it absolutely correct.

The shell scripts mk_* were written under ksh on a SYSV machine, and they
need modification to run under csh or BSD.

John Rupley
 uucp: ..{cmcl2 | hao!ncar!noao}!arizona!rupley!local
 internet: rupley!local@megaron.arizona.edu
 (H) 30 Calle Belleza, Tucson AZ 85716 - (602) 325-4533
 (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
MANIFEST: (for cdeclist.shr1)

README
Makefile
cdeclist.1		man page

uncomment.l	|
cdeclist1.l	|
cdeclist2.l	|	shell wrapper and filters generating declaration list
cdeclist3.l	|
cdeclist4.l	|
mk_cdeclist	|

identlist.l	|	shell wrapper and filters generating file of identifiers
identlist1.l	|
mk_identlist	|

\Rogue\Monster\
else
  echo "will not over write ./README"
fi
if [ `wc -c ./README | awk '{printf $1}'` -ne 2358 ]
then
echo `wc -c ./README | awk '{print "Got " $1 ", Expected " 2358}'`
fi
if `test ! -s ./Makefile`
then
echo "writing ./Makefile"
cat > ./Makefile << '\Rogue\Monster\'
#MAKEFILE -
#5 filters for extracting a list of declarations from a C source file;
#	delete: comments; 
#		(uncomment.l)
#	filter with cpp: replace #defines; execute #ifdef control statements;
#		(cdeclist1.l pre-processes, cdeclist2.l post-processes)
#	delete: function {bodies}; initializations (= ....;);
#		(cdeclist3.l)
#	reformat: to one-line declarations; and a bit more;
#		(cdeclist4.l)
#output should be suitable for:
#	making /*LINTLIBRARY*/, massaging with grep, awk..., etc;
#all filters read stdin, write stdout;
#probably need some adaptation for new ANSI standard;
#
#also 2 filters (identlist and identlist1) that prepare a C source
#file for making a word list or index of identifiers; please don't 
#flame me if the word list misses a few identifiers or includes a few 
#non-existent ones -- I make no great claim for it -- and it assumes,
#I am sure, an idiosyncratic style (mine).
#
#John Rupley
# uucp: ..{cmcl2 | hao!ncar!noao}!arizona!rupley!local
# internet: rupley!local@megaron.arizona.edu
# (H) 30 Calle Belleza, Tucson AZ 85716 - (602) 325-4533
# (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929

OPT=

dummy:
	@echo please supply a target

all:	uncomment cdeclist1 cdeclist2 cdeclist3 cdeclist4 identlist identlist1

#remove comments
uncomment:	uncomment.l
	lex -v uncomment.l
	$(CC) lex.yy.c $(CFLAGS) -ll $(LDFLAGS) -o uncomment


#remove comments and quoted strings etc
identlist:	identlist.l
	lex -v identlist.l
	$(CC) lex.yy.c $(CFLAGS) -ll $(LDFLAGS) -o identlist

#remove upper-case words and numbers
identlist1:	identlist1.l
	lex -v identlist1.l
	$(CC) lex.yy.c $(CFLAGS) -ll $(LDFLAGS) -o identlist1

#prepare C source file for cpp processing
cdeclist1:	cdeclist1.l
	lex -v cdeclist1.l
	$(CC) lex.yy.c $(CFLAGS) -ll $(LDFLAGS) -o cdeclist1

#remove additions associated with cpp processing
cdeclist2:	cdeclist2.l
	lex -v cdeclist2.l
	$(CC) lex.yy.c $(CFLAGS) -ll $(LDFLAGS) -o cdeclist2

#remove function {bodies}, put in appropriate { return (n); }
cdeclist3:	cdeclist3.l
	lex -v cdeclist3.l
	$(CC) lex.yy.c $(CFLAGS) -ll $(LDFLAGS) -o cdeclist3

#one-line declarations;remove initializatons;beautify a little
cdeclist4:	cdeclist4.l
	lex -v cdeclist4.l
	$(CC) lex.yy.c $(OPT) $(CFLAGS) -ll $(LDFLAGS) -o cdeclist4 

SHARLIST=	\
	README\
	Makefile\
	cdeclist.1\
	uncomment.l\
	cdeclist1.l\
	cdeclist2.l\
	cdeclist3.l\
	cdeclist4.l\
	mk_cdeclist\
	identlist.l\
	identlist1.l\
	mk_identlist

mkshar:
	shar -f cdeclist -c $(SHARLIST)

TESTC="please define TESTC=C_src_file on make line"
#HDIR=hard_wired_path_for_headers
#CDIR=hard_wired_path_for_sources
HDIR="."
CDIR="."

#test making a declaration list
testc:		all
	cat $(CDIR)/$(TESTC)|uncomment|cdeclist1 >temp1
	/lib/cpp -P -C -I$(HDIR) temp1 >temp2
	cat temp2|cdeclist2 >temp3
	echo /*$(TESTC)*/ >LL$(TESTC)
	echo >>LL$(TESTC)
	cat temp3|cdeclist3 >>LL$(TESTC)

test2c:		cdeclist2
	cat temp2|cdeclist2 >temp3

test3c:		cdeclist3
	echo /*$(TESTC)*/ >LL$(TESTC)
	echo >>LL$(TESTC)
	cat temp3|cdeclist3 >>LL$(TESTC)

TESTLL=LL$(TESTC)

testll:		cdeclist4
	cat $(TESTLL)|cdeclist4 >LL$(TESTLL)


#test making a word list
testw:		identlist identlist1
	cat $(TESTC)|identlist|identlist1	|\
		tr -s " 	" "\012\012"|sort|uniq	>ZZZZ$(TESTC)


#careful!
testlint:
	lint -uvx -Ml -otemp LL$(TESTLL)
	cp llib-ltemp.ln /usr/lib/large
	lint -uvx -Ml -ltemp $(TESTC)

testall: testc testll testw		#testlint 
\Rogue\Monster\
else
  echo "will not over write ./Makefile"
fi
if [ `wc -c ./Makefile | awk '{printf $1}'` -ne 3409 ]
then
echo `wc -c ./Makefile | awk '{print "Got " $1 ", Expected " 3409}'`
fi
if `test ! -s ./cdeclist.1`
then
echo "writing ./cdeclist.1"
cat > ./cdeclist.1 << '\Rogue\Monster\'
.TH CDECLIST 1
.SH NAME
mk_cdeclist, mk_identlist \- list external declarations 
for C source files; list identifiers
.SH SYNOPSIS
.B mk_cdeclist
filelist
.br
.B mk_identlist
filelist
.SH DESCRIPTION
.B Mk_cdeclist
is a shell script that links a set of filters and the C preprocessor,
.B cpp,
to convert C source files
into a list of external declarations (functions, arrays, 
variables, structures).
Each declaration is on one line, to simplify subsequent manipulation 
by awk, etc.
Initializers are replaced by a comment, /*INITIALIZED*/.
Function definitions include parameter declarations and
a dummy statement with appropriate return:
.in +10
{ [return [(n)];] }
.in
The C preprocessor is used to replace defined names and to execute 
compilation conditionals;
declarations introduced by the preprocessing are removed, and
include directives are restored.
.PP
The output of 
.B mk_cdeclist
can be compiled into a lint library.
.PP
.B Mk_identlist
is a shell script with filters for converting C source files
into a file of identifiers suitable for inverted indexing, making
a word list, etc.
.PP
.SH FILES
The filters for both
.B mk_cdeclist 
and
.B mk_identlist
are from Lex source files:
.in +5
uncomment.l, cdeclist[1-4].l and identlist[ 1].l.
.in
The executables are correspondingly named.
.br
.B Cpp
is used in 
.B mk_cdeclist.
.SH AUTHOR
John Rupley
.br
 uucp: ..{cmcl2 | hao!ncar!noao}!arizona!rupley!local
.br
 internet: rupley!local@megaron.arizona.edu
.br
 Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721
.SH BUGS
The cdeclist filters, although simple, should parse most styles of coding.  
Adjustment may be needed for the new ANSI standard.
If there is a problem, try filtering first with
.B cb
or
.B indent.
.sp
The identlist filters are offered without much being claimed for them.
Use the list as a guide and aid, without assuming it absolutely correct.
.sp
The shell scripts mk_* were written under ksh on a SYSV machine, and they
need modification to run under csh or BSD.
\Rogue\Monster\
else
  echo "will not over write ./cdeclist.1"
fi
if [ `wc -c ./cdeclist.1 | awk '{printf $1}'` -ne 1998 ]
then
echo `wc -c ./cdeclist.1 | awk '{print "Got " $1 ", Expected " 1998}'`
fi
if `test ! -s ./uncomment.l`
then
echo "writing ./uncomment.l"
cat > ./uncomment.l << '\Rogue\Monster\'
%{
/*UNCOMMENT- based on usenet posting by: */
/*	Chris Thewalt; thewalt@ritz.cive.cmu.edu */
%}
STRING		\"([^"\n]|\\\")*\"
COMMENTBODY	([^*\n]|"*"+[^*/\n])*
COMMENTEND	([^*\n]|"*"+[^*/\n])*"*"*"*/"
%START	COMMENT
%%
<COMMENT>{COMMENTBODY}		;
<COMMENT>{COMMENTEND}		BEGIN 0;
<COMMENT>.|\n			;
"/*"				BEGIN COMMENT;
{STRING}			ECHO;
.|\n				ECHO;


\Rogue\Monster\
else
  echo "will not over write ./uncomment.l"
fi
if [ `wc -c ./uncomment.l | awk '{printf $1}'` -ne 349 ]
then
echo `wc -c ./uncomment.l | awk '{print "Got " $1 ", Expected " 349}'`
fi
if `test ! -s ./cdeclist1.l`
then
echo "writing ./cdeclist1.l"
cat > ./cdeclist1.l << '\Rogue\Monster\'
%{
/*-CDECLIST1: prepare for cpp execution of #ifdefs, etc.; */
/*i.e., setup to restore #includes & remove code added by cpp */
%}
%%
^\#[ \t]*include.*$		{printf("/*%s*/\n", yytext);
	printf("/*DINGDONGDELL*/\n");
	printf("%s\n", yytext);
	printf("/*DELLDONGDING*/\n");}
.|\n				ECHO;


\Rogue\Monster\
else
  echo "will not over write ./cdeclist1.l"
fi
if [ `wc -c ./cdeclist1.l | awk '{printf $1}'` -ne 289 ]
then
echo `wc -c ./cdeclist1.l | awk '{print "Got " $1 ", Expected " 289}'`
fi
if `test ! -s ./cdeclist2.l`
then
echo "writing ./cdeclist2.l"
cat > ./cdeclist2.l << '\Rogue\Monster\'
%{
/*-CDECLIST2: remove cpp-included code, restore #include's */
%}
%START DING
%%
^"/*DINGDONGDELL*/"$		BEGIN DING;
^"/*DELLDONGDING*/"$		BEGIN 0;
<DING>^"/*#"[^*]*"*/"$ 		;
<DING>.|\n			;
^"/*#"[^*]*"*/"$ 		{yytext[yyleng-2] = 0;
	printf("%s", &yytext[2]);}
^[ \t]*\n			;
.|\n	 			ECHO;


\Rogue\Monster\
else
  echo "will not over write ./cdeclist2.l"
fi
if [ `wc -c ./cdeclist2.l | awk '{printf $1}'` -ne 291 ]
then
echo `wc -c ./cdeclist2.l | awk '{print "Got " $1 ", Expected " 291}'`
fi
if `test ! -s ./cdeclist3.l`
then
echo "writing ./cdeclist3.l"
cat > ./cdeclist3.l << '\Rogue\Monster\'
%{
/*-CDECLIST3: for each function, remove the {function body}; */
/*	(look out for {}'s escaped or  within " or ' pairs) */
%}
 int curly, retval;
WLF		([ \t\n\r\f]*)
FUNCSTRT	(\){WLF}\{|\;{WLF}\{)
SKIPALLQUOTED	(\"([^"\n]|\\\")*\"|\'.\'|\\.)
%START CURLY
%%
<CURLY>\{			curly++;
<CURLY>\}			{if (--curly == 0) {
					if (retval > 1)
					    printf(" return (%d); }", retval);
					else if (retval)
					    printf(" return; }");
					else
					    printf("}");
					BEGIN 0;
				}}
<CURLY>{FUNCSTRT}		curly++;
<CURLY>{SKIPALLQUOTED}|.|\n	;
<CURLY>"return"{WLF}\;		retval |= 1;
<CURLY>"return"{WLF}([^;]|{SKIPALLQUOTED})*\;	retval |= 2;
{FUNCSTRT}			{curly=1;retval=0;ECHO;BEGIN CURLY;}
^[ \t]*\n			;
.|\n	 			ECHO;


\Rogue\Monster\
else
  echo "will not over write ./cdeclist3.l"
fi
if [ `wc -c ./cdeclist3.l | awk '{printf $1}'` -ne 720 ]
then
echo `wc -c ./cdeclist3.l | awk '{print "Got " $1 ", Expected " 720}'`
fi
if `test ! -s ./cdeclist4.l`
then
echo "writing ./cdeclist4.l"
cat > ./cdeclist4.l << '\Rogue\Monster\'
%{
/*-CDECLIST4: process output of cdeclist3:
	delete initialization expressions;
	reformat for one-line declarations;
	some beautfying -- wise to process original source with cb|indent;
parsing in DECLST is hacked; would be better to base it cleanly on std ANSI;
to delete externs or whatever, try: {WLF}extern[^;]*\;{WLF}
*/
%}
 int curly;
W		([ \t]*)
WLF		([ \t\n\f\r]*)
LET		[_a-zA-Z]
DIGIT		[0-9+-/*]
DIGLET		[_a-zA-Z0-9]
NAME		([*]*{LET}{DIGLET}*)
ARRAY		(\[{DIGIT}*\])
DECL		([;,*]|{WLF}|{NAME}|{ARRAY})
FUNCPTR		(\({DECL}*\)\({DECL}*\))
DECLST		({DECL}|{FUNCPTR})*
FINDSTRUCT	(struct|union|enum){WLF}{NAME}?{WLF}\{
ENDSTRUCT	{WLF}\}{WLF}
FINDFUNC	\){DECLST}\{[ ]?(return[^}\n]*)?\}
ENDFUNC		{WLF}\{[ ]?(return[^}\n]*)?\}
FINDINIT	{WLF}\={WLF}
ENDINIT1	{WLF}\;{WLF}
ENDINIT2	{WLF}\,{WLF}
SKIPALLQUOTED	(\"([^"\n]|\\\")*\"|\'.\'|\\.)
%START NORM DELETE DECL
%{
#include <ctype.h>
main()
{
	BEGIN NORM;
	yylex();
	return 0;
}
%}
%%
<NORM>"/*"[^\n]*"*/"		print_skip(yytext);
<NORM>"#"[^\n]*$ 		print_skip(yytext);
<NORM>{WLF}{FINDINIT}		{BEGIN 0;BEGIN DELETE;}
<NORM>{FINDFUNC}|{FINDSTRUCT}	{curly=0;BEGIN 0;BEGIN DECL;REJECT;}
<DELETE>{SKIPALLQUOTED}|[^{},;]	;
<DELETE>\{			curly++;
<DELETE>\}			curly--;
<DELETE>{ENDINIT1}		{if (curly == 0) {
				printf("\;\t/*INITIALIZED*/\n");
				;BEGIN 0; BEGIN NORM;}}
<DELETE>{ENDINIT2}		{if (curly == 0) {
				printf(" /*INITIALIZED*/, ");
				;BEGIN 0; BEGIN NORM;}}
<DECL>{ENDSTRUCT}		{printf("} ");
				if (--curly == 0) {
					BEGIN 0;BEGIN NORM;
				}}
<DECL>{ENDFUNC}			{printf(" ");print_skip(yytext);
				BEGIN 0;BEGIN NORM;}
<DECL>\{			{ECHO;curly++;}
<DECL>{WLF}\){WLF}/{ENDFUNC}	printf(")");
<DECL>{WLF}\;{WLF}/{ENDFUNC}	printf(";");
<DECL>{WLF}\;{WLF}		printf("; ");
<NORM>{WLF}\;{WLF}		printf(";\n");
<NORM,DECL>{WLF}\,{WLF}		printf(", ");
<NORM,DECL>{WLF}/[/#]		;
<NORM,DECL>{WLF}		printf(" ");
<NORM,DECL>.	 		ECHO;
%%
print_skip(s) char * s;
{
	int c;
	while (isspace(*s))
		s++;
	printf("%s\n", s);
	while (isspace(c=input()))
		;
	unput(c);
}




\Rogue\Monster\
else
  echo "will not over write ./cdeclist4.l"
fi
if [ `wc -c ./cdeclist4.l | awk '{printf $1}'` -ne 2012 ]
then
echo `wc -c ./cdeclist4.l | awk '{print "Got " $1 ", Expected " 2012}'`
fi
if `test ! -s ./mk_cdeclist`
then
echo "writing ./mk_cdeclist"
cat > ./mk_cdeclist << '\Rogue\Monster\'
# MK_CDECLIST

DEFAULTDIR="."
CDIR=`dirname $1`
if [ ${CDIR} = "." ]
then
	CDIR=${DEFAULTDIR}
fi
HDIR=${CDIR}
CDECLIST=""
#option: if define LLOUT="Sall.1", then output concatenated onto Sall.1;
#else get individual output files with prefix "LL";
LLOUT="Sall.1"
#LLOUT="dumdumdum"

for i in $*
do
	FILE=`basename ${i}`
	echo processing file: $i >&2
	if [ ${LLOUT} != "Sall.1" ]
	then
		LLOUT=LL${FILE}
		CDECLIST="${CDECLIST} ${LLOUT}"
	fi
	echo "\n\n/*********************************************/\n" >>${LLOUT}
	echo /*${FILE}*/ >>${LLOUT}
	echo >>${LLOUT}
	cat ${CDIR}/${FILE}|uncomment|cdeclist1|/lib/cpp -P -C -I${HDIR}|
		cdeclist2|cdeclist3|cdeclist4 >>${LLOUT}
done

#temporary++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
exit

#NOTE: what follows is an example of manipulation of the cdeclist output.

#the following cats a list of separate "LL" files onto Sall.1
echo
echo +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
echo combining files: ${CDECLIST}
>Sall.1
for i in ${CDECLIST}
do

	cat ${i} >>Sall.1
done

#the following extracts a list of function definitions _only_;
#(no static or external declarations; no definitions of arrays/vars/structs);
#collect comment header lines, functions, and struct declarations;
#get rid of register specs;
#strip any variable declarations trailing struct declarations;
#change return (n) -> return (0), so ok for return of pointers;
#insert VARARGS where needed;
#to include structure definitions, add to 1st egrep: |struct[ ]+[^;({]+\{
#remove several functions with "static" scope -- and that give errors in
#compilation without struct definitions (same scope);
#this gives a partial lint library - full library needs array/variables;

cat Sall.1|egrep '\/\*[^L]|\{\}|\ return[ ;]' |
	egrep -v '^static\ |INITIALIZED' |
	egrep -v '\ engr_at[(]|^del_engr[(]|\ outentry[(]|^ini_inv[(]' |
	sed -e "s/register[ ]\([^;, ][^;, ]*[;,]\)/int\ \1/g"	\
		-e "s/register[ ]//g"	\
		-e "s/\}[^;}]*\;$/\}\ \;/"	\
		-e "s/return\ .[1-3]./return\ \(0\)/g"	\
		-e "s/^panic/\/\*VARARGS1\*\/\\
panic/"	\
		-e "s/^error/\/\*VARARGS1\*\/\\
error/"	\
		-e "s/^pline/\/\*VARARGS1\*\/\\
pline/"	\
		-e "s/^impossible/\/\*VARARGS1\*\/\\
impossible/p" >Sall.2

\Rogue\Monster\
else
  echo "will not over write ./mk_cdeclist"
fi
if [ `wc -c ./mk_cdeclist | awk '{printf $1}'` -ne 2230 ]
then
echo `wc -c ./mk_cdeclist | awk '{print "Got " $1 ", Expected " 2230}'`
fi
if `test ! -s ./identlist.l`
then
echo "writing ./identlist.l"
cat > ./identlist.l << '\Rogue\Monster\'
%{
/*IDENTLIST- comments & strings deleted; punc to space; etc*/
/*(a filter to prep C source for bib INDEXing, making a word list, ...)*/
/*convert to white space all tokens except C keywords and identifiers;*/
/*comment recognition based on usenet posting by: */
/*	Chris Thewalt; thewalt@ritz.cive.cmu.edu */
%}
W		[ \t]
STRING		\"([^"\n]|\\\")*\"
%START	COMMENT
%%
<COMMENT>([^*\n]|"*"+[^*/\n])*			;
<COMMENT>([^*\n]|"*"+[^*/\n])*"*"*"*/"		BEGIN 0;
<COMMENT>.|\n			;
"/*"				BEGIN COMMENT;
{STRING}			;
^#.*$				;
[a-zA-Z_0-9]\.[a-zA-Z_0-9]	ECHO;
[a-zA-Z_0-9]\-\>[a-zA-Z_0-9]	ECHO;
\\|\||\+|\)|\(|\*|\&|\^|\%|\$		printf(" ");
\#|\@|\!|\~|\`|\-|\=|\}|\{		printf(" ");
\]|\[|\"|\:|\'|\;|\?|\/|\>|\.|\<|\,	printf(" ");
.|\n				ECHO;

\Rogue\Monster\
else
  echo "will not over write ./identlist.l"
fi
if [ `wc -c ./identlist.l | awk '{printf $1}'` -ne 735 ]
then
echo `wc -c ./identlist.l | awk '{print "Got " $1 ", Expected " 735}'`
fi
if `test ! -s ./identlist1.l`
then
echo "writing ./identlist1.l"
cat > ./identlist1.l << '\Rogue\Monster\'
%{
/*IDENTLIST1- remove strings that are all uppercase or all numbers*/
/*(another filter to prep C source for bib INDEXing, making a word list, ...)*/
/*
^[A-Z_0-9]+/{W}+		printf(" ");
{W}+[A-Z_0-9]+$			printf(" ");
^[A-Z_0-9]+$			printf(" ");
*/
%}
W		[ \t]
%START	COMMENT
%%
(^|{W}+)[A-Z_0-9]+/(\n|{W}+)	printf(" ");
.|\n				ECHO;

\Rogue\Monster\
else
  echo "will not over write ./identlist1.l"
fi
if [ `wc -c ./identlist1.l | awk '{printf $1}'` -ne 335 ]
then
echo `wc -c ./identlist1.l | awk '{print "Got " $1 ", Expected " 335}'`
fi
if `test ! -s ./mk_identlist`
then
echo "writing ./mk_identlist"
cat > ./mk_identlist << '\Rogue\Monster\'
# MK_IDENTLIST
#prepare C source files for making an inverted index, or whatever, using
#identlist and identlist1 to: delete quoted strings, comments; 
#convert punctuation to white space; delete numbers and all upper case words;
#the result, with some stuff flagged for later reversal,
#should be suitable for making an inverted index of identifiers in each 
#source file.
#
#if use "invert" from the "bib" suite of programs, can exclude, as common
#words, the C keywords and the system library
#
#(NOTE: as a test, the output of the loop containing the filters is 
#sent through tr, sort and uniq, to produce a word list)

DEFAULTDIR="."
CDIR=`dirname $1`
if [ ${CDIR} = "." ]
then
	CDIR=${DEFAULTDIR}
fi

for i in $*
do
	FILE=`basename ${i}`
	echo processing file: $i >&2
	cat ${CDIR}/${FILE}|identlist|identlist1|
	sed -e '/^[ 	]*$/d'
done |tr -s " 	" "\012\012"|sort|uniq			#produce a word list


#temporary++++++++++++++++++++++++++++++++++++++++++++++++++
exit

#to make an inverted index, replace the above loop by the following:

for i in $*
do
	FILE=`basename ${i}`
	echo processing file: $i
	cat ${CDIR}/${FILE}|identlist|identlist1|
	sed -e 's/\([a-zA-Z_0-9]\)\.[^ \t]*[ \t]/\1\qsq\ /g' \
		-e 's/\([a-zA-Z_0-9]\)\-\>[^ \t]*[ \t]/\1qsq\ /g' \
		-e 's/\([a-zA-Z0-9]\)[_]\([a-zA-Z0-9]\)/\1quq\2/g' |
	sed -e '/^[ 	]*$/d' 		>${FILE}
	FILELIST=${FILELIST}" "${FILE}
done


#NOTE: if the above changes are made, mk_identlist should be run in 
#an empty (scratch) directory;
#intermediate files of the same name as the C source files are created;
#the directory with the C source files can be given as DEFAULTDIR or as
#the absolute pathname of the first file in the list;

#NOTE: the following is an example of processing to get an inverted index.

invert -ccommon -k5000 -l30 ${FILELIST}
mv INDEX INDEX.long
cat INDEX.long|
sed -e 's/quq/\_/g' -e 's/qsq/STRUCT/g' \
	-e 's/\([^0-9]\)[0-9][0-9]*\/[0-9][0-9]*\([^0-9]\)/\1\2/g'	\
	-e 's/\([^0-9]\)[0-9][0-9]*\/[0-9][0-9]*$/\1/' |
sort -o INDEX

\Rogue\Monster\
else
  echo "will not over write ./mk_identlist"
fi
if [ `wc -c ./mk_identlist | awk '{printf $1}'` -ne 2001 ]
then
echo `wc -c ./mk_identlist | awk '{print "Got " $1 ", Expected " 2001}'`
fi
echo "Finished archive 1 of 1"
exit



-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.