[comp.sources.unix] v18i052: Hat/Coat, dependency analysis tools, Part01/02

rsalz@uunet.uu.net (Rich Salz) (03/23/89)

Submitted-by: Bob McQueer <mtxinu!rtech!weevil!bobm>
Posting-number: Volume 18, Issue 52
Archive-name: hat-n-coat/part01

Hat (header analysis tool) analyzes header file dependencies/references,
etc.  The "symbols" recognized are #define's, enum classes, typedef's and
structure definitions.  It can also do a reasonable job of figuring out
what header files are needed by a group of c source files.

Coat (C object analysis tool) produces a topologically sorted dependency
list/symbol cross reference for a group of objects or libraries, assuming
the convention that the "real" symbol name has an underscore prepended for
the linker.  All it actually does is massage the output from nm(1) to pass
it into the analyzer program of hat.

IMPORTANT NOTE: the bobm.a library is needed to build this, which is
being sent under another archive.  Grab that, too.

--------------
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create:
#	README
#	coat.1
#	hat.1
#	Makefile
#	coat.tpl
#	scan.l
#	parse.y
#	config.h
#	node.h
#	stdio.h
export PATH; PATH=/bin:/usr/bin:$PATH
echo shar: "extracting 'README'" '(1547 characters)'
if test -f 'README'
then
	echo shar: "will not over-write existing file 'README'"
else
cat << \SHAR_EOF > 'README'
hat (header analysis tool) analyzes header file dependencies/references, etc.
The "symbols" recognized are #define's, enum classes, typedef's and structure
definitions.  It can also do a reasonable job of figuring out what header
files are needed by a group of c source files.

See hat.1 man page for full description.

The program actually invokes a parser which walks overs header files
and produces output for an analysis program which produces a topologically
sorted dependency list, and symbol cross reference.  This analyzer may
also be used to analyze references / definitions coming from any other
source as well, provided you massage the information into the right format.
The "coat" program (compiled object analysis tool) is a trivial shell script
which routes the output of "nm" into the analyzer to produce a similar
analysis for libraries or object files.  It was used, for instance, to
assure that "static" was placed on all possible procedure declarations
in these source files.

Look over the makefile before making it.  You are expected to fill in
some configuration information.  If you leave MANDIR ".", make will say
it failed since it can't move the manual pages to themselves, but you may
ignore that.  This uses a utility library, bobm.a, which I use for other
things as well, and packed in a separate archive.  You will have to build
it first.

I haven't tried any of the fast lex'es with this.  scan.l should be a good
test for them.  I also haven't run this on any SYSV systems, and it
probably needs some SYSV ifdef's.
SHAR_EOF
fi
echo shar: "extracting 'coat.1'" '(936 characters)'
if test -f 'coat.1'
then
	echo shar: "will not over-write existing file 'coat.1'"
else
cat << \SHAR_EOF > 'coat.1'
.TH COAT LOCAL 3/1/87
.UC
.SH NAME
coat - "c" object analysis tool
.SH SYNOPSIS
.B coat
[-s[desmu]] [-r[desmu]] [-v<num>] [-z] -n<symbol> <files>
.SH DESCRIPTION
.I Coat
produces a topologically sorted dependency list / symbol cross
reference for a group of objects or libraries, assuming the convention
that the "real" symbol name has an underscore prepended for the linker.
All it actually does is massage the output from
.I nm(1)
to pass it into the analyzer program of
.I hat,
producing a similar listing.  See the
.I hat
manual page for details.
.SH OPTIONS
The options shown are the same as for
.I hat,
except that the number on the -v option should not be negative.
See the
.I hat
manual page for details.
.sp
The order in the absence of
.I -z
will be that referring files will be sorted ahead of defining files, ie.
the order wanted for
.I ld
lists.
.SH "SEE ALSO"
.I hat(local), nm(1)
.SH AUTHOR
Robert L. McQueer, bobm@rtech.
SHAR_EOF
fi
echo shar: "extracting 'hat.1'" '(16084 characters)'
if test -f 'hat.1'
then
	echo shar: "will not over-write existing file 'hat.1'"
else
cat << \SHAR_EOF > 'hat.1'
.TH HAT LOCAL 3/1/87
.UC
.SH NAME
hat - header analysis tool
.SH SYNOPSIS
.B hat
[-s[desmu]] [-r[desmu]] [-z] [-q] [-i] [-x] [-v<n>] [-f<sym>] [-n<sym>] [-c[-.]<sym>] [-a] [-p[-]<sym>] [<cppopt>] <files>
.SH DESCRIPTION
.I Hat
is a tool for analyzing the #define, typedef statements and structure /
union / enum definitions and references in header
files and determining their dependencies.  It produces five sections
of information:
.sp
First, a list of the files together with the files they depend directly upon,
in topological sort order.  Each dependency also includes the first
symbol that caused the dependency.  See next paragraph concerning sort
order and cyclical references.
.sp
Second, an expanded dependency list.  For each file, this shows the expanded
list that results from descending the dependency tree.  If there are
cyclical references, this section lists the cycles.  Cycles will have
been broken arbitrarily in determining the topological sort order.
One cycle will be shown for each time a dependency is "ignored" to
allow the topological sort to proceed.
.sp
Third, a symbol cross-reference listing of defines and references.
.sp
Fourth, a listing of multiply defined symbols.
.sp
Fifth, a listing of undefined symbols.
.sp
.I Hat
handles preprocessor conditional compilation constructs by the simple
expedient of invoking
.I /lib/cpp
explicitly.  What it does is go through the files, throwing
out everything except preprocessor syntax, and adding special lines
indicating references and definitions.  The result is piped through
.I /lib/cpp
before being analyzed so that #ifdef's affecting what will
be defined or referenced on will be properly treated by having
.I /lib/cpp
remove whatever of the special lines have been conditionally compiled out.
.sp
Many of the options are aimed at letting
.I hat
control the #ifdef's, if desired, and will probably be unused in
most cases.  For "reasonably" ifdef'ed files (refraining from #ifdef'ing
alternate versions of partial syntax), you should be able to simply let
.I /lib/cpp
do the work, as intended.
.sp
Normally, #include lines are not passed through, although this can be
overridden.  It should be overridden only if the #include's in the
header files affect how conditional compilations work (a questionable
arrangement, in the author's opinion).  At any rate, the expanded text
resulting from the #include is skipped during the analysis.  Note that
you could consider
.I hat
a tool for telling you what nested #include's are neccesary in the
first place, should you be in the camp that supports using them.
.SH OPTIONS
All options with attached strings require that the string be part of
the argument, eg. "-v4", and not "-v 4".  The reason for this
is that
.I hat
passes all unrecognized option arguments on to
.I /lib/cpp,
and it obviously couldn't know whether a given argument should include
a following string or not without building in
.I /lib/cpp's
argument syntax.  Instead, we insist on joining the argument to it's
option consistently, and allowing any special options the local
.I /lib/cpp
has to be used as long as they don't conflict with
.I hat
options.
.I Hat
stays away from upper case options.
.sp
The
.I -s
and
.I -r
options allow printing of only certain sections of the output.  The attached
desmu characters indicate the dependency list, expanded dependency list,
symbol cross reference, multiple definition and undefined sections
respectively.  The
.I -s
options specifies what sections to print, while the
.I -r 
specifies printing of all sections except those given.  If multiple
specifications are given, only the last is effective, and these options
will have the same effect wherever placed in the argument list.
.sp
The
.I -z
option reverses the sense of the topological sort.  Normally, the order
presented is defining file before referring file, which is the order
you would want for #include lines.  Using
.I -z
causes the order to be referring file followed by defining file.  This
option is mainly for use of the analyzer with alternate input - for
instance,
.I coat
uses this option to present libraries in link order.
.sp
The
.I -q
option specifies that symbols in unrecognized syntax within the files are
to be treated as references.
.I Hat
normally only recognizes #define's, typedef's, externs (which
are ignored except for what appears to be the type declaration),
array dimension expressions, and struct / union / enum definitions.
Everything else is normally ignored since the syntax isn't
understood.  Using this option will cause every symbol not a keyword or
part of understood syntax to be treated as a reference, and for instance
may be used to generate references from normal .c code, at the expense
of generating many undefined symbols.
.sp
The
.I -i
option specifies that #include lines are to be passed on to
.I /lib/cpp
as discussed above.  If none of the #include's affect conditional
compilation, the only effect of this option is to make
.I /lib/cpp
do more work and pass more lines of output to the analysis routines.
.sp
The
.I -c
option causes
.I hat
to go ahead and treat specific #if[n]def's.  Normally, stuff on both
sides of an ifdef is parsed, allowing
.I /lib/cpp
to resolve the results.  If you specify -c<symbol>,
.I hat
will act as if that symbol was defined for #if[n]def's.  -c-<symbol>
makes it specifically undefined, taking the other leg of conditional
constructs.  -c.<symbol> causes the normal interpretation, ie. both
sides of the conditional expression will be parsed.  This option is mostly
intended to allow you to resolve cases where the normal parsing causes
syntax errors, eg:
.sp
.in +5
.nf
 #ifdef YUCK
 struct onething {
 #else
 struct another {
 #endif
.fi
.in -5
.sp
The
.I -a
option makes
.I hat
treat ALL #if[n]defs, STRICTLY on the basis of command line flags.  #if
sections will be still unconditionally parsed, as well as
their #else clauses.  In this case, -c.<symbol>
specifications will be ineffective.
.sp
It is generally preferable to allow
.I /lib/cpp
to resolve things.  In a vast majority of cases, syntax errors from constructs
such as the one given above cause no problems, or a missed symbol or two
at most.
.sp
The
.I -f
option (forget) causes a symbol to be ignored for analytical purposes.
Neither definitions of, or references to, this symbol will show up.
.sp
The
.I -n
option (negate) is the same as
.I -f
in effect, except that it can not be mingled with the files, or used as
a toggle.  The actual difference is that
.I -n
is handled by the analyzer program rather than the parser, and is
thus available for other programs feeding the analyzer.
.sp
The
.I -p
option causes #define lines for the symbol to be inserted into the
input for
.I /lib/cpp
before each file.  -p-<symbol> causes #undef lines to be inserted.
.sp
The
.I -i, -f, -c, -a, -p
and
.I -q
options may be intermingled with the files to control these features
on a file-by-file basis.  They actually act as toggles, the
second invocation turning the feature off again, the third turning
it on again, and so on.  In the case of
.I -f,
the second invocation allows the symbol to be considered again.
Repeated uses of
.I -p, -c
override the previous disposition of the symbol.
.sp
The
.I -x
option suppresses use of
.I /lib/cpp.
This is appropriate if none of the header files contain any
conditional compilation constructs, allowing one less process
to be spawned.
It can also be used to remedy problems arising from too much
input (most likely too many #define's) for
.I /lib/cpp
to handle.  The parser output will simply be fed directly
into the analyzer.  If this is used, and the files DO contain
conditional sections, the result will be that all sections
will be analyzed, however #ifdef'ed, unless explicitly
supressed with the other options.  This may be useful as
some kind of "worst-case" dependency independent of conditional
compilation.  This option will have the same effect anywhere in
the argument list, and makes any use of
.I -i, -p
options and all unrecognized options irrelevent.
.sp
The
.I -v
option specifies a numeric level for tracing.  Positive numbers
indicate trace levels (1-5) for the parsing of the files.  Negative
numbers indicate levels for the analysis (-1 through -4).
.I -v
is equivalent to
.I -v1.
Nobody but somebody debugging the program will likely be interested
in trace levels with absolute value > 1.  The default is level 0 -
no tracing for either parsing or analysis.
.sp
As mentioned before, all unrecognized options are passed on to
.I /lib/cpp.
The most common one used will probably be -D options to drive
definitions for conditional compilation.
.SH "ANALYZER SYNTAX"
The analyzer part of the program is actually a separate entity that can
be used to process files, references, and definitions from any source, not
just the
.I hat
parser program.  If you run "hat -v1 -x ..." it will print a pipeline
it is executing, which consists of the parser being fed into the analyzer
program.  The names and locations of these programs are configurable locally,
so you will either have to do this or find the local installation to figure
out where the analyzer is, and what it is named.
.sp
The
.I coat
command produces this sort of analysis of references in objects or libraries by
using the output of
.I nm
massaged appropriately (via
.I sed),
and fed into the analyzer.  It is actually just
a short shell script, and will probably provide a good example.
.sp
The analyzer program reads standard input, and simply ignores lines not
beginning with "@".  The syntax is very simple:
.sp
@=<filename> - to specify a new file.
.sp
@!<symbol> - current file defines a symbol.
.sp
@?<symbol> - current file references a symbol.
.sp
@<, and @> may be used to bracket stuff which should be ignored.  Inside
a @<, the only significant lines will be @< (an error), and @> (close section).
.sp
The <filename> or <symbol> may optionally have white space and quotes
around them.  Note that the quotes are only treated as whitespace
characters - there is no mechanism to include whitespace characters in
the symbol or filename.  Use of the quotes is simply a mechanism to
prevent expansion by
.I /lib/cpp,
which is also the reason for use of non-alphanumerics in the
rest of the syntax.
.sp
For instance, if you input a @= and @! for each named node, followed by
a @? for each arc starting at the given node, the analyzer could be used
to generate a topological sort of a general directed graph.  It is the
author's belief that the cycles generated are a fundamental set, also,
although he won't swear to it without further analysis.
.SH DIAGNOSTICS
The only file, line number oriented diagnostics come from the parser, and are
self-explanatory, except for "syntax error".  If the latter happens,
parsing resumes at any point that the parser can make sense out of the file,
and may not affect much of anything.
.sp
Errors in the analyzer input, which should not be creatable from the
parser, cause fatal error messages.
.SH EXAMPLES
hat -DSYSV *.h
.sp
Will tell you massive amounts of information about the header files in
the current directory.  The -DSYSV option is given to /lib/cpp to
control ifdef's in the input.
.sp
hat f1.h f2.h -fBUGABOO f3.h -fBUGABOO ....
.sp
Analyzes ignoring definitions / references of BUGABOO in f3.h.
.sp
hat -sed *.h -q *.c
.sp
Print only the dependancy list and expanded dependency list for the files,
treat all unrecognized syntax in the *.c's as references.  This does
a pretty fair job of telling you what header files are referenced by the
various .c files, as well as generating the order they should be included
in.
.sp
hat -x -cFUNKY -c-STUFF *.h
.sp
Analyze without passing through /lib/cpp, FUNKY is defined, STUFF is not,
both branches will be analyzed for any other #ifdef constructs.
.sp
hat .... -c-UGLY hack.h -c.UGLY ....
.sp
In hack.h only, UGLY is undefined for #ifdef constructs, presumably to
resolve some syntax error which would be generated otherwise.
.SH BUGS
The parser is in no way, shape, or form intended to check proper
C syntax.
Since it is only looking for certain constructs, it accepts
anything else as irrelevent stuff and ignores it.  Even within the
constructs it is looking for, it is only interested in certain
expected pieces of the syntax, and will actually accept all sorts
of meaningless trash ( "register static auto long unsigned short
short double" gets taken to be a reasonable
type declaration for instance - as far as this analysis is
concerned, that is no different than saying "int".  Or
typedef +++% int += bar; is taken as a perfectly rational typedef,
since characters not needed to distinguish what the tool is looking
for are treated as simple white space).
.sp
On a more objectionable level, the syntax which it trys to recognize is
a mixture of preprocessor syntax and C language proper.  Interactions
between the two which result in perfectly compilable C, may make
.I hat
see syntax errors, or fool it into a wrong interpretation of a symbol
as a definition or a reference.  The author got the grammar in shape
by testing it on /usr/include, /usr/include/sys, and some large
local header directories until it only got a few syntax errors.
Specifically, it gets none on the local /usr/include, 1 on the
local /usr/include/sys, which could have been remedied by an appropriate
use of -c.  The causes in local files were questionable constructs:
.in +5
.sp
Placement of the datatype portion of a typedef in a header
file to be included in front of the actual names being defined
in the source file.
.sp
Use of a macro to provide a portion of the syntax for a typedef.
.in -5
.sp
Making it really bulletproof would involve essentially doing all the
work of
.I /lib/cpp
while simultaneously realizing that macro expansions are really
references to the macro name, and so on.  Then the analysis could
come out different with different file orders - exactly what the
tool is trying to figure out in the first place.
As it stands, it generally does pretty well,
occasionally getting a syntax error or misinterpreting something.
The stuff that one "normally" places in header files works pretty well.
.sp
For a large number of files (or a number of large files), you may be
forced to use the -x option because you will blow up
.I /lib/cpp.
Suspect this if you get some message about "too
many define's", or some such thing.  The author found that the Pyramid
version quit with an error at around 3000 #define's.
.sp
Line numbers on diagnostics coming from
.I /lib/cpp
may not match up to the original file because of accumulated "@" lines
exceeding the number of newlines which were in the original file at
that point (the program tries to "soak up" the additional lines every
time it comes to some suppressable newlines).  The
.I /lib/cpp
on some machines may not pay any attention to #line
directives for generating its error messages, still giving you
a line number in reference to stdin, and simply passing on line
number information to its output.
In this case the line numbers for
.I /lib/cpp
messages will be entirely bogus.
The author tested this on two systems, Pyramid OSx and Sun 3/60.  The Sun
.I /lib/cpp
generates messages based on the #line directives, and maintains an approximate
correctness, while the Pyramid version simply gives you messages based from
the start of input, no matter what #line directives you inserted.
Redefinitions may be ignored, anyway, since the analysis will
tell you about them in gory detail.
.sp
The analyzer is a dynamic memory pig.  It builds a symbol table containing
all definitions and references for every #define, structure name and
typedef across the entire set of files.  On a system with limited
memory, the program will probably halt with an "out of memory"
message on some number of input files which seems perfectly
reasonable to handle.
.sp
If there is a conceivable way that lines beginning with "@" other than
those inserted by the parser could reach the analyzer, it will cause
problems.
.SH "SEE ALSO"
.I coat(local)
.SH AUTHOR
Robert L. McQueer, bobm@rtech.
SHAR_EOF
fi
echo shar: "extracting 'Makefile'" '(2177 characters)'
if test -f 'Makefile'
then
	echo shar: "will not over-write existing file 'Makefile'"
else
cat << \SHAR_EOF > 'Makefile'
#
# libraries.  You will want the bobm.a utility library, wherever
# you decided to put it, and the lex library.
#
LIBS = $(HOME)/lib/bobm.a -ll

#
# -d is needed to generate y.tab.h for the scanner, and for keycheck.c
#
YFLAGS = -d

#
# don't know what all you'll have to do for SYSV, other than
# -Dindex=strchr -Drindex=strrchr
#
# if you want to dink with the keywords recognized, maybe add special
# ones for your c compiler, see keycheck.c
#
CFLAGS = -O

#
# LOCAL CONFIGURATION
#
# These definitions also drive the making of a header file.  HATDIR is the
# directory you want the analyzer and parser placed in, BINDIR is the
# directory you want the command programs to be placed in.  PARSER and
# ANALYZER are the names you want to give those respective executables.
# CPPCMD is the c-preprocessor, and SHELLCMD a shell which will be
# execl'ed to execute the other PARSER | [CPPCMD] | ANALYZER pipe.
# MANDIR is where to put the manual pages.
#
# Some of the definitions will be placed in header file localnames.h, and
# moved to lastnames.h after compiling hat.c
#
MANDIR = .
HATDIR = $(HOME)/bin
BINDIR =  $(HOME)/bin
PARSER = hat_p
ANALYZER = hat_a
CPPCMD = "/lib/cpp"
SHELLCMD = "/bin/sh"

#
# object lists for the three executables - no remarks from the peanut gallery
# concerning the analyzer abbreviation :-).
#
ANALOBJ = amain.o anread.o table.o analyze.o topsort.o listsort.o
PARSOBJ = parse.o scan.o pmain.o keycheck.o
HATOBJ = hat.o

all: hat parser anal coat man

parser: $(PARSOBJ)
	cc -o $(PARSER) $(PARSOBJ) $(LIBS)
	mv $(PARSER) $(HATDIR)

anal:	$(ANALOBJ)
	cc -o $(ANALYZER) $(ANALOBJ) $(LIBS)
	mv $(ANALYZER) $(HATDIR)

hat:	$(HATOBJ)
	cc -o hat $(HATOBJ)
	mv hat $(BINDIR)

coat:
	sed -e "s,ANALYZER,$(HATDIR)/$(ANALYZER)," coat.tpl >coat
	chmod 755 coat
	mv coat $(BINDIR)

man:
	cp hat.1 $(MANDIR)
	cp coat.1 $(MANDIR)

hat.o:
	echo "#define HATDIR \"$(HATDIR)\"" >localnames.h
	echo "#define PARSER \"$(PARSER)\"" >>localnames.h
	echo "#define ANALYZER \"$(ANALYZER)\"" >>localnames.h
	echo "#define CPPCMD \"$(CPPCMD)\"" >>localnames.h
	echo "#define SHELLCMD \"$(SHELLCMD)\"" >>localnames.h
	cc $(CFLAGS) -c hat.c
	mv localnames.h lastnames.h
SHAR_EOF
fi
echo shar: "extracting 'coat.tpl'" '(321 characters)'
if test -f 'coat.tpl'
then
	echo shar: "will not over-write existing file 'coat.tpl'"
else
cat << \SHAR_EOF > 'coat.tpl'
#!/bin/sh

TMP=/tmp/coat.$$
NMOUT=/tmp/coat.nm.$$
OPT=

for x in $*
do
	case $x in
	-*) OPT="$OPT $x" ;;
	*) echo "@=$x" >>$TMP
		echo $x >&2
		nm -g $x >$NMOUT
		grep " [TBDC] " $NMOUT | sed -e "s/^.* _/@!/" >>$TMP
		grep " U " $NMOUT | sed -e "s/^.* _/@?/" >>$TMP
	esac
done

cat $TMP | ANALYZER -z $OPT
rm $TMP $NMOUT
SHAR_EOF
chmod +x 'coat.tpl'
fi
echo shar: "extracting 'scan.l'" '(9868 characters)'
if test -f 'scan.l'
then
	echo shar: "will not over-write existing file 'scan.l'"
else
cat << \SHAR_EOF > 'scan.l'
 
 /*
 **
 ** Copyright (c) 1988, Robert L. McQueer
 ** 	All Rights Reserved
 **
 ** Permission granted for use, modification and redistribution of this
 ** software provided that no use is made for commercial gain without the
 ** written consent of the author, that all copyright notices remain intact,
 ** and that all changes are clearly documented.  No warranty of any kind
 ** concerning any use which may be made of this software is offered or implied.
 **
 */

 extern int Diag_line;
 extern char *Diag_file;
 extern int Iflag;
 extern int Xflag;
 extern int Aflag;
 extern int Cflag;
 extern int Verbosity;

 extern char *Ftab;

 extern char Fextra[];

 int Add_line;	/* referenced by yyparse() */

 static int Outflag;
 static int Oldstate;
 static int Close_include;

 static int Squelch;

 /*
 ** ifdef stack.  Cheap to specify a large number of nesting levels,
 ** so we don't bother making this configurable.  Number allowed
 ** is 32 times the length of the array (it's a bit map)
 ** Estack length matches Ifstack, and simply says whether the
 ** else goes with an if we are processing or not.
 */
 static unsigned long Ifstack[50];	/* yep, 1600 nested ifdefs! */
 static unsigned long Estack[50];
 static int Ifidx;

 /*
 ** a local stdio.h is used to pull in yytab.h and define
 ** MYPUTS(), which outputs a string conditionally on the
 ** setting of Outflag.  Also defines SQRET as a conditional
 ** return based on Squelch setting.
 */

 /*
 ** NOTES on interaction with yyparse:
 **
 ** returning WORD, NAWORD or AWORD indicates a string constant
 ** has been queued up using q_str().  It is up to the parser
 ** to dequeue the returned strings without overflowing the queue.
 **
 ** All /lib/cpp syntax is delimited for yyparse with the token
 ** CPPEND once we reach the end of the # line, or possible
 ** continuations for #define constructs.  Separate tokens
 ** DEFINE and CPP distinguish #define lines from other cpp
 ** lines.  States <PREPRO>, <DEF1> and <INDEF> are used for this.
 **
 ** Squelch controls #ifdef treatment.  If set, we simply continue
 ** scanning rather than passing tokens back to yyparse.  Thus
 ** we turn off #ifdef'ed out sections by simply not allowing the
 ** parser to see them.  Squelch is changed in such a manner as
 ** to send back or suppress things only on CPP boundaries. An
 ** entire CPP -> CPPEND statement is suppressed when we suppress,
 ** together with everything until the CPP -> CPPEND sequence
 ** turning it back on again, which is sent back in its entirety.
 **
 ** MYPUTS controls output of /lib/cpp constructs.  The scanner
 ** outputs all the /lib/cpp lines, unless the Xflag is on.  It
 ** also outputs "@<" , "@>" lines around includes if Iflag
 ** is on, and outputs the "@=" line, a "#line" directive, and
 ** any optional stuff specified by the command line at the
 ** beginning of each file.  The parser outputs "@!", "@?" lines
 ** only.
 **
 ** Much stuff is never seen by the parser.  Many characters such
 ** as +, -, (, ), = which are not important to the constructs the
 ** parser is looking for are treated as simple white space.  Numeric,
 ** single-quote and double-quote constants are also ignored, being
 ** treated much as commentary.  <COMMENT> and <QUOTE> states apply
 ** to this.  Note that they have to resume an old state, rather
 ** than an unconditional 0 to handle comment and quote constants
 ** within cpp syntax.
 **
 ** AWORD, NAWORD tokens apply only to the word following #define.
 ** AWORD indicates and argument list, NAWORD none.  ')' is returned
 ** as a token (MEND) only inside #define's, allowing the parser
 ** to pick up argument lists.
 **
 ** obvious item reference, .<something> or -><something> are
 ** also thrown out inside #defines, so we don't see them as
 ** symbol references.
 **
 ** Add_line is a mechanism to attempt to make line numbers
 ** match up for /lib/cpp.  Every time we generate a spare \n
 ** for a @ line, we bump it.  When an "optional" newline comes along,
 ** we decrement it if > 0, or output a newline.
 */

%Start COMMENT QUOTE INDEF DEF1 PREPRO
%%
<COMMENT>\n		{
				if (!Outflag && !Xflag)
				{
					if (Add_line > 0)
						--Add_line;
					else
						fputs(yytext,stdout);
				}
				++Diag_line;
			}
<COMMENT>[^*\n]+	;
<COMMENT>\*\/		BEGIN Oldstate;
<COMMENT>\*		;

<QUOTE>\n		{
				MYPUTS("\n");
				++Diag_line;
				diagnostic("unclosed quote");
				BEGIN Oldstate;
			}
<QUOTE>[^\\"\n]+	MYPUTS(yytext);
<QUOTE>\\\\		MYPUTS(yytext);
<QUOTE>\\\"		MYPUTS(yytext);
<QUOTE>\"		{
				BEGIN Oldstate;
				MYPUTS(yytext);
			}
<QUOTE>\\\n		{
				++Diag_line;
				MYPUTS(yytext);
			}
<PREPRO>\n		{
				++Diag_line;
				if (Close_include)
				{
					MYPUTS("\n@>\n");
					Add_line += 2;
				}
				else
					MYPUTS("\n");
				Close_include = Outflag = Oldstate = 0;
				BEGIN 0;
				SQRET(CPPEND);
			}

<DEF1>[A-Za-z_][A-Za-z0-9_]*\(	{
					MYPUTS(yytext);
					yytext[yyleng-1] = '\0';
					BEGIN INDEF;
					if (!Squelch)
					{
						q_str(yytext);
						SQRET(AWORD);
					}
				}

<DEF1>[A-Za-z_][A-Za-z0-9_]*	{
					MYPUTS(yytext);
					BEGIN INDEF;
					if (!Squelch)
					{
						q_str(yytext);
						SQRET(NAWORD);
					}
				}

<DEF1>\\\n	MYPUTS(yytext);
<DEF1>[ \t]+	MYPUTS(yytext);
<DEF1>.		{
			diagnostic("bizarre #define - can't find symbol");
			MYPUTS(yytext);
			BEGIN 0;
			REJECT;
		}

<INDEF>\\\n		{
				++Diag_line;
				MYPUTS(yytext);
			}
<INDEF>\)		{
				MYPUTS(yytext);
				SQRET(MEND);
			}

<INDEF>\-\>[A-Za-z0-9_]*	MYPUTS(yytext);
<INDEF>\.[A-Za-z0-9_]*		MYPUTS(yytext);

<INDEF>\\\\		MYPUTS(yytext);
<INDEF>\n		{
				++Diag_line;
				MYPUTS("\n");
				Outflag = Oldstate = 0;
				BEGIN 0;
				SQRET(CPPEND);
			}

\/\*	BEGIN COMMENT;
\"	{
		MYPUTS("\"");
		BEGIN QUOTE;
	}

^\#[ \t]*define		{
				Oldstate = INDEF;
				BEGIN DEF1;
				Outflag = ! Xflag;
				MYPUTS(yytext);
				SQRET(DEFINE);
			}
^\#[ \t]*include	{
				if (Iflag)
				{
					Close_include = 1;
					Outflag = 1;
					MYPUTS("\n@<\n");
					Add_line += 2;
					MYPUTS(yytext);
				}
				else
					Close_include = 0;
				Oldstate = PREPRO;
				BEGIN PREPRO;
				SQRET(CPP);
			}

^\#[ \t]*ifdef[ \t]*[A-Za-z0-9_]+	{
						Outflag = ! Xflag;
						MYPUTS(yytext);
						if (Cflag || Aflag)
							do_ifd(yytext,0);
						Oldstate = PREPRO;
						BEGIN PREPRO;
						SQRET(CPP);
					}
^\#[ \t]*ifndef[ \t]*[A-Za-z0-9_]+	{
						Outflag = ! Xflag;
						MYPUTS(yytext);
						if (Cflag || Aflag)
							do_ifd(yytext,1);
						Oldstate = PREPRO;
						BEGIN PREPRO;
						SQRET(CPP);
					}
^\#[ \t]*if		{
				Outflag = ! Xflag;
				MYPUTS(yytext);

				if (Cflag || Aflag)
					do_if();
				Oldstate = PREPRO;
				BEGIN PREPRO;
				SQRET(CPP);
			}
^\#[ \t]*else		{
				Outflag = ! Xflag;
				MYPUTS(yytext);
				if (Cflag || Aflag)
					do_else();
				Oldstate = PREPRO;
				BEGIN PREPRO;
				SQRET(CPP);
			}
^\#[ \t]*endif		{
				Outflag = ! Xflag;
				MYPUTS(yytext);
				if (Cflag || Aflag)
					do_end();
				Oldstate = PREPRO;
				BEGIN PREPRO;
				SQRET(CPP);
			}

^\#	{
		Oldstate = PREPRO;
		Outflag = ! Xflag;
		BEGIN PREPRO;
		MYPUTS("#");
		SQRET(CPP);
	}

\;	{
		MYPUTS(yytext);
		SQRET(SEMICOLON);
	}
\{	{
		MYPUTS(yytext);
		SQRET(LBRACE);
	}
\}	{
		MYPUTS(yytext);
		SQRET(RBRACE);
	}
\[	{
		MYPUTS(yytext);
		SQRET(LSQ);
	}
\]	{
		MYPUTS(yytext);
		SQRET(RSQ);
	}
\,	{
		MYPUTS(yytext);
		SQRET(COMMA);
	}

[0-9]+\.[0-9]*[Ee][+-]*[0-9]	MYPUTS(yytext);
[0-9]+[Ee][+-]*[0-9]		MYPUTS(yytext);

[0-9]+\.[0-9]*		MYPUTS(yytext);
[0-9]+L			MYPUTS(yytext);
[0-9]+			MYPUTS(yytext);
0[Xx][a-fA-F0-9]*L	MYPUTS(yytext);
0[Xx][a-fA-F0-9]*	MYPUTS(yytext);

\'\\\'\'	MYPUTS(yytext);
\'.*\'		MYPUTS(yytext);

[A-Za-z_][A-Za-z0-9_]*	{
				int i;

				MYPUTS(yytext);
				if (!Squelch)
				{
					if ((i = keycheck(yytext)) == WORD)
						q_str(yytext);
					SQRET(i);
				}
			}

\n	{
		if (!Outflag && !Xflag)
		{
			if (Add_line > 0)
				--Add_line;
			else
				fputs(yytext,stdout);
		}
		else
			MYPUTS(yytext);
		++Diag_line;
	}
.	MYPUTS(yytext);
%%

static
tok_out(i)
int i;
{
	diagnostic("scanned token %d",i);
	return (i);
}

/*
** called on each new file, to reset the scanner
*/
init_lex(name)
char *name;
{
	Add_line = Close_include = Ifidx = Squelch = Outflag = Oldstate = 0;

	printf("@=\"%s\"\n",name);
	if (! Xflag)
		printf("%s#line 1 \"%s\"\n",Fextra,name);
	Diag_line = 1;
	Diag_file = name;
}

char *strtok();
char *htab_find();

/*
** do_ifd is destructive, so it must be called AFTER output of text
*/
static
do_ifd(s,rev)
char *s;
int rev;
{
	int idx;
	int shift;

	idx = Ifidx/32;
	shift = Ifidx % 32;

	if (Squelch)
		Ifstack[idx] |= 1L << shift;
	else
		Ifstack[idx] &= ~(1L << shift);

	++Ifidx;

	if (Squelch)
	{
		Estack[idx] &= ~(1L << shift);
		return;
	}

	strtok(s," \t#");
	s = strtok(NULL," \t");
	--s;			/* we KNOW the strtok's bumped the string */
	*s = '+';
	if (htab_find(Ftab,s) != NULL)
	{
		if (rev)
			Squelch = 1;
		else
			Squelch = 0;
		Estack[idx] |= 1L << shift;
	}
	else
	{
		*s = '-';
		if (htab_find(Ftab,s) != NULL || Aflag)
		{
			if (rev)
				Squelch = 0;
			else
				Squelch = 1;
			Estack[idx] |= 1L << shift;
		}
		else
			Estack[idx] &= ~(1L << shift);
	}
}

/*
** ignore sense of all #if statements.
*/
static
do_if()
{
	int idx;
	int shift;

	idx = Ifidx/32;
	shift = Ifidx % 32;

	if (Squelch)
		Ifstack[idx] |= 1L << shift;
	else
		Ifstack[idx] &= ~(1L << shift);

	++Ifidx;

	Estack[idx] &= ~(1L << shift);
}

static
do_end()
{
	int idx;
	int shift;

	if (Ifidx == 0)
	{
		diagnostic("unmatched #endif");
		Squelch = 0;
		return;
	}

	--Ifidx;
	idx = Ifidx/32;
	shift = Ifidx % 32;

	Squelch = (Ifstack[idx] >> shift) & 1;
}

static
do_else()
{
	int idx;
	int shift;

	if (Ifidx == 0)
	{
		diagnostic("unmatched #else");
		Squelch = 0;
		return;
	}

	idx = Ifidx - 1;
	shift = idx % 32;
	idx /= 32;

	if ((Estack[idx] >> shift) & 1)
		Squelch = ! Squelch;
}
SHAR_EOF
fi
echo shar: "extracting 'parse.y'" '(6842 characters)'
if test -f 'parse.y'
then
	echo shar: "will not over-write existing file 'parse.y'"
else
cat << \SHAR_EOF > 'parse.y'

/*
**
**	Copyright (c) 1988, Robert L. McQueer
**		All Rights Reserved
**
** Permission granted for use, modification and redistribution of this
** software provided that no use is made for commercial gain without the
** written consent of the author, that all copyright notices remain intact,
** and that all changes are clearly documented.  No warranty of any kind
** concerning any use which may be made of this software is offered or implied.
**
*/

%token TYPEDEF EXTERN STRUCT ENUM
%token DEFINE WORD CPP CPPEND MEND AWORD NAWORD
%token ADJ STCLASS NTYPE KEYWORD
%token LSQ RSQ LBRACE RBRACE SEMICOLON COMMA
%right ADJ NTYPE
%%
file	:
		{
			p_init();
		}
	| file blurb
	;

blurb	: TYPEDEF st tdef tlist SEMICOLON
	| EXTERN ext SEMICOLON
	| pre
	| WORD
		{
			if (Qflag)
				r_out(next_str());
			else
				next_str();
		}
	| COMMA
	| LBRACE
	| RBRACE
	| SEMICOLON
	| KEYWORD
	| STCLASS
	| native
	| s1def
	| e1def
	| arrdim
	| error
		{
			Sn_def = 0;
			Head = Tail = 0;
			p_init();
		}
	;

pre	: DEFINE NAWORD
		{
			d_enter(next_str());
		}
		macro CPPEND
		{
			do_refs();
		}
	| DEFINE AWORD
		{
			d_enter(next_str());
		} margs MEND
		macro CPPEND
		{
			do_refs();
		}
	| CPP pjunk CPPEND
	;

margs	: 
	| mlist
	;

mlist	: WORD
		{
			a_enter(next_str());
		}
	| mlist COMMA WORD
		{
			a_enter(next_str());
		}
	;

s1def	: snword LBRACE
		{
			d_out(next_str());
		}
		struct RBRACE
	| STRUCT LBRACE struct RBRACE
	| snword WORD
		{
			r_out(next_str());
			next_str();
		}
	;

snword	: STRUCT WORD
	;

e1def	: eword LBRACE
		{
			d_out(next_str());
		}
		elist RBRACE
	| ENUM LBRACE elist RBRACE
	| eword WORD
		{
			r_out(next_str());
			next_str();
		}
	;

eword	: ENUM WORD
	;

st	:
	| st STCLASS
	;

tdef	: s2def
	| e2def
	| WORD
		{
			r_out(next_str());
			Sn_def = 0;
		}
	| native
		{
			Sn_def = 0;
		}
	;

elist	: 
		{
			Ecount = 0;
		}
	| elist WORD
		{
			if (Ecount)
				r_out(next_str());
			else
				d_out(next_str());
			++Ecount;
		}
	| elist COMMA
		{
			Ecount = 0;
		}
	;

native	: NTYPE
	| ADJ
	| ADJ native
	;

s2def	: snword LBRACE
		{
			strcpy(Sname,next_str());
			d_out(Sname);
			Sn_def = 1;
		}
		struct RBRACE
	| STRUCT LBRACE struct RBRACE
		{
			Sn_def = 0;
		}
	| snword WORD
		{
			Sn_def = 0;
			r_out(next_str());
			d_out(next_str());
		}
	;

e2def	: eword LBRACE
		{
			strcpy(Sname,next_str());
			d_out(Sname);
			Sn_def = 1;
		}
		elist RBRACE
	| ENUM LBRACE elist RBRACE
		{
			Sn_def = 0;
		}
	| eword WORD
		{
			Sn_def = 0;
			r_out(next_str());
			d_out(next_str());
		}
	;

struct	:
	| struct pre
	| struct s1def items SEMICOLON
	| struct e1def items SEMICOLON
	| struct WORD
		{
			r_out(next_str());
		} items SEMICOLON
	| struct native items SEMICOLON;
	;

items	:
	| items COMMA
	| items WORD
		{
			if (Qflag)
				d_out(next_str());
			else
				next_str();
		}
	| items arrdim
	;

tlist	:
	| tlist COMMA
	| tlist WORD
		{
			char *ptr;

			ptr = next_str();
			if (! Sn_def || strcmp(ptr,Sname) != 0)
				d_out(ptr);
		} tlarr
	;

tlarr	:
	| tlarr arrdim
	;

ext	: WORD
		{
			r_out(next_str());
		}
		ejunk
	| STRUCT WORD 
		{
			r_out(next_str());
		}
		ejunk
	| ENUM WORD
		{
			r_out(next_str());
		}
		ejunk
	| native ejunk
	;

ejunk	: 
	| ejunk WORD
		{
			next_str();
		}
	| ejunk LBRACE ebal RBRACE
	| ejunk STRUCT
	| ejunk COMMA
	| ejunk arrdim
	;

ebal	:
	| ebal LBRACE ebal RBRACE
	| ebal WORD
		{
			next_str();
		}
	| ebal COMMA
	| ebal SEMICOLON
	| ebal STRUCT
	| ebal arrdim
	| ebal native
	;

macro	:
	| macro WORD
		{
			r_enter(next_str());
		}
	| macro COMMA
	| macro SEMICOLON
	| macro LBRACE
	| macro RBRACE
	| macro STRUCT
	| macro LSQ
	| macro KEYWORD
	| macro RSQ
	| macro MEND
	| macro native
	| macro STCLASS
	| macro TYPEDEF
	| macro EXTERN
	| macro ENUM
	;

pjunk	:
	| pjunk WORD
		{
			next_str();
		}
	| pjunk COMMA
	| pjunk SEMICOLON
	| pjunk LBRACE
	| pjunk RBRACE
	| pjunk STRUCT
	| pjunk DEFINE
	| pjunk TYPEDEF
	| pjunk ENUM
	| pjunk EXTERN
	| pjunk LSQ
	| pjunk RSQ
	| pjunk native
	| pjunk STCLASS
	| pjunk KEYWORD
	;

arrdim	: LSQ dstuff RSQ
	;

dstuff	:
	| dstuff COMMA
	| dstuff WORD
		{
			r_out(next_str());
		}
	| dstuff STRUCT
	| dstuff native
	| dstuff KEYWORD
	;
%%

#include <stdio.h>
#include "config.h"


/*
** yyparse can look ahead one token.  Must make SDEPTH
** sufficient for all tokens parsed before calling next_str(), taking
** this into account.  I think 2 would actually work, currently.
*/
#define SDEPTH 3

static char Sq[SDEPTH][BUFSIZ];
static int Tail = 0;
static int Head = 0;

static int Sn_def;
static int Ecount;

static char Sname[BUFSIZ];

static char *Dtab = NULL;
static char *Def;

extern char *Ftab;

extern int Qflag;
extern int Iflag;
extern int Verbosity;

extern int Add_line;	/* see scanner */

char *htab_init();
char *htab_find();

char *str_store();

/* called by yylex() */
q_str(s)
char *s;
{
	strcpy(Sq[Tail],s);
	Tail = (Tail+1)%SDEPTH;
	if (Verbosity > 4)
		diagnostic("PUSH: %s",s);
}

static char *
next_str()
{
	char *ptr;

	ptr = Sq[Head];
	Head = (Head+1)%SDEPTH;
	if (Verbosity > 4)
		diagnostic("NEXT: %s",ptr);
	return (ptr);
}

static
p_init()
{
	if (Dtab == NULL)
	{
		Dtab = htab_init(SMALL_TABLE,NULL,NULL,NULL);
		if (Verbosity > 3)
			diagnostic("tab INIT");
	}
	else
	{
		htab_clear(Dtab);
		if (Verbosity > 3)
			diagnostic("tab CLEAR");
		str_free();
	}

	if (Tail != Head)
		diagnostic("OOPS - parser/lex synch problem");
	Head = Tail;
}

static
d_enter(s)
char *s;
{
	if (Verbosity > 2)
		diagnostic("#def: %s",s);
	if (keycheck(s) != WORD)
		diagnostic("Redefining keywords is not a good idea");
	Def = str_store(s);
}

static
a_enter(s)
char *s;
{
	if (Verbosity > 2)
		diagnostic("#arg: %s",s);
	htab_enter(Dtab,str_store(s),"ARG");
	if (Verbosity > 3)
		diagnostic("tab enter '%s', 'ARG'",s);
}

static
r_enter(s)
char *s;
{
	if (Verbosity > 2)
		diagnostic("#ref: %s",s);
	if (htab_find(Dtab,s) == NULL)
	{
		if (Verbosity > 3)
			diagnostic("tab enter '%s', ''",s);
		htab_enter(Dtab,str_store(s),"");
	}
	else
	{
		if (Verbosity > 3)
			diagnostic("tab contains '%s'",s);
	} 
}

static
do_refs()
{
	int i;
	char *k,*d;

	if (Verbosity > 2)
		diagnostic("#end");
	d_out(Def);
	for (i=htab_list(Dtab,1,&d,&k); i != 0; i=htab_list(Dtab,0,&d,&k))
	{
		if (Verbosity > 3)
			diagnostic("tab list '%s', '%s'",k,d);
		if (*d == '\0')
			r_out(k);
	}
	p_init();
}

static
r_out(s)
char *s;
{
	if (Verbosity > 1)
		diagnostic("REF: %s",s);
	if (Ftab != NULL && htab_find(Ftab,s) != NULL)
	{
		if (Verbosity > 1)
			diagnostic("Forget %s",s);
		return;
	}
	printf("@?\"%s\"\n",s);
	++Add_line;
}


static
d_out(s)
char *s;
{
	if (Verbosity > 1)
		diagnostic("DEF: %s",s);
	if (Ftab != NULL && htab_find(Ftab,s) != NULL)
	{
		if (Verbosity > 1)
			diagnostic("Forget %s",s);
		return;
	}
	printf("@!\"%s\"\n",s);
	++Add_line;
}

yyerror(s)
char *s;
{
	diagnostic(s);
}
SHAR_EOF
fi
echo shar: "extracting 'config.h'" '(1276 characters)'
if test -f 'config.h'
then
	echo shar: "will not over-write existing file 'config.h'"
else
cat << \SHAR_EOF > 'config.h'
/*
** size of hash tables
**
** TABLE_SIZE refers to analysis program, and will contain entries
** for every symbol reference, definition, file, etc. - It can
** potentially contain a great many entries.
**
** SMALL_TABLE is for three tables used by the parser  - language keywords,
** -f/-c options, and the symbols referenced by a single macro respectively.
** None expected to require a huge number of entries.
**
** These sizes DO NOT affect how many entries may be made to the tables -
** they are a linked list arrangement allowed to be >100% density.
**
** actual table sizes will be adjusted upwards to a prime number.
*/
#define TABLE_SIZE 4000
#define SMALL_TABLE 60

/*
** hash table node allocation block for the analysis program.
*/
#define NODE_BLOCK 200

/*
** buffer size for strings containing lists of files, and lists
** of #define's and #undef's for -p options
*/
#define CATBUFFER 4800

/*
** couple things which ought to be defined in stdio.h and sys/params.h.
** In case they aren't
*/
#ifndef MAXPATHLEN
#define MAXPATHLEN 240
#endif

#ifndef BUFSIZ
#define BUFSIZ 1024
#endif

/*
** longest command line possible using your favorite shell(s)
** Used to allocate buffers for strings constructed from command line
** arguments
*/
#define MAXCMDLEN 4096
SHAR_EOF
fi
echo shar: "extracting 'node.h'" '(882 characters)'
if test -f 'node.h'
then
	echo shar: "will not over-write existing file 'node.h'"
else
cat << \SHAR_EOF > 'node.h'
#define REF 1
#define DEF 2
#define FNAME 3
#define DEP 4
#define DDEF 5
#define UDEF 6
#define CYCLE 7
#define FORGET 8

#define MAXTYPE 8	/* maximum of above node types */

typedef struct
{
	char *name;
	int type;
} KEY;

typedef struct _nd_s
{
	struct _nd_s *next;
	KEY key;
	union
	{
		struct _nd_s *file;	/* file for DEF, DDEF, REF, UREF */
		char *cycle;
		struct
		{
			struct _nd_s *dep;	/* FNAME dependency list */
			int refcount;		/* refcount for use in sort */
			int mark;		/* mark for DFS's */
		} fname;
		struct
		{
			struct _nd_s *dfile;	/* defining file */
			struct _nd_s *rfile;	/* referring file */
			struct _nd_s *next;	/* next list element */
			char *sym;		/* symbol causing dependency */
			int erase;		/* erase edge (cyclic) */
		} dep;
		struct
		{
			char *files;	/* up to MAXCATFILE */
			int count;	/* actual number of files */
		} ud;
	} d;
} NODE;
SHAR_EOF
fi
echo shar: "extracting 'stdio.h'" '(397 characters)'
if test -f 'stdio.h'
then
	echo shar: "will not over-write existing file 'stdio.h'"
else
cat << \SHAR_EOF > 'stdio.h'
/*
** Little trick to get token codes into lex.  lex.yy.c includes "stdio.h".
** in here we define everything needed that can't be indented ahead of the
** lex script %%, and pick up the "real" stdio.h via <stdio.h>
*/

#include <stdio.h>
#include "y.tab.h"
#include "config.h"

#define MYPUTS(S) if (Outflag) fputs(S,stdout)

#define SQRET(X) if (!Squelch) return(Verbosity > 4 ? tok_out(X) : X)
SHAR_EOF
fi
exit 0
#	End of shell archive

-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.