bobm@rtech.UUCP (Bob Mcqueer) (03/27/88)
Be sure to pick up a third article containing a utility library, too. Read man pages for details. For the discussion that came up in comp.sources.wanted, try: hat -sde <hdr files> -q <.c files> {amdahl, sun, mtxinu, hoptoad, cpsc6a}!rtech!bobm ---------------- #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create: # coat.1 # hat.1 # coat.tpl # parse.y # scan.l # Makefile # This archive created: Thu Mar 24 17:21:23 1988 export PATH; PATH=/bin:/usr/bin:$PATH echo shar: "extracting 'coat.1'" '(925 characters)' if test -f 'coat.1' then echo shar: "will not over-write existing file 'coat.1'" else cat << \SHAR_EOF > 'coat.1' .TH COAT LOCAL 3/1/87 .UC .SH NAME coat - "c" object analysis tool .SH SYNOPSIS .B coat [-s[desmu]] [-r[desmu]] [-v<num>] [-z] <files> .SH DESCRIPTION .I Coat produces a topologically sorted dependency list / symbol cross reference for a group of objects or libraries, assuming the convention that the "real" symbol name has an underscore prepended for the linker. All it actually does is massage the output from .I nm(1) to pass it into the analyzer program of .I hat, producing a similar listing. See the .I hat manual page for details. .SH OPTIONS The options shown are the same as for .I hat, except that the number on the -v option should not be negative. See the .I hat manual page for details. .sp The order in the absence of .I -z will be that referring files will be sorted ahead of defining files, ie. the order wanted for .I ld lists. .SH "SEE ALSO" .I hat(local), nm(1) .SH AUTHOR Robert L. McQueer, bobm@rtech. SHAR_EOF fi echo shar: "extracting 'hat.1'" '(14793 characters)' if test -f 'hat.1' then echo shar: "will not over-write existing file 'hat.1'" else cat << \SHAR_EOF > 'hat.1' .TH HAT LOCAL 3/1/87 .UC .SH NAME hat - header analysis tool .SH SYNOPSIS .B hat [-s[desmu]] [-r[desmu]] [-z] [-q] [-i] [-x] [-v<n>] [-f<sym>] [-c[-.]<sym>] [-a] [-p[-]<sym>] [<cppopt>] <files> .SH DESCRIPTION .I Hat is a tool for analyzing the #define, typedef statements and structure / union / enum definitions and references in header files and determining their dependencies. It produces five sections of information: .sp First, a list of the files together with the files they depend directly upon, in topological sort order. Each dependency also includes the first symbol that caused the dependency. See next paragraph concerning sort order and cyclical references. .sp Second, an expanded dependency list. For each file, this shows the expanded list that results from descending the dependency tree. If there are cyclical references, this section lists the cycles. Cycles will have been broken arbitrarily in determining the topological sort order. One cycle will be shown for each time a dependency is "ignored" to allow the topological sort to proceed. .sp Third, a symbol cross-reference listing of defines and references. .sp Fourth, a listing of multiply defined symbols. .sp Fifth, a listing of undefined symbols. .sp .I Hat handles preprocessor conditional compilation constructs by the simple expedient of invoking .I /lib/cpp explicitly. What it does is go through the files, throwing out everything except preprocessor syntax, and adding special lines indicating references and definitions. The result is piped through .I /lib/cpp before being analyzed so that #ifdef's affecting what will be defined or referenced on will be properly treated by having .I /lib/cpp remove whatever of the special lines have been conditionally compiled out. .sp Many of the options are aimed at letting .I hat control the #ifdef's, if desired, and will probably be unused in most cases. For "reasonably" ifdef'ed files (refraining from #ifdef'ing alternate versions of partial syntax), you should be able to simply let .I /lib/cpp do the work, as intended. .sp Normally, #include lines are not passed through, although this can be overridden. It should be overridden only if the #include's in the header files affect how conditional compilations work (a questionable arrangement, in the author's opinion). At any rate, the expanded text resulting from the #include is skipped during the analysis. Note that you could consider .I hat a tool for telling you what nested #include's are neccesary in the first place, should you be in the camp that supports using them. .SH OPTIONS All options with attached strings require that the string be part of the argument, eg. "-v4", and not "-v 4". The reason for this is that .I hat passes all unrecognized option arguments on to .I /lib/cpp, and it obviously couldn't know whether a given argument should include a following string or not without building in .I /lib/cpp's argument syntax. Instead, we insist on joining the argument to it's option consistently, and allowing any special options the local .I /lib/cpp has to be used as long as they don't conflict with .I hat options. .I Hat stays away from upper case options. .sp The .I -s and .I -r options allow printing of only certain sections of the output. The attached desmu characters indicate the dependency list, expanded dependency list, symbol cross reference, multiple definition and undefined sections respectively. The .I -s options specifies what sections to print, while the .I -r specifies printing of all sections except those given. If multiple specifications are given, only the last is effective, and these options will have the same effect wherever placed in the argument list. .sp The .I -z option reverses the sense of the topological sort. Normally, the order presented is defining file before referring file, which is the order you would want for #include lines. Using .I -z causes the order to be referring file followed by defining file. This option is mainly for use of the analyzer with alternate input - for instance, .I coat uses this option to present libraries in link order. .sp The .I -q option specifies that symbols in unrecognized syntax within the files are to be treated as references. .I Hat normally only recognizes #define's, typedef's, externs (which are ignored except for what appears to be the type declaration), array dimension expressions, and struct / union / enum definitions. Everything else is normally ignored since the syntax isn't understood. Using this option will cause every symbol not a keyword or part of understood syntax to be treated as a reference, and for instance may be used to generate references from normal .c code, at the expense of generating many undefined symbols. .sp The .I -i option specifies that #include lines are to be passed on to .I /lib/cpp as discussed above. If none of the #include's affect conditional compilation, the only effect of this option is to make .I /lib/cpp do more work and pass more lines of output to the analysis routines. .sp The .I -c option causes .I hat to go ahead and treat specific #if[n]def's. Normally, stuff on both sides of an ifdef is parsed, allowing .I /lib/cpp to resolve the results. If you specify -c<symbol>, .I hat will act as if that symbol was defined for #if[n]def's. -c-<symbol> makes it specifically undefined, taking the other leg of conditional constructs. -c.<symbol> causes the normal interpretation, ie. both sides of the conditional expression will be parsed. This option is mostly intended to allow you to resolve cases where the normal parsing causes syntax errors, eg: .sp .in +5 .nf #ifdef YUCK struct onething { #else struct another { #endif .fi .in -5 .sp The .I -a option makes .I hat treat ALL #if[n]defs, STRICTLY on the basis of command line flags. #if sections will be still unconditionally parsed, as well as their #else clauses. In this case, -c.<symbol> specifications will be ineffective. .sp It is generally preferable to allow .I /lib/cpp to resolve things. In a vast majority of cases, syntax errors from constructs such as the one given above cause no problems, or a missed symbol or two at most. .sp The .I -f option (forget) causes a symbol to be ignored for analytical purposes. Neither definitions of, or references to, this symbol will show up. .sp The .I -p option causes #define lines for the symbol to be inserted into the input for .I /lib/cpp before each file. -p-<symbol> causes #undef lines to be inserted. .sp The .I -i, -f, -c, -a, -p and .I -q options may be intermingled with the files to control these features on a file-by-file basis. They actually act as toggles, the second invocation turning the feature off again, the third turning it on again, and so on. In the case of .I -f, the second invocation allows the symbol to be considered again. Repeated uses of .I -p, -c override the previous disposition of the symbol. .sp The .I -x option suppresses use of .I /lib/cpp. This is appropriate if none of the header files contain any conditional compilation constructs, allowing one less process to be spawned. It can also be used to remedy problems arising from too much input (most likely too many #define's) for .I /lib/cpp to handle. The parser output will simply be fed directly into the analyzer. If this is used, and the files DO contain conditional sections, the result will be that all sections will be analyzed, however #ifdef'ed, unless explicitly supressed with the other options. This may be useful as some kind of "worst-case" dependency independent of conditional compilation. This option will have the same effect anywhere in the argument list, and makes any use of .I -i, -p options and all unrecognized options irrelevent. .sp The .I -v option specifies a numeric level for tracing. Positive numbers indicate trace levels (1-5) for the parsing of the files. Negative numbers indicate levels for the analysis (-1 through -4). .I -v is equivalent to .I -v1. Nobody but somebody debugging the program will likely be interested in trace levels with absolute value > 1. The default is level 0 - no tracing for either parsing or analysis. .sp As mentioned before, all unrecognized options are passed on to .I /lib/cpp. The most common one used will probably be -D options to drive definitions for conditional compilation. .SH "ANALYZER SYNTAX" The analyzer part of the program is actually a separate entity that can be used to process files, references, and definitions from any source, not just the .I hat parser program. If you run "hat -v1 -x ..." it will print a pipeline it is executing, which consists of the parser being fed into the analyzer program. The names and locations of these programs are configurable locally, so you will either have to do this or find the local installation to figure out where the analyzer is, and what it is named. .sp The .I coat command produces this sort of analysis of references in objects or libraries by using the output of .I nm massaged appropriately (via .I sed), and fed into the analyzer. It is actually just a short shell script, and will probably provide a good example. .sp The analyzer program reads standard input, and simply ignores lines not beginning with "@". The syntax is very simple: .sp @=<filename> - to specify a new file. .sp @!<symbol> - current file defines a symbol. .sp @?<symbol> - current file references a symbol. .sp @<, and @> may be used to bracket stuff which should be ignored. Inside a @<, the only significant lines will be @< (an error), and @> (close section). .sp The <filename> or <symbol> may optionally have white space and quotes around them. Note that the quotes are only treated as whitespace characters - there is no mechanism to include whitespace characters in the symbol or filename. Use of the quotes is simply a mechanism to prevent expansion by .I /lib/cpp, which is also the reason for use of non-alphanumerics in the rest of the syntax. .sp For instance, if you input a @= and @! for each named node, followed by a @? for each arc starting at the given node, the analyzer could be used to generate a topological sort of a general directed graph. It is the author's belief that the cycles generated are a fundamental set, also, although he won't swear to it without further analysis. .SH DIAGNOSTICS The only file, line number oriented diagnostics come from the parser, and are self-explanatory, except for "syntax error". If the latter happens, parsing resumes at any point that the parser can make sense out of the file, and may not affect much of anything. .sp Errors in the analyzer input, which should not be creatable from the parser, cause fatal error messages. .SH BUGS The parser is in no way, shape, or form intended to check proper C syntax. Since it is only looking for certain constructs, it accepts anything else as irrelevent stuff and ignores it. Even within the constructs it is looking for, it is only interested in certain expected pieces of the syntax, and will actually accept all sorts of meaningless trash ( "register static auto long unsigned short short double" gets taken to be a reasonable type declaration for instance - as far as this analysis is concerned, that is no different than saying "int". Or typedef +++% int += bar; is taken as a perfectly rational typedef, since characters not needed to distinguish what the tool is looking for are treated as simple white space). .sp On a more objectionable level, the syntax which it trys to recognize is a mixture of preprocessor syntax and C language proper. Interactions between the two which result in perfectly compilable C, may make .I hat see syntax errors, or fool it into a wrong interpretation of a symbol as a definition or a reference. The author got the grammar in shape by testing it on /usr/include, /usr/include/sys, and some large local header directories until it only got a few syntax errors. Specifically, it gets none on the local /usr/include, 1 on the local /usr/include/sys, which could have been remedied by an appropriate use of -c. The causes in local files were questionable constructs: .in +5 .sp Placement of the datatype portion of a typedef in a header file to be included in front of the actual names being defined in the source file. .sp Use of a macro to provide a portion of the syntax for a typedef. .in -5 .sp Making it really bulletproof would involve essentially doing all the work of .I /lib/cpp while simultaneously realizing that macro expansions are really references to the macro name, and so on. Then the analysis could come out different with different file orders - exactly what the tool is trying to figure out in the first place. As it stands, it generally does pretty well, occasionally getting a syntax error or misinterpreting something. The stuff that one "normally" places in header files works pretty well. .sp For a large number of files (or a number of large files), you may be forced to use the -x option because you will blow up .I /lib/cpp. Suspect this if you get some message about "too many define's", or some such thing. The author found that the Pyramid version quit with an error at around 3000 #define's. .sp Line numbers on diagnostics coming from .I /lib/cpp may not match up to the original file because of accumulated "@" lines exceeding the number of newlines which were in the original file at that point (the program tries to "soak up" the additional lines every time it comes to some suppressable newlines). The .I /lib/cpp on some machines may not pay any attention to #line directives for generating its error messages, still giving you a line number in reference to stdin, and simply passing on line number information to its output. In this case the line numbers for .I /lib/cpp messages will be entirely bogus. The author tested this on two systems, Pyramid OSx and Sun 3/60. The Sun .I /lib/cpp generates messages based on the #line directives, and maintains an approximate correctness, while the Pyramid version simply gives you messages based from the start of input, no matter what #line directives you inserted. Redefinitions may be ignored, anyway, since the analysis will tell you about them in gory detail. .sp The analyzer is a dynamic memory pig. It builds a symbol table containing all definitions and references for every #define, structure name and typedef across the entire set of files. On a system with limited memory, the program will probably halt with an "out of memory" message on some number of input files which seems perfectly reasonable to handle. .sp If there is a conceivable way that lines beginning with "@" other than those inserted by the parser could reach the analyzer, it will cause problems. .SH "SEE ALSO" .I coat(local) .SH AUTHOR Robert L. McQueer, bobm@rtech. SHAR_EOF fi echo shar: "extracting 'coat.tpl'" '(321 characters)' if test -f 'coat.tpl' then echo shar: "will not over-write existing file 'coat.tpl'" else cat << \SHAR_EOF > 'coat.tpl' #!/bin/sh TMP=/tmp/coat.$$ NMOUT=/tmp/coat.nm.$$ OPT= for x in $* do case $x in -*) OPT="$OPT $x" ;; *) echo "@=$x" >>$TMP echo $x >&2 nm -g $x >$NMOUT grep " [TBDC] " $NMOUT | sed -e "s/^.* _/@!/" >>$TMP grep " U " $NMOUT | sed -e "s/^.* _/@?/" >>$TMP esac done cat $TMP | ANALYZER -z $OPT rm $TMP $NMOUT SHAR_EOF chmod +x 'coat.tpl' fi echo shar: "extracting 'parse.y'" '(6842 characters)' if test -f 'parse.y' then echo shar: "will not over-write existing file 'parse.y'" else cat << \SHAR_EOF > 'parse.y' /* ** ** Copyright (c) 1988, Robert L. McQueer ** All Rights Reserved ** ** Permission granted for use, modification and redistribution of this ** software provided that no use is made for commercial gain without the ** written consent of the author, that all copyright notices remain intact, ** and that all changes are clearly documented. No warranty of any kind ** concerning any use which may be made of this software is offered or implied. ** */ %token TYPEDEF EXTERN STRUCT ENUM %token DEFINE WORD CPP CPPEND MEND AWORD NAWORD %token ADJ STCLASS NTYPE KEYWORD %token LSQ RSQ LBRACE RBRACE SEMICOLON COMMA %right ADJ NTYPE %% file : { p_init(); } | file blurb ; blurb : TYPEDEF st tdef tlist SEMICOLON | EXTERN ext SEMICOLON | pre | WORD { if (Qflag) r_out(next_str()); else next_str(); } | COMMA | LBRACE | RBRACE | SEMICOLON | KEYWORD | STCLASS | native | s1def | e1def | arrdim | error { Sn_def = 0; Head = Tail = 0; p_init(); } ; pre : DEFINE NAWORD { d_enter(next_str()); } macro CPPEND { do_refs(); } | DEFINE AWORD { d_enter(next_str()); } margs MEND macro CPPEND { do_refs(); } | CPP pjunk CPPEND ; margs : | mlist ; mlist : WORD { a_enter(next_str()); } | mlist COMMA WORD { a_enter(next_str()); } ; s1def : snword LBRACE { d_out(next_str()); } struct RBRACE | STRUCT LBRACE struct RBRACE | snword WORD { r_out(next_str()); next_str(); } ; snword : STRUCT WORD ; e1def : eword LBRACE { d_out(next_str()); } elist RBRACE | ENUM LBRACE elist RBRACE | eword WORD { r_out(next_str()); next_str(); } ; eword : ENUM WORD ; st : | st STCLASS ; tdef : s2def | e2def | WORD { r_out(next_str()); Sn_def = 0; } | native { Sn_def = 0; } ; elist : { Ecount = 0; } | elist WORD { if (Ecount) r_out(next_str()); else d_out(next_str()); ++Ecount; } | elist COMMA { Ecount = 0; } ; native : NTYPE | ADJ | ADJ native ; s2def : snword LBRACE { strcpy(Sname,next_str()); d_out(Sname); Sn_def = 1; } struct RBRACE | STRUCT LBRACE struct RBRACE { Sn_def = 0; } | snword WORD { Sn_def = 0; r_out(next_str()); d_out(next_str()); } ; e2def : eword LBRACE { strcpy(Sname,next_str()); d_out(Sname); Sn_def = 1; } elist RBRACE | ENUM LBRACE elist RBRACE { Sn_def = 0; } | eword WORD { Sn_def = 0; r_out(next_str()); d_out(next_str()); } ; struct : | struct pre | struct s1def items SEMICOLON | struct e1def items SEMICOLON | struct WORD { r_out(next_str()); } items SEMICOLON | struct native items SEMICOLON; ; items : | items COMMA | items WORD { if (Qflag) d_out(next_str()); else next_str(); } | items arrdim ; tlist : | tlist COMMA | tlist WORD { char *ptr; ptr = next_str(); if (! Sn_def || strcmp(ptr,Sname) != 0) d_out(ptr); } tlarr ; tlarr : | tlarr arrdim ; ext : WORD { r_out(next_str()); } ejunk | STRUCT WORD { r_out(next_str()); } ejunk | ENUM WORD { r_out(next_str()); } ejunk | native ejunk ; ejunk : | ejunk WORD { next_str(); } | ejunk LBRACE ebal RBRACE | ejunk STRUCT | ejunk COMMA | ejunk arrdim ; ebal : | ebal LBRACE ebal RBRACE | ebal WORD { next_str(); } | ebal COMMA | ebal SEMICOLON | ebal STRUCT | ebal arrdim | ebal native ; macro : | macro WORD { r_enter(next_str()); } | macro COMMA | macro SEMICOLON | macro LBRACE | macro RBRACE | macro STRUCT | macro LSQ | macro KEYWORD | macro RSQ | macro MEND | macro native | macro STCLASS | macro TYPEDEF | macro EXTERN | macro ENUM ; pjunk : | pjunk WORD { next_str(); } | pjunk COMMA | pjunk SEMICOLON | pjunk LBRACE | pjunk RBRACE | pjunk STRUCT | pjunk DEFINE | pjunk TYPEDEF | pjunk ENUM | pjunk EXTERN | pjunk LSQ | pjunk RSQ | pjunk native | pjunk STCLASS | pjunk KEYWORD ; arrdim : LSQ dstuff RSQ ; dstuff : | dstuff COMMA | dstuff WORD { r_out(next_str()); } | dstuff STRUCT | dstuff native | dstuff KEYWORD ; %% #include <stdio.h> #include "config.h" /* ** yyparse can look ahead one token. Must make SDEPTH ** sufficient for all tokens parsed before calling next_str(), taking ** this into account. I think 2 would actually work, currently. */ #define SDEPTH 3 static char Sq[SDEPTH][BUFSIZ]; static int Tail = 0; static int Head = 0; static int Sn_def; static int Ecount; static char Sname[BUFSIZ]; static char *Dtab = NULL; static char *Def; extern char *Ftab; extern int Qflag; extern int Iflag; extern int Verbosity; extern int Add_line; /* see scanner */ char *htab_init(); char *htab_find(); char *str_store(); /* called by yylex() */ q_str(s) char *s; { strcpy(Sq[Tail],s); Tail = (Tail+1)%SDEPTH; if (Verbosity > 4) diagnostic("PUSH: %s",s); } static char * next_str() { char *ptr; ptr = Sq[Head]; Head = (Head+1)%SDEPTH; if (Verbosity > 4) diagnostic("NEXT: %s",ptr); return (ptr); } static p_init() { if (Dtab == NULL) { Dtab = htab_init(SMALL_TABLE,NULL,NULL,NULL); if (Verbosity > 3) diagnostic("tab INIT"); } else { htab_clear(Dtab); if (Verbosity > 3) diagnostic("tab CLEAR"); str_free(); } if (Tail != Head) diagnostic("OOPS - parser/lex synch problem"); Head = Tail; } static d_enter(s) char *s; { if (Verbosity > 2) diagnostic("#def: %s",s); if (keycheck(s) != WORD) diagnostic("Redefining keywords is not a good idea"); Def = str_store(s); } static a_enter(s) char *s; { if (Verbosity > 2) diagnostic("#arg: %s",s); htab_enter(Dtab,str_store(s),"ARG"); if (Verbosity > 3) diagnostic("tab enter '%s', 'ARG'",s); } static r_enter(s) char *s; { if (Verbosity > 2) diagnostic("#ref: %s",s); if (htab_find(Dtab,s) == NULL) { if (Verbosity > 3) diagnostic("tab enter '%s', ''",s); htab_enter(Dtab,str_store(s),""); } else { if (Verbosity > 3) diagnostic("tab contains '%s'",s); } } static do_refs() { int i; char *k,*d; if (Verbosity > 2) diagnostic("#end"); d_out(Def); for (i=htab_list(Dtab,1,&d,&k); i != 0; i=htab_list(Dtab,0,&d,&k)) { if (Verbosity > 3) diagnostic("tab list '%s', '%s'",k,d); if (*d == '\0') r_out(k); } p_init(); } static r_out(s) char *s; { if (Verbosity > 1) diagnostic("REF: %s",s); if (Ftab != NULL && htab_find(Ftab,s) != NULL) { if (Verbosity > 1) diagnostic("Forget %s",s); return; } printf("@?\"%s\"\n",s); ++Add_line; } static d_out(s) char *s; { if (Verbosity > 1) diagnostic("DEF: %s",s); if (Ftab != NULL && htab_find(Ftab,s) != NULL) { if (Verbosity > 1) diagnostic("Forget %s",s); return; } printf("@!\"%s\"\n",s); ++Add_line; } yyerror(s) char *s; { diagnostic(s); } SHAR_EOF fi echo shar: "extracting 'scan.l'" '(9833 characters)' if test -f 'scan.l' then echo shar: "will not over-write existing file 'scan.l'" else cat << \SHAR_EOF > 'scan.l' /* ** ** Copyright (c) 1988, Robert L. McQueer ** All Rights Reserved ** ** Permission granted for use, modification and redistribution of this ** software provided that no use is made for commercial gain without the ** written consent of the author, that all copyright notices remain intact, ** and that all changes are clearly documented. No warranty of any kind ** concerning any use which may be made of this software is offered or implied. ** */ extern int Diag_line; extern char *Diag_file; extern int Iflag; extern int Xflag; extern int Aflag; extern int Cflag; extern int Verbosity; extern char *Ftab; extern char Fextra[]; int Add_line; /* referenced by yyparse() */ static int Outflag; static int Oldstate; static int Close_include; static int Squelch; /* ** ifdef stack. Cheap to specify a large number of nesting levels, ** so we don't bother making this configurable. Number allowed ** is 32 times the length of the array (it's a bit map) ** Estack length matches Ifstack, and simply says whether the ** else goes with an if we are processing or not. */ static unsigned long Ifstack[50]; /* yep, 1600 nested ifdefs! */ static unsigned long Estack[50]; static int Ifidx; /* ** a local stdio.h is used to pull in yytab.h and define ** MYPUTS(), which outputs a string conditionally on the ** setting of Outflag. Also defines SQRET as a conditional ** return based on Squelch setting. */ /* ** NOTES on interaction with yyparse: ** ** returning WORD, NAWORD or AWORD indicates a string constant ** has been queued up using q_str(). It is up to the parser ** to dequeue the returned strings without overflowing the queue. ** ** All /lib/cpp syntax is delimited for yyparse with the token ** CPPEND once we reach the end of the # line, or possible ** continuations for #define constructs. Separate tokens ** DEFINE and CPP distinguish #define lines from other cpp ** lines. States <PREPRO>, <DEF1> and <INDEF> are used for this. ** ** Squelch controls #ifdef treatment. If set, we simply continue ** scanning rather than passing tokens back to yyparse. Thus ** we turn off #ifdef'ed out sections by simply not allowing the ** parser to see them. Squelch is changed in such a manner as ** to send back or suppress things only on CPP boundaries. An ** entire CPP -> CPPEND statement is suppressed when we suppress, ** together with everything until the CPP -> CPPEND sequence ** turning it back on again, which is sent back in its entirety. ** ** MYPUTS controls output of /lib/cpp constructs. The scanner ** outputs all the /lib/cpp lines, unless the Xflag is on. It ** also outputs "@<" , "@>" lines around includes if Iflag ** is on, and outputs the "@=" line, a "#line" directive, and ** any optional stuff specified by the command line at the ** beginning of each file. The parser outputs "@!", "@?" lines ** only. ** ** Much stuff is never seen by the parser. Many characters such ** as +, -, (, ), = which are not important to the constructs the ** parser is looking for are treated as simple white space. Numeric, ** single-quote and double-quote constants are also ignored, being ** treated much as commentary. <COMMENT> and <QUOTE> states apply ** to this. Note that they have to resume an old state, rather ** than an unconditional 0 to handle comment and quote constants ** within cpp syntax. ** ** AWORD, NAWORD tokens apply only to the word following #define. ** AWORD indicates and argument list, NAWORD none. ')' is returned ** as a token (MEND) only inside #define's, allowing the parser ** to pick up argument lists. ** ** obvious item reference, .<something> or -><something> are ** also thrown out inside #defines, so we don't see them as ** symbol references. ** ** Add_line is a mechanism to attempt to make line numbers ** match up for /lib/cpp. Every time we generate a spare \n ** for a @ line, we bump it. When an "optional" newline comes along, ** we decrement it if > 0, or output a newline. */ %Start COMMENT QUOTE INDEF DEF1 PREPRO %% <COMMENT>\n { if (!Outflag && !Xflag) { if (Add_line > 0) --Add_line; else fputs(yytext,stdout); } ++Diag_line; } <COMMENT>[^*\n]+ ; <COMMENT>\*\/ BEGIN Oldstate; <COMMENT>\* ; <QUOTE>\n { MYPUTS("\n"); ++Diag_line; diagnostic("unclosed quote"); BEGIN Oldstate; } <QUOTE>[^\\"\n]+ MYPUTS(yytext); <QUOTE>\\\\ MYPUTS(yytext); <QUOTE>\\\" MYPUTS(yytext); <QUOTE>\" { BEGIN Oldstate; MYPUTS(yytext); } <QUOTE>\\\n { ++Diag_line; MYPUTS(yytext); } <PREPRO>\n { ++Diag_line; if (Close_include) { MYPUTS("\n@>\n"); Add_line += 2; } else MYPUTS("\n"); Close_include = Outflag = Oldstate = 0; BEGIN 0; SQRET(CPPEND); } <DEF1>[A-Za-z_][A-Za-z0-9_]*\( { MYPUTS(yytext); yytext[yyleng-1] = '\0'; BEGIN INDEF; if (!Squelch) { q_str(yytext); SQRET(AWORD); } } <DEF1>[A-Za-z_][A-Za-z0-9_]* { MYPUTS(yytext); BEGIN INDEF; if (!Squelch) { q_str(yytext); SQRET(NAWORD); } } <DEF1>\\\n MYPUTS(yytext); <DEF1>[ \t]+ MYPUTS(yytext); <DEF1>. { diagnostic("bizarre #define - can't find symbol"); MYPUTS(yytext); BEGIN 0; REJECT; } <INDEF>\\\n { ++Diag_line; MYPUTS(yytext); } <INDEF>\) { MYPUTS(yytext); SQRET(MEND); } <INDEF>\-\>[A-Za-z0-9_]* MYPUTS(yytext); <INDEF>\.[A-Za-z0-9_]* MYPUTS(yytext); <INDEF>\\\\ MYPUTS(yytext); <INDEF>\n { ++Diag_line; MYPUTS("\n"); Outflag = Oldstate = 0; BEGIN 0; SQRET(CPPEND); } \/\* BEGIN COMMENT; \" { MYPUTS("\""); BEGIN QUOTE; } ^\#[ \t]*define { Oldstate = INDEF; BEGIN DEF1; Outflag = ! Xflag; MYPUTS(yytext); SQRET(DEFINE); } ^\#[ \t]*include { if (Iflag) { Close_include = 1; Outflag = 1; MYPUTS("\n@<\n"); Add_line += 2; MYPUTS(yytext); } else Close_include = 0; Oldstate = PREPRO; BEGIN PREPRO; SQRET(CPP); } ^\#[ \t]*ifdef[ \t]*[A-Za-z0-9_]+ { Outflag = ! Xflag; MYPUTS(yytext); if (Cflag || Aflag) do_ifd(yytext,0); Oldstate = PREPRO; BEGIN PREPRO; SQRET(CPP); } ^\#[ \t]*ifndef[ \t]*[A-Za-z0-9_]+ { Outflag = ! Xflag; MYPUTS(yytext); if (Cflag || Aflag) do_ifd(yytext,1); Oldstate = PREPRO; BEGIN PREPRO; SQRET(CPP); } ^\#[ \t]*if { Outflag = ! Xflag; MYPUTS(yytext); if (Cflag || Aflag) do_if(); Oldstate = PREPRO; BEGIN PREPRO; SQRET(CPP); } ^\#[ \t]*else { Outflag = ! Xflag; MYPUTS(yytext); if (Cflag || Aflag) do_else(); Oldstate = PREPRO; BEGIN PREPRO; SQRET(CPP); } ^\#[ \t]*endif { Outflag = ! Xflag; MYPUTS(yytext); if (Cflag || Aflag) do_end(); Oldstate = PREPRO; BEGIN PREPRO; SQRET(CPP); } ^\# { Oldstate = PREPRO; Outflag = ! Xflag; BEGIN PREPRO; MYPUTS("#"); SQRET(CPP); } \; { MYPUTS(yytext); SQRET(SEMICOLON); } \{ { MYPUTS(yytext); SQRET(LBRACE); } \} { MYPUTS(yytext); SQRET(RBRACE); } \[ { MYPUTS(yytext); SQRET(LSQ); } \] { MYPUTS(yytext); SQRET(RSQ); } \, { MYPUTS(yytext); SQRET(COMMA); } [0-9]+\.[0-9]*[Ee][+-]*[0-9] MYPUTS(yytext); [0-9]+[Ee][+-]*[0-9] MYPUTS(yytext); [0-9]+\.[0-9]* MYPUTS(yytext); [0-9]+L MYPUTS(yytext); [0-9]+ MYPUTS(yytext); 0[Xx][a-fA-F0-9]*L MYPUTS(yytext); 0[Xx][a-fA-F0-9]* MYPUTS(yytext); \'\\\'\' MYPUTS(yytext); \'.*\' MYPUTS(yytext); [A-Za-z_][A-Za-z0-9_]* { int i; MYPUTS(yytext); if (!Squelch) { if ((i = keycheck(yytext)) == WORD) q_str(yytext); SQRET(i); } } \n { if (!Outflag && !Xflag) { if (Add_line > 0) --Add_line; else fputs(yytext,stdout); } else MYPUTS(yytext); ++Diag_line; } . MYPUTS(yytext); %% tok_out(i) int i; { diagnostic("scanned token %d",i); return (i); } /* ** called on each new file, to reset the scanner */ init_lex(name) char *name; { Add_line = Close_include = Ifidx = Squelch = Outflag = Oldstate = 0; printf("@=\"%s\"\n",name); if (! Xflag) printf("%s#line 1 \"%s\"\n",Fextra,name); Diag_line = 1; Diag_file = name; } char *strtok(); char *htab_find(); /* ** do_ifd is destructive, so it must be called AFTER output of text */ do_ifd(s,rev) char *s; int rev; { int idx; int shift; idx = Ifidx/32; shift = Ifidx % 32; if (Squelch) Ifstack[idx] |= 1L << shift; else Ifstack[idx] &= ~(1L << shift); ++Ifidx; if (Squelch) { Estack[idx] &= ~(1L << shift); return; } strtok(s," \t#"); s = strtok(NULL," \t"); --s; /* we KNOW the strtok's bumped the string */ *s = '+'; if (htab_find(Ftab,s) != NULL) { if (rev) Squelch = 1; else Squelch = 0; Estack[idx] |= 1L << shift; } else { *s = '-'; if (htab_find(Ftab,s) != NULL || Aflag) { if (rev) Squelch = 0; else Squelch = 1; Estack[idx] |= 1L << shift; } else Estack[idx] &= ~(1L << shift); } } /* ** ignore sense of all #if statements. */ do_if() { int idx; int shift; idx = Ifidx/32; shift = Ifidx % 32; if (Squelch) Ifstack[idx] |= 1L << shift; else Ifstack[idx] &= ~(1L << shift); ++Ifidx; Estack[idx] &= ~(1L << shift); } do_end() { int idx; int shift; if (Ifidx == 0) { diagnostic("unmatched #endif"); Squelch = 0; return; } --Ifidx; idx = Ifidx/32; shift = Ifidx % 32; Squelch = (Ifstack[idx] >> shift) & 1; } do_else() { int idx; int shift; if (Ifidx == 0) { diagnostic("unmatched #else"); Squelch = 0; return; } idx = Ifidx - 1; shift = idx % 32; idx /= 32; if ((Estack[idx] >> shift) & 1) Squelch = ! Squelch; } SHAR_EOF fi echo shar: "extracting 'Makefile'" '(2167 characters)' if test -f 'Makefile' then echo shar: "will not over-write existing file 'Makefile'" else cat << \SHAR_EOF > 'Makefile' # # libraries. You will want the bobm.a utility library, wherever # you decided to put it, and the lex library. # LIBS = $(HOME)/lib/bobm.a -ll # # -d is needed to generate y.tab.h for the scanner, and for keycheck.c # YFLAGS = -d # # don't know what all you'll have to do for SYSV, other than # -Dindex=strchr -Drindex=strrchr # # if you want to dink with the keywords recognized, maybe add special # ones for your c compiler, see keycheck.c # CFLAGS = -O # # LOCAL CONFIGURATION # # These definitions also drive the making of a header file. HATDIR is the # directory you want the analyzer and parser placed in, BINDIR is the # directory you want the command programs to be placed in. PARSER and # ANALYZER are the names you want to give those respective executables. # CPPCMD is the c-preprocessor, and SHELLCMD a shell which will be # execl'ed to execute the other PARSER | [CPPCMD] | ANALYZER pipe. # MANDIR is where to put the manual pages. # # Some of the definitions will be placed in header file localnames.h, and # moved to lastnames.h after compiling hat.c # MANDIR = . HATDIR = $(HOME)/bin BINDIR = $(HOME)/bin PARSER = hat_p ANALYZER = hat_a CPPCMD = "/lib/cpp" SHELLCMD = "/bin/sh" # # object lists for the three executables - no remarks from the peanut gallery # concerning the analyzer abbreviation :-). # ANALOBJ = amain.o anread.o table.o analyze.o topsort.o listsort.o PARSOBJ = parse.o scan.o pmain.o keycheck.o HATOBJ = hat.o all: hat parser anal coat man parser: $(PARSOBJ) cc -o $(PARSER) $(PARSOBJ) $(LIBS) mv $(PARSER) $(HATDIR) anal: $(ANALOBJ) cc -o $(ANALYZER) $(ANALOBJ) $(LIBS) mv $(ANALYZER) $(HATDIR) hat: $(HATOBJ) cc -o hat $(HATOBJ) mv hat $(BINDIR) coat: sed -e "s/ANALYZER/$(ANALYZER)/" coat.tpl >coat chmod 755 coat mv coat $(BINDIR) man: cp hat.1 $(MANDIR) cp coat.1 $(MANDIR) hat.o: echo "#define HATDIR \"$(HATDIR)\"" >localnames.h echo "#define PARSER \"$(PARSER)\"" >>localnames.h echo "#define ANALYZER \"$(ANALYZER)\"" >>localnames.h echo "#define CPPCMD \"$(CPPCMD)\"" >>localnames.h echo "#define SHELLCMD \"$(SHELLCMD)\"" >>localnames.h cc $(CFLAGS) -c hat.c mv localnames.h lastnames.h SHAR_EOF fi exit 0 # End of shell archive