[net.sources] lintlib - lint library generator

chuqui@cae780.UUCP (Chuq Von Rospach) (11/28/83)

As long as I am tossing things out onto the net, here is another program
that I put together to remove another Unix drudgery. The program enclosed
will automatically generate lint librarys for a given set of input files,
so there is no longer any excuse for out of date lints. 

This program is rather new, and still has some quirks. You should go
through the libraries after they have been generated to make sure there
aren't any problems that need hand fixing (it does do about 99.9% of the
work, though).

comments, questions, enhancements to me (please!)

chuq

=== Delete everything to here and feed to shell ===
echo x = Makefile
cat >Makefile <<!Chuquico!Software
CFLAGS=-O

system: lintlib1 lintlib2 lintlib3

clean:
	rm -f *.o lintlib1.c lintlib2.c *.BAK

lintlib1: lintlib1.o
	cc lintlib1.o -o lintlib1 -ll 

lintlib2: lintlib2.o
	cc lintlib2.o -o lintlib2 -ll 

lintlib3: lintlib3.o
	cc lintlib3.o -o lintlib3
!Chuquico!Software
echo x = README
cat >README <<!Chuquico!Software
README for lintlib

Lintlib is a set of programs that will automatically create a lint library
for a file or a group of files. It consists of the following programs:

	lintlib: a Csh script that controls the whole process
	lintlib1: the first pass. this program strips comments
	lintlib2: second pass. this program strips out everything except the
		function definitions and #include lines and sends them out to 
		two files for sorting.
	lintlib3: final pass. Used to format the sorted function definitions 
		so that they look somewhat decent.

By default the program reads stdin and outputs to stdout. Any parameters 
given are assumed to be filenames and will be read instead (output still goes
to stdout).

The format of the output is as follows:

/*LINTLIBRARY*/
#include "file"
	.
	.
	.
#include <file>
	.
	.
	.
function definitions (sorted by function name)
	.
	.
	.
EOF

The general definition of the function in the library looks like this:

[type]	name(arg [,...]) arg_type arg;[...;] { [return([(type)] 0)] ;}
 
Where [type] is the optional type of the function; arg_type is type of the
argument.

If the function returns a value (using return(something) as opposed to
dropping through or using return()) then the lint function will return
0 cast to the type of the function. If the type of the function is defaulted
to int, the cast is dropped.

For example, the <stdio.h> function strcpy() would look like:

char *strcpy(s,s2) char *s; char *s2; { return ((char *) 0) ; }

If the function definition is longer than will confortably fit on one line,
lintlib attempts to break it up onto multiple lines so that reading is
enhanced. Note that lintlib is not nearly as smart as you are, so it doesn't
always work (especially with really_long_variable_names).


Bugs

1> Lintlib doesn't handle #ifdefs yet. A section of code such as:

#ifdef DEBUG
#include "testincludes"
#else
#include "systemincludes"
#endif DEBUG

will include BOTH include files into the lint library. I haven't figured out 
a decent way of handling this yet without putting ALL ifdefs into the file so
you should plan on looking at the file when it is done to take care of these
things.

2> lintlib will not work on function definitions where the function type and
	the name are on separate lines. It will think that
		char *
		strcpy();
	is an int.

3> If you define your global functions at the top of your program instead of 
	putting them into an include file they may show up in the lint library
	if they happen to match what lex thinks is a function. For example,
		int foo(), bar(); char *hi(); 
	will sneak in. Its probably a good idea to double check the lint library
	manually for these anomalies.
!Chuquico!Software
echo x = lintlib
cat >lintlib <<!Chuquico!Software
#! /bin/csh
set tmp1="/tmp/incl$$"
set tmp2="/tmp/func$$"
lintlib1 $* | lintlib2 $tmp1 $tmp2
echo "/*LINTLIBRARY*/"
sort +1 -2 -u $tmp1
sort "-t	" +1 -2 $tmp2 | lintlib3
rm -f $tmp1 $tmp2
!Chuquico!Software
echo x = lintlib.1
cat >lintlib.1 <<!Chuquico!Software
.TH LINTLIB 1 (CAE)
.SH NAME
lintlib - create a lint library
.SH SYNOPSIS
.B lintlib
[filename1] [filename2 ...]
.SH DESCRIPTION
.PP
Lintlib is a program that will automatically create a lint library
for a file or a group of files.
.PP
By default the program reads stdin and outputs to stdout. Any parameters 
given are assumed to be filenames and will be read instead (output still goes
to stdout).
.PP
The program parses function definitions out of the input stream and
captures the following information:
.RS
type of the function
.br
name of the function
.br
function parameters (if any)
.br
parameter type definitions (if any)
.br
if there is a return statement with a value for the function.
.RE
.PP
The program will output a line for each function encountered with the
following information:
.RS
type and name of the function
.br
parameters and their types
.br
a program body that consists of an open brace, an optional return
statement (see below), a semicolon, and a close brace.
.RE
.PP
If there was a valued return statement in the function, the lint
version will return a 0 value cast to the type of the function (as in
'return (long int) 0)'. If the function type is defaulted the cast is
dropped. If the return statement had no value or if there was no
return in the function then the entire return statement is dropped.
.PP
As an example, take the function:
.RS
long int foo(bar)
.br
char bar;
.br
{
.br
[body of function]
.br
return(0L);
.br
}
.br
.RE
.PP
The output for this function would be:
.RS
long int foo(bar) char bar; { return ((long int) 0) ; }
.RE
.SH FILES
.br
/usr/cae/lintlib - controlling shell script
.br
/usr/cae/lintlib1 - first pass
.br
/usr/cae/lintlib2 - second pass
.br
/usr/cae/lintlib3 - third pass
.br
/tmp/fun$$ /tmp/inc$$ - temporary files
.SH AUTHOR
Chuqui Von Rospach
.SH SEE ALSO
lint(1)
.SH DIAGNOSTICS
If it can't open an input file it is mentioned and ignored.
.br
If it can't open a temporary file it writhes horribly and dies.
.SH BUGS
.PP
Lintlib doesn't handle #ifdefs yet. A section of code such as:
.RS
#ifdef DEBUG
.br
#include "testincludes"
.br
#else
.br
#include "systemincludes"
.br
#endif DEBUG
.RE
will include BOTH include files into the lint library. I haven't figured out 
a decent way of handling this yet without putting ALL ifdefs into the file so
you should plan on looking at the file when it is done to take care of these
things.
.PP
Lintlib will not work on function definitions where the function type and
the name are on separate lines. It will think that
.RS
char *
.br
strcpy();
.RE
is an int.
.PP
If you define your global functions at the top of your program instead of 
putting them into an include file they may show up in the lint library
if they happen to match what lex thinks is a function. For example,
.RS
int foo(), bar(); char *hi(); 
.RE
will sneak in. Its probably a good idea to double check the lint library
manually for these anomalies.
.sp
This is a new program, so expect some unknown bugs and other shakedown
related problems. Please pass along any problems (with examples, I
hope) to account chuqui.
!Chuquico!Software
echo x = lintlib1.l
cat >lintlib1.l << !Chuquico!Software
%{
/* 
 *	Name: lintlib1.l					Part of: lintlib
 *
 *  Author: chuqui Von Rospach
 *
 *	Functional Description:
 *
 *	lintlib1.l is a lex program that is used as the first pass of the lintlib
 *  automatic lint library generator package. Its purpose is to read all of 
 *  the files given to it as parameters (default is stdin) and output them to
 *  the standard output after stripping all comments and packing white space.
 *
 * Parameters:
 *	argv[0] is the program name.
 *	Any other parameters are taken as filenames. Files that cannot be opened 
 *		are ignored.
 *
 * Procedures called:
 *	All called procedures are within this file.
 *
 * Revision history:
 * 		created 10/4/83 by chuqui
 */

/* Name: main					Part of: lintlib1.l
 *
 *	Author: chuqui Von Rospach
 * 
 *  Description: Control procedure for first pass of lintlib. Takes care
 *		of opening files to be processed and passing them to the lex
 *		procedure yylex().
 *
 *	Parameters:
 *		argc: number of arguments passed to program.
 *		argv: array of pointers to arguments passed.
 *			[0]: name of program.
 *			[1] to [argc-1]: filenames to process.
 *
 *	Routines called:
 *			yylex
 *			freopen
 *			fprintf
 *
 *	Revision history
 *		created 10/4/83 by chuqui
 */
main (argc,argv)
int argc;
char *argv[];
{
   char *progname;

   progname = argv[0];
   if (argc <= 1)
   {
      yylex();
   }else{
      while (argc > 1)
      {
         if (freopen(argv[1],"r",stdin)== NULL)
         {
            fprintf(stderr,"%s: cannot open %s\n",progname,argv[1]);
         } else { 
            yylineno = 1;
            yylex();
         }
         argc--; argv++;
      }
   }
}
%}
%%
"/*"   comment();				/* strip the comment */
[ \t]+   putchar(' ');			/* replace all whitespace with a space */
"\n"[ \t]*"("	putchar('(');	/* removes newline between function & parms*/
","[ \t]*"\n"		putchar(','); /* these two cause multiline parameters */
"\n"[ \t]*","		putchar(','); /* to pack onto one line */
extern				eat_externs();	/* eat up extern definitions */
%%

/*
 *	Name: comment				Part of: lintlib1
 *
 * Author: chuqui Von rospach
 *
 * Description: This function discards the input until it locates an end of
 *		comment token. It is called after the start comment token 
 *		has been gobbled.
 *
 * Parameters: none
 *
 * routines called: yyinput
 *
 * revision history:
 *	created 10/4/83 by chuqui
 */
comment()
{
   char c;
   while (c = yyinput()) {
      if (c == '*' && yyinput() == '/')
         break;
   }
}


 /*
 *	Name: eat_externs				Part of: lintlib1
 *
 * Author: chuqui Von rospach
 *
 * Description: This function discards the input until it locates a semicolon
 *		that defines the end of the extern statement.
 *
 * Parameters: none
 *
 * routines called: yyinput
 *
 * revision history:
 *	created 10/4/83 by chuqui
 */
eat_externs()
{
   char c;
   while (c = yyinput()) {
	  if (c == ';') break;
	  if (c == '\\') yyinput();
   }
}
%{
!Chuquico!Software
echo x = lintlib2.l
cat >lintlib2.l << !Chuquico!Software
/*   Name: lintlib2.l         Part of: lintlib
 *
 *   Author:   chuqui Von Rospach
 *
 *   Functional description:
 *      This is a lex program that takes the input from linlib1 on the 
 *      input and parses out two things: The lines in the file starting 
 *      with #include and the function definitions. A function definition 
 *       contains the following items:
 *         o   an options function type
 *         o   the function name, followed by an open parenthsis '('
 *         o   an optional list of parameters, each separated by a comma
 *         o   a close parenthesis ')'
 *         o   optional type definitions for the parameters
 *         o   an open brace '{'
 *         o   the body of the function
 *         o   a close brace '}'
 *
 *      The body of the function can optionally have a return statement
 *      with or without a parameter.
 *
 *      This program takes the include file definitions and stores them in the
 *      filename defined by argv[1] for sorting. Function definitions are 
 *      converted into a format useful for creating lint libraries and is 
 *      stored in the filename defined by argv[2] for sorting. The output 
 *      record is defined as:
 *
 *      [<type>]<tab><name>([arg_list]) [arg_defs] { [return ([(<type>) 0)]; }
 *
 *      where:
 *         o   <type> is the optional type for the function
 *         o   <tab> is the tab character '\t'
 *         o   <name> is the name of the function
 *         o   [arg_list] is an optional list of parameters to the function
 *         o   [arg_defs] is the optional list of type definitions for the 
 *             parameters to the function
 *         o   if the function had a parameterized return statement (i.e.
 *             return(value) then the function will be given a return
 *             statement with a return value cast to the type of the 
 *             function (as in 'long foo() { return( (long) 0); }' ).
 *             if the function type is default, the cast is dropped. 
 *             if the function has no value to the return or if there 
 *             is no return at all then the return statement is dropped
 *             and the body of the function becomes '{ ; }'.
 *
 *   Parameters:
 *      argv[0]:   Name of the program
 *      argv[1]:   filename used to store '#include' lines
 *      argv[2]:   filename used to store function definitions
 *
 *   Procedures called:   All procedures are internal to this file.
 *
 *   Revision History:
 *      created 10/4/83 by chuqui Von Rospach
 */

#define MAXLINE 512               /* Maximum size of strings */
#define FALSE   0                 /* logical false */

int levels,                       /* nesting levels of braces */
    has_return,                   /* a return statement has been found */
    in_function;                  /* currently processing a function */
char function_type[MAXLINE];      /* Storage for the type of the function */
char *progname;                   /* Name of this program */
FILE *inclfp,                     /* file pointer for include lines */
     *funfp,                      /* file pointer for function lines */
     *fopen();   
 
/*   Name: main         Part of: lintlib2.l
 *
 *   Author: chuqui Von Rospach
 *
 *   Description: This function opens up the output files and then calls 
 *      yylex() to do the real processing.
 *
 *   Parameters:
 *      argv[0] - name of the program
 *      argv[1] - filename to store the include lines
 *      argv[2] - filename to store the function lines
 *
 *   Routines called:
 *      fprintf
 *      exit
 *      fopen
 *      yylex
 *      fclose
 *
 *   Revision history:
 *      created 10/4/83 by chuqui Von Rospach
 */

main (argc,argv)
int argc;
char *argv[];
{
   levels = FALSE;
   has_return = FALSE;
   in_function = FALSE;
   progname = argv[0];

   if (argc != 3)
   {
      fprintf(stderr,"Usage: %s [include file] [function file]\n",progname);
      exit(1);
   }
   if (((inclfp = fopen(argv[1],"w")) == NULL ) ||
      ((funfp = fopen(argv[2],"w")) == NULL ))
   {
      fprintf(stderr,"%s: cannot open output files\n",progname);
      exit(1);
   }
   yylex();
   fclose(inclfp);
   fclose(funfp);
}
%}

%%
^#include.*                                     fprintf(inclfp,"%s\n",yytext);
^[a-zA-Z_][a-zA-Z0-9 _*]*"("[a-zA-Z0-9, _]*")"  function();
"{"                                             levels++;
"}"                                             if (!--levels) end_function();
return[( ]*[^);]                                has_return++;
.                                               ;
\n                                              ;
%%

/* Name: function            Part of: lintlib2.l
 *
 * Author: chuqui Von Rospach
 *
 * Description: function handles the processing of the first part of a 
 *   function in the source file. It splits the function type from the function
 *   name and then packs everything it finds up to the opening brace onto 
 *   the line with the function call.
 *
 * Parameters: none
 * 
 *   Routines called:
 *      where
 *      rwhere
 *      strncpy
 *      fprintf
 *      strcpy
 *      putc
 *
 *   Revision History
 *      created 10/4/83 by chuqui
 */

function()
{
   char c;
   char s[MAXLINE],
       function_name[MAXLINE];      /* where the function name is stored*/
   int i;

   if (in_function) return;         /* if we are in a function, don't start 
                                     * a new one!
                                     */

   function_type[0] = '\0';
   function_name[0] = '\0';
   has_return = FALSE;
   levels = 1;
   in_function = 1;

   i = where(yytext,'(');            /* get everything to the parm list */
   strncpy(s,yytext,i);            /* and put it somewhere */
   s[i] = '\0';                  /* strncpy won't add nulls for you */

/* if there was a space between the function name and the '(' we need
 * to get rid of it
 */
   if (s[i-1] == ' ') s[i-1] = '\0';   
                           
/* Now check to see if there is a space left in the function name. If there
 * is, it shows that there is a function type there also, so parse it out.
 */
   if ((i = rwhere(s,' ')) >= 0) {   
/* check to see if the first character of the function name is a '*'. If it
 * is we have something like 'char *strcpy()' that needs to be parsed as 
 * 'char *' 'strcpy' instead of the normal 'char' '*strcpy'. This piece of
 * code looks (and is) kludgey, but it works. What it does is look at the 
 * first character of the function name and if it is a start it bumps the
 * pointer forward to it (remember that the i'th character lives in [i-1]
 * and so i+1 is two characters forward *sigh*). We then check to see if 
 * the i'th character is ' ' and if it is we skip over it so that everything
 * lines up and sorts properly later. If you can think of an easier way to
 * do this, let me know....
 */
      if (yytext[i+1] == '*')
         i+=2;   /* kludge to put function pointer into type instead of 
                * name (as in char *strcpy)
                */
      strncpy(function_type,yytext,i);
      if (yytext[i] == ' ') i++;
      strcpy(function_name,&yytext[i]);
   } else      /* end if ((i */
      strcpy(function_name,yytext);      /* no function type, so skip 
                                  * all of that garbage...
                                  */
/* Print out the type and the name */
   fprintf(funfp,"%s\t%s ",function_type,function_name);

/* and copy the input to the output (stripping newlines) until you reach the
 * open bracket. This gets all of the parameter definitions that we need.
 */
   while ((c = yyinput()) != '{') 
      if (c != '\n') 
         putc(c,funfp);
   putc(c,funfp);
}

/*   Name: end_function            Part of: lintlib2.l
 *
 *   Author:   chuqui Von Rospach
 *
 *   Description: End function is called when the lex analyser drops out 
 *   of the final level of brace nesting. Its purpose is to output the 
 *   return() call if the function had one with any appropriate cast and 
 *   the close brace.
 */ 

end_function()
{
   if (!in_function) return;      /* happens with structures */
   if (has_return)
   {
      if (strlen(function_type))
         fprintf(funfp,"return ((%s)0);}\n",function_type);
      else 
         fprintf(funfp,"return (0);}\n");
   } else {
      fprintf(funfp,";}\n");
   }
   in_function = FALSE;
}

/*   Name:   where            Part of: lintlib2.l
 *
 *   Author: chuqui Von Rospach
 *
 *   Description: where does what index(3) does, except that it returns
 *      an integer that indexes into s[] instead of a character pointer.
 *
 *   Parameters:
 *      s:   String to look at
 *      c:   Character to find
 *
 *   functions called:
 *      strlen
 *      return
 *
 *   Revision history:
 *      created 10/4/83 by chuqui
 */

where(s,c) /* return first location of character c in string s */
char s[];
char c;
{
   register int i;

   for (i = 0; i < strlen(s); i++)
      if (s[i] == c)
         return(i);
   return(-1);
}


/*   Name:   rwhere            Part of: lintlib2.l
 *
 *   Author: chuqui Von Rospach
 *
 *   Description: rwhere does what rindex(3) does, except that it returns
 *      an integer that indexes into s[] instead of a character pointer.
 *
 *   Parameters:
 *      s:   String to look at
 *      c:   Character to find
 *
 *   functions called:
 *      strlen
 *      return
 *
 *   Revision history:
 *      created 10/4/83 by chuqui
 */
rwhere(s,c) /* return last location of character c in string s */
char s[];
char c;
{
   register int i;

   for (i = strlen(s); i >= 0; i--)
      if (s[i] == c)
         return(i);
   return(-1);
}
!Chuquico!Software
echo x = lintlib3.c
cat >lintlib3.c << !Chuquico!Software
/*   Name: lintlib3.c            Part of: lintlib
 *
 *   Author: chuqui Von Rospach
 *
 *   Functional description: This is a C program that takes the standard input
 *      (which is assumed to be the sorted output of function definitions from
 *      lintlib2) and tries to make it pretty for people to look at. If a 
 *      line is longer than will comfortable fit on a CRT screen this program
 *      attempts to break it up in a reasonable way by splitting the 
 *      definition, argument types, and program body onto separate lines. It 
 *      doesn't always work real well when these segments are real long, but
 *      it does its best.
 *
 *   Parameters: none
 *
 *   Routines called:
 *      gets
 *      strlen
 *      puts
 *      putchar
 *      printf
 *      isspace
 *
 *   Revision history:
 *      created by chuqui 10/4/83
 */

#include <stdio.h>
#include <ctype.h>

#define MAXLINE  512              /* Maximum string length */
#define INDENT_S "\n\t    "       /* used to indent continued lines */
#define INDENT   12               /* size of that indent */
#define CUTOFF   65               /* where to start splitting */
#define FALSE    0                /* logical false */

/*   Name:   main         Part of: lintlib3.c
 *
 *   Author: chuqui Von Rospach
 *
 *   Description: This reads reads stdin until EOF. If the line is short 
 *      enough, it tosses to stdout, otherwise it gives it to format()
 *      to play with.
 *
 *   Parameters: none
 *
 *   functions called:
 *      gets
 *      strlen
 *      puts
 *      format
 *
 *   Revision history:
 *      created 10/4/83 by chuqui 
 */
main()
{
      char line[MAXLINE];         /* used to store line being processed */
     int len;                  /* length of string */
      while (gets(line) != NULL)
     {
           if ((len = strlen(line)) < CUTOFF)
            puts(line);
         else
            format(line,len);
      }
   exit(0);
}

/*   Name: format         Part of: lintlib3.c
 *
 *   Author: chuqui Von Rospach
 *
 *   Description: this function takes the function stored in line and tries
 *      to break it up so that it will not go off the edge of a CRT. This
 *      is done by splitting it up into three parts, the definition and 
 *      parameters, the parameter types, and the function body. If these 
 *      are still too long, it tries to break them up at reasonably logical
 *      places, which doesn't always work
 *
 *   Parameters:
 *      char line[]      the string to process.
 *      int len          length of the string.
 *
 *   Functions called:
 *      putchar
 *      printf
 *      isspace
 *
 *   Revision history
 *      created 10/4/83 by chuqui
 */
format(line,len)
char line[];
int len;
{
      char c;                           /* convenient storage */
      register int col;                 /* # of columns output */
      register int i;                   /* convenient storage */
      register int paren_done;          /* whether we found an open paren */

     paren_done = FALSE;
/* if the first character is a tab, set the column marker to 7 so that it all 
 * evens out. If this seems kludgey, it is, so feel free to suggest a nicer 
 * way 
 */
      col = (line[0] == '\t' ? 7 : 0);
      for (i = 0; i < len; i++) {
         c = line[i];
/* The first close paren marks the end of the function definition, so
 * break the line and indent for the next.
 */
         if ( c == ')' && !paren_done) {
            paren_done++;
            putchar(c);
            printf(INDENT_S);
            col = INDENT;
/* The open brace shows the start of the function body so break the line 
 * again
 */
         } else if (c == '{') { 
            printf(INDENT_S);
            putchar(c);
            col = INDENT+1;
/* if we are farther than CUTOFF, look for a logical breaking point,
 * such as a comma, semi, or close paren. This is where our formatting
 * gets into trouble because people who use real_long_names can fool us
 * terribly
 */
         } else if (col > CUTOFF && 
                     (c == ',' || c == ';' || c == ')' || c == ' ')) {
            putchar(c);
            printf(INDENT_S);
            col = INDENT;
/* check for white space in the parameters and function body and only print
 * a single space
 */
         } else if (isspace(c) && paren_done) {
               if (col != INDENT)
                  putchar(' ');
               while (isspace(line[i+1]))
                  i++;
/* default for everything else: print it (of course) */
         } else {
            putchar(c);
            col++;
         }
      }
      putchar('\n');      /* end of the string, so start a new line */
     return;
}
!Chuquico!Software
-- 
From the dungeons of the warlock:		{amd70 qubix}!cae780!chuqui
		Chuqui the Plaid		*pif*