[net.sources] lintlib - automatic creation of human readable lint libraries

chuqui@nsc.UUCP (Chuq Von Rospach) (09/27/84)
This is a shell archive- run with /bin/sh to extract.

---- Cut here ---
#! /bin/sh
# The rest of this file is a shell script which will extract:
# README Usage Makefile lintlib lintlib.1 lintlib1.l lintlib2.l lintlib3.c
echo x - README
cat >README <<'!!ChuquiCo!!Software!!'
README for lintlib

Lintlib is a set of programs that will automatically create a lint library
for a file or a group of files. It consists of the following programs:

	lintlib: a Csh script that controls the whole process
	lintlib1: the first pass. this program strips comments
	lintlib2: second pass. this program strips out everything except the
		function definitions and #include lines and sends them out to 
		two files for sorting.
	lintlib3: final pass. Used to format the sorted function definitions 
		so that they look somewhat decent.

By default the program reads stdin and outputs to stdout. Any parameters 
given are assumed to be filenames and will be read instead (output still goes
to stdout).

The format of the output is as follows:

/*LINTLIBRARY*/
#include "file"
	.
	.
	.
#include <file>
	.
	.
	.
function definitions (sorted by function name)
	.
	.
	.
EOF

The general definition of the function in the library looks like this:

[type]	name(arg [,...]) arg_type arg;[...;] { [return([(type)] 0)] ;}
 
Where [type] is the optional type of the function; arg_type is type of the
argument.

If the function returns a value (using return(something) as opposed to
dropping through or using return()) then the lint function will return
0 cast to the type of the function. If the type of the function is defaulted
to int, the cast is dropped.

For example, the <stdio.h> function strcpy() would look like:

char *strcpy(s,s2) char *s; char *s2; { return ((char *) 0) ; }

If the function definition is longer than will confortably fit on one line,
lintlib attempts to break it up onto multiple lines so that reading is
enhanced. Note that lintlib is not nearly as smart as you are, so it doesn't
always work (especially with really_long_variable_names).


Bugs

1> Lintlib doesn't handle #ifdefs yet. A section of code such as:

#ifdef DEBUG
#include "testincludes"
#else
#include "systemincludes"
#endif DEBUG

will include BOTH include files into the lint library. I haven't figured out 
a decent way of handling this yet without putting ALL ifdefs into the file so
you should plan on looking at the file when it is done to take care of these
things.

2> lintlib will not work on function definitions where the function type and
	the name are on separate lines. It will think that
		char *
		strcpy();
	is an int.

3> If you define your global functions at the top of your program instead of 
	putting them into an include file they may show up in the lint library
	if they happen to match what lex thinks is a function. For example,
		int foo(), bar(); char *hi(); 
	will sneak in. Its probably a good idea to double check the lint library
	manually for these anomalies.

4> lintlib doesn't handle static functions yet, so these need to be removed
	by hand for now.
!!ChuquiCo!!Software!!
echo x - Usage
cat >Usage <<'!!ChuquiCo!!Software!!'
		How to use the system linter

There are three programs in the lint system: they are:

	lintlib - This is the program that creates the automated lint
	library.

Three subsidiary programs, lintlib1, lintlib2, and lintlib3 need to be
installed in the path somewhere as well so that lintlib will work.

The mklintlib program should be run on a regular basis to update the
lint library. This can be done whenever /hp2/test rolls over or by an
automatic call in cron. this takes a while, so it should either be run
very niced or at night when nobody is around (it was running 2-5 hours
for me depending on system load). It needs write ability to
/usr/lib/lint. It stores the old version of the lint library in
llib-lsys.old so you can diff and see what changed.

lintlib can be used for any set of source files when you want to, but
its major purpose is to support mklintlib. There is a man page
available for people who want to find out how it works so that they
can use it for their own work.

lintsys is used to replace the lint call for the CAE system. Any
number of files can be given to it for linting. It takes care of
locating the .h files in /hp2/test and the lint library for you, and 
it removes the lines that say:

	warning: function_name defined, but never used

to minimize the amount of drek you have to wade through. Expect
lintsys to run at least 5 minutes or more, depending on the number of
files you give it and the system load.

Please report any bugs and other uglies to chuqui.....


!!ChuquiCo!!Software!!
echo x - Makefile
cat >Makefile <<'!!ChuquiCo!!Software!!'
CFLAGS=-O
BIN   = /usr/nsc
LIB   = /usr/nsc/lib/lintlib
MAN   = /usr/man/man1

system: lintlib1 lintlib2 lintlib3

clean:
	rm -f *.o lintlib1.c lintlib2.c *.BAK

lintlib1: lintlib1.o
	cc lintlib1.o -o lintlib1 -ll 

lintlib2: lintlib2.o
	cc lintlib2.o -o lintlib2 -ll 

lintlib3: lintlib3.o
	cc lintlib3.o -o lintlib3

install: system
	cp lintlib $(BIN)/lintlib
	chmod 555 $(BIN)/lintlib
	cp lintlib1 lintlib2 lintlib3 $(LIB)

man:
	cp lintlib.1 $(MAN)

!!ChuquiCo!!Software!!
echo x - lintlib
cat >lintlib <<'!!ChuquiCo!!Software!!'
#! /bin/sh
# set lib to the location where the subprocesses are stored!
lib="/usr/nsc/lib/lintlib"
tmp1="/tmp/incl$$"
tmp2="/tmp/func$$"
$lib/lintlib1 $* | $lib/lintlib2 $tmp1 $tmp2
echo "/*LINTLIBRARY*/"
sort +1 -2 -u $tmp1
sort "-t	" +1 -2 $tmp2 | $lib/lintlib3
rm -f $tmp1 $tmp2
!!ChuquiCo!!Software!!
echo x - lintlib.1
cat >lintlib.1 <<'!!ChuquiCo!!Software!!'
.TH LINTLIB 1 NSC
.SH NAME
lintlib - create a lint library
.SH SYNOPSIS
.B lintlib
[filename1] [filename2 ...]
.SH DESCRIPTION
.PP
Lintlib is a program that will automatically create a lint library
for a file or a group of files.
.PP
By default the program reads stdin and outputs to stdout. Any parameters 
given are assumed to be filenames and will be read instead (output still goes
to stdout).
.PP
The program parses function definitions out of the input stream and
captures the following information:
.RS
type of the function
.br
name of the function
.br
function parameters (if any)
.br
parameter type definitions (if any)
.br
if there is a return statement with a value for the function.
.RE
.PP
The program will output a line for each function encountered with the
following information:
.RS
type and name of the function
.br
parameters and their types
.br
a program body that consists of an open brace, an optional return
statement (see below), a semicolon, and a close brace.
.RE
.PP
If there was a valued return statement in the function, the lint
version will return a 0 value cast to the type of the function (as in
'return (long int) 0)'. If the function type is defaulted the cast is
dropped. If the return statement had no value or if there was no
return in the function then the entire return statement is dropped.
.PP
As an example, take the function:
.RS
long int foo(bar)
.br
char bar;
.br
{
.br
[body of function]
.br
return(0L);
.br
}
.br
.RE
.PP
The output for this function would be:
.RS
long int foo(bar) char bar; { return ((long int) 0) ; }
.RE
.SH FILES
.br
/lintlib - controlling shell script
.br
/lintlib1 - first pass
.br
/lintlib2 - second pass
.br
/lintlib3 - third pass
.br
/tmp/fun$$ /tmp/inc$$ - temporary files
.SH AUTHOR
Chuq Von Rospach
.SH SEE ALSO
lint(1)
.SH DIAGNOSTICS
If it can't open an input file it is mentioned and ignored.
.br
If it can't open a temporary file it writhes horribly and dies.
.SH BUGS
.PP
Lintlib doesn't handle #ifdefs yet. A section of code such as:
.RS
#ifdef DEBUG
.br
#include "testincludes"
.br
#else
.br
#include "systemincludes"
.br
#endif DEBUG
.RE
will include BOTH include files into the lint library. I haven't figured out 
a decent way of handling this yet without putting ALL ifdefs into the file so
you should plan on looking at the file when it is done to take care of these
things.
.PP
Lintlib will not work on function definitions where the function type and
the name are on separate lines. It will think that
.RS
char *
.br
strcpy();
.RE
is an int.
.PP
If you define your global functions at the top of your program instead of 
putting them into an include file they may show up in the lint library
if they happen to match what lex thinks is a function. For example,
.RS
int foo(), bar(); char *hi(); 
.RE
will sneak in. Its probably a good idea to double check the lint library
manually for these anomalies.
.sp
This is a new program, so expect some unknown bugs and other shakedown
related problems. Please pass along any problems (with examples, I
hope) to account chuqui.

!!ChuquiCo!!Software!!
echo x - lintlib1.l
cat >lintlib1.l <<'!!ChuquiCo!!Software!!'
%{
/* 
 * Name: lintlib1.l
 * Author: chuq Von Rospach
 * Functional Description:
 *  lintlib1.l is a lex program that is used as the first pass of the
 *  lintlib automatic lint library generator package. Its purpose is to
 *  read all of the files given to it as parameters (default is stdin)
 *  and output them to the standard output after stripping all comments
 *  and packing white space.
 * Revision history:
 *   created 10/4/83 by chuqui
 */
main (argc,argv)
int argc;
char *argv[];
{
    char *progname;
    progname = argv[0];

    if (argc <= 1)
    {
	yylex();
    } else
	while (argc > 1)
	{
	    if (freopen(argv[1],"r",stdin)== NULL)
		fprintf(stderr,"%s: cannot open %s\n",progname,argv[1]);
	    else
	    { 
		yylineno = 1;
		yylex();
	    }
	    argc--; argv++;
	}
}
%}
%%
"/*"   comment();                /* strip the comment */
[\t]+  putchar (' ');            /* replace all whitespace with a space */
"\n"[\t]*"("   putchar ('(');    /* removes newline between function & parms */
","[\t]*"\n"   putchar (',');    /* these two cause multiline parameters */
"\n"[\t]*","   putchar (',');    /* to pack onto one line */
extern  eat_externs ();          /* eat up extern definitions */
%%
comment ()
{
    char    c;
    while (c = yyinput ())
	if (c == '*' && yyinput () == '/')
	    break;
}

eat_externs ()
{
    char    c;
    while (c = yyinput ())
    {
	if (c == ';')
	    break;
	if (c == '\\')
	    yyinput ();
    }
}
!!ChuquiCo!!Software!!
echo x - lintlib2.l
cat >lintlib2.l <<'!!ChuquiCo!!Software!!'
%{
/* Name: lintlib2.l
 * Author: chuq Von Rospach
 * Functional description:
 *  This is a lex program that takes the input from linlib1 on the 
 *  input and parses out two things: The lines in the file starting 
 *  with #include and the function definitions. A function definition 
 *  contains the following items:
 *   o an options function type
 *   o the function name, followed by an open parenthsis '('
 *   o an optional list of parameters, each separated by a comma
 *   o a close parenthesis ')'
 *   o optional type definitions for the parameters
 *   o an open brace '{'
 *   o the body of the function
 *   o a close brace '}'
 *  The body of the function can optionally have a return statement
 *  with or without a parameter.
 *
 *  This program takes the include file definitions and stores them in the
 *  filename defined by argv[1] for sorting. Function definitions are 
 *  converted into a format useful for creating lint libraries and is 
 *  stored in the filename defined by argv[2] for sorting. The output 
 *  record is defined as:
 *
 *   [<type>]<tab><name>([arg_list]) [arg_defs] { [return ([(<type>) 0)]; }
 *
 *  where:
 *   o <type> is the optional type for the function
 *   o <tab> is the tab character '\t'
 *   o <name> is the name of the function
 *   o [arg_list] is an optional list of parameters to the function
 *   o [arg_defs] is the optional list of type definitions for the 
 *     parameters to the function
 *   o if the function had a parameterized return statement (i.e.
 *     return(value) then the function will be given a return
 *     statement with a return value cast to the type of the 
 *     function (as in 'long foo() { return( (long) 0); }' ).
 *     if the function type is default, the cast is dropped. 
 *     if the function has no value to the return or if there 
 *     is no return at all then the return statement is dropped
 *     and the body of the function becomes '{ ; }'.
 *
 * Revision History:
 *  created 10/4/83 by chuq Von Rospach
 */

#define MAXLINE 512               /* Maximum size of strings */
#define FALSE   0                 /* logical false */

int levels,                       /* nesting levels of braces */
    has_return,                   /* a return statement has been found */
    in_function;                  /* currently processing a function */
char function_type[MAXLINE];      /* Storage for the type of the function */
char *progname;                   /* Name of this program */
FILE *inclfp,                     /* file pointer for include lines */
     *funfp,                      /* file pointer for function lines */
     *fopen();   
  
/* Name: main
 * Author: chuq Von Rospach
 * Description: This function opens up the output files and then calls 
 *  yylex() to do the real processing.
 * Revision history:
 *  created 10/4/83 by chuq Von Rospach
 */

main (argc,argv)
int argc;
char *argv[];
{
    levels = FALSE;
    has_return = FALSE;
    in_function = FALSE;
    progname = argv[0];

    if (argc != 3)
    {
	fprintf(stderr,"Usage: %s [include file] [function file]\n",progname);
	exit(1);
    }
    if (((inclfp = fopen(argv[1],"w")) == NULL ) ||
    ((funfp = fopen(argv[2],"w")) == NULL ))
    {
	fprintf(stderr,"%s: cannot open output files\n",progname);
	exit(1);
    }
    yylex();
    fclose(inclfp);
    fclose(funfp);
}
%}

%%
^#include.*                                     fprintf(inclfp,"%s\n",yytext);
^[a-zA-Z_][a-zA-Z0-9 _*]*"("[a-zA-Z0-9, _]*")"  function();
"{"                                             levels++;
"}"                                             if (!--levels) end_function();
return[( ]*[^);]                                has_return++;
.                                               ;
\n                                              ;
%%

/* Name: function
 * Author: chuq Von Rospach
 * Description: function handles the processing of the first part of a 
 *  function in the source file. It splits the function type from the function
 *  name and then packs everything it finds up to the opening brace onto 
 *  the line with the function call.
 * Revision History
 *  created 10/4/83 by chuq
 */

function()
{
    char c,
         s[MAXLINE],
         function_name[MAXLINE];     /* where the function name is stored*/
    int i;

    if (in_function) return;         /* if we are in a function, don't start 
                                      * a new one! */

    function_type[0] = '\0';
    function_name[0] = '\0';
    has_return = FALSE;
    levels = 1;
    in_function = 1;

    i = where(yytext,'(');           /* get everything to the parm list */
    strncpy(s,yytext,i);             /* and put it somewhere */
    s[i] = '\0';                     /* strncpy won't add nulls for you */

/* if there was a space between the function name and the '(' we need
 * to get rid of it */
    if (s[i-1] == ' ') s[i-1] = '\0';   
   
/* Now check to see if there is a space left in the function name. If there
 * is, it shows that there is a function type there also, so parse it out. */
    if ((i = rwhere(s,' ')) >= 0)
    {   
/* check to see if the first character of the function name is a '*'. If it
 * is we have something like 'char *strcpy()' that needs to be parsed as 
 * 'char *' 'strcpy' instead of the normal 'char' '*strcpy'. This piece of
 * code looks (and is) kludgey, but it works. What it does is look at the 
 * first character of the function name and if it is a start it bumps the
 * pointer forward to it (remember that the i'th character lives in [i-1]
 * and so i+1 is two characters forward *sigh*). We then check to see if 
 * the i'th character is ' ' and if it is we skip over it so that everything
 * lines up and sorts properly later. If you can think of an easier way to
 * do this, let me know....*/
	if (yytext[i+1] == '*')
	    i+=2;   /* kludge to put function pointer into type instead of 
		     * name (as in char *strcpy) */
	    strncpy(function_type,yytext,i);
	    if (yytext[i] == ' ') i++;
	    strcpy(function_name,&yytext[i]);
    } else      /* end if ((i */
	strcpy(function_name,yytext);      /* no function type, so skip 
						* all of that garbage...  */
/* Print out the type and the name */
    fprintf(funfp,"%s\t%s ",function_type,function_name);

/* and copy the input to the output (stripping newlines) until you reach the
 * open bracket. This gets all of the parameter definitions that we need.  */
    while ((c = yyinput()) != '{') 
	if (c != '\n') 
	    putc(c,funfp);
	putc(c,funfp);
}

/* Name: end_function
 * Author: chuq Von Rospach
 * Description: End function is called when the lex analyser drops out 
 *  of the final level of brace nesting. Its purpose is to output the 
 *  return() call if the function had one with any appropriate cast and 
 *  the close brace.
 */ 
end_function()
{
    if (!in_function) return;      /* happens with structures */
    if (has_return)
    {
	if (strlen(function_type))
	    fprintf(funfp,"return ((%s)0);}\n",function_type);
	else 
	    fprintf(funfp,"return (0);}\n");
    } else
    {
	fprintf(funfp,";}\n");
    }
    in_function = FALSE;
}

/* Name: where
 * Author: chuqui Von Rospach
 * Description: where does what index(3) does, except that it returns
 *  an integer that indexes into s[] instead of a character pointer.
 * Revision history:
 *  created 10/4/83 by chuq
 */
where(s,c) /* return first location of character c in string s */
char s[];
char c;
{
    register int i;

    for (i = 0; i < strlen(s); i++)
	if (s[i] == c)
	    return(i);
	return(-1);
}

/* Name: rwhere
 * Author: chuq Von Rospach
 * Description: rwhere does what rindex(3) does, except that it returns
 *  an integer that indexes into s[] instead of a character pointer.
 * Revision history:
 *  created 10/4/83 by chuq
 */
rwhere(s,c)
char s[];
char c;
{
    register int i;

    for (i = strlen(s); i >= 0; i--)
	if (s[i] == c)
	    return(i);
	return(-1);
}
!!ChuquiCo!!Software!!
echo x - lintlib3.c
cat >lintlib3.c <<'!!ChuquiCo!!Software!!'
/* Name: lintlib3.c
 * Author: chuq Von Rospach
 * Functional description: This is a C program that takes the standard input
 *  (which is assumed to be the sorted output of function definitions from
 *  lintlib2) and tries to make it pretty for people to look at. If a 
 *  line is longer than will comfortable fit on a CRT screen this program
 *  attempts to break it up in a reasonable way by splitting the 
 *  definition, argument types, and program body onto separate lines. It 
 *  doesn't always work real well when these segments are real long, but
 *  it does its best.
 * Revision history:
 *  created by chuq 10/4/83
 */

#include <stdio.h>
#include <ctype.h>

#define MAXLINE  512              /* Maximum string length */
#define INDENT_S "\n\t    "       /* used to indent continued lines */
#define INDENT   12               /* size of that indent */
#define CUTOFF   65               /* where to start splitting */
#define FALSE    0                /* logical false */

/* Name: main
 * Author: chuq Von Rospach
 * Description: This reads reads stdin until EOF. If the line is short 
 *  enough, it tosses to stdout, otherwise it gives it to format() to play with.
 * Revision history:
 *  created 10/4/83 by chuq 
 */
main()
{
    char line[MAXLINE];         /* used to store line being processed */
    int len;                    /* length of string */

    while (gets(line) != NULL)
    {
	if ((len = strlen(line)) < CUTOFF)
	    puts(line);
	else
	    format(line,len);
    }
    exit(0);
}

/* Name: format
 * Author: chuq Von Rospach
 * Description: this function takes the function stored in line and tries
 *  to break it up so that it will not go off the edge of a CRT. This
 *  is done by splitting it up into three parts, the definition and 
 *  parameters, the parameter types, and the function body. If these 
 *  are still too long, it tries to break them up at reasonably logical
 *  places, which doesn't always work
 * Revision history
 *  created 10/4/83 by chuq
 */
format(line,len)
char line[];
int len;
{
    char c;                           /* convenient storage */
    register int col,                 /* # of columns output */
                 i,                   /* convenient storage */
                 paren_done;          /* whether we found an open paren */

    paren_done = FALSE;

/* if the first character is a tab, set the column marker to 7 so that it all 
 * evens out. If this seems kludgey, it is, so feel free to suggest a nicer 
 * way */

    col = (line[0] == '\t' ? 7 : 0);
    for (i = 0; i < len; i++)
    {
	c = line[i];

/* The first close paren marks the end of the function definition, so
 * break the line and indent for the next. */

	if ( c == ')' && !paren_done)
	{
	    paren_done++;
	    putchar(c);
	    printf(INDENT_S);
	    col = INDENT;

/* The open brace shows the start of the function body so break the line 
 * again */

	} else
	    if (c == '{')
	    { 
		printf(INDENT_S);
		putchar(c);
		col = INDENT+1;

/* if we are farther than CUTOFF, look for a logical breaking point,
 * such as a comma, semi, or close paren. This is where our formatting
 * gets into trouble because people who use real_long_names can fool us
 * terribly */

	    } else 
	    if (col > CUTOFF && (c == ',' || c == ';' || c == ')' || c == ' '))
	    {
		putchar(c);
		printf(INDENT_S);
		col = INDENT;

/* check for white space in the parameters and function body and only print
 * a single space */

	    } else
	    if (isspace(c) && paren_done)
	    {
		if (col != INDENT)
		    putchar(' ');
		while (isspace(line[i+1]))
		    i++;

/* default for everything else: print it (of course) */

	    } else
	    {
		putchar(c);
		col++;
	    }
	}
	putchar('\n');      /* end of the string, so start a new line */
	return;
}
!!ChuquiCo!!Software!!
-- 
From the Department of Bistromatics:			Chuq Von Rospach
{amd,decwrl,fortune,hplabs,ihnp4}!nsc!chuqui	nsc!chuqui@decwrl.ARPA

Flying is the art of throwing yourself at the ground and missing.