[comp.sources.unix] v20i010: Tools for generating software metrics, Part03/14

rsalz@uunet.uu.net (Rich Salz) (09/20/89)

Submitted-by: Brian Renaud <huron.ann-arbor.mi.us!bdr>
Posting-number: Volume 20, Issue 10
Archive-name: metrics/part03

---- Cut Here and unpack ----
#!/bin/sh
# this is part 3 of a multipart archive
# do not concatenate these parts, unpack them in order with /bin/sh
# file doc/Results1 continued
#
CurArch=3
if test ! -r s2_seq_.tmp
then echo "Please unpack part 1 first!"
     exit 1; fi
( read Scheck
  if test "$Scheck" != $CurArch
  then echo "Please unpack part $Scheck next!"
       exit 1;
  else exit 0; fi
) < s2_seq_.tmp || exit 1
echo "x - Continuing file doc/Results1"
sed 's/^X//' << 'SHAR_EOF' >> doc/Results1
Xheavily to the change count.  This is also intuitive.  (I think there
Xare two reasons for this, (1) that code with many special cases requiring
Xreturns/exits is not well thought out ahead of time, but rather, coded
Xas the implementor thought and (2) that changes to code with many returns
Xcan be difficult, and can easily be incorrect.
X
XYou might think at this point that you have a magic formula for predicting
Xerrors.  Unfortunately, although this stuff does a very good job of
Xpredicting problems on the pascal project, the best R-sqared I ever
Xgot for the RSM(*1) data was still less than .4.  Apparently, there are
Xother factors at work.  I believe that one very important factor is the
Xexperience/skill of the original implementor and of those who made changes
Xto the module.  The uids of these people can be obtained from the sccs file.
XYou then need to have some way to associate a skill level with the person
X(note that time may be important here too).  I never completed this final
Xstep.  
X
X
X*1 - RSM == Remote Software Management, a distributed source distribution
Xand control system.  See results2 for more info.
X
XAs always, feel free to contact me with questions or assistance in setting
Xthis stuff up to work on a project you want to analyze.
X
XBrian Renaud
SHAR_EOF
echo "File doc/Results1 is complete"
chmod 0644 doc/Results1 || echo "restore of doc/Results1 fails"
echo "x - extracting doc/Results2 (Text)"
sed 's/^X//' << 'SHAR_EOF' > doc/Results2
XSome additional example metrics.  These are from the RSM and pascal
X(combined) projects.
X
XThis example predicts changes (errors) by using (halstead) volume,
Xcomments, functions (number of functions in file) and returns
X(total returns).  There are 490 files (data points).  The data contains
Xa number of significant outliers.  For every variable, the standard
Xdeviation is greater than the mean.
X
XNote that I included function count rather than mccabe.  This
Xseemed to be a more significant variable.
X
XThe regression equation is:
X
Xchanges	=   0.00164 volume  -  0.1097comments  -  0.2515 functions  
X	  + 0.07432 returns  +  3.18609
X
XR-Squared is .5818
X
XNote that, once again we have a counter-intuitive variable.  This time
Xfunctions varies inversely with changes.  (That is, more functions in
Xa file imply fewer changes.)  Strange; I speculate that since file size
Xis explained by the volume variable, many functions in a file means small
Xfunctions.  Small functions typically contain fewer errors.
X
XThe correlation matrix is:
X
Xchanges		1.0000
Xvolume		0.6630	1.0000
Xcomments	0.2152	0.7068	1.0000
Xfunctions	0.2394	0.6311	0.7528	1.0000
Xreturns		0.6879	0.7087	0.1825	0.3079	1.0000
X
X
XThe t test results are:
X
Xvolume		10.5995
Xcomments	 5.0503
Xfunctions	 1.8426
Xreturns		 3.8045
SHAR_EOF
chmod 0644 doc/Results2 || echo "restore of doc/Results2 fails"
echo "x - extracting doc/halstead.doc (Text)"
sed 's/^X//' << 'SHAR_EOF' > doc/halstead.doc
XHalstead provides various indicators of the module's complexity.
XThe module is examined to determine the following
X
Xn1 == number of unique operators
Xn2 == number of unique operands
XN1 == number of total operators
XN2 == number of total operands
X
XThe program writes these output fields:
X
XFile Name
XProgram Length          (N  = N1 + N2)
XProgram Volume          (V  = N log<base2> (n1 + n2))
XProgram Level           (L  = (2/n1) * (n2/N2))
XMental Discriminations  (E^ = V / L)
X
XThe program volume (aka ``Halstead Volume'') is probably most useful
Xand seems to be reasonably well correlated with error counts for modules.
SHAR_EOF
chmod 0644 doc/halstead.doc || echo "restore of doc/halstead.doc fails"
echo "x - extracting doc/kdsi.1L (Text)"
sed 's/^X//' << 'SHAR_EOF' > doc/kdsi.1L
X.TH KDSI "L COSI" 2/2/86
X.\" bdr
X.UC
X.SH NAME
Xkdsi - count number of lines of code in a C program
X.SH SYNOPSIS
X.B kdsi
X.B [ file ]*
X.SH DESCRIPTION
X.I Kdsi
Xcounts the lines of code in a C program.
XIt provides the following information:
X.sp 1
X.nf
X          lines of code
X          blank lines
X          comments lines
X          number of comments
X.fi
X.LP
XIf you specify no files,
X.I kdsi
Xwill read from stdin.
XIf you specify more than one file on the command line,
X.I kdsi
Xwill print the total for each category.
X.SH SEE ALSO
Xwc(1)
SHAR_EOF
chmod 0644 doc/kdsi.1L || echo "restore of doc/kdsi.1L fails"
echo "x - extracting doc/mccabe.doc (Text)"
sed 's/^X//' << 'SHAR_EOF' > doc/mccabe.doc
XMccabe determines function complexity based on Mccabe model
Xof program complexity.
X
XUsage: mccabe [-n] file [file]
X
XThe -n flag (No Header) is useful if you are using this to produce data for
Xother tools
X
XMccabe produces (in order) the following output
X
X	File Name
X	Function Name	(or *** for complexity not in a function, as in yacc)
X	Complexity
X	Count of Return statements
X
XTypically a complexity over 10 is cause for further examination.  The
Xnumber of returns in a function is also important in predicting error
Xcounts for that function, perhaps even more important than the complexity.
SHAR_EOF
chmod 0644 doc/mccabe.doc || echo "restore of doc/mccabe.doc fails"
echo "x - extracting src/control/README (Text)"
sed 's/^X//' << 'SHAR_EOF' > src/control/README
XThis directory contains an example of some tools which glom together the
Xoutput of various metrics tools into some (potentially) usable databases.
XThere are two databases produced, one of which is file-based (containing
Xhalstead, change count and line count information) and one of which is
Xfunction based (containing mccabe and return information).  Obviously,
Xthere can be one or more functions in a file.
X
XI have only a few printouts left indicating the results of the multiple
Xregression models I built.  The results from these are presented later in
Xthis file.  The most (statistically) significant variable for the
Xprediction of errors (SCCS changes) was halstead volume.  Mccabe, returns
Xand comment counts were also significant, at similar statistical levels.
XKdsi and volume were highly correlated.  The only variable I disbelieve is
Xthe values for mccabe.  See my comments in the doc directory.
X
XThe structure of the scripts follows.  They are not easy to read, although
Xif you lay them all out before you, they are not too bad.
X
XThe file pascal_stats is the highest level script.  It actually knows about
Xthe structure of the application's directories.  (But, before trying to use
Xthis file, see the comments about proj_stats below.) It calls gather_stats
Xto pull stuff together for each directory.  The output from gather_stats is
Xjust appended to the appropriate metrics database.
X
XGather_stats actually gets the statistics, for the file specification
Xpassed to it.  Note that I have modified it since it was last used.  I
Xassume that it still compiles.  In addition, some of the commands produce
Xmore and different output than they used to.  I have changed the
Xspecification of the joins to what I think it should be.
X
XI have created a new script proj_stats which employs a different (easier to
Xread) method than that in "pascal_stats" for naming all of the directories
Xand files in a project.  It reads a file for the names of the directories
Xand the patterns which describe files in those directories.  An example
Xfile which should work for the pascal project is in "example_spec".
X
XThe other new scripts are altparse.prs and altkdsi.  Altkdsi just post-
Xprocesses the output of the kdsi command to strip off the totals line.
XAltparse.prs finds the change count from sccs.  It does not count ".1"
Xdeltas, assuming that those are new releases, rather than bug fixes.
SHAR_EOF
chmod 0644 src/control/README || echo "restore of src/control/README fails"
echo "x - extracting src/control/altkdsi (Text)"
sed 's/^X//' << 'SHAR_EOF' > src/control/altkdsi
X: postprocess kdsi to put it in form for gather_stats
X
Xkdsi $* | awk '$5 != "total" {printf("%s\t%s\t%s\n", $5, $1, $4);}'
SHAR_EOF
chmod 0644 src/control/altkdsi || echo "restore of src/control/altkdsi fails"
echo "x - extracting src/control/altparse.prs (Text)"
sed 's/^X//' << 'SHAR_EOF' > src/control/altparse.prs
X: parse output from sccs prs command
X
Xfor file in $*
Xdo
X	prs ${file} | awk '
X	BEGIN {
X		True = 1;
X		False = 0;
X		inMR = False;
X		inComment = False;
X		first = True;
X	}
X
X	$0 == "" {
X		inMR = False;
X		inComment = False;
X		next;
X	}
X
X	$0 ~ /^D [0-9][0-9]*\.[0-9][0-9]*/ {
X		# got delta entry
X		len = length( $2 );
X		if ( substr($2, len - 1) != ".1" )
X			changect++;
X		next;
X	}
X
X	$1 ~ /^MRs:/ {
X		inMR = True;
X		next;
X	}
X
X	$1 ~ /^COMMENT/ {
X		inComment = True;
X		next;
X	}
X
X	inMR == 1 {	# skipping through MR section
X		next;
X	}
X
X	inComment == 1 {	# skipping through comment section
SHAR_EOF
echo "End of part 3"
echo "File src/control/altparse.prs is continued in part 4"
echo "4" > s2_seq_.tmp
exit 0


-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
Use a domain-based address or give alternate paths, or you may lose out.