rsalz@uunet.uu.net (Rich Salz) (09/20/89)
Submitted-by: Brian Renaud <huron.ann-arbor.mi.us!bdr> Posting-number: Volume 20, Issue 10 Archive-name: metrics/part03 ---- Cut Here and unpack ---- #!/bin/sh # this is part 3 of a multipart archive # do not concatenate these parts, unpack them in order with /bin/sh # file doc/Results1 continued # CurArch=3 if test ! -r s2_seq_.tmp then echo "Please unpack part 1 first!" exit 1; fi ( read Scheck if test "$Scheck" != $CurArch then echo "Please unpack part $Scheck next!" exit 1; else exit 0; fi ) < s2_seq_.tmp || exit 1 echo "x - Continuing file doc/Results1" sed 's/^X//' << 'SHAR_EOF' >> doc/Results1 Xheavily to the change count. This is also intuitive. (I think there Xare two reasons for this, (1) that code with many special cases requiring Xreturns/exits is not well thought out ahead of time, but rather, coded Xas the implementor thought and (2) that changes to code with many returns Xcan be difficult, and can easily be incorrect. X XYou might think at this point that you have a magic formula for predicting Xerrors. Unfortunately, although this stuff does a very good job of Xpredicting problems on the pascal project, the best R-sqared I ever Xgot for the RSM(*1) data was still less than .4. Apparently, there are Xother factors at work. I believe that one very important factor is the Xexperience/skill of the original implementor and of those who made changes Xto the module. The uids of these people can be obtained from the sccs file. XYou then need to have some way to associate a skill level with the person X(note that time may be important here too). I never completed this final Xstep. X X X*1 - RSM == Remote Software Management, a distributed source distribution Xand control system. See results2 for more info. X XAs always, feel free to contact me with questions or assistance in setting Xthis stuff up to work on a project you want to analyze. X XBrian Renaud SHAR_EOF echo "File doc/Results1 is complete" chmod 0644 doc/Results1 || echo "restore of doc/Results1 fails" echo "x - extracting doc/Results2 (Text)" sed 's/^X//' << 'SHAR_EOF' > doc/Results2 XSome additional example metrics. These are from the RSM and pascal X(combined) projects. X XThis example predicts changes (errors) by using (halstead) volume, Xcomments, functions (number of functions in file) and returns X(total returns). There are 490 files (data points). The data contains Xa number of significant outliers. For every variable, the standard Xdeviation is greater than the mean. X XNote that I included function count rather than mccabe. This Xseemed to be a more significant variable. X XThe regression equation is: X Xchanges = 0.00164 volume - 0.1097comments - 0.2515 functions X + 0.07432 returns + 3.18609 X XR-Squared is .5818 X XNote that, once again we have a counter-intuitive variable. This time Xfunctions varies inversely with changes. (That is, more functions in Xa file imply fewer changes.) Strange; I speculate that since file size Xis explained by the volume variable, many functions in a file means small Xfunctions. Small functions typically contain fewer errors. X XThe correlation matrix is: X Xchanges 1.0000 Xvolume 0.6630 1.0000 Xcomments 0.2152 0.7068 1.0000 Xfunctions 0.2394 0.6311 0.7528 1.0000 Xreturns 0.6879 0.7087 0.1825 0.3079 1.0000 X X XThe t test results are: X Xvolume 10.5995 Xcomments 5.0503 Xfunctions 1.8426 Xreturns 3.8045 SHAR_EOF chmod 0644 doc/Results2 || echo "restore of doc/Results2 fails" echo "x - extracting doc/halstead.doc (Text)" sed 's/^X//' << 'SHAR_EOF' > doc/halstead.doc XHalstead provides various indicators of the module's complexity. XThe module is examined to determine the following X Xn1 == number of unique operators Xn2 == number of unique operands XN1 == number of total operators XN2 == number of total operands X XThe program writes these output fields: X XFile Name XProgram Length (N = N1 + N2) XProgram Volume (V = N log<base2> (n1 + n2)) XProgram Level (L = (2/n1) * (n2/N2)) XMental Discriminations (E^ = V / L) X XThe program volume (aka ``Halstead Volume'') is probably most useful Xand seems to be reasonably well correlated with error counts for modules. SHAR_EOF chmod 0644 doc/halstead.doc || echo "restore of doc/halstead.doc fails" echo "x - extracting doc/kdsi.1L (Text)" sed 's/^X//' << 'SHAR_EOF' > doc/kdsi.1L X.TH KDSI "L COSI" 2/2/86 X.\" bdr X.UC X.SH NAME Xkdsi - count number of lines of code in a C program X.SH SYNOPSIS X.B kdsi X.B [ file ]* X.SH DESCRIPTION X.I Kdsi Xcounts the lines of code in a C program. XIt provides the following information: X.sp 1 X.nf X lines of code X blank lines X comments lines X number of comments X.fi X.LP XIf you specify no files, X.I kdsi Xwill read from stdin. XIf you specify more than one file on the command line, X.I kdsi Xwill print the total for each category. X.SH SEE ALSO Xwc(1) SHAR_EOF chmod 0644 doc/kdsi.1L || echo "restore of doc/kdsi.1L fails" echo "x - extracting doc/mccabe.doc (Text)" sed 's/^X//' << 'SHAR_EOF' > doc/mccabe.doc XMccabe determines function complexity based on Mccabe model Xof program complexity. X XUsage: mccabe [-n] file [file] X XThe -n flag (No Header) is useful if you are using this to produce data for Xother tools X XMccabe produces (in order) the following output X X File Name X Function Name (or *** for complexity not in a function, as in yacc) X Complexity X Count of Return statements X XTypically a complexity over 10 is cause for further examination. The Xnumber of returns in a function is also important in predicting error Xcounts for that function, perhaps even more important than the complexity. SHAR_EOF chmod 0644 doc/mccabe.doc || echo "restore of doc/mccabe.doc fails" echo "x - extracting src/control/README (Text)" sed 's/^X//' << 'SHAR_EOF' > src/control/README XThis directory contains an example of some tools which glom together the Xoutput of various metrics tools into some (potentially) usable databases. XThere are two databases produced, one of which is file-based (containing Xhalstead, change count and line count information) and one of which is Xfunction based (containing mccabe and return information). Obviously, Xthere can be one or more functions in a file. X XI have only a few printouts left indicating the results of the multiple Xregression models I built. The results from these are presented later in Xthis file. The most (statistically) significant variable for the Xprediction of errors (SCCS changes) was halstead volume. Mccabe, returns Xand comment counts were also significant, at similar statistical levels. XKdsi and volume were highly correlated. The only variable I disbelieve is Xthe values for mccabe. See my comments in the doc directory. X XThe structure of the scripts follows. They are not easy to read, although Xif you lay them all out before you, they are not too bad. X XThe file pascal_stats is the highest level script. It actually knows about Xthe structure of the application's directories. (But, before trying to use Xthis file, see the comments about proj_stats below.) It calls gather_stats Xto pull stuff together for each directory. The output from gather_stats is Xjust appended to the appropriate metrics database. X XGather_stats actually gets the statistics, for the file specification Xpassed to it. Note that I have modified it since it was last used. I Xassume that it still compiles. In addition, some of the commands produce Xmore and different output than they used to. I have changed the Xspecification of the joins to what I think it should be. X XI have created a new script proj_stats which employs a different (easier to Xread) method than that in "pascal_stats" for naming all of the directories Xand files in a project. It reads a file for the names of the directories Xand the patterns which describe files in those directories. An example Xfile which should work for the pascal project is in "example_spec". X XThe other new scripts are altparse.prs and altkdsi. Altkdsi just post- Xprocesses the output of the kdsi command to strip off the totals line. XAltparse.prs finds the change count from sccs. It does not count ".1" Xdeltas, assuming that those are new releases, rather than bug fixes. SHAR_EOF chmod 0644 src/control/README || echo "restore of src/control/README fails" echo "x - extracting src/control/altkdsi (Text)" sed 's/^X//' << 'SHAR_EOF' > src/control/altkdsi X: postprocess kdsi to put it in form for gather_stats X Xkdsi $* | awk '$5 != "total" {printf("%s\t%s\t%s\n", $5, $1, $4);}' SHAR_EOF chmod 0644 src/control/altkdsi || echo "restore of src/control/altkdsi fails" echo "x - extracting src/control/altparse.prs (Text)" sed 's/^X//' << 'SHAR_EOF' > src/control/altparse.prs X: parse output from sccs prs command X Xfor file in $* Xdo X prs ${file} | awk ' X BEGIN { X True = 1; X False = 0; X inMR = False; X inComment = False; X first = True; X } X X $0 == "" { X inMR = False; X inComment = False; X next; X } X X $0 ~ /^D [0-9][0-9]*\.[0-9][0-9]*/ { X # got delta entry X len = length( $2 ); X if ( substr($2, len - 1) != ".1" ) X changect++; X next; X } X X $1 ~ /^MRs:/ { X inMR = True; X next; X } X X $1 ~ /^COMMENT/ { X inComment = True; X next; X } X X inMR == 1 { # skipping through MR section X next; X } X X inComment == 1 { # skipping through comment section SHAR_EOF echo "End of part 3" echo "File src/control/altparse.prs is continued in part 4" echo "4" > s2_seq_.tmp exit 0 -- Please send comp.sources.unix-related mail to rsalz@uunet.uu.net. Use a domain-based address or give alternate paths, or you may lose out.