[comp.lang.c] Determining C Complexity

tomr@ashtate (Tom Rombouts) (07/20/90)

In article <3205@mica6.UUCP> motcid!henley@uunet.uu.net writes:
>I'm looking for the different points of view related to determining
>the complexity of a C program.  If anyone knows of any programs available
>that already do this I'd like to have some pointers to those as well.  The
>tool or theories would be applied to source code to enable management the
>ability to maintain stats regarding such things as:

In the DOS world, PC Metric for about $200 is a pretty good little
tool for this.  The manual also includes some history of the various
complexity algorithms it can use.  I see that versions for FORTRAN,
Pascal and Modula-2 are also available.  

However, be warned that (in my experience, at least), trying to quantify
software development into precise statistics based on things such as
lines of code, percentage of project completed, etc. is difficult at
best.  You can make sophisticated charts and graphs that will impress
managers or investors, but there is no real way to measure such things
as intelligence of the overall design and architecture, elegance of
algorithms, or quality of comments and documentation.  

Tom Rombouts  Torrance Techie  tomr@ashtate.A-T.com  V: (213) 538-7108

DISCLAIMER:  The above opinions are my own and do not necessarily rep-
resent the views or opinions of any known corporate entity.

warren@eecs.cs.pdx.edu (Warren Harrison) (07/21/90)

In article <1050@ashton.UUCP> tomr@ashton.UUCP (Tom Rombouts) writes:
>In article <3205@mica6.UUCP> motcid!henley@uunet.uu.net writes:
>>If anyone knows of any programs available
>>that already do this I'd like to have some pointers to those as well. 
>
>In the DOS world, PC Metric for about $200 is a pretty good little
>tool for this.  The manual also includes some history of the various
>complexity algorithms it can use.  I see that versions for FORTRAN,
>Pascal and Modula-2 are also available.  
>
There are versions of PC-METRIC for UNIX, VMS and Apple Macintosh too.
The languages supported are Ada, C, COBOL, Pascal, FORTRAN, dBASE, Modula-2,
QuickBASIC and a variety of assembly languages (all languages are not
supported on all machines). The UNIX versions are $495, but the PC and Mac
versions are still $199.

Disclaimer: My wife owns the company that makes and sells PC-METRIC ...

Warren


==========================================================================
Warren Harrison                                          warren@cs.pdx.edu
Department of Computer Science                                503/725-3108
Portland State University

johnb@srchtec.UUCP (John Baldwin) (07/24/90)

In article <1050@ashton.UUCP> tomr@ashton.UUCP (Tom Rombouts) writes:
>
>.......................... there is no real way to measure such things
>as intelligence of the overall design and architecture, elegance of
>algorithms, or quality of comments and documentation.  

True.  But you CAN quantify (statistically, not deterministically) such
things as "did the programmer, in general, decompose the design into
subroutines (functions, procedures, ad infinitum) of manageable size?"

Naturally, you shouldn't become the "code police" on the basis of those
quantifiers alone.  For instance, a certain routine may have to run on
a special stack of limited size; therefore it might be coded as an extremely
long procedure, with calls only to "primitive" subroutines, in order to
avoid exceeding stack depth.

A good policy (by my experience) is to take complexity metrics (and the like)
with a grain of salt:  your metric rates some piece of code with a
McCabe complexity of 22, and the average is 8.2.  So you check the code;
there's no immediate reason for the excess complexity (based on your perusal).
You talk to Joe Programmer, who provides an extremely logical explanation
AND a reasonable defense for doing things this way (as opposed to....).
BTW, note that both an explanation AND a defense are required.

Having satisfied all of this, you good-naturedly remind Joe P. that he
should place a comment near the beginning of the source code which contains
a synopsis of his explanation and defense of the design.  In parallel, you
jot "ok" next to this metric, with some kind of cross reference.

-- 
John T. Baldwin            | Disclaimer:
search technology, inc.    |    Some people claim I never existed.
Norcross, Georgia          | (real .sig under construction
johnb@srchtec.uucp         |  at Starfleet Orbital Navy Yards ;-)

jamiller@hpcupt1.HP.COM (Jim Miller) (07/24/90)

>A good policy (by my experience) is to take complexity metrics (and the like)
>with a grain of salt:  your metric rates some piece of code with a
>McCabe complexity of 22, and the average is 8.2.  So you check the code;
>
>-- 
>John T. Baldwin            | Disclaimer:
>search technology, inc.    |    Some people claim I never existed.
>Norcross, Georgia          | (real .sig under construction
>johnb@srchtec.uucp         |  at Starfleet Orbital Navy Yards ;-)
>----------

For example, I understand that a huge case statement, where each case of
100's may only be 2-6 lines long (can you say "apply production" :-),
will give an enormous McCabe complexity.

It is obviously better coding style for the apply production routine
to break the range up into subranges, and call routines like
apply1to20(), apply21to40(), ..., apply201to220, ...
Much easier to maintain and read :-)

Obviously.

    jim miller
    Standard disclaimer

bright@Data-IO.COM (Walter Bright) (07/26/90)

In article <142@srchtec.UUCP> johnb@srchtec.UUCP (John Baldwin) writes:
<In article <1050@ashton.UUCP> tomr@ashton.UUCP (Tom Rombouts) writes:
<<.......................... there is no real way to measure such things
<<as intelligence of the overall design and architecture, elegance of
<<algorithms, or quality of comments and documentation.  
<True.  But you CAN quantify (statistically, not deterministically) such
<things as "did the programmer, in general, decompose the design into
<subroutines (functions, procedures, ad infinitum) of manageable size?"
<A good policy (by my experience) is to take complexity metrics (and the like)
<with a grain of salt:  your metric rates some piece of code with a
<McCabe complexity of 22, and the average is 8.2.  So you check the code;
<there's no immediate reason for the excess complexity (based on your perusal).

In my experience, metrics are completely useless. There is no substitute for
an organized code review. This is where a listing is run of the entire
project code, and is passed around the other programmers who review it and
mark it up for style, clarity, obvious bugs, aesthetics, whatever.
Then you have a meeting where you go through all the markups.
You'd be amazed at the quality improvements that result. Another major
advantage of this is that the junior members of the team learn a lot from
the senior members, and the senior members learn new techniques from the
juniors. It's time well spent.

henry@zoo.toronto.edu (Henry Spencer) (07/26/90)

In article <2592@dataio.Data-IO.COM> bright@Data-IO.COM (Walter Bright) writes:
>In my experience, metrics are completely useless...

I concur.  Code metrics, like line counts and structure charts and a zillion
other programming fads, are an attempt to substitute rules and procedures
for adequate resources and competence.  Bureaucrats cling to the belief
that this approach can be made to work, even in the face of overwhelming
evidence that it doesn't, because adequate resources are expensive and
competence is difficult to assess and costly to retain.  TANSTAAFL.
-- 
NFS:  all the nice semantics of MSDOS, | Henry Spencer at U of Toronto Zoology
and its performance and security too.  |  henry@zoo.toronto.edu   utzoo!henry

johnb@srchtec.UUCP (John Baldwin) (07/27/90)

In article <2592@dataio.Data-IO.COM> bright@Data-IO.COM (Walter Bright) writes:
>....[excerpted]...
> There is no substitute for an organized code review.

Agreed.  In my posting on the use of complexity metrics, I did not mean to
suggest that metrics be used to the exclusion of code reviews or walkthroughs.
We are using both design walkthroughs and code walkthroughs on the project I
am currently involved with, and I instituted a review process with a previous
employer.  In that previous environment, I found that the metrics were helpful
just for the nonsubjective input they provided in the review process, even
though the interpretation was subjective.  Some friends of mine in other
organizations have mentioned that metrics, when properly used, allowed team
leaders to continue to maintain a reasonable "situation assessment" even
when unreasonable time constraints prevented thorough reviews of all code.

Perhaps I was too subtle earlier, but I also wanted to make the point that
when using imperfect tools such as code metrics, it is vitally important
that the user differentiate between ACCURACY and PRECISION.  For those who
work in organizations where you are required to submit metrics or other
"statistics" (I use the term very loosely here) to nontechnical management,
it may be very helpful to express your numbers in the form "value (+|-) err".
If I have some measurement that I know is only accurate to within 50% of
its value, I will very likely reduce a number such as "6.12214" to the form
"6 (+|-) 3".  [Interpret "(+|-)" as the over/under plus-or-minus symbol :-)]

-- 
John T. Baldwin                      |  johnb@srchtec.uucp
Search Technology, Inc.              |  johnb%srchtec.uucp@mathcs.emory.edu
standard disclaimer:                 |  ...uunet!samsung!emory!stiatl!srchtec..
opinions and mistakes purely my own. |  ...mailrus!gatech!stiatl!srchtec...

ssdken@watson.Claremont.EDU (Ken Nelson) (07/28/90)

In article <2592@dataio.Data-IO.COM> bright@Data-IO.COM (Walter Bright) writes:
>....[excerpted]...
> There is no substitute for an organized code review.

   Right.  Metrics don't replace reviews, they augment them. 
   Where they really help is when time and resources are limiting
   what you can review.  If you use high complexity or some other
   metric to help you decide what to review, and how in-depth to review,
   you can prevent or detect a lot more bugs.

   I am interested in how you (or anybody who cares to
   comment) organize code reviews.  Do you have a checklist,
   is there a packet of information that is prepared before
   the review?  How is it prepared or collected?  What people
   are invited.  Too often I have worked on projects where the 
   "organization" for the review meant merely rounding up the
   people. Once organized,
   if not controlled they would degenerate into religous 
   debates between factions of different programming styles.

   Are there any articles or books you could recommend that
   detail a "good" review process?

   I have seen these items used in past reviews:

     1. A static analyzer's output including full cross reference.
     2. A list of know bugs, and or needed improvements.
     3. A list of To Be Determined items about the code.
     4. A pretty printed output.
     5. LINT output, if written in C.
     6. The set of requirements for the program or program
 	section.
     7. Coding standards for the project.

   Any additions, subtractions, comments??

				Ken Nelson
				Principal Engineer
				Software Systems Design
				3627 Padua Av.
				Claremont, CA 91711
				(714) 624-3402

henry@zoo.toronto.edu (Henry Spencer) (07/29/90)

In article <1990Jul26.165322.2729@zoo.toronto.edu> I wrote:
>... Code metrics, like line counts and structure charts and a zillion
>other programming fads, are an attempt to substitute rules and procedures
>for adequate resources and competence...  TANSTAAFL.

Several people have asked about the acronym; the expansion is "There Ain't
No Such Thing As A Free Lunch".  (I believe it was Heinlein who coined
the acronym, although the phrase has been around for a long time.)
-- 
The 486 is to a modern CPU as a Jules  | Henry Spencer at U of Toronto Zoology
Verne reprint is to a modern SF novel. |  henry@zoo.toronto.edu   utzoo!henry

bright@Data-IO.COM (Walter Bright) (07/31/90)

In article <7967@jarthur.Claremont.EDU> ssdken@watson.Claremont.EDU (Ken Nelson) writes:
>In article <2592@dataio.Data-IO.COM> bright@Data-IO.COM (Walter Bright) writes:
<< There is no substitute for an organized code review.
<   Right.  Metrics don't replace reviews, they augment them. 
<   Where they really help is when time and resources are limiting
<   what you can review.

I'm arguing against using metrics at all. If you're limited in resource,
decide to review the newest code, or the most critical.

<   Do you have a checklist,
<   is there a packet of information that is prepared before
<   the review?  How is it prepared or collected?  

A listing is run of all the code. Each reviewer gets a pen of a different
color, and they simply read the code and mark it up.
Some things they look for:
	o Obvious bugs
	o Worthless comments:	i++;	/* increment i */
	o Failing to check for error returns from functions
	o Failing to check malloc returns for NULL
	o Meaningless comments
	o Array fencepost problems
	o Possible array overflows
	o Portability problems
	o Inefficient code	(i & 1) ? 1 : 0
	o Dead code
	o Violations of group style consensus
	o Things that should be typedef'd, things that should not be typedef'd
	o Global functions and variables that should be static
	o Modularity
	o Anything else that the reviewer thinks should be discussed or
	  revised

<   What people are invited.

The team programmers, plus 'guest' programmers from other projects.
Managers are excluded.

<   Once organized, if not controlled they would degenerate into religous 
<   debates between factions of different programming styles.

The moderator's job is to prevent such degenerations and to keep
things constructive.

petej@ssg0.UUCP (Peter M. Jansson) (08/01/90)

In article <2600@dataio.Data-IO.COM> bright@Data-IO.COM (Walter Bright) writes:
>I'm arguing against using metrics at all. If you're limited in resource,
>decide to review the newest code, or the most critical.
>

Has anyone really run a study of how code metrics correlate to
maintenance costs?  Something like "In this module, the complexity was
high, and we had to make more changes to it after delivery" for a
reasonable size system would be helpful.

Data like these would help to determine in what circumstances metrics
are useful.

By the way, how does one define "most critical?"

johnb@srchtec.UUCP (John Baldwin) (08/02/90)

In article <2600@dataio.Data-IO.COM> bright@Data-IO.COM (Walter Bright) writes:
|
|I'm arguing against using metrics at all. If you're limited in resource,
|decide to review the newest code, or the most critical.
|
   [excerpted]
|Some things they look for:
   [more deleted]
|	o Array fencepost problems
|	o Possible array overflows
|	o Portability problems
|	o Inefficient code	(i & 1) ? 1 : 0
|	o Dead code

All of the above are good candidates for automation: they can be detected
just as easily using "mechanical" methods.  Supplying the review group with
the output from a code analyser will provide MORE TIME for the people to do
what humans do better at anyway, namely, exercise judgement.  It is exactly
this aspect which makes code reviews so valuable (in both your opinion and
mine).  A properly-written automated tool will be much better at catching
dead code and the like than a human reader will; filter programs don't get
tired, distracted, or bleary-eyed.

-- 
John T. Baldwin                      |  johnb@srchtec.uucp
Search Technology, Inc.              |  johnb%srchtec.uucp@mathcs.emory.edu
standard disclaimer:                 |  ...uunet!samsung!emory!stiatl!srchtec..
opinions and mistakes purely my own. |  ...mailrus!gatech!stiatl!srchtec...

bright@Data-IO.COM (Walter Bright) (08/02/90)

In article <204@ssg0.UUCP> petej@ssg0.UUCP (Peter M. Jansson) writes:
<In article <2600@dataio.Data-IO.COM> bright@Data-IO.COM (Walter Bright) writes:
<<I'm arguing against using metrics at all. If you're limited in resource,
<<decide to review the newest code, or the most critical.
<By the way, how does one define "most critical?"

I'd define most critical as code that is the most difficult to find problems
in. Examples are interrupt handlers. Interrupt handlers can easily have bugs
in them that only show up once every 1000 times they are called, and thus
are very difficult to verify correct by running test cases. The best
way to test and debug interrupt handlers is by "gedanken experiments",
that is, carefully examining the code with your brain.

A least critical piece would be, say, the code that parses the command
line switches. Bugs there usually show up without too much prodding and
are easily tested and fixed.

donm@margot.Eng.Sun.COM (Don Miller) (08/02/90)

In article <1990Jul26.165322.2729@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>In article <2592@dataio.Data-IO.COM> bright@Data-IO.COM (Walter Bright) writes:
>>In my experience, metrics are completely useless...
>
>I concur.  Code metrics, like line counts and structure charts and a zillion
>other programming fads, are an attempt to substitute rules and procedures
>for adequate resources and competence.  Bureaucrats cling to the belief
>that this approach can be made to work, even in the face of overwhelming
>evidence that it doesn't, because adequate resources are expensive and
>competence is difficult to assess and costly to retain.  TANSTAAFL.
>-- 
>NFS:  all the nice semantics of MSDOS, | Henry Spencer at U of Toronto Zoology
>and its performance and security too.  |  henry@zoo.toronto.edu   utzoo!henry

Better late than never with a rebuttal, I suppose.  

There's no arguing that substituting metrics, or any other
"rules and procedures", for adequate resources and competence
is not a good software development strategy.  Although I'm
not a bureaucrat, I cling to the belief that the use of metrics
during the software development process does work.  

To some extent, I'm reminded of the wise and talented American 
auto worker of the 70's, dismissing the statistical process
improvement methods (i.e. metrics) with the view that nobody 
beats the talented, creative, hard-working American labor force.
Apparently, Japan didn't hold the same view.  Japanese auto makers
have used metrics to improve product quality to superior levels.

Granted, cars aren't software.  Software, on the other hand, is
a product, no matter how "non-assembly line".  The software 
market is just beginning to become one in which quality can be
used as a distinct competitive advantage.  Thus, an attempt to
measure software quality with an eye toward continuous improvement
seems a rational course.

Please feel free to throw opinions at me.  I am very interested
in the impact of quality processes on the software development 
process.

Don Miller
Software Quality Engineering
Sun Microsystems, Inc.
donm@eng.sun.com

henry@zoo.toronto.edu (Henry Spencer) (08/03/90)

In article <140003@sun.Eng.Sun.COM> donm@margot.Eng.Sun.COM (Don Miller) writes:
>Granted, cars aren't software.  Software, on the other hand, is
>a product, no matter how "non-assembly line".  The software 
>market is just beginning to become one in which quality can be
>used as a distinct competitive advantage...

Couldn't prove it by me, considering the absolute garbage that is usually
shipped, and the marketdroids' fixation on new features, as opposed to
making the old ones work.  I would *hope* for such a change, and have
been known to pointedly suggest it to certain companies, but I'm not
very optimistic.

>Thus, an attempt to
>measure software quality with an eye toward continuous improvement
>seems a rational course.

Certainly.  What does that have to do with code metrics?

That is the crux of the problem:  there is little or no evidence that
those code metrics measure anything *useful*.

If you want to measure quality, measure quality.  Count verified bug
reports and performance problems, and perhaps introduce some sort of
modifier for memory consumption.  These are not terribly good measures
of quality, but at least they measure real problems!

(If you want a suggestion on what to *do* with that information, forget
Japan for the moment and apply a dose of capitalism.  Start with the
number 1 at the beginning of the quarter.  Every time you get a legitimate
report of a flaw -- bug, performance problem, etc. -- multiply the
current number by 0.99.  At the end of the quarter, each person in
the programming group gets that fraction of his salary as a lump-sum
performance bonus.  Note that this applies across the whole group, not
person-by-person, which encourages cooperative efforts to reduce the
flaw rate.  And yes, this means that a very low flaw rate means paying
those people a lot of extra money -- they'll be worth it!)

(There are obviously a lot of details that need ironing out, and the
classification of things as flaws has to be done by some independent
assessment group, and it would be better to have an additive scheme
that lets people get *more* than 100% bonuses, but the general
point is clear, I think.  Do this and you can forget about silliness
like imposing code-metric testing on everyone -- the programmers themselves
will figure out which quality-improvement schemes work and which don't.)
-- 
The 486 is to a modern CPU as a Jules  | Henry Spencer at U of Toronto Zoology
Verne reprint is to a modern SF novel. |  henry@zoo.toronto.edu   utzoo!henry

daves@hpopd.HP.COM (Dave Straker) (08/03/90)

>In article <1990Jul26.165322.2729@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>>In article <2592@dataio.Data-IO.COM> bright@Data-IO.COM (Walter Bright) writes:
>>>In my experience, metrics are completely useless...
>>
>>I concur.  Code metrics, like line counts and structure charts and a zillion
>>other programming fads, are an attempt to substitute rules and procedures
>>for adequate resources and competence.  Bureaucrats cling to the belief
>
When illuminati like Walter and Henry say such things I despair!  
I concur more with Tom DeMarco:

"You can't control what you can't measure"
"A useful metric is measurable, independent, accountable and precise"
"Measuring any project parameter and attaching evident significance to it 
will affect the usefulness of that parameter"
"Rational, competent men and women can work effectively to maximise
*any single observed* indication of success"

Metrics are not perfect, but when properly understood they can help
better decisions to be made.

Dave Straker            Pinewood Information Systems Division (PWD not PISD)
[8-{)                   HPDESK: David Straker/HP1600/01
                        Unix:   daves@hpopd.hp.com

petej@ssg0.UUCP (Peter M. Jansson) (08/03/90)

In article <1990Aug2.195825.29393@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>That is the crux of the problem:  there is little or no evidence that
>those code metrics measure anything *useful*.
>

There's little or no evidence that metrics *don't* measure anything useful,
either.  In fact, there's little or no evidence about metrics at all, at
least that I've seen.  Can anyone provide citations of work that 
shows how metrics correlate to quality/maintenance cost?  We don't seem to
be having much substantial discussion without this information. ( "I hate
'em" "I like 'em" "I don't know")

	Pete.

bright@Data-IO.COM (Walter Bright) (08/04/90)

In article <161@srchtec.UUCP> johnb@srchtec.UUCP (John Baldwin) writes:
<In article <2600@dataio.Data-IO.COM> bright@Data-IO.COM (Walter Bright) writes:
<|Some things they look for:
<|	o Array fencepost problems
<|	o Possible array overflows
<|	o Portability problems
<|	o Inefficient code	(i & 1) ? 1 : 0
<|	o Dead code
<All of the above are good candidates for automation: they can be detected
<just as easily using "mechanical" methods.

True, simple cases can be discovered by automation. Most cases I worry about
that *cannot* be (and I've seen all of these):

o Array fencepost problems
	func(char *p)
	{	for (i = 0; i <= max; i++)	/* ERROR! should be i < max */
			p[i] = 5;
	}

o Possible array overflows
	char a[10];
	strcpy(a,p);		/* is p guaranteed to fit?	*/

o Portability problems
	Not fopen'ing a binary file with "rb" instead of "r".

o Inefficient code
	Replacing a bubble sort with qsort().
	Replacing scaling by a double with scaling by a long.

o Dead code (typically there due to maintenance, the case may be handled
  elsewhere in the program or may have been designed out of existence)
	switch (i)
	{	case 1:  ....
		case 2:  ....
		case 3:  ....	/* but i can never be 3 in this function! */
	}

The possibilities, of course, are endless. An analysis program simply
cannot know what the design of the program is supposed to be.

donm@margot.Eng.Sun.COM (Don Miller) (08/04/90)

First, a quick definition:

A software metric defines a standard way of measuring some attribute 
of the software development process.  For example, size, cost, defects,
communications, difficulty, and environment are all attributes.

from the book Software Metrics: Establishing a Company-Wide Program,
an account of the implementation of a metrics program at HP.

In article <1990Aug2.195825.29393@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
I said (among other things):
>>Thus, an attempt to
>>measure software quality with an eye toward continuous improvement
>>seems a rational course.
>
>Certainly.  What does that have to do with code metrics?
>
>That is the crux of the problem:  there is little or no evidence that
>those code metrics measure anything *useful*.
>
>If you want to measure quality, measure quality.  Count verified bug
>reports and performance problems, and perhaps introduce some sort of
>modifier for memory consumption.  These are not terribly good measures
>of quality, but at least they measure real problems!

  When I first joined this thread I understood and agreed with
  the discussions dismissing code metrics as an ends in themselves.
  My assertion is that the measurement of the software development
  process (including code, defects, people, time, cost, etc.) is 
  the only way to evaluate changes to the process, presumably made
  to increase quality, speed up development time, decrease resource
  usage, or increase functionality.  In general, metrics are a 
  means towards developing a maximally efficient software development
  process.

>(If you want a suggestion on what to *do* with that information, forget
>Japan for the moment and apply a dose of capitalism.  Start with the
>number 1 at the beginning of the quarter.  Every time you get a legitimate
>report of a flaw -- bug, performance problem, etc. -- multiply the
>current number by 0.99.  At the end of the quarter, each person in
>the programming group gets that fraction of his salary as a lump-sum
>performance bonus.  Note that this applies across the whole group, not
>person-by-person, which encourages cooperative efforts to reduce the
>flaw rate.  And yes, this means that a very low flaw rate means paying
>those people a lot of extra money -- they'll be worth it!)

  Here is a fatal flaw of metrics practice.  Using them to evaluate
  and reward is counterproductive.  Usage of metrics data for such
  purposes will only cause future data to be distorted and useless.
  The goal becomes ensuring no one knows you have bugs rather than
  not having them.

> ... -- the programmers themselves
>will figure out which quality-improvement schemes work and which don't.)

  Not if they don't have the means to evaluate the meaning of "work"
  (especially "works better").

  I've redirected followups to comp.software-eng for further
  interesting discussion.

Don Miller
Software Quality Engineering
Sun Microsystems, Inc.
donm@margot.eng.sun.com

johnb@srchtec.UUCP (John Baldwin) (08/04/90)

In article <204@ssg0.UUCP> petej@ssg0.UUCP (Peter M. Jansson) writes:
>
>Has anyone really run a study of how code metrics correlate to
>maintenance costs?  Something like "In this module, the complexity was
>high, and we had to make more changes to it after delivery" for a
>reasonable size system would be helpful.

1. T. Capers Jones; _Programming_Productivity_; McGraw-Hill; New York; 1986;
                    pp. 107-110.

2. T. Capers Jones; _Program_Quality_and_Programmer_Productivity; TR 02.764;
                    IBM Corp.; San Jose, CA; Jan 1977.

[If I can find the bibliography I did last year, there are some good ref-
erences in places such as "IEEE Transactions on Software Engineering".
I'll look.]
-- 
John T. Baldwin                      |  johnb@srchtec.uucp
Search Technology, Inc.              |  johnb%srchtec.uucp@mathcs.emory.edu
standard disclaimer:                 |  ...uunet!samsung!emory!stiatl!srchtec..
opinions and mistakes purely my own. |  ...mailrus!gatech!stiatl!srchtec...

henry@zoo.toronto.edu (Henry Spencer) (08/05/90)

In article <7990015@hpopd.HP.COM> daves@hpopd.HP.COM (Dave Straker) writes:
>"You can't control what you can't measure"

So what, precisely, are code metrics measuring?

Not code quality, as seen by the customers.  They care about whether it
works, whether it's fast and small, and whether it will continue to work
after maintenance.

The connection between code metrics and any of these things is, at best,
unverified conjecture.

If you want to measure bugs, performance, and maintenance ease, there are
better metrics.  Like number of bugs, timing figures, and maintenance
man-hours.  Admittedly, there are a lot of variables involved, and it is
hard work to measure these things well.  Running a program to determine
the cyclomatic complexity of the code is much easier.  But the customers
don't *care* about the cyclomatic complexity!

Measure the things you care about, and forget the silly code metrics.
-- 
The 486 is to a modern CPU as a Jules  | Henry Spencer at U of Toronto Zoology
Verne reprint is to a modern SF novel. |  henry@zoo.toronto.edu   utzoo!henry

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (08/05/90)

In article <7990015@hpopd.HP.COM>, daves@hpopd.HP.COM (Dave Straker) writes:
> When illuminati like Walter and Henry say such things I despair!  
> I concur more with Tom DeMarco:

> "You can't control what you can't measure"

I think Henry Spencer is saying what Kamin, Gould, and others have said
about the Great IQ Debate:
  "Just because you can measure something doesn't mean it's real"
(or relevant).

-- 
Distinguishing between a work written in Hebrew and one written in Aramaic
when we have only a Latin version made from a Greek translation is not easy.
(D.J.Harrington, discussing pseudo-Philo)

daves@hpopd.HP.COM (Dave Straker) (08/06/90)

henry@zoo.toronto.edu (Henry Spencer) / 10:45 pm  Aug  4, 1990 / writes:

>In article <7990015@hpopd.HP.COM> daves@hpopd.HP.COM (Dave Straker) writes:
>>"You can't control what you can't measure"
>
>So what, precisely, are code metrics measuring?
>
>Not code quality, as seen by the customers.  They care about whether it
>works, whether it's fast and small, and whether it will continue to work
>after maintenance.

They'd probably prefer it worked *without* maintenance. I agree whole-heartedly
with starting and ending with the customer and his needs (HP has a strong
focus here), and the most important measures are those that the customer
uses. However, he *does* measure code, possibly not in a rigorous, quantitative
manner, but in a way that says 'this is a good program' (or otherwise).
The most obvious quantitative measure he does make is in defects, which are
eminently countable. We can also analyse them: How did they happen? Why did
they get to the customer? Can we improve our processes such that this would
not have happened?

>The connection between code metrics and any of these things is, at best,
>unverified conjecture.

Start with the customer's needs. Work backwards into your own processes.
Some of these will be related to code. Find the best measure you can
and use it. Close the loop! Make sure the measure and the action resulting
from its use actually does result in improved customer satisfaction.

>If you want to measure bugs, performance, and maintenance ease, there are
>better metrics.  Like number of bugs, timing figures, and maintenance
>man-hours.  Admittedly, there are a lot of variables involved, and it is
>hard work to measure these things well.  Running a program to determine
>the cyclomatic complexity of the code is much easier.  But the customers
>don't *care* about the cyclomatic complexity!

Yes, I suppose the basis of this discussion is McCabe. Nevertheless
his measure is a *tool*. It can be run over a lot of files quickly to
flag *potential* trouble spots. The final judgement must be human.

>Measure the things you care about, and forget the silly code metrics.

Measure the things the *customer* cares about, and forget *all* silly metrics
code or otherwise. Take time to find the good metrics (possibly by trying
the silly ones first - you often don't know they're silly until you try them).

>The 486 is to a modern CPU as a Jules  | Henry Spencer at U of Toronto Zoology
>Verne reprint is to a modern SF novel. |  henry@zoo.toronto.edu   utzoo!henry

Dave Straker            Pinewood Information Systems Division (PWD not PISD)
[8-{)                   HPDESK: David Straker/HP1600/01
                        Unix:   daves@hpopd

rmj@tcom.stc.co.uk (Rhodri James) (08/06/90)

To give an example of a metric which certainly is useful, how about the
percentage of code actually tested? This was very revealing when it was
first introduced in various companies -- at one that I know, the
engineers discovered that their initial testing covered less than 10% of
the code, and "really thorough testing" tended to get about 60%
-- 
* Windsinger                 * "Nothing is forgotten..."
* rmj@islay.tcom.stc.co.uk   *                    Mike Whitaker
*     or                     * "...except sometimes the words"
* ...!mcvax!ukc!stc!rmj      *                    Phil Allcock

peter@ficc.ferranti.com (Peter da Silva) (08/07/90)

In article <7990015@hpopd.HP.COM> daves@hpopd.HP.COM (Dave Straker) writes:
> "Rational, competent men and women can work effectively to maximise
> *any single observed* indication of success"

This can be observed in the Soviet Union. When a nail plant is required
to produce the maximum number of nails produced, they produce needles. When
required to produce the greatest weight, they produce railroad spikes.
Neither are terribly useful for building houses.

When required to turn a profit, however...
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
<peter@ficc.ferranti.com>

scs@adam.mit.edu (Steve Summit) (08/07/90)

In article <7990016@hpopd.HP.COM> daves@hpopd.HP.COM (Dave Straker) writes:
>henry@zoo.toronto.edu (Henry Spencer) / 10:45 pm  Aug  4, 1990 / writes:
>>If you want to measure bugs, performance, and maintenance ease, there are
>>better metrics.  Like number of bugs, timing figures, and maintenance
>>man-hours.  Admittedly, there are a lot of variables involved, and it is
>>hard work to measure these things well.  Running a program to determine
>>the cyclomatic complexity of the code is much easier...
>
>Yes, I suppose the basis of this discussion is McCabe. Nevertheless
>his measure is a *tool*. It can be run over a lot of files quickly to
>flag *potential* trouble spots. The final judgement must be human.

Henry and Dave both have good arguments, and neither of them
is really wrong.  I think the argument boils down to a similar
tautology as does the occasional attempt to ban the goto
statement: good programmers write good code without metrics, and
might have good reason to write code which a metric says is "bad"
(presumably such code is backed up by comments which the metric
can't see).  Struggling programmers will probably find ways to
write lousy code even in the presence of metrics (and the lousy
code may even manage to have low "complexity" scores).  The
metrics may help to manage a project staffed by struggling
programmers (and there are probably a lot of them out there) but
they aren't going to magically save that project, or the rest of
the industry.  (And, to be fair, no one in this discussion has
claimed that they would.)

Someone asked about real studies attempting to correlate
complexity metric scores with "real" code quality (measured by
programmer assessment, or bug reports, or something).  I remember
a paper from the Pacific Northwest Software Quality Conference in
Portland in 1985 or 1986 describing a study which tried to do
just that.  Rather sheepishly, the researchers had to admit that,
of all the metrics studied, the only valid correlation was seen
in the case of the control, namely the wc metric.  Of course,
metrics were rather new then; some may have gotten better by
now...

                                            Steve Summit
                                            scs@adam.mit.edu

johnb@srchtec.UUCP (John Baldwin) (08/08/90)

In article <2611@dataio.Data-IO.COM> bright@Data-IO.COM (Walter Bright) writes:
>
>The possibilities, of course, are endless. An analysis program simply
>cannot know what the design of the program is supposed to be.

True enough.  Let me stop the discussion long enough to make sure I understand
what your point really is, and see if you and others understand mine.

If you are saying "software metrics can never replace thoughtful code
inspection,"  then I must heartily agree with you, and the discussion can
go on to more fruitful things than this.

If you are saying "there are things which 'mechanical' metrics or analysis
cannot catch, therefore they are useless," then I must disagree.

Let me draw a (very poor) analogy:  my electric dishwasher at home does not
do a good job of removing cooked-on food from broiler pans and the like.
If I am dumb enough to put those straight into the dishwasher, it only makes
them more stubborn to remove.  So I accept the less-than-optimal limits of
my tool and only use it for the great volume of "not quite as dirty"
kitchenware, and continue to tackle the worst of the cleaning by hand.

I am proposing that software metrics are a tool to be used in a like manner.
A fellow writer (I have forgotten whom) posted a message to the effect
"...if you want to measure quality, then measure quality..."
                                                 ^^^^^^^
That is precisely the purpose of any *metric*, to objectively quantify the
aspects of the software development process which ARE Quantifiable (and which
we find profitable to do so).  [For example, I don't see how it might benefit
us to measure the number of times the word "the" occurs within a comment,
although it's certainly feasible to build a filter to look for that.]

Metrics (and analysis tools) will never replace thoughtful people.  They will
augment our capabilities by  (a) providing useful objective information in
numeric (measureable) form,  and (b) performing additional "drudgery level"\

>	{	for (i = 0; i <= max; i++)	/* ERROR! should be i < max */
numeric (measurable) form,  and (b) performing meticulous analysis at the
level where automation is feasible and people are likely to err.

Consider LINT as an example of type (b) above; properly set-up and used, it
is invaluable for assisting the developer in producing solid code.

can be invaluable for assisting a developer in producing solid code.

Hopefully, this will clear up some of this discussion.

Incidentally, as an aside, the current project I'm involved with under my
current employer is NOT using software-metrics per se.  We ARE using careful
walkthroughs and code examinations.... and I will be the first to object if
we decide to do away with this MOST USEFUL tool.
-- 
John T. Baldwin                      |  johnb@srchtec.uucp
Search Technology, Inc.              |  johnb%srchtec.uucp@mathcs.emory.edu
standard disclaimer:                 |  ...uunet!samsung!emory!stiatl!srchtec..
opinions and mistakes purely my own. |  ...mailrus!gatech!stiatl!srchtec...

mike@hpfcso.HP.COM (Mike McNelly) (08/08/90)

We may all be missing a point here:  Complexity metrics seem to be an
attempt at PREDICTING software quality, not a very useful way of
MEASURING it.  Once the product's out, it's silly to measure quality
indirectly via complexity metrics when there are more direct measures.

Personal $.02:  Any measurement technique can be subverted and sooner or
later it will be if it's employed as a stick by management against the
developer.  The measurement tools can be effective when they're used as
aids by the developer and not as rating systems. When the rules of the
game are set by metrics (or by any other means) pretty soon everyone
of any intelligence learns to play by the rules. It does not necessarily
follow that better products will be produced, only that these products
will conform to the "guidelines".

Example:  Bug reports in databases.  They're good until someone gets the
bright idea to use them as a lever for salaries, etc.  Then what happens
when the developer finds a bug in his own code?  He keeps a private list
of these bugs.  Is he really rewarded for finding bugs?  For other bugs,
his tendency is to downgrade them.

Example:  Code size (larger or smaller) as a metric.  As an aid in
prediction I suppose it might have its uses but as a guideline for
performance measurement it stinks.  The obvious result of such a
guideline is uninspired code.

Mike McNelly
mike@fc.hp.com