[comp.software-eng] program complexity metrics

ncmagel@ndsuvax.UUCP (ken magel) (12/10/87)

     Does anyone actually use program complexity metrics for practical 
work?  I am interested in how companies actually use complexity metrics
ranging from lines of code to cyclomatic complexity to software science to 
the many other metrics which have been proposed.  Today one get get either
public domain or as a commercial product programs which will compute and 
report several different metrics.  Are these actually used?
     If you do not use program complexity metrics, what sort of proof, evidence
or demonstrations would your company need to cause you to consider using
them?  What kinds of payoffs do they have to have and how would those payoffs
be established to your satisfaction?
     I am particularly interested in answers to the second paragraph of 
questions.
                                                 Thank you.

rion@wdl1.UUCP (Rion Cassidy) (12/12/87)

>     If you do not use program complexity metrics, what sort of proof, evidence
>or demonstrations would your company need to cause you to consider using
>them?  What kinds of payoffs do they have to have and how would those payoffs
>be established to your satisfaction?
>     I am particularly interested in answers to the second paragraph of 
>questions.


I would like to answer this question since I don't use program
complexity metrics and am very interested in software engineering.
However, except for what the name implies, I really don't know what
program complexity metrics are and therefore can't answer.  How about
a brief overview?  Anybody?


Rion Cassidy
rion@ford-wdl1.arpa
...{sgi,sun,ucbvax}!wdl1!rion

All opinions stated here are my own, not my employer's.

chip@pedsga.UUCP (12/15/87)

In article <3850002@wdl1.UUCP> rion@wdl1.UUCP writes:
>>     If you do not use program complexity metrics, what sort of proof, evidence
>>or demonstrations would your company need to cause you to consider using
>>them?  What kinds of payoffs do they have to have and how would those payoffs
>>be established to your satisfaction?
>>     I am particularly interested in answers to the second paragraph of 
>>questions.
>
>I would like to answer this question since I don't use program
>complexity metrics and am very interested in software engineering.
>However, except for what the name implies, I really don't know what
>program complexity metrics are and therefore can't answer.  How about
>a brief overview?  Anybody?

According to my textbook: Software Enginneering, by Shooman, McGraw-Hill,
complexity metrics serve the following purposes:

1. As a rank of the difficulty of various software modules to be used along
with other factors in the assignment of personnel
2. In the case of module complexity, as a guide to judge whether subdivision
of a complex module is warrented
3. As a direct measure of progress and an indirect measure of quality during
the various phases of development.
4. As an aid in normalizing data used in retrospective studies of past 
development projects.

Personally, I feel that the second use is of most value.  

The general idea is to be able to apply formulas to software (either as a
whole or individually to modules) to determine the "quality" of the software.

One of the more famous complexity metrics papers was written by Thomas McCabe
(IEEE Transactions on Software Engineering, 12/76 pp 308-320) in which he
describes his graph-theoretic complexity measure.  His approach is that
the more paths there is in a given module, the more complex it is (seems
simple enough).  The formula will produce a complexity value for a module.
His conclusion is that modules with complexity values over 10 will have
a tendency to have more errors (I guess the programmer can't track 
all possible paths).  By breaking the module up into smaller modules, you
reduce the chance for inherent code errors.  Putting it simply, the formula
is as follows:

v = sum(branches, loops, case, conditions,...ANDs, ORs, NOTs) + 1

His studies showed that in several large projects, that 23% of the routines
with a value greater than 10 accounted for 53% of the bugs.

Metrics can provide a lot of insight to software quality.  Unfortunately,
they are not viewed as a means of bringing software development out of
its "primitive state" (said in jest of course, see previous articles in 
this newsgroup :-)

Hope this provides some insight to a relatively unknown topic.

-- 
         Chip ("My grandmother called me Charles once. ONCE!!") Maurer
     Concurrent Computer Corporation, Tinton Falls, NJ 07724 (201)758-7361
        uucp: {mtune|purdue|rutgers|princeton|encore}!petsd!pedsga!chip
                       arpa: pedsga!chip@UXC.CSO.UIUC.EDU

samperi@dasys1.UUCP (Dominick Samperi) (12/17/87)

If McCabe's cyclomatic complexity metric is essentially equal to the number of
decisions in a program module (IF's, AND, OR, etc.) plus 1, I don't see the
advantage in giving it a graph-theoretic interpretation. It is not equal to
the number of paths through a module, and it seems to be little more than the
number of forkings in a control graph (plus one). True, it can be computed
by counting nodes, edges, and components, but it is much easier to simply
count the number of forks and add 1, or to dispense with the control graph
altogether and simply count IF's, AND's, NOT's, etc. In Shooman's Software
Engineering text 'phantom paths' are added to a graph in order to compute
its cylomatic complexity. I don't see the point here at all (and the formula
used in this text, V(G) = e - n + p, will not work for graphs with p > 1;
must use V(G) = e - n + 2p).

-- 
	Dominick Samperi, Manhattan College, New York, NY
		...!ihnp4!cmcl2!manhat!samperi
		...!ihnp4!cmcl2!phri!dasys1!samperi

cl@dlhpedg.co.uk (Charles Lambert) (12/18/87)

In article <3850002@wdl1.UUCP> rion@wdl1.UUCP (Rion Cassidy) writes:
>
>However, except for what the name implies, I really don't know what
>program complexity metrics are and therefore can't answer.  How about
>a brief overview?  Anybody?
>

As the name implies,  they are techniques to measure the complexity of
a (proposed) programming task; they estimate how laborious and difficult
it will be to implement.  From an analysis of complexity, you can predict
several things (according to the advocates),  for instance:
    (i)   how much the implementation will cost (in terms of effort);
    (ii)  the number of errors you can expect;
    (iii) the number and type of test cases you will need;
    (iv)  how costly the system will be to maintain.

Ideally,  you want to make these predictions as early in the development
as possible;  however,  my studies to date show that the current techniques
address the detailed Module Design best, the System Design less well and
the Functional Specification not at all.

In their excellent survey of software development techniques [1],  Birrel
and Ould describe complexity measurements at the stages of System Design
and System Production (the latter being Module Design and Coding).  For
System Design,  they describe only one measurement that I would truly
call a complexity measurement: the "interconnectivity metric" of Kafura
and Henry [2].   In contrast,  three measurements are described for the
Module Design:  "McCabe's (cyclomatic) number [3]", "Halstead's E [4]" and
"control entropy".   I quote, without permission.

    On interconnectivity:
    "[Kafura and Henry] describe it as follows. 'The first [phase]
     involves generating a set of relations indicating the flow of
     information through input parameters, output parameters, returned-
     values functions and data structures. The second phase generates
     an information-flow structure from these relations.  The third
     phase analyses the information-flow structure to determine the
     derived calls, the local flows and the global flows'.  A special
     notation is used to record the result of the first analysis so
     that automated tools can be used for the second and third phases.
     The analysis can be carried out on any level of design or
     production provided a precise enough description is available..."

    On cyclomatic numbers:
    "McCabe (1976) introduced the cyclomatic number of the control flow
     graph of a program as a measure of program complexity.  The
     cyclomatic number of a graph counts the number of linearly
     independent cycles in the graph and thus gives the number of basic
     paths through the program.....

    "Alternatively it can be shown that for structured programs the
     cyclomatic number can be obtained by adding one to the number of
     predicates in the program segment...

    "The cyclomatic number has been shown to correlate quite well with
     the number of errors found in the code and the programming time.
     It can also clearly be used as an indicator of the number of test
     cases that should be applied to a module to test all basic paths."

References
----------
[1] Birrell, N.D. and Ould, M.A; A Practical Handbook for Software Development.
    Cambridge University Press. 1985.  ISBN 0 521 25462 0

[2] Kafura, D. and Henry, S; "Software quality metrics based on
    interconnectivity" in the Journal of Systems and Software, 2, 121-131.
    1981.

[3] McCabe, T.J; "A complexity measure" in IEEE Transactions on Software
    Engineering, SE-2, 308-320. 1976.

[4] Halstead, M; Elements of Software Science. Elsevier North-Holland,
    New York. 1977.

UH2@PSUVM.BITNET (Lee Sailer) (12/18/87)

This is an interesting topic.  How would YOU "measure" the complexity
of a module?  How can we "validate" the measures?
     
The <number of paths> thru a module doesn't impress me at all.  A very simple
module constructed from a sequence of big CASE or SWITCH statements will have
many paths, but is easy to understand.
     
How 'bout a measure based on the *hierarchy* of Data Flow Diagrams?  For
example, some modules might have a simple top level, with increasing
complexity as you break down each level.  Another module might have a
complex top level, but the lower levels might be simple.  Which is more
complex overall, the first or the second?
     
My *guess* is that the first is simpler, overall, on the principle that
it hides the complexity.  Yet, it probably has more paths, more tokens, more
ifs, ands, and whiles (fewer gotos?).
     
The main point is that the COMPLEXITY of a module depends to some extent
on how detailed an understanding of the module you want or need.
     
                                                                lee

marc@apollo.uucp (Marc Gibian) (12/19/87)

The current crop of complexity measures do NOT improve the code.  There
is an excellent article in a past ACM Communications on this issue.  The
conclusions of the acticle match the experiences of a major software
organization in Raytheon.  I will take the liberty here to summarize
the ACM article since I forget the issue number.  The basic conclusion of
the article is that no complexity measure currently exists that has been
designed specifically for the purpose for which it is being used by
software developers.  Secondly, current complexity measures do not
validate when tested against these software development goals.

The application of complexity measures at Raytheon, required by almost
every software contract there, is usually accomplished by requiring that
every module (however a module is defined) has a set of complexity values
less than some arbitrary number.  Unfortunately, the process of reducing
the measured complexity of a module most often decreased the quality of
the module, made it harder to maintain, harder to understand, and generally 
did not accomplish the intended goal.  I can provide specific examples if
anyone is interested.

I believe that a measure designed specifically to accomplish software
development goals, and validated against those goals, could be quite
helpful.  In the meantime, I think it is very dangerous to use the
current instruments since they mislead management and customers into
believing they have a good means of measuring some aspects of their
software.

Marc S. Gibian
Software Engineer
Apollo Computer

email:  marc@apollo.UUCP

tom@pedsga.UUCP (12/23/87)

In article <392487e8.fed5@apollo.uucp> marc@apollo.UUCP (Marc Gibian) writes:
>The current crop of complexity measures do NOT improve the code.  There . . .
> ----- summary of recent ACM article deleted -----
>
>The application of complexity measures at Raytheon, required by almost
>every software contract there, is usually accomplished by requiring that
>every module (however a module is defined) has a set of complexity values
>less than some arbitrary number.  Unfortunately, the process of reducing
>the measured complexity of a module most often decreased the quality of
>the module, made it harder to maintain, harder to understand, and generally 
>did not accomplish the intended goal.  I can provide specific examples if
>anyone is interested.

Marc states (very well) an important point, but I, for one, would like to
see some examples.  The phrase "the process of reducing the measured
complexity" worries me a bit.  The article Marc posted led me to beleive
that Raytheon has used the complexity formula to "reverse engineer" their
standard for software development.  Just because their particular standard
resulted in "decreased quality", does not mean there isn't a standard that
results in a high level of quality PLUS a favorable complexity value.
I have used the term quality to refer to the original article:
maintainability, readability.  It still makes sense that lower complexity
values would improve this quality.  I'm not sure that Raytheon's bad
experience disproves this.  It certainly bears further investigation.

pase@ogcvax.UUCP (Douglas M. Pase) (12/25/87)

In article <PSUVM.28001UH2> UH2@PSUVM.BITNET (Lee Sailer) writes:
>The <number of paths> thru a module doesn't impress me at all.  A very simple
>module constructed from a sequence of big CASE or SWITCH statements will have
>many paths, but is easy to understand.
>How 'bout a measure based on the *hierarchy* of Data Flow Diagrams?

As I see it there are two problems.  The first is the complexity of the
paths through a module.  A number of metrics have been proposed elsewhere
and mentioned in this forum (cycle count, independent path count, etc.).
The second problem is the complexity of the conditions each path places
on the data that passes through, a la Floyd/Hoare program verification.
for example, simple arithmetic expressions replace old conditions with
new.  An example might be: x > 5 --> 'x := x + 1' --> x > 6.  Other types
of expressions may replace old conditions with new, or add new restrictions.
'if' expressions are bad news, 'while' and recursive expressions can be worse.
But, not all 'if', 'while', or recursive expressions are equally bad, and it's
not just the relative location of each in a program.

Lest anyone think I'm saying something profound, all I have said is that if
one wishes to measure the complexity of a program, one must consider the program
itself (or equiv. its tranformations of the data it modifies) and not just
the flow of data through it.
--
Doug Pase   --   ...ucbvax!tektronix!ogcvax!pase  or  pase@cse.ogc.edu.csnet