[comp.software-eng] Metrics Example

hlavaty@CRVAX.Sri.Com (05/22/91)

I guess the best way to illustrate my position on metrics (previously discussed
in several of the Re:bridge buildings...) is to use an example.  Kers. Caravan:
sorry if my original response to your post was "flamelike" - it was not 
intended to be.  From your repsonse, I think you missed my point entirely.  
Anyway...

Several years ago, my job as a customer representative was to ensure that the
software developments currently underway were progressing smoothly (i.e. the
customers money was being spent correctly).  By progressing smoothly I mean
the advertised schedules were indeed accurate, and no "short-term" gains were
being taken in place of "long-term" overall health (such as skimping on unit
testing to keep to the schedule, for now).  This was a cost-plus contract
(which means the customer pays for overruns) so the customer was very 
inetersted in looking over the developers shoulder (bitter experience in the
past has taught them this hard lesson). 

I decided that we would have a monthly status meeting where I would meet with
the managers of the different software projects and talk with each of them for
about an hour on how their efforts were going.  At the first meeting, all of the
discussions were what I call qualitative (and yes, I have confused qualitative
with subective.  I welcome discussion on how they're different).  Each manager
showed his schedule chart (the schedule that they themselves managed to - it was
not my format.  I wanted the *real* story), and proceeded to discuss why their
development was progressing just fine.  The only thing I had to go on to agree
with them was their qualitative opinion (and since it wasn't necessarrily my
opinion, I considered it subjective - meaning it was their opinion and I didn't
see any evidience other than their experience/feelings/motivations that led me
to believe this to be true).  Now, after this meeting I had to go to the 
customer and tell them what I thought of the development efforts.  I quickly
realized I had a problem, since I was very uncomfortable telling them what I
thought; it was really only what the software managers thought (and the customer
already new that!).  What to do...What to do...

I seized on metrics as the answer to my problems.  If I could get the managers
to show me various metric data, then *I* could interpret it my own way and make
my opinions!  This is what I was referring to when I previously posted about a
"common reference point".  Before, since I was not part of the software 
development, I had no common reference point.  I had to accept what the managers
were telling me at face value (and my customers experience has shown this to be
dangerous in a cost-plus contract).  Once we were using metrics, at least we
could have some numbers to discuss.  So, here is the set of metrics I got them
to start presenting:

	1) A graph of unit test cases complete over time.  Both projected and
	   completed test cases were plotted.
	2) The same graph for integration test cases
	3) The number of errors detected, fixed, and still open each month
	4) The location of these errors
	5) SLOC coded, integrated, and projected each month

Why did I pick these?  I acquired information from the industry that talked 
about what these metrics had meant on OTHER programs done by other companies.
I therefore had other data to compare the contractors results to, and 
deteremine for myself if it compared favorably to what the contractors were
saying (things were on schedule and fine).  None of these metrics were in use
by the software managers at the time.  Well, several interesting things
happened...

After several months, the unit/integration test case graphs clearly showed that
the existing schedule for one of the developments was not accurate.  The rate
at which the development was accomplishing these was not matching the projected
this metric, say by an experienced manager following his own qualitative 
opinion?  Yes, it could have.  But the manager in question *didn't* catch it.
He didn't realize his problem untill confronted with the question "Since your
demonstrated rate of testing is not matching your original projection, how are
you going to finish on time?"   

A big advantage gained using metrics is that it gave me a window into the 
development process through which I could look to determine if things
were in control.  With the metric charts, now my questions would be
"Gee, you had a large error spike last month.  What caused that?" or "You
didn't get your normal amount of unit testing in last month.  Why was that?"
Suddenly we were discussing *real* issues.  The answers to these questions
invariably led to the identification of specific problems the manager was
having (not enough computers, someone quit unexpectedly, and the occassional
"that's a problem we don't know how to solve just yet").  Now I had what I
needed - a window into the process that communicated to me enough information
to ask the right questions.

After a while, the chief software manager at the company and the program
manager started using these charts themselves for their own purposes.  They
realized that they couldn't afford to miss this information!
other million things your job entails.  Once you have found a metric that
indicates future success, by measuring it you are encouraging the development
to conform to practices that have proven successfull in the past.  You also
have a common framework from which to ask questions and also compare your
development to others that have also used the metric.

I hope this illustrates my point more concretely.  

Jim Hlavaty

kers@hplb.hpl.hp.com (Chris Dollin) (05/23/91)

hlavaty@CRVAX.Sri.Com says:

   ...  Kers. Caravan: sorry if my original response to your post was 
   "flamelike" - it was not intended to be.  From your repsonse, I think you
   missed my point entirely.  

Mine was not the original post - you may have been misled by my comment about
``peculiar''. 

And I don't think I missed your point. As it stated, my response was not 
intended to criticise the use of *metrics* - it was to point out (what I
regarded as) weak points in your *case*.

Unless you can address these issues, you are vulnerable to attack from the
anti-metrics brigade. Annecdotes are not good enough as evidence.

   ... (and yes, I have confused qualitative
   with subective.  I welcome discussion on how they're different).

Quantatative ~= with numbers. Qualitative ~= without numbers.
Subjective ~= private, local, person-dependant. Objective ~= public, ``real''.
[I'm *not* attempting to give a precise definition here, and I *know*
subjective doesn't equate to unreal. I'm attempting to compare and contrast.

``I have a strength 5 headache'' is quantative and subjective.

``My briefcase is heavier than my mug'' is qualitative and objective.

``McKillip writes better fantasy than Anthony'' is qualitative and subjective
[*1].

``This window is 80 characters with in font hp8.6x13'' is quantitive and
objective.

    I seized on metrics as the answer to my problems.  If I could get the
    managers to show me various metric data, then *I* could interpret it my 
    own way and make my opinions!  

But why should your opinions be any better than theirs?

   A big advantage gained using metrics is that it gave me a window into the 
   development process through which I could look to determine if things
   were in control.  With the metric charts, now my questions would be
   "Gee, you had a large error spike last month.  What caused that?" or "You
   didn't get your normal amount of unit testing in last month.  Why was that?"
   Suddenly we were discussing *real* issues.  

The only justification you have presented for the act of faith that a ``large
error spike'' is a ``real'' issue is industrial studies data [omitted above].
Go for it! *That* is the thing that makes metrics important - *data exists
that show that they work*. It's not because their ``quantative'' (but being
``objective'' helps).

   The answers to these questions
   invariably led to the identification of specific problems the manager was
   having (not enough computers, someone quit unexpectedly, and the occassional
   "that's a problem we don't know how to solve just yet").  Now I had what I
   needed - a window into the process that communicated to me enough 
   information to ask the right questions.

Great. Again, it's not the *numbers*, it's what they *mean*, and the existance
of a communications process.

   After a while, the chief software manager at the company and the program
   manager started using these charts themselves for their own purposes.  They
   realized that they couldn't afford to miss this information!

And that is data worth having.

   I hope this illustrates my point more concretely.  

And I hope my remarks help you to present you case better when you get the next
bunch of people to convince.

Number's aren't real, they're complex - use with caution; don't place the
numbers above their meaning.

[*1] Some people don't believe the statement, some people don't think it's
subjective, but I'd be surprised if anyone thought it was quantative.
--

Regards, Kers.      | "You're better off  not dreaming of  the things to come;
Caravan:            | Dreams  are always ending  far too soon."

jls@netcom.COM (Jim Showalter) (05/24/91)

>>    I seized on metrics as the answer to my problems.  If I could get the
>>    managers to show me various metric data, then *I* could interpret it my 
>>    own way and make my opinions!  

>But why should your opinions be any better than theirs?

His weren't necessarily better. However, what he determined
was that the managers didn't have any data to back up their own opinions.
Thus, at the outset of the exercise, prior to the introduction of any
metrics, all opinions were of equivalent validity/nonvalidity. It was
the introduction of the metrics--and ONLY that--that provided a framework
in which opinions could be tested against real measures. Until the metrics
were introduced, "Yeah, we're on schedule" was it as far as insight into
the project was concerned. This went away very quickly when it was possible
to say "Well, you say we're on schedule, but according to this chart we're
not".
-- 
**************** JIM SHOWALTER, jls@netcom.com, (408) 243-0630 ****************
*Proven solutions to software problems. Consulting and training on all aspects*
*of software development. Management/process/methodology. Architecture/design/*
*reuse. Quality/productivity. Risk reduction. EFFECTIVE OO usage. Ada/C++.    *

kers@hplb.hpl.hp.com (Chris Dollin) (05/24/91)

Jim Showalter makes a remark on one of my responses to ... duh, lost their 
name:

[a. n. other]
   >>    I seized on metrics as the answer to my problems.  If I could get the
   >>    managers to show me various metric data, then *I* could interpret it
   >>    my own way and make my opinions!  

[me]
   >But why should your opinions be any better than theirs?

[Jim]
   His weren't necessarily better. However, what he determined
   was that the managers didn't have any data to back up their own opinions.
   Thus, at the outset of the exercise, prior to the introduction of any
   metrics, all opinions were of equivalent validity/nonvalidity. It was
   the introduction of the metrics--and ONLY that--that provided a framework
   in which opinions could be tested against real measures.

The point that worries me is this assertion that the metrics are ``real
measures''. [I'm happy to believe that they are; it's just that I think they
need justification. As earlier posts of mine have said, just because they're
*numbers* doesn't make them *meaningful*.]

Suppose I proposed the following metrics:

* the number of cups (or cans) of beverage consumed per developer per day.

* number of charaters of output generated (on paper, that is) per day.

* hours spent on aerobics

* numbers of relevant papers photocopied per week

* average number of windows present on the screen at once

Are these ``real measures''? A priori, they seem as real as (say) defect
density, or uncommented-lines-of-source-code, or compiles-needed-before-no-
syntax errors.

You may question the relevance of aerobics [*1]. I might argue ``a fit mind in
a fit body''. In any case, one of the points of the original poster was that
you could shotgun the team with a variety of metrics and keep those that
``worked'' (whatever that means).

If we don't understand why they work (for example, suppose that aerobic hours
turned out to be a good predictor for project success), then we should say so.
If their effectiveness has be determinded by purely empirical means (ie, we
have no underlying theory), we should say so. What we should *not* do is call
metrics ``real measures'' without pointing to a justification.

Let me try and make my point clear. I want metrics to be a Good Thing; I want
to be able to make estimates, plan with them, and measure to see if the plan is
being kept to. [I'm not too worried about using numbers to do it, although
that's traditional; it's easy to attribute to numbers precision, accuracy, and
meaning that they do not in fact posess.] But I think we must be clear as to
*why* we thing metrics are ``real'' and *why* we think they work; and we must
not allow our enthusiasm to blind us to perfectly sensible questions.

What metrics do we have on the effectiveness of posting?

[*1] Indeed, many do, even outside this context.


--

Regards, Kers.      | "You're better off  not dreaming of  the things to come;
Caravan:            | Dreams  are always ending  far too soon."

jls@netcom.COM (Jim Showalter) (05/25/91)

>The point that worries me is this assertion that the metrics are ``real
>measures''. [I'm happy to believe that they are; it's just that I think they
>need justification. As earlier posts of mine have said, just because they're
>*numbers* doesn't make them *meaningful*.]

We are veering off into semantics and the philosophy of causality. Define
what you mean by "meaningful". In a previous post I pointed out that if
rutabaga consumption per developer could be shown to be a good predictor
of project success, then it was a valid metric with which to make such
predictions. The counterargument is absurd: "Yeah, there's a 100% correlation,
but we won't use that metric anyway because it's SILLY.".

>Are these ``real measures''? A priori, they seem as real as (say) defect
>density, or uncommented-lines-of-source-code, or compiles-needed-before-no-
>syntax errors.

If they can be shown to have a correlation with project success, then, yes,
they ARE real measures. What else would you call them?

>In any case, one of the points of the original poster was that
>you could shotgun the team with a variety of metrics and keep those that
>``worked'' (whatever that means).

What "worked" means is that you can demonstrate from historical data
that Metric #7 is an accurate predictor of project success.

>If we don't understand why they work (for example, suppose that aerobic hours
>turned out to be a good predictor for project success), then we should say so.
>If their effectiveness has be determinded by purely empirical means (ie, we
>have no underlying theory), we should say so. What we should *not* do is call
>metrics ``real measures'' without pointing to a justification.

Smoking is a great predictive metric for lung cancer, even though the precise
mechanisms for the entire chain of events leading to the cancer have yet to
be fully elucidated. Would you reject smoking as an indicator of lung cancer
because you haven't yet proven causality? This is the tack taken by the
cancer lobby, but who wants to be on their team? Lack of dietary fiber is
a good predictor of colon cancer, and yet they are much further from having
a precise explanation of the mechanisms whereby eating nothing but Ho-hos
ruin one's bowels. Would you then decide to eat nothing but Ho-hos, just because
the precise mechanism is not yet explained?

>But I think we must be clear as to
>*why* we thing metrics are ``real'' and *why* we think they work;

We think they're real because project in trouble tend not to have any
metrics, and projects on schedule tend to use metrics. As for WHY we
think they work, this is just the causality issue again--I'm personally
not particularly concerned with why the metrics work: I'm concerned
with them working, and that's about it. 

Let me make my position clear: I'd practice VOODOO if someone could
provide me with evidence that projects using it are more successful
than projects not using it. And I would do this without spending
ten seconds trying to figure out WHY it worked.
-- 
**************** JIM SHOWALTER, jls@netcom.com, (408) 243-0630 ****************
*Proven solutions to software problems. Consulting and training on all aspects*
*of software development. Management/process/methodology. Architecture/design/*
*reuse. Quality/productivity. Risk reduction. EFFECTIVE OO usage. Ada/C++.    *

donm@margot.Eng.Sun.COM (Don Miller) (05/25/91)

In article <KERS.91May24091125@cdollin.hpl.hp.com> kers@hplb.hpl.hp.com (Chris Dollin) writes (in jest, to make a point):
>
>Suppose I proposed the following metrics:
>
>* the number of cups (or cans) of beverage consumed per developer per day.
>
>* number of charaters of output generated (on paper, that is) per day.
>
>* hours spent on aerobics
>
>* numbers of relevant papers photocopied per week
>
>* average number of windows present on the screen at once

   This reminds me of a funny example of the misapplication of
   a metric.  In the movie, "Monty Python and the Holy Grail", 
   townspeople attempt to apply metrics to determine if a woman
   is a witch:

   1. Witches are made of wood.

   2. Wood floats.

   3. Ducks float.

   4. Therefore, if she weighs as much as a duck, she must be
      a witch.

   The townspeople proceed to weigh the woman on an elaborate
   scale counterweighted with a duck.  Of course, this scale
   indicates that her weight and the ducks are indeed the same.
   Thus, she is unarguably a witch.

   I guess this means that regardless of how appropriate a
   metric appears to be, someone needs to be able to tell
   when it's misapplied or faulty.  It also means that if
   you have the power and the will, metrics can be powerful
   tools for manipulation - especially in an unenlightened 
   environment.

   However, in our enlightened environments the above scenario
   would never happen.  We all know that witches are made of
   stone. (don't we? :-)
   
--
Don Miller                              |   #include <std.disclaimer>
Software Quality Engineering            |   #define flame_retardant \
Sun Microsystems, Inc.                  |   "I know you are but what am I?"
donm@eng.sun.com                        |   

orville@weyrich.UUCP (Orville R. Weyrich) (05/25/91)

In article <1991May24.201741.14138@netcom.COM> jls@netcom.COM (Jim Showalter) writes:
>>The point that worries me is this assertion that the metrics are ``real
>>measures''. [I'm happy to believe that they are; it's just that I think they
>>need justification. As earlier posts of mine have said, just because they're
>>*numbers* doesn't make them *meaningful*.]
>
>We are veering off into semantics and the philosophy of causality. Define
>what you mean by "meaningful". In a previous post I pointed out that if
>rutabaga consumption per developer could be shown to be a good predictor
>of project success, then it was a valid metric with which to make such
>predictions. The counterargument is absurd: "Yeah, there's a 100% correlation,
>but we won't use that metric anyway because it's SILLY.".

>If they can be shown to have a correlation with project success, then, yes,
>they ARE real measures. What else would you call them?

Yes, if there is a demonstrated correlation, then they can be used as a 
predictor of success. The danger is that the conclusion might be drawn that
if the project seems to slipping, the management should take steps to 
increase the rutabega consumption. This would only be valid if there were
a demonstration of cause and effect.
 
Observation of a cause/effect relationship implies a correlation; 
the observation of a correlation does not necessarily imply a cause/effect
relationship.

>We think they're real because project in trouble tend not to have any
>metrics, and projects on schedule tend to use metrics. As for WHY we
>think they work, this is just the causality issue again--I'm personally
>not particularly concerned with why the metrics work: I'm concerned
>with them working, and that's about it. 

Just to be the devil's advocate, can we assume that:

1) Projects in which the managers feel that their subordiantes are not too 
bright, but are doing the best that they can, and cannot be induced to improve,
may be in trouble.

2) Projects with such managers do not use metrics, because they feel that
it is no use -- you can't induce the programers to improve.

In this situation, it seems to me that the project will be in trouble, and will
not be using metrics. Inducing the manager to use metrics will not solve the
problem. Inducing the manager to change his/her attitude will.

>Let me make my position clear: I'd practice VOODOO if someone could
>provide me with evidence that projects using it are more successful
>than projects not using it. And I would do this without spending
>ten seconds trying to figure out WHY it worked.

Be careful here -- it may be that programmers who enjoy rutabegas come from
a part of the world where VOODOO is prevalent, and that rutabegas contain
some particular nutrient which stimulates programmers to do a particularly
good job.  If you simply introduce VOODOO into your failing project, you 
are not likely to get the desired effect, and will have wasted time and
effort that would have been better spent encouraging your project team to eat
more rutabegas.

This is why the FDA is so down on "quack" medicines.

I would insist on evidence that using VOODOO *causes* projects to be more
successful before committing to using VOODOO [unless you have a boss yelling
"I don't care what you do, just do SOMETHING so that I can tell the Board
of Directors that the problem has been taken care of." :-)].

I agree that it is not so important to determine how the cause induces the
effect. But even, here, it is useful to ponder. If VOODOO is demonstrated to 
cause better projects, do you need to have the whole shebang, or is it
sufficient to toss a dead chicken at the programmers periodically? 
It certainly is cheaper to pick up a few dead chickens from beside the road
than it is to import a witch-doctor [you think that MD's are overpaid? You
should check out what even a mediocre witch-doctor gets paid! :-)].


--------------------------------------           ******************************
Orville R. Weyrich, Jr., Ph.D.                   Certified Systems Professional
Internet: orville%weyrich@uunet.uu.net             Weyrich Computer Consulting
Voice:    (602) 391-0821                         POB 5782, Scottsdale, AZ 85261
Fax:      (602) 391-0023                              (Yes! I'm available)
--------------------------------------           ******************************

adam@visix.com (05/26/91)

In article <1991May24.201741.14138@netcom.COM>, jls@netcom.COM (Jim Showalter) writes:

>We think they're real because project in trouble tend not to have any
>metrics, and projects on schedule tend to use metrics. As for WHY we
>think they work, this is just the causality issue again--I'm personally
>not particularly concerned with why the metrics work: I'm concerned
>with them working, and that's about it. 

This argument in favor of metrics shows precisely what is wrong with
non-causal correlations.

This is a familiar problem in social science.  For example, students
at private schools generally perform better.  Students in more wealthy
neighborhoods do better.  Asian students do better.  Therefore, to
improve your child's performance, send them to private school, or move
into a wealthy neighborhood, or make them Asian.

This is faulty reasoning.  The greatest cause of good performance in
school is parents who take an interest in their child's education.  If
you care enough to spend extra money or move to a better neighborhood
for the sake of your child, then your child has the advantage of
supportive parents.  The actual act of spending money or moving is
irrelevant next to the will to do so.  If you send your child to
private school in order to get rid of him, then he's in trouble.

Let me restate the analogy.  When a project manager cares enough
about his project to try metrics for improving quality, then
regardless of the metrics he chooses, the project has an advantage
(namely, the manager himself).  When a project manager is sick to
death of his project and wants to use metrics to save himself some
work, then the project is likely to fail.

When someone cares about you and your work, you will do better.

When someone cares only about the number of lines of code you write,
you will do worse.


Adam

ftower@ncar.ucar.EDU (Francis Tower) (05/27/91)

Supportive managerial involvement is vital but as a project becomes really huge
the managerial span of control places some managers out of direct contact with
the worker bees.  Then metrics appear because they give a quantitive (if not
inaccurate) feel for the project.

The key point in my experience is that upper management will almost always
demand to see metrics.  Since humans have a tendency to avoid looking bad on
these metrics, programming professionals will'Game' their approach to the
metrics.  The best move my firm made was when the new CEO decided  that each
division would create a list of what was really important to measure based on
their unique mission and their customers.  The parent division approved the
division lists, the divisions were expected to implement their chosen metrics,
BUT! the divisions were not to report any numbers to the parent or to other
divisions.  

Programmers still 'Gamed' the metrics but the gaming was in a direction which
promoted better software design and development.  

Such changes were of course fought expecially by management which didn't like
change or the sense of loss of touch.  How could their compare one division
against another was their lament.  The Boss's answer was you cann't.




Symptomatic Relief?  Our chief scientiest kept cranking parameters until a
given piece of code finally gave reasonable results.  Later, the error in the
code was found which negated the heroic fiddling.