[comp.software-eng] Personal growth and software engineering!

sean@castle.ed.ac.uk (S Matthews) (03/19/91)

How to become a better human being and develop a more rounded (or
even perfectly spherical) personality through programming.

gwharvey@lescsse.uucp (Greg Harvey) writes:

> These statements form the rhetorical foundation for a much needed
> "quality revolution." Software QA is just one area where these belief
> statements can be applied effectively.  Zero defects is a difficult
> concept to grasp because we humans lack perfection and even have
> difficulty visualizing perfection.  Zero defects is an accountability
> method where we introspectively examine defects in order to determine
> how each occurred.  If done honestly and carefully, the person comes to
> the realization that faulty results follow faulty methods.  An honest
> person realizes, at the exact same instant, that faulty methods are not
> character flaws, but instead are opportunities for improvement!

> Simple accountability, which most people avoid (myself included!), helps
> us do our best in every life situation.  Admittedly, being responsible
> or accountable for our actions can make life uncomfortable.  Software QA
> strives to recreate the desire for "directed perfection" in the software
> creator.  It creates this desire by measuring the effectiveness, as best
> it is able, of the creators and discussing the results with them.  Its
> focus should be enhancing the creative powers of the individual through
> incremental improvement of method.

Somehow I do not think that feeling good and a `can do' attitude are enough.

Is there anyone out there who wants to suggest half an hour of
transcendental meditation before tackling the morning's coding.

Sean

Now back to my prayer mat...

Sorry about the extended quotation, but I was not sure how to edit it
down without destroying the flavour---indeed the flavor is in the size.

klb@unislc.uucp (Keith L. Breinholt) (03/26/91)

Sean> How to become a better human being and develop a more rounded (or
Sean> even perfectly spherical) personality through programming.

Greg> Simple accountability, which most people avoid (myself included!), helps
Greg> us do our best in every life situation.  Admittedly, being responsible
Greg> or accountable for our actions can make life uncomfortable.  Software QA
Greg> strives to recreate the desire for "directed perfection" in the software
Greg> creator.  It creates this desire by measuring the effectiveness, as best
Greg> it is able, of the creators and discussing the results with them.  Its
Greg> focus should be enhancing the creative powers of the individual through
Greg> incremental improvement of method.

Sean> Somehow I do not think that feeling good and a `can do' attitude
Sean> are enough.
Sean>
Sean> Is there anyone out there who wants to suggest half an hour of
Sean> transcendental meditation before tackling the morning's coding.

I don't think Greg was talking about a 'feeling good' or 'can do'
attitude to improve your programming.  If you reread the message
you'll find that he's talking about measured improvement.

Here's a simple fact of life--Measurement of a process or skill is the
first step towards control of the same.

Now, serious professionals are the type that like to improve their
skills through whatever means is necessary.  If I have a method of
measuring when I'm doing better versus when I screw up, I've just
learned how to improve my skills.

Now if a supposed professional comes to me and tells me that they
don't want to be measured (i.e. they don't want to improve).  I have
real doubts about the future of that individual.

Keith L. Breinholt
Unisys, Unix Systems Group
-- 
___________________________________________________________________________

Keith L. Breinholt		hellgate.utah.edu!uplherc!unislc!klb or
Unisys, Unix Systems Group	kbreinho@peruvian.utah.edu

duncan@ctt.bellcore.com (Scott Duncan) (04/03/91)

In article <1991Mar25.164133.29674@unislc.uucp> klb@unislc.UUCP (Keith L. Breinholt,B2G08) writes:
>
>Here's a simple fact of life--Measurement of a process or skill is the
>first step towards control of the same.

I disagree.  Recognition and acceptance that the process or skill needs to be
brought under control is the first step.  There is the "if it ain't broke
don't fix it" feeling of many software practitioners to overcome first.  In
some cases, measurements can be used to point out that something really is
"broken," from the perspective of clients/customers, even if the developers
don't see anything wrong.  However, this is usually product measures rather
than those which address the process.  And product measures generate most of
the heat in discussion about metrics, i.e., people get stuck in lines of code
and field defects and "productivity" goals, etc.

>Now, serious professionals are the type that like to improve their
>skills through whatever means is necessary.  If I have a method of
>measuring when I'm doing better versus when I screw up, I've just
>learned how to improve my skills.

That's you measuring you, though.  This is not the kinds of measurement efforts
often undertaken in large organizations.  Organizational goals and measures of
quality and productivity often end up at odds with individual ones.  The former
are more formal, visible, and based on quantifiable measures while the latter
are more likely to be informal, personally derived/tracked, and dependent upon
line management and peer feedback.  Both of these have validity to those who
use them, but most metrics "programs" are of the organizational variety, not
the local work group or individual improvement kind.

>Now if a supposed professional comes to me and tells me that they
>don't want to be measured (i.e. they don't want to improve).  I have
>real doubts about the future of that individual.

And I think it is not a matter of not wanting "to improve," but of not being
introduced to measurement with the sense that it is being offered for improve-
ment.  Most people get introduced to it in a way that suggests it is there for
evaluation purposes.  Or it is not clear why it is there, but the management
has heard/believes that "you can't manage what you can't measure" or some such
statement without thinking that this may apply to process and product but not
personnel.

In any event, there is a great difference between having folks come to you and
say
	"We notice your quality/productivity measures for the past
	quarter have been lower than hoped for, what are you going
	to do about it."

as opposed to saying

	"We notice your quality/productivity measures for the past 
        quarter have been lower than hoped for, what we do to help
        you improve."

Most metrics programs seem to start with the former message implied (largely
because the latter is not made explicit).

>Keith L. Breinholt		hellgate.utah.edu!uplherc!unislc!klb or
>Unisys, Unix Systems Group	kbreinho@peruvian.utah.edu

Speaking only for myself, of course, I am...
Scott P. Duncan (duncan@ctt.bellcore.com OR ...!bellcore!ctt!duncan)
                (Bellcore, 444 Hoes Lane  RRC 1H-210, Piscataway, NJ  08854)
                (908-699-3910 (w)   609-737-2945 (h))

jgautier@vangogh.ads.com (Jorge Gautier) (04/04/91)

In article <1991Mar25.164133.29674@unislc.uucp> klb@unislc.uucp (Keith L. Breinholt) writes:
>   Here's a simple fact of life--Measurement of a process or skill is the
>   first step towards control of the same.

No.  Realization that the process or skill needs to be controlled is
the first step towards control of the same.

>   Now, serious professionals are the type that like to improve their
>   skills through whatever means is necessary.  If I have a method of
>   measuring when I'm doing better versus when I screw up, I've just
>   learned how to improve my skills.

IF you have a method of measuring the quality of the process (and I
think that's assuming a lot), you still have to be willing to CHANGE
the process in order to improve it.  Changing the way they do things
is a very scary thought to many people.  Don't ask me why, I've never
understood it.

>   Now if a supposed professional comes to me and tells me that they
>   don't want to be measured (i.e. they don't want to improve).  I have
>   real doubts about the future of that individual.

I don't mean to sound negative about this, but your i.e. is bullshit.
The people who don't want to change are the ones who don't want to
improve.  Measurement has nothing to do with it.  Some people don't
want to be measured because they know that the metrics being used are
bogus.  If a supposed manager comes to me and tells me that they want
to "measure the software development process," I have real doubts
about the past and future of that individual.
--
Jorge A. Gautier| "The enemy is at the gate.  And the enemy is the human mind
jgautier@ads.com|  itself--or lack of it--on this planet."  -General Boy
DISCLAIMER: All statements in this message are false.

mcgregor@hemlock.Atherton.COM (Scott McGregor) (04/04/91)

In article <JGAUTIER.91Apr3131954@vangogh.ads.com>,
jgautier@vangogh.ads.com (Jorge Gautier) writes:

> Changing the way they do things
> is a very scary thought to many people.  Don't ask me why, I've never
> understood it.

I think there are two very good psychological bases for this fear.  First,
experience conveys advantage. We see that in "the learning curve" or
"experience curve" effect; those who have done it the most times can often
do it again (the same way) fastest.  When you offer someone experienced
to change, you offer the opportunity to start over BEHIND the people they
used to be in front of.  In organizations with a meritocracy (e.g.
ranking between individuals leads to commesurate benefits) this means a
short term loss in ranking and commensurate benefits.  Long term thing might
be different but the short term affect is enough to introduce hesitancy and
a desire to "time" the transition to a convenient time, hence resistance to
change.

The second reason is that whenever there is change, there is the possibility
that not only will you lose experience, but quite possibly competancy.  For
example, I am partially red-green color blind.  If a change were to make
to my job that required considerable facility in color recognition, I might
no longer be qualified.  That's a clear physical limitation.  But there are
also mental limitations.  I have some facility with mathematics, but there
are still areas of mathematics which despite considerable effort on my part
have overwhelmed me.  Again, a change that required that sort of mathematical
facility could undermine my success, or even competency for my job. Sure,
I could probably find another one in a different area, but I have a lot
of psychological investment in what I have enjoyed doing well in the past.

People often comment on how "young people" are so much more open to change
than their elders.  The above two factors go a long way to explain why.
When you are at the bottom of the experience curve, starting over on a
different one, seems to matter relatively little.  Moreover, with so
little background invested in one area, going into a different area
costs less psychically.

Scott McGregor
Atherton Technology

lfd@cbnewsm.att.com (Lee Derbenwick) (04/05/91)

In article <JGAUTIER.91Apr3131954@vangogh.ads.com>,
jgautier@vangogh.ads.com (Jorge Gautier) writes:
> In article <1991Mar25.164133.29674@unislc.uucp> klb@unislc.uucp (Keith L. Breinholt) writes:
[ ... ]
> >   Now if a supposed professional comes to me and tells me that they
> >   don't want to be measured (i.e. they don't want to improve).  I have
> >   real doubts about the future of that individual.
> 
> I don't mean to sound negative about this, but your i.e. is bullshit.
> The people who don't want to change are the ones who don't want to
> improve.  Measurement has nothing to do with it.  Some people don't
> want to be measured because they know that the metrics being used are
> bogus.  If a supposed manager comes to me and tells me that they want
> to "measure the software development process," I have real doubts
> about the past and future of that individual.

A metric isn't necessarily bogus, though I suspect that any _single_
metric of the software process, in isolation, omits vastly more than it
measures.  And all the numerical metrics we've got, put together, still
omit a significant amount of what you'd _like_ to measure.  So our best
metrics, combined, still can't describe the whole process.  But I don't
fault people for wanting to measure the software process, as long as
they know that they can only measure _some aspects of some pieces_.

On the other hand, sometimes _truly_ bogus metrics get introduced...

My candidate for truly-bogus-productivity-metric, one that I've actually
seen used (briefly): number of program changes ("PC"s) per staff-year.
In an environment where there is a large body of existing code being
maintained and enhanced with new features, it is superficially plausible
that the more program changes, the more old bugs and new features are
being dealt with.

Unfortunately, the easiest way to be a "productive" developer by that
metric is to code sloppily, skip unit testing, and let the system test
group find your bugs for you.  Each bug they find has to be fixed and
PCed, so you up your PC count with minimal effort.  (And avoid unit
testing, besides!)

Bogus metrics like this one, rewarding the _opposite_ of desired
behavior, are among the worst.  And if I "don't desire to be measured"
according to them, it is _not_ because I don't wish to improve.

 -- Speaking strictly for myself,
 --   Lee Derbenwick, AT&T Bell Laboratories, Warren, NJ
 --   lfd@cbnewsm.ATT.COM  or  <wherever>!att!cbnewsm!lfd

duncan@ctt.bellcore.com (Scott Duncan) (04/05/91)

In article <34953@athertn.Atherton.COM> mcgregor@hemlock.Atherton.COM (Scott McGregor) writes:
>
>                                     When you offer someone experienced
>to change, you offer the opportunity to start over BEHIND the people they
>used to be in front of.  In organizations with a meritocracy (e.g.
>ranking between individuals leads to commesurate benefits) this means a
>short term loss in ranking and commensurate benefits.  Long term thing might
>be different but the short term affect is enough to introduce hesitancy and
>a desire to "time" the transition to a convenient time, hence resistance to
>change.

And this is heightened if there is not obvious "merit" assigned to change, but
simply the expectation that change will occur and that people will "produce"
at some acceptable level throughout.  If it is very clear that the opinions of
those who judge merit consider output as the important factor (i.e., "what have
you done for me lately"), then changing the input or the way input is turned
into output is less likely to be seen as encouraged.

Change often means some reduction in performance level while mastery is being
achieved.  If you can see this happening and the merit system only looks at
levels of output as a constant, then change is quickly equated with diminished
rewards.  The person undergoing the change has to look for future rewards and
sacrifice near-term ones.

>The second reason is that whenever there is change, there is the possibility
>that not only will you lose experience, but quite possibly competancy.

I think this is just the other side of the coin rather than a second reason.
The issue with experience is that you will, likely, gain that back after a
while.  As you've described it based on some skill/ability it appears you
might never gain (or which impeds you), it is a permanent loss of "merit"
since the rules for what is valued change and you cannot change.

I think both have to do with what is considered valued and how that is communi-
cated to people.  In very competitive environments, "professionals" are ex-
pected to manage their own growth to a large extent, but they are often not
able to determine what that growth should be.

>                                                       but I have a lot
>of psychological investment in what I have enjoyed doing well in the past.

And I think this is a key point.  It is perfectly okay to expect professionals
to be responsible for their development, but one cannot be surprised that,
after many years of skill-building (and reward for that skill, don't forget),
when the values change, people resist.  People want to feel supported (appre-
ciated) during the period of change.  I think leaving too much up to simple
"professional" expectations leaves people with the feeling that they are on
their own.

>People often comment on how "young people" are so much more open to change
>than their elders.  The above two factors go a long way to explain why.
>When you are at the bottom of the experience curve, starting over on a
>different one, seems to matter relatively little.

You've also not been reinforced in your behavior for many years through the
merit system that told you what you were doing was good.  Older employees are
often praised for being "dependable" while younger ones are rewarded for their
willingness to adapt and change.

Accepting change is often tough for those who seem to mandate it as well.  It
is one thing to be supportive of change in a theoretical sense as a value.  It
is another to be committed to change as a way of life (even for a while).  One
can delegate (or mandate) responsibility for change onto others; one cannot
delegate commitment to change if people have not been a part of the decision
to change.

Watts Humphrey in his book _Managing_for_Innovation_ makes this point.  He also
notes that professionals, out of a sense of responsibility and some pride, will
try to fulfill expectations that others place upon them (e.g., tight schedule).
However, this is not the same as being committed to the obligation because one
has their own credibility at stake (e.g., they came up with the schedule them-
selves).

People who have been around a while have also seen lots of fads (which bring
change with them) come and go -- or, at least, come and get superceded by the
next one.  Hence, it is often harder to get experienced employees as excited
by change because they see the surface/outward trappings change, but life going
on pretty much as always from the sociasl/political/rewards perspective.

Speaking only for myself, of course, I am...
Scott P. Duncan (duncan@ctt.bellcore.com OR ...!bellcore!ctt!duncan)
                (Bellcore, 444 Hoes Lane  RRC 1H-210, Piscataway, NJ  08854)
                (908-699-3910 (w)   609-737-2945 (h))

alan@tivoli.UUCP (Alan R. Weiss) (04/06/91)

In article <JGAUTIER.91Apr3131954@vangogh.ads.com> jgautier@vangogh.ads.com (Jorge Gautier) writes:
>In article <1991Mar25.164133.29674@unislc.uucp> klb@unislc.uucp (Keith L. Breinholt) writes:
>>   Here's a simple fact of life--Measurement of a process or skill is the
>>   first step towards control of the same.
>
>No.  Realization that the process or skill needs to be controlled is
>the first step towards control of the same.

Bzzt.  Wrong.  The *first* step is identifying a problem in specific terms.
If there is not a problem, why bother?  Note that this can include a
desire to increase productivity and/or quality.  But you better be specific.
And numbers help here.

>>   Now if a supposed professional comes to me and tells me that they
>>   don't want to be measured (i.e. they don't want to improve).  I have
>>   real doubts about the future of that individual.
>
>I don't mean to sound negative about this, but your i.e. is bullshit.
>The people who don't want to change are the ones who don't want to
>improve.  Measurement has nothing to do with it.  Some people don't
>want to be measured because they know that the metrics being used are
>bogus.  If a supposed manager comes to me and tells me that they want
>to "measure the software development process," I have real doubts
>about the past and future of that individual.
>--
>Jorge A. Gautier| "The enemy is at the gate.  And the enemy is the human mind
>jgautier@ads.com|  itself--or lack of it--on this planet."  -General Boy
>DISCLAIMER: All statements in this message are false.

Some people may not have a standard by which to compare themselves.
True, some people don't want to change.  We have a term for that:
unemployed.  In the '90's, we can be sure that change is constant ;-)

If the metrics are bogus, then fix them by including the workers
in the process.  In this case, "process" can mean calling a short
meeting, identifying dumb metrics, and coming up with meaningful ones.

But to discredit measurements because some are unreliable is to
throw the baby out with the bath water.

If I were YOUR manager, I would have YOU measure yourself.  And rest
assured, you would not question my future because of it :-)
You *might* even appreciate the chance to improve your skills
(and therefore your value) in a non-threatening fashion.

Clearly the absence of measurements relegates software creation to
the arts, rather than as an engineering and/or scientific discipline.
If this is your desire, you should make sure you get what you
bargained for.

No one is more aware than I of the role of people in the process.
But treating programmers like anything less than professionals is insulting.
Surely you don't mean this, do you Jorge?  :-)

_______________________________________________________________________
Alan R. Weiss                           TIVOLI Systems, Inc.
E-mail: alan@tivoli.com                 6034 West Courtyard Drive,
E-mail: alan@whitney.tivoli.com	        Suite 210
Voice : (512) 794-9070                  Austin, Texas USA  78730
Fax   : (512) 794-0623
_______________________________________________________________________

duncan@ctt.bellcore.com (Scott Duncan) (04/08/91)

In article <549@tivoli.UUCP> alan@tivoli.UUCP (Alan R. Weiss) writes:
>
>Bzzt.  Wrong.  The *first* step is identifying a problem in specific terms.

Agreed and measurement can help do this.  However, identifying a place where
you think you want to change can direct early measurement effort toward some
goal.  Usually, a problem prompting people to look for a solution manifests
itself as the need to reduce the cost of development, improve the quality of
the output, etc.  Measurement can make the terms in which that problem are ex-
pressed more specific.

>If there is not a problem, why bother?

Agreed but individuals may feel like they are having no problems while the
organizational output seems to have some.  If individuals are not used to
public visibility of their work, things may seem fine to them as they are
getting "their work done" to their own satisfaction.  If the standards of
performance (day to day...I am not discussing merit reviews, etc.) are up
to the individual professional alone, there can be tremendous variation in
the direction in which folks are headed.  (Perhaps any one of the directions
is fine, but a hal dozen different ones makes for organizational chaos.)

>                                        Note that this can include a
>desire to increase productivity and/or quality.  But you better be specific.

I'm not sure you are implying that you can't have both.  But it sounds like it.
From what I have heard many organizations report, you can have both (except,
perhaps, in the most stringent quality situations).  Most recently, I heard
a speaker indicate that their best day for quality so far was also their
most productive.  (This was not in software, but in a delivery service con-
text.  Large software producers have reported that a focus on quality has
led them to improved productivity.  It is not clear that the reverse would
be as true.)

>Some people may not have a standard by which to compare themselves.

Or it is their own personally developed one which is fine for their own sense
of well-being, but have much less value in a larger context.

>True, some people don't want to change.  We have a term for that:
>unemployed.  In the '90's, we can be sure that change is constant ;-)

Yes...people talk about this a great deal.  I am not sure...other than the day
the axe falls, that the sense of urgency is portrayed effectively enough so a
large enough number of professionals (and their management) feel this.  There
has to be a better way to get started down the road toward improvement than
simply to suddenly become unemployed.  That simply says people missed the boat
and that the situation went well beyond where they could do anything about the
situation.  We need to be able to tell that things are heading out of control
at a point where something can be done about it -- and shown to people so they
understand what to do.  (Threats of job loss just get people looking for some
more secure job.  They do not get people to change very effectively.)

>If the metrics are bogus, then fix them by including the workers
>in the process.

Hopefully, you try to avoid putting in "bogus" metrics in the first place by
including workers in the process of instituting them in the first place.

>                 In this case, "process" can mean calling a short
>meeting, identifying dumb metrics, and coming up with meaningful ones.

I suggest it will take longer than "a short meeting" to do this and get some
useful measurement effort in place.  Being too efficiency-minded at the outset
is likely to get "bogus" measures or process to occur.  One of the complaints
heard most often about metrics efforts is that they take too much time.  Too
short a preparation period is likely to have just this effect because the ef-
fort required to collect and analyze the results falls on the shoulders of
those who need to be creating th system and may be ill-prepared to initiate a
measurement program.

Probably the value of a short meeting is, as you say, to "identify dumb
metrics" and avoid them.  Then encourage folks to propose useful ones, describe
how they could be most painlessly collected and analyzed, and suggest changes
they would like to see made in their development process.  Arrange another
meeting to have this information presented and see what convergence of views
can be reached.  If you can achieve agreement here, then setting folks to
work on implementing some measures should go more smoothly.

>But to discredit measurements because some are unreliable is to
>throw the baby out with the bath water.

Absolutely.  On the other hand, to parade the idea of measurement around just
because it seems "scientific" and "professional" will make a joke out of the
effort fairly quickly.

>If I were YOUR manager, I would have YOU measure yourself.

And coupling that with the opportunity to improve oneself as you suggest would
make the measurement effort more meaningful.

However, as I pointed out in my previous posting, most measurement programs
seem to get initiated from top-down at some large organizational level.  It
is rare that things are startewd from the bottom up since the reason for the
program always seems to be a management desire for information/data and not
individual improvement.  (Management expects that the latter will happen as
if by some "proferssional" magic.)

>Clearly the absence of measurements relegates software creation to
>the arts, rather than as an engineering and/or scientific discipline.

Agreed.

>No one is more aware than I of the role of people in the process.
>But treating programmers like anything less than professionals is insulting.

But I feel the definition of what lots of people think it means to be a "pro-
fessional" includes _not_ being measured too specifically since it implies a
lack of trust.  I think we have to change the sense of what it means to be
measured, targeting improvement rather than evaluation, as well as establish
that "professionalism" means more public sharing of what's going on, i.e.,
more visibility, not because of lack of trust, but because of the necessity
for growth through exchange of experience.

>Alan R. Weiss                           TIVOLI Systems, Inc.
>E-mail: alan@tivoli.com                 6034 West Courtyard Drive,
>E-mail: alan@whitney.tivoli.com	        Suite 210
>Voice : (512) 794-9070                  Austin, Texas USA  78730
>Fax   : (512) 794-0623

Speaking only for myself, of course, I am...
Scott P. Duncan (duncan@ctt.bellcore.com OR ...!bellcore!ctt!duncan)
                (Bellcore, 444 Hoes Lane  RRC 1H-210, Piscataway, NJ  08854)
                (908-699-3910 (w)   609-737-2945 (h))

lfd@cbnewsm.att.com (Lee Derbenwick) (04/08/91)

In article <549@tivoli.UUCP>, alan@tivoli.UUCP (Alan R. Weiss) writes:

> If the metrics are bogus, then fix them by including the workers
> in the process.  In this case, "process" can mean calling a short
> meeting, identifying dumb metrics, and coming up with meaningful ones.

And how are the workers to know how to measure the process?  We may be
able to _reject_ certain measures as bogus, but that is much easier
than creating good measures, which is an open research area.

Here are a couple of quality metrics I would _like_: they seem to capture
two key areas of software quality -- faults and maintainability.  Both,
unfortunately, violate causality:

1. Total number, severity, and time-to-discovery of remaining faults that
   will be experienced by customers as software failures.

2. Cost of introducing the next several enhancements that will be required
   of this code.

Please suggest how a "short meeting" of software developers could come up
with _feasible_ versions of these.  (It is _not_ feasible to wait a year
or two for the results of measurement -- by then, you've already made
changes to your development process and probably to your staff, so the
time constant is too long to use the metrics for process improvement.)

> Clearly the absence of measurements relegates software creation to
> the arts, rather than as an engineering and/or scientific discipline.

Yes, the _absence_ of measurements would relegate software creation
to the arts.  But the fact that our measurement capabilities are
incomplete forces it to be a mix of science and art -- just as the
other branches of engineering are.

Note that I am _not_ saying that there is no basis for measurements,
but many of the ones I've seen published seem to rest on very shaky
assumptions.  E.g., cost to maintain may be statistically correlated
with some function of the cyclomatic numbers of the routines composing
a module, but a statistical correlation doesn't say anything about any
specific case.  Treating it as if it does (unless the correlation is
nearly perfect) is pseudo-science, not science.

 -- Speaking strictly for myself,
 --   Lee Derbenwick, AT&T Bell Laboratories, Warren, NJ
 --   lfd@cbnewsm.ATT.COM  or  <wherever>!att!cbnewsm!lfd

jgautier@vangogh.ads.com (Jorge Gautier) (04/09/91)

In article <549@tivoli.UUCP> alan@tivoli.UUCP (Alan R. Weiss) writes:
>> No.  Realization that the process or skill needs to be controlled is
>> the first step towards control of the same.
>
> Bzzt.  Wrong.  The *first* step is identifying a problem in specific terms.
> If there is not a problem, why bother?  Note that this can include a
> desire to increase productivity and/or quality.  But you better be specific.
> And numbers help here.

The problem is the cost (in the most inclusive sense) of producing
software.  The numbers are at least monetary units and time units.
Cost is related to process and system complexity.  Since we wouldn't
want to unnecessarily restrict ourselves to low complexity systems, we
could improve the efficiency of the process (if we wanted to ;-).

> If the metrics are bogus, then fix them by including the workers
> in the process.  In this case, "process" can mean calling a short
> meeting, identifying dumb metrics, and coming up with meaningful ones.

I don't know how to fix them, and I'm not particularly interested in
figuring it out.  Most processes are too informal to measure
effectively.  By effectively I mean that the metric should give you a
nonambiguous message of what needs to be done given your intentions.
For example, a coverage measuring tool measures the amount of code
exercised by tests.  If you want 90% coverage and you are getting 30%,
the message is clear: write more and/or better tests.  Testing is a
well formalized process and can be meaningfully measured in this way.
Contrast this with metrics like "we wrote a 1000 line program in three
months and found 50 defects."  What does this tell you?  Is it good
or bad?  Should you do anything about your process because of these
metrics?  Lines of code, hours of programming and number of defects
are so variable and dependent on so many informal factors that their
effectiveness is limited.  Really, am I the only one who can see the
difference between these two examples?

> But to discredit measurements because some are unreliable is to
> throw the baby out with the bath water.

Measurement is a useful approach in some circumstances.  However, if
measurements don't "tell you what to do" about your process they can
be misleading at best.

> If I were YOUR manager, I would have YOU measure yourself.  And rest
> assured, you would not question my future because of it :-)
> You *might* even appreciate the chance to improve your skills
> (and therefore your value) in a non-threatening fashion.

How much are you paying for this self-measurement job? :-)  Thanks,
but no, thanks.  I prefer to construct software rather than measure
something that I don't even know what it is.  I already work regularly
on improving my skills as a matter of principle.

> Clearly the absence of measurements relegates software creation to
> the arts, rather than as an engineering and/or scientific discipline.
> If this is your desire, you should make sure you get what you
> bargained for.

The absence of formality in software development processes is what
makes their measurement unreliable and relegates software creation to
the arts.  I don't desire absence of measurements, only of unreliable
ones. 

> No one is more aware than I of the role of people in the process.
> But treating programmers like anything less than professionals is insulting.
> Surely you don't mean this, do you Jorge?  :-)

No :-).  (BTW, I love the way the word "professional" is used to keep
programmers "in line".  This is not the first time I've seen it done.
Students, take note; this is usually not taught in school. :-)
Programmers may be professionals, but they are not engineers.
Just introducing some random kind of measurement into their process
won't automatically make them engineers.

--
Jorge A. Gautier| "The enemy is at the gate.  And the enemy is the human mind
jgautier@ads.com|  itself--or lack of it--on this planet."  -General Boy
DISCLAIMER: All statements in this message are false.

mcgregor@hemlock.Atherton.COM (Scott McGregor) (04/12/91)

In article <JGAUTIER.91Apr8184934@vangogh.ads.com>,
jgautier@vangogh.ads.com (Jorge Gautier) writes:

	960-7300
References: <9233@castle.ed.ac.uk> <1991Mar25.164133.29674@unislc.uucp>
	<JGAUTIER.91Apr3131954@vangogh.ads.com> <549@tivoli.UUCP>
Distribution: comp.software-eng
Date: 8 Apr 91 18:49:34
Lines: 77

In article <549@tivoli.UUCP> alan@tivoli.UUCP (Alan R. Weiss) writes:

> Contrast this with metrics like "we wrote a 1000 line program in three
> months and found 50 defects."  What does this tell you?  Is it good
> or bad?  Should you do anything about your process because of these
> metrics?  Lines of code, hours of programming and number of defects
> are so variable and dependent on so many informal factors that their
> effectiveness is limited. 

My answers to above: a) tells you to inspect your process for flaws,
b) it is bad (defects were known to be present), c) Yes, you should try
to understand what is causing the flaws. For more examples on what specifically
you should do to understand what your process better, and to understand
how this analysis is arrived at, read on (otherwise skip ahead the following
is long--sorry!)

Perhaps simple metrics like 1000lines/3 months, or 50defects/1000 lines
don't tell you much without comparison to similar metrics from other
people.  That may still be an argument for comparative metric usage:
If one process/person generates substantially different results from
the rest of people, it might indicate a fruitful place to do some
study just to understand what is different.  A much lower than average
number of defects/month might mean a poorer defect detection process or
a better defect avoidance process.  A little study might help you determine
which. And knowing that might help you repeat/avoid that cause in the future.
Or it might be that it is unavoidable/uncontrollable in the future, but
at least it would be more predictable.

That said, I think that there is also non comparative information 
in the metrics.  There are some absolute values to compare against.
Fewer defects (dectected + undected) is better than more defects;
hence any defect found is an opportunity for improvement.  The total
number of defects being nonzero does not provide an absolute answer
as to what is wrong, but it suggests some useful experiments or
questions to answer (see below) in order to reduce defects in the future.
Similarly, fewer months is better--time is money, and more time also 
introduces more likelihood of mis-estimation (not on a percentage basis,
but definately on an absolute basis).  Fewer lines is also better.
Again on an absolute level fewer lines means fewer possible errors.  Less
time to test, or inspect in general.

So what are some questions the above metrics suggest we explore?
They tell me a number of things.  First of all, you had 50 defects.
You might want to see if there is anything you can do to your process
to find those 50 defects sooner.  Some defects might have been
side-effects of other defects. Early recognition might have avoided
these.  You might want to see if there is anything you can do to prevent
some of the other defects.  Are some of them defects that lint would
catch?  Can you make lint run before things are checked in each time?  
Would a language sensitive editor catch
some of those things? (maybe force all switch statements to have a
specified default that catches all unexpected values?)  Would inspections
find them sooner? If you ran branch coverage tests would you have 
discovered some defects sooner?  

There was a rate of over  18 defects a month.  On average there are 22 
working days a month. This is almost a defect a day.  Is there something
about your working area that causes people to become distracted and more
error prone? Or is there something in your environment that perhaps
keeps distractions *down* to such a level that you are not getting even more
distractions!

There were 1000 lines. At 66 lines/printed page that is at least 16
pages. At 24 lines per window that is at least 42 window-fulls.
Is there a place that the engineer can put 16 pages of print out so 
all are visible at one time?  If not, does the fact that the engineer
has to "swap" between pages make it difficult to understand the code
and lead to errors (in this respect this IS a valid measure of complexity).
Are things isolated enough that all the relevant pieces can be in a
single window-full, or does swapping here lead to problems too.  Even
if there is a wall or table where you can put 15 printed pages, can an
engineer actually see and understand all of it at once?  Is there a more
compact representation that would have made errors more obvious? Taken less
time to write and debug?

1000 lines / 3 months / 22 days /month = 15.15 lines / day. 15.15 lines / day /
8 work hours/ day =  1.89 lines /hour.   Clearly typing a couple of lines
does not take an hour.  There must be something else going on.  But what?
Are meetings take up a lot of time?  Is training taking up a lot of time?
Does that mean we are asking people to do things that they don't know
how to do? Are people spending a lot of time looking things up in
manuals?  Do they
have the manuals or do they have to walk somewhere to get them.  If they
are on line is access fast?  Are they easy to find things in? easy to
read? easy to understand?  Or is a lot of this "thought" time.  Does this
indicate that the task is complex?  Or just poorly understood?  Do people
need to spend time contacting other people and querying them in order to
better understand things?  Are multiple people developing simultaneously?
Are some of the delays due to coordination problems?  Might there be a
better way to coordinate and speed things?  Or maybe a different design
that isolates
interactions (reduces coordination delays) more? 

In conclusion, I think that the  problem with metrics such as the ones
Jorge gives is not in their derivation, but in ourselves.  Mostly when
people see numbers like this they are annoyed because they don't give
specific answers about what to do to improve your process.  That is
correct.  But they do provide benefit in that they can suggest specific
questions.  Unfortunately, this is where one problem in ourselves lies.
Mostly we don't really want more questions.  Mostly we don't enjoy the
painstaking observation and study necessary to answer these questions.
We might well prefer to get on with fun activities like coding.
We want answers, not more questions.  Since metrics like these raise more
questions, we don't want them.  In fact, we don't want them so strongly,
that we ignore useful information in them that would have to be derived.
1000 lines in 3 months is hard to understand what it means 2 lines / hour
is more understandable, but it takes some work to get from the former to
the latter.  We don't like these metrics puzzlers, so  we don't bother
to do the work to really understand the number. Instead we allow
ourselves to get more numb about these metrics (number and number about
more numbers? :-)
Then we get all hung up on whether it was 1030 statements by counting
semicolons, or 988 counting for expressions as one statement (either one
comes to "about 2 lines /hour" which means this difference doesn't matter).
When metrics become hard to ground in reality (like the size of the US national
debt!) we call them meaningless. And of course this is self-fulfilling
because if we don't investigate they ARE meaningless. 

To make progress from metrics like these is like trying to take collected
wisdom of the alchemists and derive the period table. A lot of people
don't want to do that work.  They want to do the scientific predictive
work that is only possible once the periodic table is derived.  But
we aren't there yet. So computer "scientists" will be frustrated by
such metrics until computer "alchemists" slog through the awful questions,
observations and compilation chores, and stumble on significant parts of
the periodic table of computer programming elements that make prediction
possible.

Scott McGregor
Atherton Technology
mcgregor@atherton.com

marick@m.cs.uiuc.edu (Brian Marick) (04/12/91)

jgautier@vangogh.ads.com (Jorge Gautier) writes:

>... By effectively I mean that the metric should give you a
>nonambiguous message of what needs to be done given your intentions.
>For example, a coverage measuring tool measures the amount of code
>exercised by tests.  If you want 90% coverage and you are getting 30%,
>the message is clear: write more and/or better tests.  Testing is a
>well formalized process and can be meaningfully measured in this way.

Bad analogy.  It is by no means clear what 90% coverage means in terms
of the measure that really counts: what proportion of bugs are left in
the program.  Or, wait: what *really* counts is how many *failures*
the average customer sees.  No, that's not quite it: following Deming
and Taguchi, what counts is also the variability from customer to
customer.

And what's the difference between 90% coverage, 80% coverage, and 100%
coverage?  Does it depend on the program?  

Suppose you have two test suites.  The first was developed from the
specification (black box), and achieved 90% coverage.  The second was
developed from the specification, achieved 50% coverage, and then was
augmented to reach 90% coverage.  Which is a better test suite?  Which
is more likely to detect bugs due to missing code?  Which signals a
better test process?

I don't mean to say that measuring test coverage is useless, because I
happen to find it very useful -- when interpreted in context.  The
best numbers focus your attention, say, "Hey!  Look over here; there's
something odd going on."  Without them, problem solving is next to
impossible; with them, it's merely difficult.  (Shewhart control
charts, used in manufacturing quality control, are a perfect example
of what I mean.)

Now, it's debatable whether complexity metrics focus our attention in
the right place.  But it's not fair to complain because they require
interpretation.

Brian Marick
Motorola @ University of Illinois
marick@cs.uiuc.edu, uiucdcs!marick

alan@tivoli.UUCP (Alan R. Weiss) (04/12/91)

In article <1991Apr8.163111.3968@cbnewsm.att.com> lfd@cbnewsm.att.com (Lee Derbenwick) writes:
>In article <549@tivoli.UUCP>, alan@tivoli.UUCP (Alan R. Weiss) writes:
>
>> If the metrics are bogus, then fix them by including the workers
>> in the process.  In this case, "process" can mean calling a short
>> meeting, identifying dumb metrics, and coming up with meaningful ones.
>
>And how are the workers to know how to measure the process?  We may be
>able to _reject_ certain measures as bogus, but that is much easier
>than creating good measures, which is an open research area.

While I believe that this *IS* an open area of research, as you say,
I TRULY believe that people-who-are-doing-the-work ("workers")
MUST be involved in setting their own standards.  How are they to
know?  Management outlines the broad parameters of the problem
and/or improvement goals, and then provides leadership.  If you
do not believe this, then you are ignoring the last 10 years of
management research (In Search of Excellence, Theory Z, One
Minute Management, Japanese Management, etc. ad nauseum).

It is easy to start the measuring process IF you get participant
buy-in, something scientists forget is crucial to success.
(But hey, lots of managers forget this, too!).  Once you get
buy-in to the concept, you step-wise refine the metrics selected
as you LEARN what makes sense.  You do a GREAT disservice trying
to avoid the learning process itself.  Organizations and people
MUST be encouraged to react quickly, make mistakes, and fix them.

>Here are a couple of quality metrics I would _like_: they seem to capture
>two key areas of software quality -- faults and maintainability.  Both,
>unfortunately, violate causality:
>
>1. Total number, severity, and time-to-discovery of remaining faults that
>   will be experienced by customers as software failures.
>
>2. Cost of introducing the next several enhancements that will be required
>   of this code.

The first one should be "mom and apple pie" for all software orgs.
The second one is basic risk analysis, and its good, too.

>Please suggest how a "short meeting" of software developers could come up
>with _feasible_ versions of these.  (It is _not_ feasible to wait a year
>or two for the results of measurement -- by then, you've already made
>changes to your development process and probably to your staff, so the
>time constant is too long to use the metrics for process improvement.)

Hire me and I'll set this up for you :-)  Just kidding ... I LIKE
TIVOLI Systems.  Look, this is deceptively easy:

1.  Do your homework and find out what your current state is
	(the SEI calls this Self-Assessment).

2.  Decide what needs to be improved, and make it quantifiable.

3.  Define some *preliminary* metrics and ways to go about measuring.

4.  Go around to the development managers and get their buy-in.

5.  Go around to the key influencers in the organization and clue them in.

6.  Have the Development Managers and QA call a joint meeting
	on the theme "Development Quality Program"

7.  Learn to listen.

8.  Explain what you're trying to do, how it adds value by saving
	money, schedule, and improves market share, etc. etc.
	Be prepared to back it up with case studies and evidence.

9.  Outline the broad objectives, then turn the meeting over to THEM.

10. Listen.  Guide.  Brainstorm.  Then set some preliminary goals.

11. At various checkpoints, review your data and analyzed information
	to see if its telling you what you need to know.  Keep the
	developers/engineers psted on the data.  This is not QA's
	game, this is a Development effort.

12. Stepwise refine.

Now, its going to be VERY easy to poke lots of holes in this process,
so I suggest you contact the Watts Humphrey at the Software Engineering
Institute (SEI) at Carnegie-Mellon for a more formalized definition.`
But frankly, I just use common sense.  :-)

QA SUPPORTS THE DEVELOPMENT PLAN AND THE BUSINESS PLAN.

>> Clearly the absence of measurements relegates software creation to
>> the arts, rather than as an engineering and/or scientific discipline.
>
>Yes, the _absence_ of measurements would relegate software creation
>to the arts.  But the fact that our measurement capabilities are
>incomplete forces it to be a mix of science and art -- just as the
>other branches of engineering are.

Looking for the magic bullet reminds me of cancer research:  prevention
is a LOT better than correction. 

>Note that I am _not_ saying that there is no basis for measurements,
>but many of the ones I've seen published seem to rest on very shaky
>assumptions.  E.g., cost to maintain may be statistically correlated
>with some function of the cyclomatic numbers of the routines composing
>a module, but a statistical correlation doesn't say anything about any
>specific case.  Treating it as if it does (unless the correlation is
>nearly perfect) is pseudo-science, not science.

Yes.

>
> -- Speaking strictly for myself,
> --   Lee Derbenwick, AT&T Bell Laboratories, Warren, NJ
> --   lfd@cbnewsm.ATT.COM  or  <wherever>!att!cbnewsm!lfd

Good posting.  Hope I've helped a little.

_______________________________________________________________________
Alan R. Weiss                           TIVOLI Systems, Inc.
E-mail: alan@tivoli.com                 6034 West Courtyard Drive,
E-mail: alan@whitney.tivoli.com	        Suite 210
Voice : (512) 794-9070                  Austin, Texas USA  78730
Fax   : (512) 794-0623  "I speak only for myself, not necessarily my firm"
_______________________________________________________________________

jgautier@vangogh.ads.com (Jorge Gautier) (04/12/91)

In article <1991Apr12.134908.20245@m.cs.uiuc.edu> marick@m.cs.uiuc.edu (Brian Marick) writes:
> >... By effectively I mean that the metric should give you a
> >nonambiguous message of what needs to be done given your intentions.
> >For example, a coverage measuring tool measures the amount of code
> >exercised by tests.  If you want 90% coverage and you are getting 30%,
> >the message is clear: write more and/or better tests.  Testing is a
> >well formalized process and can be meaningfully measured in this way.

> Bad analogy.  It is by no means clear what 90% coverage means in terms
> of the measure that really counts: what proportion of bugs are left in
> the program.  Or, wait: what *really* counts is how many *failures*
> the average customer sees.  No, that's not quite it: following Deming
> and Taguchi, what counts is also the variability from customer to
> customer.

> And what's the difference between 90% coverage, 80% coverage, and 100%
> coverage?  Does it depend on the program?  

Percent coverage does not tell you how good your test suite is or how
well you have tested your code.  It measures exactly what it says: how
much of your code was exercised during execution.  Any other
interpretation is a likely self-deception.  I used this metric as an
example of one of the less bogus metrics.  Actually, I have found %
coverage less useful in practice (except for giving numbers to the
boss ;-) than the number of times that specific branches were taken.
By examining the coverage for specific branches you can sometimes
determine whether you have tested important situations.  Overall,
coverage metrics are some of the least bogus ones around, but I'm sure
that they can also be misused.

As for measuring the number of bugs left in a program, it seems
impossible without extensive use of formal and effective behavioral
specifications in the development process.
--
Jorge A. Gautier| "The enemy is at the gate.  And the enemy is the human mind
jgautier@ads.com|  itself--or lack of it--on this planet."  -General Boy
DISCLAIMER: All statements in this message are false.

cole@farmhand.rtp.dg.com (Bill Cole) (04/13/91)

In article <1991Apr12.134908.20245@m.cs.uiuc.edu>, marick@m.cs.uiuc.edu (Brian Marick) writes:
|> Jorge Gautier) writes:
|> 
|> >... By effectively I mean that the metric should give you a
|> >nonambiguous message of what needs to be done given your intentions.
|> >For example, a coverage measuring tool measures the amount of code
|> >exercised by tests.  If you want 90% coverage and you are getting 30%,
|> >the message is clear: write more and/or better tests.  Testing is a
|> >well formalized process and can be meaningfully measured in this way.
|> 
|> (Much stuff deleted)
|> 
|> And what's the difference between 90% coverage, 80% coverage, and 100%
|> coverage?  Does it depend on the program?  
|> 
|> I don't mean to say that measuring test coverage is useless, because I
|> happen to find it very useful -- when interpreted in context.  The
|> best numbers focus your attention, say, "Hey!  Look over here; there's
|> something odd going on."  Without them, problem solving is next to
|> impossible; with them, it's merely difficult.  (Shewhart control
|> charts, used in manufacturing quality control, are a perfect example
|> of what I mean.)
|> 

Tell me what you mean by 'coverage'.  I submit that if you can define 100%
coverage that testing the software might not be meaningful (or challenging
since we're talking about personal growth).

My worry is always that we haven't covered some specific, but minor, situation
that will eventually bite a user in the ear -- or lower.  We try to do at least
100% coverage on any new feature with both positive and negative tests.  The
interesting thing is that we can't build all the scenarios that our customers
will build in their own shops.  We do, however, learn from reported bugs.

Part of the bug closure process in my shop requires that a test be added to the
test suite so that we keep the bug out of future released software.  The down
side to all of this is that we know that we're not going to find all the bugs
in our software, but the positive side is that we're going to keep getting
better.

The views must be my own because nobody wants to share them,
/Bill

marick@m.cs.uiuc.edu (Brian Marick) (04/15/91)

cole@farmhand.rtp.dg.com (Bill Cole) writes:

>Tell me what you mean by 'coverage'.  

I was referring to code coverage.  For example, 100% branch coverage
means that running the program against your test suite forces every
branch in the program to be taken in both the TRUE and FALSE
directions.  100% statement coverage means that every statement has
been executed.  100% loop coverage is usually taken to mean that every
loop has executed 0, 1, or many times.

Code coverage is a measure of how thoroughly your test suite exercises
the actual text of the program.  The problem with coverage is that it
does NOT measure how thoroughly you've explored the program's
specification.  That is, it doesn't directly address your worry:

>... that we haven't covered some specific, but minor, situation
>that will eventually bite a user in the ear -- or lower.  

In general, there's no way to measure specification coverage, so
people resort to using code coverage as an approximation.  My point
was that the approximation gets better when you use it as a
springboard for thought.  For example, this has happened to me:

I have an unexercised branch.  I could simply derive a test case to
exercise it and be done with it.  Instead, I treat this as a signal
that there's some entire area of the specification that I've failed to
explore.  I develop several tests, only one of which exercises that
branch.  One of the *other* tests -- seemingly useless as far as
coverage is concerned -- finds a bug.

The next step is then to think a bit about why you missed that
specification area in the first place and how you can avoid missing
ones like it next time, when you might not have a coverage clue.  See
also Scott McGregor's recent note in this newsgroup.

Of course, even if you could measure specification coverage -- whether
you tested everything the program's supposed to do -- now you get to
worry about requirements coverage -- does the program do what the user
wants?  This is where it gets really interesting, where a lot of
payback lies.  Testing techniques are applicable.

Hope I've answered your question.

Brian Marick
Motorola @ University of Illinois
marick@cs.uiuc.edu, uiucdcs!marick

cole@farmhand.rtp.dg.com (Bill Cole) (04/16/91)

Scott McGregor writes:
|> 
|> (Bunch o' stuff deleted)
|> 
|> In conclusion, I think that the  problem with metrics such as the ones
|> Jorge gives is not in their derivation, but in ourselves.  Mostly when
|> people see numbers like this they are annoyed because they don't give
|> specific answers about what to do to improve your process.  That is
|> correct.  But they do provide benefit in that they can suggest specific
|> questions.  Unfortunately, this is where one problem in ourselves lies.
|> Mostly we don't really want more questions.  Mostly we don't enjoy the
|> painstaking observation and study necessary to answer these questions.
|> We might well prefer to get on with fun activities like coding.
|> We want answers, not more questions.  Since metrics like these raise more
|> questions, we don't want them.  In fact, we don't want them so strongly,
|> that we ignore useful information in them that would have to be derived.
|> 1000 lines in 3 months is hard to understand what it means 2 lines / hour
|> is more understandable, but it takes some work to get from the former to
|> the latter.  We don't like these metrics puzzlers, so  we don't bother
|> to do the work to really understand the number. Instead we allow
|> ourselves to get more numb about these metrics (number and number about
|> more numbers? :-)
|> 

1. What's a defect?  Is it a mis-spelling?  An annoying placement of a 
message?  A mis-mapped message?  Is a corruption of a database?  Is it
a crashed system?  The defect-is-a-defect-is-a-defect mentality doesn't
reflect the way we view products -- software or not.

2. How do we measure productivity in rev n+1 software?  That is, we
seldom deal with clean-sheet projects; we spend most of our lives refining
what we did before, maybe adding features or functionality.  But we don't
get to work on completely new projects for the vast majority of our
professional lives.  So how do you measure productivity/defects in that
environment?

3. Who cares what the lines-per-timeframe is? or the defects-per-line
if the software is delivered per the agreed-upon schedule and with the
agreed-upon level of quality (as measured by the known bug count of various
levels of problems)?  Notice that the assumption here is that the schedule
was 'reasonable' and that the quality level was 'rational' -- and that the
folks doing the developing agreed to all this.

We can measure and 'quantify' all day, but the bottom line is customer
satisfaction.  And the perception of quality.

I'm alone in the room, so the opinions are my own,
/Bill

cole@farmhand.rtp.dg.com (Bill Cole) (04/16/91)

|> I was referring to code coverage.  For example, 100% branch coverage
|> means that running the program against your test suite forces every
|> branch in the program to be taken in both the TRUE and FALSE
|> directions.  100% statement coverage means that every statement has
|> been executed.  100% loop coverage is usually taken to mean that every
|> loop has executed 0, 1, or many times.
|> 
|> Code coverage is a measure of how thoroughly your test suite exercises
|> the actual text of the program.  The problem with coverage is that it
|> does NOT measure how thoroughly you've explored the program's
|> specification.  That is, it doesn't directly address your worry:
|> 
|> >... that we haven't covered some specific, but minor, situation
|> >that will eventually bite a user in the ear -- or lower.  
|> 
|> In general, there's no way to measure specification coverage, so
|> people resort to using code coverage as an approximation. 
|> 
|> The next step is then to think a bit about why you missed that
|> specification area in the first place and how you can avoid missing
|> ones like it next time, when you might not have a coverage clue.  See
|> also Scott McGregor's recent note in this newsgroup.
|> 

I don't think that covering every branch, loop or case statement will
assure you of 100% coverage.  I'll admit to being unsure on this one.
It seems to me that, by the time I get to the point where each bit of
functionality is pretty sound on its own, it's the way my tests wander 
through the code paths that make the coverage 100%.  For instance:
You've got a working bit of software that rarely fails (say if takes
3 weeks of running flat out to reproduce the problem) and you finally
discover the bug was due to 'timing' or 'circumstances you couldn't
foresee' -- perhaps a failure condition that the software didn't deal
with particularly well.  Experience -- I won't tell you how much -- 
tells me that most really evil bugs in released software are either
in timing situations or they occur at the interface between pieces
of functionality.

We try to learn from our 'mistakes' -- bugs that we could have found
but didn't.  We have a set of tests that used to run for two weeks on
two machines; that test suite has been enhanced so that it now runs
for three weeks on seven machines.  The good thing, though, is that
the bugs that get by us now are much more obscure.

Thanks for the reply.

I represent these opinions to be my own,
/Bill

lfd@cbnewsm.att.com (Lee Derbenwick) (04/16/91)

In article <581@tivoli.UUCP>, alan@tivoli.UUCP (Alan R. Weiss) writes:
> In article <1991Apr8.163111.3968@cbnewsm.att.com> lfd@cbnewsm.att.com (Lee Derbenwick) writes:
> >In article <549@tivoli.UUCP>, alan@tivoli.UUCP (Alan R. Weiss) writes:
> >
> >> If the metrics are bogus, then fix them by including the workers
> >> in the process.  In this case, "process" can mean calling a short
> >> meeting, identifying dumb metrics, and coming up with meaningful ones.
> >
> >And how are the workers to know how to measure the process?  We may be
> >able to _reject_ certain measures as bogus, but that is much easier
> >than creating good measures, which is an open research area.
> 
> While I believe that this *IS* an open area of research, as you say,
> I TRULY believe that people-who-are-doing-the-work ("workers")
> MUST be involved in setting their own standards.  How are they to
> know?  Management outlines the broad parameters of the problem
> and/or improvement goals, and then provides leadership.  [ ... ]

This begs the question.  Of course the workers must be involved in
setting the standards.  But, right now, we know of _no_ metrics that
measure some of what we really want to measure.  No amount of
management leadership is going to allow anyone to create those
metrics.  But enough management pressure will force people to create
bogus ones.

> >Here are a couple of quality metrics I would _like_: they seem to capture
> >two key areas of software quality -- faults and maintainability.  Both,
> >unfortunately, violate causality:
> >
> >1. Total number, severity, and time-to-discovery of remaining faults that
> >   will be experienced by customers as software failures.
> >
> >2. Cost of introducing the next several enhancements that will be required
> >   of this code.
> 
> The first one should be "mom and apple pie" for all software orgs.
> The second one is basic risk analysis, and its good, too.

I'm afraid Alan missed the tense in #1.  It is (reasonably) easy to
measure what _did_ happen: but that tells you what your software
process was doing one or two years ago.  And if you haven't changed
your process at all in that time, you haven't been paying attention
even to qualitative information.  To improve your process _now_, you
need to know what faults your customers will experience as future
failures.  So it's far from "Mom and Apple Pie."

(In cases where you can accurately characterize your customers' usage
patterns, John Musa's work on software reliability gives you ways of
estimating MTBF for customers, which is close to what I want.  But
with new or significantly changed software, you often have no better
than guesses about the usage.)

The second metric also requires prediction of the future.  Measures
of structure, etc., can be useful at a crude level, but they can't
capture "What is the chance that the customer will decide they
desperately need something that violates a fundamental assumption of
this code, so that it will have to be totally rewritten?"  And I don't
know of any good way to measure fundamental assumptions.  Again, you
can find out later whether you had assumed too much -- but that tells
you about your process in the past.  "Basic risk analysis" isn't even
close, though it _is_ a first step in that direction.

There are metrics that tell you about your process right now: but in
most cases, they are very crude approximations, or they are based on
assumptions that are little more than guesswork.  If you change your
process based on them, you may be as likely to worsen as to improve it.

On the other hand, there are metrics to tell you very accurately about
your process at some time in the past.  Since it takes customers time
to find errors, these tends to be at least 6 months and easily two years
or more out of date.  But you can make significant process changes
within a few months.  In terms of control theory, you have a process
with a certain time constant, but your observations of that process have
a significantly longer time constant.  By attempting to use those
observations to control the process, you are likely to introduce
oscillatory or chaotic behavior.

 -- Speaking strictly for myself,
 --   Lee Derbenwick, AT&T Bell Laboratories, Warren, NJ
 --   lfd@cbnewsm.ATT.COM  or  <wherever>!att!cbnewsm!lfd

mcgregor@hemlock.Atherton.COM (Scott McGregor) (04/18/91)

In article <1991Apr15.200458.11331@dg-rtp.dg.com>,
cole@farmhand.rtp.dg.com (Bill Cole) writes:

> 1. What's a defect?  Is it a mis-spelling?  An annoying placement of a 
> message?  A mis-mapped message?  Is a corruption of a database?  Is it
> a crashed system?  The defect-is-a-defect-is-a-defect mentality doesn't
> reflect the way we view products -- software or not.

A defect is something that matters, and it isn't in the way that is beneficial
to you. It is something that if removed, fixed or avoided, would avoid
problems or bring benefits.  This is very subjective--does a mis-spelling
matter? If is matters it is a defect.  If it doesn't it isn't.  Whos
opinion counts? Obviously--the people who get to make the decisions.  It
could be you, it could be a senior manager, it could be a customer.  It
could be all of you.

> 2. How do we measure productivity in rev n+1 software?  That is, we
> seldom deal with clean-sheet projects; we spend most of our lives refining
> what we did before, maybe adding features or functionality.  But we don't
> get to work on completely new projects for the vast majority of our
> professional lives.  So how do you measure productivity/defects in that
> environment?

Go back to what matters to the subjective decision makers above.

> 3. Who cares what the lines-per-timeframe is? or the defects-per-line
> if the software is delivered per the agreed-upon schedule and with the
> agreed-upon level of quality (as measured by the known bug count of various
> levels of problems)?  Notice that the assumption here is that the schedule
> was 'reasonable' and that the quality level was 'rational' -- and that the
> folks doing the developing agreed to all this.

The people who care what the lines-per-timeframe are the people who want
to be able to do something about it, or the people who need to assure
themselves that competitors will not be any more able to do something
about it than they are. As a manager, I see 2 lines an hour and I know
that there are things going on that are chewing up time, besides keying
in those two lines. This does not mean that the people doing the coding
are bad people.  Maybe that number is consistant with past history.
But WHY does it takes so long?  Some of that time is think time.  That's
good, but if I gave a person more training, would they solve problems
faster?  Might my competitors do just that?  Would a copy of collected
algorithms of the
ACM help? Or am I chewing up too much time with meetings? Meetings that
perhaps customers might not have? 

What does it mean for a schedule to be reasonable?  Does it not mean mainly
that it is line with historical local experiences?   Might a manager
agree with a schedule and quality that was consistant with the past
market requirements, only to be surprised by a competitor that made
a major change?   What was an acceptable price/performance for a workstation
before the HP 700 announcement?  Will it be the same afterwards?  What
was the quality requirements for American cars in the 60s?  After the
improvements in quality by the Japanese and Europeans, can the agreements
of the past continue?

Managers often make judgements with limited information.  They do not
know for sure what the competitors will do, or what the market will do.
They may agree on the reasonableness of a certain schedule, or quality
plan based upon past experience.  If you meet it, then everything is okay
right?  You are a success.  If the quality is too low, or the product
too little, too late, then it isn't your fault right? But will the fact that
the mistakes weren't your fault make you feel better when the project or
company or national industry fails.  Will being a faultless victim feel
good?  (Digression, for ten years I worked for a major hardware vendor.
The last years there I managed a project team.  We delivered EVERY project
on time, or sometimes earlier.  We did several crash projects successfully.
We even helped bail out another teams project effort that had bogged down.
But the company still wasn't successful in that market, and basically
exited the area, displacing the "successful" project team. It didn't
feel good to be a faultless failure."

In other words, it is not sufficient to do things right, you must also
do the right things.  And part of this means doing more than what is being
asked of you, but looking over your shoulder at the competition and using
your head to understand where improvements are possible by yourself, or 
your competitors.

> We can measure and 'quantify' all day, but the bottom line is customer
> satisfaction.  And the perception of quality.

Absolutely correct.  And if you measure and quantify all day you'll never
do the real work to get customer satisfaction.  On the other hand, if
you always put your head down and do things the way they have been done
in the past without questioning how improvements might be made you
can find yourself passed by competitors who make continual improvements.
You must have a balance  If you are merely successful in doing what is
asked of you, but become a "faultless failure" victim in the end.   It
has happened before to many companies and national industries.  The best
way to avoid
this, if avoidance is possible, is to look around you and question why
things are the way they are, and what can change.

Scott McGregor
Atherton Technology
mcgregor@atherton.com

rogers@ficc.ferranti.com (keith rogers) (04/18/91)

Some comments on various postings (all pasted together - sorry!).... 

> If the metrics are bogus, then fix them by including the workers
> in the process.

"Bogus" may be in the eye of the beholder.  Failure to use the
numbers (and this includes being too concerned with the collection
of data, to the expense of analysis) has probably doomed more 
metrics than "bogusness." 

>          ....  Most processes are too informal to measure effectively.  

If the process is too informal to be measured, then either the
process should be formalized, and made measurable, or you should
decide, a priori, to not care too much what the result/output of 
the process will be.  Without some definition of what is going 
on in a process, you cannot expect to control it, or its output.

> Measurement is a useful approach in some circumstances.  However, if
> measurements don't "tell you what to do" about your process they can
> be misleading at best.

Measurements are unlikely to "tell you what to do."  Most "misleading"
is really misinterpretation; i.e., the failure to do the proper analysis, 
and then use the data correctly, perhaps to improve the measurements.

>           ... software engineering is a pseudo-science, right up there
> with, and in the grand style of, say, phrenology.

You have to admit that for all the failed sciences, a few did pan out.
I mean, who was to know whether phrenology was right or wrong at the
time, ludicrous as it seems now.  Invasive surgery probably seemed
a little radical, too, along with others.  And, maybe metrics will
be relegated to the dustbin when it's all said and done.  But, if your
alternative is to toss a specification "over the wall" and say a prayer,
I'd put that in the same class as 19th century rainmaking :-).

> [ ... lots of good stuff deleted ... ]
> We want answers, not more questions.  Since metrics like these raise more
> questions, we don't want them.  In fact, we don't want them so strongly,
> that we ignore useful information in them that would have to be derived.
> [ ... more good stuff deleted ... ]

Yes!

> There are metrics that tell you about your process right now: but in
> most cases, they are very crude approximations, or they are based on
> assumptions that are little more than guesswork.  If you change your
> process based on them, you may be as likely to worsen as to improve it.

Right.  But if you are changing your process without understanding it,
or knowing whether the variation you have measured is due to special
(i.e., non-process related) or common causes, then you deserve what you get.

> 1.            .... The defect-is-a-defect-is-a-defect mentality doesn't
> reflect the way we view products -- software or not.

Who's "we?"  (I believe defects are usually classified by level of severity.) 

> 2. How do we measure productivity in rev n+1 software?

Productivity is difficult, but it's very important.  If you bid a project 
that involves modifications to your base product, you need to be able to 
estimate how long the work will take.  Then you need to measure, somehow, 
the amount of work actually done, so that when you ultimately find out 
that your estimate wasn't [very good], you have some likelihood of doing 
a better job on your next estimate, before you go out of business.  

> 3. Who cares what the lines-per-timeframe is? or the defects-per-line
> if the software is delivered per the agreed-upon schedule and with the
> agreed-upon level of quality ...

Who cares?  The banks, for one.  If you have to hire twice as many people
to make that agreed upon schedule, and your bottom line starts showing
up in red, you won't be around to sell another high-quality product.

> We can measure and 'quantify' all day, but the bottom line is customer
> satisfaction.  And the perception of quality.

The bottom line is measured in $.  Customer satisfaction is paramount,
of course, but you can't do it for long on a non-positive cash-flow.
-----
Keith Rogers (rogers@ficc.ferranti.com)
(standard disclaimer)

alan@tivoli.UUCP (Alan R. Weiss) (04/19/91)

In article <1991Apr15.221119.6242@cbnewsm.att.com> lfd@cbnewsm.att.com (Lee Derbenwick) writes:
>>In article <581@tivoli.UUCP>, alan@tivoli.UUCP (Alan R. Weiss) writes:

>> While I believe that this *IS* an open area of research, as you say,
>> I TRULY believe that people-who-are-doing-the-work ("workers")
>> MUST be involved in setting their own standards.  How are they to
>> know?  Management outlines the broad parameters of the problem
>> and/or improvement goals, and then provides leadership.  [ ... ]
>
>This begs the question.  Of course the workers must be involved in
>setting the standards.

Good.  Then we agree on something.

>But, right now, we know of _no_ metrics that
>measure some of what we really want to measure.

You are too vague here, so I'll give you an idea of just *some*
of the things we are measuring:

	Gilb Inspection Process Metrics
	--------------------------------

	# of Sev 1 defects found during inspections
	# of Sev 2 defects found during inspections
	# of Sev 3 defects found during inspections
	# of Sev 4 defects found during inspections
	Total of all defects logged
	Total minutes spent in Defect Logging Meeting
	Total Defects Logged/Minute
	Total Pages Inspected 
	% Pages Inspected
	Total Defects/Page
	Average Defects/Page
	Pages/Hour
	Total Reported Minutes Spent Preparing For Inspections
	Total Cost of Preparation
	Cumulative Total Time Spent in Inspection Process
	Cumulative Total Cost of Inspection Process
	Average Cost to Fix A Defect During Test Phase
	Total SAVINGS (estimated) Due To Inspections
	Cost of Quality

	Test Execution Metrics
	----------------------

	Total Test Cases Generated (automatically)
	Total Test Assertions Generated (automatically)
	% Test Cases to Specifications (Function Points)
	Approximate % Test Coverage
	Total Test Assertions Executed
	Total Test Assertions Succeeded (PASSED)
	Total Test Assertions Failed (FAILED)
	Total Test Assertions Blocked (BLOCKED)
	% Sucess to Total
	Total Sev 1 Defects Found in Testing
	Total Sev 2 Defects Found in Testing
	Total Sev 3 Defects Found in Testing
	Total Sev 4 Defects Found in Testing
	Performance Measurements
	etc. etc. etc.

	Source Code Analysis Metrics
	----------------------------

	Relative Cyclomatic Code Complexity
	Total Lines of Code
	Defects per KLOC's
	Defect per Function Point

	Schedule Metrics
	----------------

	Time to Code
	Time to Build
	Time to Integration Test
	Time to Functional Test (Component/LPP Test)
	Time to System Test
	Number of schedule slips (if any)
	etc. etc. etc.

Now then, Lee, I fail to see how you could *possibly* say that
"we" can't measure software, because "we" are doing it every
single day.  Please note that not all of these measurements
have equal weight, and different metrics have different audiences
and purposes.  This is where management leadership comes into play.
NO WHERE are these metrics used to beat people up.  They are used
to improve both the product and the process, and are used by
individuals interested in improving their own performance.
Managers only get to see aggregates, not individual measurements.

Whether its called "metrics", or "measurements", or just simply
"running the numbers", we like to use BOTH halves of our brains:
the creative half, and the logical half. 

(Thanks to Tom Gilb, wherever the hell you are, for your work on
Inspections.  Thanks also to Michael Fagan, William Howden, Boris
Belzer, Kerry Kimbrough, Robert Choate, C.A.R. Hoare, Tony
DeMarco, Ed Yourdon, and literally thousands of other people who are
building Software Science.  Also, thanks to my agent, my mom .... :-) 

>No amount of
>management leadership is going to allow anyone to create those
>metrics.  But enough management pressure will force people to create
>bogus ones.

Balderdash :-)  I guess you just have had unfortunate experiences.
Management creates the space that allows for a process that
results in metrics.  If that sounds too "California-speak", then
forgive me:  I *came* from California.  Talk to Tom Peters
and Ken Blanchard if you don't dig this.  And maybe W. Ed Demings,
Phil Crosby, and J.M. Juran, too.

>
>> >Here are a couple of quality metrics I would _like_: they seem to capture
>> >two key areas of software quality -- faults and maintainability.  Both,
>> >unfortunately, violate causality:
>> >
>> >1. Total number, severity, and time-to-discovery of remaining faults that
>> >   will be experienced by customers as software failures.
>> >
>> >2. Cost of introducing the next several enhancements that will be required
>> >   of this code.
>> 
>> The first one should be "mom and apple pie" for all software orgs.
>> The second one is basic risk analysis, and its good, too.
>
>I'm afraid Alan missed the tense in #1.  It is (reasonably) easy to
>measure what _did_ happen: but that tells you what your software
>process was doing one or two years ago.

Yeah yeah, except that we do this kind of stuff daily and weekly
and monthly and yearly, and we don't wait.  Besides, we can
calculate the Escape Rate by knowing the incoming rate, the find
rate, the fix rate, and the code coverage rate.  Sure, its just
an approximation, but so what?  Management is NEVER precise,
because PEOPLE aren't, and software is a People-business.

>And if you haven't changed
>your process at all in that time, you haven't been paying attention
>even to qualitative information.  To improve your process _now_, you
>need to know what faults your customers will experience as future
>failures.  So it's far from "Mom and Apple Pie."

No, I believe that if you are still doing things algorithmically
rather than constantly adjusting your process to improve quality, 
you don't deserve to live economically.  Which is exactly what
Tom Peters has been preaching for about 10 years.

>(In cases where you can accurately characterize your customers' usage
>patterns, John Musa's work on software reliability gives you ways of
>estimating MTBF for customers, which is close to what I want.  But
>with new or significantly changed software, you often have no better
>than guesses about the usage.)

Could you please send me some information on John Musa's work?
Sounds useful.  Right now, we go to customer sites, sit down with
them, learn their business, and try to learn their work habits.

>The second metric also requires prediction of the future.  Measures
>of structure, etc., can be useful at a crude level, but they can't
>capture "What is the chance that the customer will decide they
>desperately need something that violates a fundamental assumption of
>this code, so that it will have to be totally rewritten?"  And I don't
>know of any good way to measure fundamental assumptions.  Again, you
>can find out later whether you had assumed too much -- but that tells
>you about your process in the past.  "Basic risk analysis" isn't even
>close, though it _is_ a first step in that direction.

Hey, I haven't a clue.  We manage this process by getting as close
to our customers as possible, by trying to learn their business,
and by creating a business relationship.  But, if a customer changes
their minds, they're the customer.  We try and stay light on our
feet enough to either modify the current product or come out with
a follow-on fast.  If you think this is just small-company think,
then look at 3M, H-P, etc. 

>or more out of date.  But you can make significant process changes
>within a few months.  In terms of control theory, you have a process
>with a certain time constant, but your observations of that process have
>a significantly longer time constant.  By attempting to use those
>observations to control the process, you are likely to introduce
>oscillatory or chaotic behavior.
>
> -- Speaking strictly for myself,
> --   Lee Derbenwick, AT&T Bell Laboratories, Warren, NJ
> --   lfd@cbnewsm.ATT.COM  or  <wherever>!att!cbnewsm!lfd

Continuous quality improvement, Lee.  Say it like a mantra,
or we'll all be learning to speak Japanese Real Soon Now. :-}

Companies are NOT like market systems, in which changes to the
process (like market intrusion) causes increases in the oscillations.
This may be true for the Federal Reserve Board, but falls apart
for firms.  The reason:  communication of information can be
speeded-up (velocity of price information, as it were).

Besides, what you are advocating is to just leave well enough
alone.  It appeals to my libertarian nature, Lee, but the
manager in me sees disaster.  And hey, what DO you advocate?

_______________________________________________________________________
Alan R. Weiss                           TIVOLI Systems, Inc.
E-mail: alan@tivoli.com                 6034 West Courtyard Drive,
E-mail: alan@whitney.tivoli.com	        Suite 210
Voice : (512) 794-9070                  Austin, Texas USA  78730
Fax   : (512) 794-0623
_______________________________________________________________________

cole@farmhand.rtp.dg.com (Bill Cole) (04/20/91)

|> > 1. What's a defect?  Is it a mis-spelling?  An annoying placement of a 
|> > message?  A mis-mapped message?  Is a corruption of a database?  Is it
|> > a crashed system?  The defect-is-a-defect-is-a-defect mentality doesn't
|> > reflect the way we view products -- software or not.
|> 
|> A defect is something that matters, and it isn't in the way that is beneficial
|> to you. It is something that if removed, fixed or avoided, would avoid
|> problems or bring benefits.  This is very subjective--does a mis-spelling
|> matter? If is matters it is a defect.  If it doesn't it isn't.  Whos
|> opinion counts? Obviously--the people who get to make the decisions.  It
|> could be you, it could be a senior manager, it could be a customer.  It
|> could be all of you.

My response was directed at the academic folks who use the term 'defects'
without differentiation.  I agree with you in general.  I'd add that a
defect to one is not necessarily a defect to all; and what was a nit can
become a major malfunction if it comes at the end of a chain of problems
or other defects.

|> 
|> > 2. How do we measure productivity in rev n+1 software?  That is, we
|> > seldom deal with clean-sheet projects; we spend most of our lives refining
|> > what we did before, maybe adding features or functionality.  But we don't
|> > get to work on completely new projects for the vast majority of our
|> > professional lives.  So how do you measure productivity/defects in that
|> > environment?
|> 
|> Go back to what matters to the subjective decision makers above.

I don't understand your response.  The two don't seem to map together.

|> 
|> > 3. Who cares what the lines-per-timeframe is? or the defects-per-line
|> > if the software is delivered per the agreed-upon schedule and with the
|> > agreed-upon level of quality (as measured by the known bug count of various
|> > levels of problems)?  Notice that the assumption here is that the schedule
|> > was 'reasonable' and that the quality level was 'rational' -- and that the
|> > folks doing the developing agreed to all this.
|> 
|> The people who care what the lines-per-timeframe are the people who want
|> to be able to do something about it, or the people who need to assure
|> themselves that competitors will not be any more able to do something
|> about it than they are. As a manager, I see 2 lines an hour and I know
|> that there are things going on that are chewing up time, besides keying
|> in those two lines. This does not mean that the people doing the coding
|> are bad people.  Maybe that number is consistant with past history.
|> But WHY does it takes so long?  Some of that time is think time.  That's
|> good, but if I gave a person more training, would they solve problems
|> faster?  Might my competitors do just that?  Would a copy of collected
|> algorithms of the
|> ACM help? Or am I chewing up too much time with meetings? Meetings that
|> perhaps customers might not have? 

We should be on search for 'fat' in what we do and continue to question the
process.  Other examples: Will taking the group out for a day/week as a
reward help the process?  Are the right people assigned in the right places?
Could we adjust the hours and get some overlap?

|> What does it mean for a schedule to be reasonable?  Does it not mean mainly
|> that it is line with historical local experiences?

For me, a schedule is reasonable if (1) we're getting to the marketplace in a
rational time with a product that has the majority of the features needed, and
(2) if the folks doing the job agree that they can get it done in that time.
If I impose my managerial will and dictate a schedule which is simply not
within the realm of possibility, then I'll probably fail.  My feel for the work
as a former technician should give me an indication of whether or not the
productivity makes sense.

|> Managers often make judgements with limited information.  They do not
|> know for sure what the competitors will do, or what the market will do.
|> They may agree on the reasonableness of a certain schedule, or quality
|> plan based upon past experience.  If you meet it, then everything is okay
|> right?  You are a success.  If the quality is too low, or the product
|> too little, too late, then it isn't your fault right? But will the fact that
|> the mistakes weren't your fault make you feel better when the project or
|> company or national industry fails.  Will being a faultless victim feel
|> good?  

Yup.  In fact, there's an extension to this idea which says that it may not
be a good idea to be the first one in the marketplace; that way you can
learn what's important -- and not important -- and move quickly to cash in
on your competitor's mistakes or deficiencies.

|> In other words, it is not sufficient to do things right, you must also
|> do the right things.  And part of this means doing more than what is being
|> asked of you, but looking over your shoulder at the competition and using
|> your head to understand where improvements are possible by yourself, or 
|> your competitors.

George Patton said that you should never believe that your enemy can't do
the same things you can do.

Sorry to include so much of your reply, but it didn't seem fair not to.
/Bill
/

lfd@cbnewsm.att.com (Lee Derbenwick) (04/22/91)

In article <599@tivoli.UUCP>, alan@tivoli.UUCP (Alan R. Weiss) writes:
> In article <1991Apr15.221119.6242@cbnewsm.att.com> lfd@cbnewsm.att.com (Lee Derbenwick) writes:
> >But, right now, we know of _no_ metrics that
> >measure some of what we really want to measure.
> 
> You are too vague here, so I'll give you an idea of just *some*
> of the things we are measuring:
> 	
[ long list deleted ]

Yes, you are measuring lots of things.  Do you know what they mean,
or are you measuring them because they are things you can measure?

To pull one out in isolation, it's not clear to me whether you want
to increase or decrease "# of Sev 3 defects found during inspections."
An increase could mean your coding is getting sloppier, or that your
inspections are getting more effective.  (If your coding is getting
enough sloppier, your inspections could even be getting _less_
effective.)  "No change" might mean that your coding and inspections
are both getting better, or it might mean that they're both getting
worse.  And you won't know which till you have enough feedback from
system test and from customers: maybe months or more from now.

Also, with so many things to measure, you need to do _something_ to
condense them down to a relatively few composite metrics to use in
optimizing your process.  You may well be doing this already...
(Indeed, I must assume you are.)

And you have at least one non-metric that you list as a metric:

> 	Total SAVINGS (estimated) Due To Inspections

This is not measureable -- you even note that it is an estimate; at
worst, it's purely subjective; at best it's derived statistically from
other metrics, in which case it's not a metric in its own right.

> Now then, Lee, I fail to see how you could *possibly* say that
> "we" can't measure software, because "we" are doing it every
> single day.  [ ... ]

You are collecting lots of data.  Most of these metrics have ambiguous
interpretations.  The conversion of this data into information is very
much an open area.

> >(In cases where you can accurately characterize your customers' usage
> >patterns, John Musa's work on software reliability gives you ways of
> >estimating MTBF for customers, which is close to what I want.  But
> >with new or significantly changed software, you often have no better
> >than guesses about the usage.)
> 
> Could you please send me some information on John Musa's work?
> Sounds useful.  Right now, we go to customer sites, sit down with
> them, learn their business, and try to learn their work habits.

John D. Musa, Anthony Iannino, Kazuhira Okumoto,
_Software Reliability: Measurement, Prediction, Application_
McGraw-Hill, 1990.

To make best use of it, you have to know the actual statistics of
customer inputs.  (A reasonable approximation will do, but even that
is often not available.)

> Besides, what you are advocating is to just leave well enough
> alone.  It appeals to my libertarian nature, Lee, but the
> manager in me sees disaster.  And hey, what DO you advocate?

I advocate working on metrics, but I recognize that at the moment we
are more capable of collecting large amounts of data than we are at
understanding what those data really mean.  And I _don't_ advocate
waiting until we have fully reliable metrics before making changes.  I
advocate using an out-of-fashion concept called engineering judgment
to work on continuous improvement _now_, making use of what we can
measure, but not treating it as a religion (or as the science it isn't,
yet), while we learn how to measure more meaningfully.

 -- Speaking strictly for myself,
 --   Lee Derbenwick, AT&T Bell Laboratories, Warren, NJ
 --   lfd@cbnewsm.ATT.COM  or  <wherever>!att!cbnewsm!lfd

mcgregor@hemlock.Atherton.COM (Scott McGregor) (04/23/91)

In article <1991Apr19.203033.2151@dg-rtp.dg.com>,
cole@farmhand.rtp.dg.com (Bill Cole) writes:

|> > 2. How do we measure productivity in rev n+1 software?  That is, we
|> > seldom deal with clean-sheet projects; we spend most of our lives refining
|> > what we did before, maybe adding features or functionality.  But we don't
|> > get to work on completely new projects for the vast majority of our
|> > professional lives.  So how do you measure productivity/defects in that
|> > environment?

(Bill quoting me:)
|> Go back to what matters to the subjective decision makers above.

>I don't understand your response.  The two don't seem to map together.

What I was trying to get at is that the way you measure productivity and
defects in a constant revision process is that you measure "things that
mattered" per unit time.  Things that mattered in a bad way measure
defects per unit time.  Things that mattered in a good way (e.g.
functionality delivered) per unit time measure productivity.

One of the problems is that "things that matter in a good way" are often
only fuzzily defined.  Moreover, two things of equal value may not
require equal effort.  That's unfortunate, and you do what you can with
such fuzzy results.  Obviously, high levels of precision are not possible
in this case--but high levels are not necessarily required.  In my
experience the differences can vary between (by analogy) a football field and a
ten yard mark.  You don't have to measure with a micrometer to find
which size you are closer to. A car-length isn't a perfect yardstick,
but it might do in
a pinch when a car is close at hand.  Similarly for metrics. 
Functionality is not one to one with lines of code, but for  similar
people doing the same sort of changes to an existing application, lines
of code may give more or less
associated with functionality delivered.    You'd be able to tell if
you had more like a football field's difference or more like a ten yard
mark difference. It would be stupid to muddle about a few percent
difference in productivity by that measure--not simply because the
association between functionality and code may not be that close, but
also because the measures of functionality are not that tight anyway.
It is as silly to argue about a few inches when using the car-length measure.

This doesn't invalidate the metric.  It qualifies it.  For all the science
and engineering background, there are many problems where we just don't
need ultimate precision to have value.  A  civil engineer might well
measure a large jobs in numbers of cement truck loads, or dump trucks of
fill, or tons of steel rods, none of them absolutely precise measures,
but each fully sufficient for charaterizing the work enough for rule
of thumb metrics.  Why should it be any different in software development?

Scott McGregor                Don't expect too much of numbers, but don't
Atherton Technology           ignore the little you do get.
mcgregor@atherton.com

cole@farmhand.rtp.dg.com (Bill Cole) (04/24/91)

|> > Measurement is a useful approach in some circumstances.  However, if
|> > measurements don't "tell you what to do" about your process they can
|> > be misleading at best.
|> 
|> Measurements are unlikely to "tell you what to do."  Most "misleading"
|> is really misinterpretation; i.e., the failure to do the proper analysis, 
|> and then use the data correctly, perhaps to improve the measurements.

The danger is that we measure what's easy to measure and neglect the 
difficult things simply because they are difficult.  Or we choose to measure
based on what we expect the conclusions to be (i.e., a method used widely in
the hard sciences).
  
|> > 1.            .... The defect-is-a-defect-is-a-defect mentality doesn't
|> > reflect the way we view products -- software or not.
|> 
|> Who's "we?"  (I believe defects are usually classified by level of severity.) 

The comment was aimed at the academics who use the word 'defect' very loosely;
you and I are in agreement -- as are a mess o' people who responded.

|> > 3. Who cares what the lines-per-timeframe is? or the defects-per-line
|> > if the software is delivered per the agreed-upon schedule and with the
|> > agreed-upon level of quality ...
|> 
|> Who cares?  The banks, for one.  If you have to hire twice as many people
|> to make that agreed upon schedule, and your bottom line starts showing
|> up in red, you won't be around to sell another high-quality product.

This is a point of view I had not considered, but it is certainly valid.  My
'out' is the 'reasonable phrase.  We make assumptions as managers that our
folks can produce software at some sustainable rate with occasional ventures
into hyper-sustainable and we gauge that productivity on experience.  The
best way I've found to be 'reasonable' is to involve the folks building the
software in the process so that the schedule is in some large measure their
doing and not something imposed purely by 'management'.  But I do agree with
your point.

/Bill
Disclaimer on file...........

alan@tivoli.UUCP (Alan R. Weiss) (04/24/91)

In article <1991Apr21.181153.17062@cbnewsm.att.com> lfd@cbnewsm.att.com (Lee Derbenwick) writes:

>[ long list deleted ]

I would have liked to have had your feedback on the long list I
posted.

>Yes, you are measuring lots of things.  Do you know what they mean,
>or are you measuring them because they are things you can measure?

At this stage in our development as a company we are measuring for
a number of reasons, including (but not exclusively) the need to
establish some baselines.  Yes, I know what they mean.  There are
many more things that we do NOT measure, too long to post here.

>To pull one out in isolation, it's not clear to me whether you want
>to increase or decrease "# of Sev 3 defects found during inspections."
>An increase could mean your coding is getting sloppier, or that your
>inspections are getting more effective.  (If your coding is getting
>enough sloppier, your inspections could even be getting _less_
>effective.)  "No change" might mean that your coding and inspections
>are both getting better, or it might mean that they're both getting
>worse.  And you won't know which till you have enough feedback from
>system test and from customers: maybe months or more from now.

Correct.  Right now, we are measuring to build up our empirical
data and to start to form our heuristics.  In point of fact, though,
defects logged per minute tends to look like a bell curve:  you
start by not getting very many, then you get good at inspections,
then the number of actual defects decreases as the rework is
accomplished PER deliverable.  You are correct that these measurements
will be VERY useful in the NEXT release of this product, but since
we are on a VERY rapid development pace this will happen soon anyway. 

>Also, with so many things to measure, you need to do _something_ to
>condense them down to a relatively few composite metrics to use in
>optimizing your process.  You may well be doing this already...
>(Indeed, I must assume you are.)

Yes indeed, eventually.  Right now we're still learning which
levers and dials (engine room metaphor) are important in our
environment.

>And you have at least one non-metric that you list as a metric:
>
>> 	Total SAVINGS (estimated) Due To Inspections
>
>This is not measureable -- you even note that it is an estimate; at
>worst, it's purely subjective; at best it's derived statistically from
>other metrics, in which case it's not a metric in its own right.

First, the fact that it is an estimate does not mean it is not
measurable, only that it has a relevant range.  It IS derived
statistically both from other metrics AND using other estimates
and assumptions.  We are performing some sensitivity analysis
now to encourage a step-wise refinement (specifically on
Cost-to-Fix, Cost-Per-Defect-If-Escaped, etc).  Some of this
is proprietary technology (sigh ... setenv COP-OUT).  Mostly
we don't want our competitors to know our productivity rates
(only our customers!).

>You are collecting lots of data.  Most of these metrics have ambiguous
>interpretations.  The conversion of this data into information is very
>much an open area.

Yes, and as I stated we don't beat people up over them.  We encourage
the developers to look at them, throw spears at them, and help
refine them.  They are posted on a big whiteboard in the kitchen,
with lots of space for flames and comments.  By doing so, we help
focus their thoughts for some amount of time on Quality.  All
quality studies I've seen suggest that this is A Good Thing.

>John D. Musa, Anthony Iannino, Kazuhira Okumoto,
>_Software Reliability: Measurement, Prediction, Application_
>McGraw-Hill, 1990.

Thank you!

>To make best use of it, you have to know the actual statistics of
>customer inputs.  (A reasonable approximation will do, but even that
>is often not available.)

Yes, we do this, too.  We have been going to customer sites,
soliciting feedback, filming (well, taping) customer responses
to our graphical user interface environment (kinda like the
Mazda Kansei idea), etc.  

We're not wiring customers up to feedback machines yet (EEG, EKG,
galvanic skin response, etc.) ... dunno, think this is a good idea? :-)
Kinda violates their privacy, eh>?

>I advocate working on metrics, but I recognize that at the moment we
>are more capable of collecting large amounts of data than we are at
>understanding what those data really mean.  And I _don't_ advocate
>waiting until we have fully reliable metrics before making changes.  I
>advocate using an out-of-fashion concept called engineering judgment
>to work on continuous improvement _now_, making use of what we can
>measure, but not treating it as a religion (or as the science it isn't,
>yet), while we learn how to measure more meaningfully.
>
> -- Speaking strictly for myself,
> --   Lee Derbenwick, AT&T Bell Laboratories, Warren, NJ
> --   lfd@cbnewsm.ATT.COM  or  <wherever>!att!cbnewsm!lfd

OK, I can dig that.  So, in order to reach a stage where the
measurements are more meaningful, we are actually doing The
Right Thing by gathering the data and stepwise refining it into
nuggets of information.  We use LOTS of engineering judgement:
this IS a startup company, after all!

_______________________________________________________________________
Alan R. Weiss                           TIVOLI Systems, Inc.
E-mail: alan@tivoli.com                 6034 West Courtyard Drive,
E-mail: alan@whitney.tivoli.com	        Suite 210
Voice : (512) 794-9070                  Austin, Texas USA  78730
Fax   : (512) 794-0623
_______________________________________________________________________

bwf@cbnewsc.att.com (bernard.w.fecht) (04/24/91)

In article <1991Apr21.181153.17062@cbnewsm.att.com> lfd@cbnewsm.att.com (Lee Derbenwick) writes:
>I advocate working on metrics, but I recognize that at the moment we
>are more capable of collecting large amounts of data than we are at
>understanding what those data really mean.  And I _don't_ advocate
>waiting until we have fully reliable metrics before making changes.  I
>advocate using an out-of-fashion concept called engineering judgment
>to work on continuous improvement _now_, making use of what we can
>measure, but not treating it as a religion (or as the science it isn't,
>yet), while we learn how to measure more meaningfully.
>

Hmmm (forgive the possibly obvious, but enthusiastic revelation here),
we should make an effort to decouple process improvement and process
measurement -- "black-box metrics" in a sense.

Too often, the "process people" invent their own metrics and know too
much about them.  Its very tempting then, in errant or not, to chase
the metrics instead of the process and things become "metrics improvement"
exercise.