hlavaty@CRVAX.Sri.Com (05/22/91)
I guess the best way to illustrate my position on metrics (previously discussed in several of the Re:bridge buildings...) is to use an example. Kers. Caravan: sorry if my original response to your post was "flamelike" - it was not intended to be. From your repsonse, I think you missed my point entirely. Anyway... Several years ago, my job as a customer representative was to ensure that the software developments currently underway were progressing smoothly (i.e. the customers money was being spent correctly). By progressing smoothly I mean the advertised schedules were indeed accurate, and no "short-term" gains were being taken in place of "long-term" overall health (such as skimping on unit testing to keep to the schedule, for now). This was a cost-plus contract (which means the customer pays for overruns) so the customer was very inetersted in looking over the developers shoulder (bitter experience in the past has taught them this hard lesson). I decided that we would have a monthly status meeting where I would meet with the managers of the different software projects and talk with each of them for about an hour on how their efforts were going. At the first meeting, all of the discussions were what I call qualitative (and yes, I have confused qualitative with subective. I welcome discussion on how they're different). Each manager showed his schedule chart (the schedule that they themselves managed to - it was not my format. I wanted the *real* story), and proceeded to discuss why their development was progressing just fine. The only thing I had to go on to agree with them was their qualitative opinion (and since it wasn't necessarrily my opinion, I considered it subjective - meaning it was their opinion and I didn't see any evidience other than their experience/feelings/motivations that led me to believe this to be true). Now, after this meeting I had to go to the customer and tell them what I thought of the development efforts. I quickly realized I had a problem, since I was very uncomfortable telling them what I thought; it was really only what the software managers thought (and the customer already new that!). What to do...What to do... I seized on metrics as the answer to my problems. If I could get the managers to show me various metric data, then *I* could interpret it my own way and make my opinions! This is what I was referring to when I previously posted about a "common reference point". Before, since I was not part of the software development, I had no common reference point. I had to accept what the managers were telling me at face value (and my customers experience has shown this to be dangerous in a cost-plus contract). Once we were using metrics, at least we could have some numbers to discuss. So, here is the set of metrics I got them to start presenting: 1) A graph of unit test cases complete over time. Both projected and completed test cases were plotted. 2) The same graph for integration test cases 3) The number of errors detected, fixed, and still open each month 4) The location of these errors 5) SLOC coded, integrated, and projected each month Why did I pick these? I acquired information from the industry that talked about what these metrics had meant on OTHER programs done by other companies. I therefore had other data to compare the contractors results to, and deteremine for myself if it compared favorably to what the contractors were saying (things were on schedule and fine). None of these metrics were in use by the software managers at the time. Well, several interesting things happened... After several months, the unit/integration test case graphs clearly showed that the existing schedule for one of the developments was not accurate. The rate at which the development was accomplishing these was not matching the projected this metric, say by an experienced manager following his own qualitative opinion? Yes, it could have. But the manager in question *didn't* catch it. He didn't realize his problem untill confronted with the question "Since your demonstrated rate of testing is not matching your original projection, how are you going to finish on time?" A big advantage gained using metrics is that it gave me a window into the development process through which I could look to determine if things were in control. With the metric charts, now my questions would be "Gee, you had a large error spike last month. What caused that?" or "You didn't get your normal amount of unit testing in last month. Why was that?" Suddenly we were discussing *real* issues. The answers to these questions invariably led to the identification of specific problems the manager was having (not enough computers, someone quit unexpectedly, and the occassional "that's a problem we don't know how to solve just yet"). Now I had what I needed - a window into the process that communicated to me enough information to ask the right questions. After a while, the chief software manager at the company and the program manager started using these charts themselves for their own purposes. They realized that they couldn't afford to miss this information! other million things your job entails. Once you have found a metric that indicates future success, by measuring it you are encouraging the development to conform to practices that have proven successfull in the past. You also have a common framework from which to ask questions and also compare your development to others that have also used the metric. I hope this illustrates my point more concretely. Jim Hlavaty
kers@hplb.hpl.hp.com (Chris Dollin) (05/23/91)
hlavaty@CRVAX.Sri.Com says: ... Kers. Caravan: sorry if my original response to your post was "flamelike" - it was not intended to be. From your repsonse, I think you missed my point entirely. Mine was not the original post - you may have been misled by my comment about ``peculiar''. And I don't think I missed your point. As it stated, my response was not intended to criticise the use of *metrics* - it was to point out (what I regarded as) weak points in your *case*. Unless you can address these issues, you are vulnerable to attack from the anti-metrics brigade. Annecdotes are not good enough as evidence. ... (and yes, I have confused qualitative with subective. I welcome discussion on how they're different). Quantatative ~= with numbers. Qualitative ~= without numbers. Subjective ~= private, local, person-dependant. Objective ~= public, ``real''. [I'm *not* attempting to give a precise definition here, and I *know* subjective doesn't equate to unreal. I'm attempting to compare and contrast. ``I have a strength 5 headache'' is quantative and subjective. ``My briefcase is heavier than my mug'' is qualitative and objective. ``McKillip writes better fantasy than Anthony'' is qualitative and subjective [*1]. ``This window is 80 characters with in font hp8.6x13'' is quantitive and objective. I seized on metrics as the answer to my problems. If I could get the managers to show me various metric data, then *I* could interpret it my own way and make my opinions! But why should your opinions be any better than theirs? A big advantage gained using metrics is that it gave me a window into the development process through which I could look to determine if things were in control. With the metric charts, now my questions would be "Gee, you had a large error spike last month. What caused that?" or "You didn't get your normal amount of unit testing in last month. Why was that?" Suddenly we were discussing *real* issues. The only justification you have presented for the act of faith that a ``large error spike'' is a ``real'' issue is industrial studies data [omitted above]. Go for it! *That* is the thing that makes metrics important - *data exists that show that they work*. It's not because their ``quantative'' (but being ``objective'' helps). The answers to these questions invariably led to the identification of specific problems the manager was having (not enough computers, someone quit unexpectedly, and the occassional "that's a problem we don't know how to solve just yet"). Now I had what I needed - a window into the process that communicated to me enough information to ask the right questions. Great. Again, it's not the *numbers*, it's what they *mean*, and the existance of a communications process. After a while, the chief software manager at the company and the program manager started using these charts themselves for their own purposes. They realized that they couldn't afford to miss this information! And that is data worth having. I hope this illustrates my point more concretely. And I hope my remarks help you to present you case better when you get the next bunch of people to convince. Number's aren't real, they're complex - use with caution; don't place the numbers above their meaning. [*1] Some people don't believe the statement, some people don't think it's subjective, but I'd be surprised if anyone thought it was quantative. -- Regards, Kers. | "You're better off not dreaming of the things to come; Caravan: | Dreams are always ending far too soon."
jls@netcom.COM (Jim Showalter) (05/24/91)
>> I seized on metrics as the answer to my problems. If I could get the >> managers to show me various metric data, then *I* could interpret it my >> own way and make my opinions! >But why should your opinions be any better than theirs? His weren't necessarily better. However, what he determined was that the managers didn't have any data to back up their own opinions. Thus, at the outset of the exercise, prior to the introduction of any metrics, all opinions were of equivalent validity/nonvalidity. It was the introduction of the metrics--and ONLY that--that provided a framework in which opinions could be tested against real measures. Until the metrics were introduced, "Yeah, we're on schedule" was it as far as insight into the project was concerned. This went away very quickly when it was possible to say "Well, you say we're on schedule, but according to this chart we're not". -- **************** JIM SHOWALTER, jls@netcom.com, (408) 243-0630 **************** *Proven solutions to software problems. Consulting and training on all aspects* *of software development. Management/process/methodology. Architecture/design/* *reuse. Quality/productivity. Risk reduction. EFFECTIVE OO usage. Ada/C++. *
kers@hplb.hpl.hp.com (Chris Dollin) (05/24/91)
Jim Showalter makes a remark on one of my responses to ... duh, lost their name: [a. n. other] >> I seized on metrics as the answer to my problems. If I could get the >> managers to show me various metric data, then *I* could interpret it >> my own way and make my opinions! [me] >But why should your opinions be any better than theirs? [Jim] His weren't necessarily better. However, what he determined was that the managers didn't have any data to back up their own opinions. Thus, at the outset of the exercise, prior to the introduction of any metrics, all opinions were of equivalent validity/nonvalidity. It was the introduction of the metrics--and ONLY that--that provided a framework in which opinions could be tested against real measures. The point that worries me is this assertion that the metrics are ``real measures''. [I'm happy to believe that they are; it's just that I think they need justification. As earlier posts of mine have said, just because they're *numbers* doesn't make them *meaningful*.] Suppose I proposed the following metrics: * the number of cups (or cans) of beverage consumed per developer per day. * number of charaters of output generated (on paper, that is) per day. * hours spent on aerobics * numbers of relevant papers photocopied per week * average number of windows present on the screen at once Are these ``real measures''? A priori, they seem as real as (say) defect density, or uncommented-lines-of-source-code, or compiles-needed-before-no- syntax errors. You may question the relevance of aerobics [*1]. I might argue ``a fit mind in a fit body''. In any case, one of the points of the original poster was that you could shotgun the team with a variety of metrics and keep those that ``worked'' (whatever that means). If we don't understand why they work (for example, suppose that aerobic hours turned out to be a good predictor for project success), then we should say so. If their effectiveness has be determinded by purely empirical means (ie, we have no underlying theory), we should say so. What we should *not* do is call metrics ``real measures'' without pointing to a justification. Let me try and make my point clear. I want metrics to be a Good Thing; I want to be able to make estimates, plan with them, and measure to see if the plan is being kept to. [I'm not too worried about using numbers to do it, although that's traditional; it's easy to attribute to numbers precision, accuracy, and meaning that they do not in fact posess.] But I think we must be clear as to *why* we thing metrics are ``real'' and *why* we think they work; and we must not allow our enthusiasm to blind us to perfectly sensible questions. What metrics do we have on the effectiveness of posting? [*1] Indeed, many do, even outside this context. -- Regards, Kers. | "You're better off not dreaming of the things to come; Caravan: | Dreams are always ending far too soon."
jls@netcom.COM (Jim Showalter) (05/25/91)
>The point that worries me is this assertion that the metrics are ``real >measures''. [I'm happy to believe that they are; it's just that I think they >need justification. As earlier posts of mine have said, just because they're >*numbers* doesn't make them *meaningful*.] We are veering off into semantics and the philosophy of causality. Define what you mean by "meaningful". In a previous post I pointed out that if rutabaga consumption per developer could be shown to be a good predictor of project success, then it was a valid metric with which to make such predictions. The counterargument is absurd: "Yeah, there's a 100% correlation, but we won't use that metric anyway because it's SILLY.". >Are these ``real measures''? A priori, they seem as real as (say) defect >density, or uncommented-lines-of-source-code, or compiles-needed-before-no- >syntax errors. If they can be shown to have a correlation with project success, then, yes, they ARE real measures. What else would you call them? >In any case, one of the points of the original poster was that >you could shotgun the team with a variety of metrics and keep those that >``worked'' (whatever that means). What "worked" means is that you can demonstrate from historical data that Metric #7 is an accurate predictor of project success. >If we don't understand why they work (for example, suppose that aerobic hours >turned out to be a good predictor for project success), then we should say so. >If their effectiveness has be determinded by purely empirical means (ie, we >have no underlying theory), we should say so. What we should *not* do is call >metrics ``real measures'' without pointing to a justification. Smoking is a great predictive metric for lung cancer, even though the precise mechanisms for the entire chain of events leading to the cancer have yet to be fully elucidated. Would you reject smoking as an indicator of lung cancer because you haven't yet proven causality? This is the tack taken by the cancer lobby, but who wants to be on their team? Lack of dietary fiber is a good predictor of colon cancer, and yet they are much further from having a precise explanation of the mechanisms whereby eating nothing but Ho-hos ruin one's bowels. Would you then decide to eat nothing but Ho-hos, just because the precise mechanism is not yet explained? >But I think we must be clear as to >*why* we thing metrics are ``real'' and *why* we think they work; We think they're real because project in trouble tend not to have any metrics, and projects on schedule tend to use metrics. As for WHY we think they work, this is just the causality issue again--I'm personally not particularly concerned with why the metrics work: I'm concerned with them working, and that's about it. Let me make my position clear: I'd practice VOODOO if someone could provide me with evidence that projects using it are more successful than projects not using it. And I would do this without spending ten seconds trying to figure out WHY it worked. -- **************** JIM SHOWALTER, jls@netcom.com, (408) 243-0630 **************** *Proven solutions to software problems. Consulting and training on all aspects* *of software development. Management/process/methodology. Architecture/design/* *reuse. Quality/productivity. Risk reduction. EFFECTIVE OO usage. Ada/C++. *
donm@margot.Eng.Sun.COM (Don Miller) (05/25/91)
In article <KERS.91May24091125@cdollin.hpl.hp.com> kers@hplb.hpl.hp.com (Chris Dollin) writes (in jest, to make a point): > >Suppose I proposed the following metrics: > >* the number of cups (or cans) of beverage consumed per developer per day. > >* number of charaters of output generated (on paper, that is) per day. > >* hours spent on aerobics > >* numbers of relevant papers photocopied per week > >* average number of windows present on the screen at once This reminds me of a funny example of the misapplication of a metric. In the movie, "Monty Python and the Holy Grail", townspeople attempt to apply metrics to determine if a woman is a witch: 1. Witches are made of wood. 2. Wood floats. 3. Ducks float. 4. Therefore, if she weighs as much as a duck, she must be a witch. The townspeople proceed to weigh the woman on an elaborate scale counterweighted with a duck. Of course, this scale indicates that her weight and the ducks are indeed the same. Thus, she is unarguably a witch. I guess this means that regardless of how appropriate a metric appears to be, someone needs to be able to tell when it's misapplied or faulty. It also means that if you have the power and the will, metrics can be powerful tools for manipulation - especially in an unenlightened environment. However, in our enlightened environments the above scenario would never happen. We all know that witches are made of stone. (don't we? :-) -- Don Miller | #include <std.disclaimer> Software Quality Engineering | #define flame_retardant \ Sun Microsystems, Inc. | "I know you are but what am I?" donm@eng.sun.com |
orville@weyrich.UUCP (Orville R. Weyrich) (05/25/91)
In article <1991May24.201741.14138@netcom.COM> jls@netcom.COM (Jim Showalter) writes: >>The point that worries me is this assertion that the metrics are ``real >>measures''. [I'm happy to believe that they are; it's just that I think they >>need justification. As earlier posts of mine have said, just because they're >>*numbers* doesn't make them *meaningful*.] > >We are veering off into semantics and the philosophy of causality. Define >what you mean by "meaningful". In a previous post I pointed out that if >rutabaga consumption per developer could be shown to be a good predictor >of project success, then it was a valid metric with which to make such >predictions. The counterargument is absurd: "Yeah, there's a 100% correlation, >but we won't use that metric anyway because it's SILLY.". >If they can be shown to have a correlation with project success, then, yes, >they ARE real measures. What else would you call them? Yes, if there is a demonstrated correlation, then they can be used as a predictor of success. The danger is that the conclusion might be drawn that if the project seems to slipping, the management should take steps to increase the rutabega consumption. This would only be valid if there were a demonstration of cause and effect. Observation of a cause/effect relationship implies a correlation; the observation of a correlation does not necessarily imply a cause/effect relationship. >We think they're real because project in trouble tend not to have any >metrics, and projects on schedule tend to use metrics. As for WHY we >think they work, this is just the causality issue again--I'm personally >not particularly concerned with why the metrics work: I'm concerned >with them working, and that's about it. Just to be the devil's advocate, can we assume that: 1) Projects in which the managers feel that their subordiantes are not too bright, but are doing the best that they can, and cannot be induced to improve, may be in trouble. 2) Projects with such managers do not use metrics, because they feel that it is no use -- you can't induce the programers to improve. In this situation, it seems to me that the project will be in trouble, and will not be using metrics. Inducing the manager to use metrics will not solve the problem. Inducing the manager to change his/her attitude will. >Let me make my position clear: I'd practice VOODOO if someone could >provide me with evidence that projects using it are more successful >than projects not using it. And I would do this without spending >ten seconds trying to figure out WHY it worked. Be careful here -- it may be that programmers who enjoy rutabegas come from a part of the world where VOODOO is prevalent, and that rutabegas contain some particular nutrient which stimulates programmers to do a particularly good job. If you simply introduce VOODOO into your failing project, you are not likely to get the desired effect, and will have wasted time and effort that would have been better spent encouraging your project team to eat more rutabegas. This is why the FDA is so down on "quack" medicines. I would insist on evidence that using VOODOO *causes* projects to be more successful before committing to using VOODOO [unless you have a boss yelling "I don't care what you do, just do SOMETHING so that I can tell the Board of Directors that the problem has been taken care of." :-)]. I agree that it is not so important to determine how the cause induces the effect. But even, here, it is useful to ponder. If VOODOO is demonstrated to cause better projects, do you need to have the whole shebang, or is it sufficient to toss a dead chicken at the programmers periodically? It certainly is cheaper to pick up a few dead chickens from beside the road than it is to import a witch-doctor [you think that MD's are overpaid? You should check out what even a mediocre witch-doctor gets paid! :-)]. -------------------------------------- ****************************** Orville R. Weyrich, Jr., Ph.D. Certified Systems Professional Internet: orville%weyrich@uunet.uu.net Weyrich Computer Consulting Voice: (602) 391-0821 POB 5782, Scottsdale, AZ 85261 Fax: (602) 391-0023 (Yes! I'm available) -------------------------------------- ******************************
adam@visix.com (05/26/91)
In article <1991May24.201741.14138@netcom.COM>, jls@netcom.COM (Jim Showalter) writes: >We think they're real because project in trouble tend not to have any >metrics, and projects on schedule tend to use metrics. As for WHY we >think they work, this is just the causality issue again--I'm personally >not particularly concerned with why the metrics work: I'm concerned >with them working, and that's about it. This argument in favor of metrics shows precisely what is wrong with non-causal correlations. This is a familiar problem in social science. For example, students at private schools generally perform better. Students in more wealthy neighborhoods do better. Asian students do better. Therefore, to improve your child's performance, send them to private school, or move into a wealthy neighborhood, or make them Asian. This is faulty reasoning. The greatest cause of good performance in school is parents who take an interest in their child's education. If you care enough to spend extra money or move to a better neighborhood for the sake of your child, then your child has the advantage of supportive parents. The actual act of spending money or moving is irrelevant next to the will to do so. If you send your child to private school in order to get rid of him, then he's in trouble. Let me restate the analogy. When a project manager cares enough about his project to try metrics for improving quality, then regardless of the metrics he chooses, the project has an advantage (namely, the manager himself). When a project manager is sick to death of his project and wants to use metrics to save himself some work, then the project is likely to fail. When someone cares about you and your work, you will do better. When someone cares only about the number of lines of code you write, you will do worse. Adam
ftower@ncar.ucar.EDU (Francis Tower) (05/27/91)
Supportive managerial involvement is vital but as a project becomes really huge the managerial span of control places some managers out of direct contact with the worker bees. Then metrics appear because they give a quantitive (if not inaccurate) feel for the project. The key point in my experience is that upper management will almost always demand to see metrics. Since humans have a tendency to avoid looking bad on these metrics, programming professionals will'Game' their approach to the metrics. The best move my firm made was when the new CEO decided that each division would create a list of what was really important to measure based on their unique mission and their customers. The parent division approved the division lists, the divisions were expected to implement their chosen metrics, BUT! the divisions were not to report any numbers to the parent or to other divisions. Programmers still 'Gamed' the metrics but the gaming was in a direction which promoted better software design and development. Such changes were of course fought expecially by management which didn't like change or the sense of loss of touch. How could their compare one division against another was their lament. The Boss's answer was you cann't. Symptomatic Relief? Our chief scientiest kept cranking parameters until a given piece of code finally gave reasonable results. Later, the error in the code was found which negated the heroic fiddling.