rcd@ico.isc.com (Dick Dunn) (05/23/91)
jls@netcom.COM (Jim Showalter) writes a bunch of generally on-target stuff, but he uses one example that should caution us in our quest for decent metrics... > ...Think of metrics like the SAT college admissions test. It doesn't > purport to measure intelligence, it just claims to be a reasonably accurate > predictor of success in college. The evidence supports this claim: SOMETHING > that has some bearing on success in college is being measured by the SAT's, > since those with lower scores tend to do worse in college... OK, I don't argue with the SAT's success there, but consider: What is "success in college"? Generally it's a matter of satisfying another set of metrics which purport to be related to the acquisition of knowledge and skills. HOWEVER, these metrics (grades, oversimplifying a bit) are also indirect measures. So, for the SAT to work, all it has to do is measure a student's ability to perform well according to the college metrics; it may not mean squat about what a student is actually going to get out of college. In particular, both the SAT and college grades often reflect one's ability to take multiple-choice tests. (One of my favorite examples is that I've done well on a multiple-choice test in basic French, in spite of never having learned the language...hell, I can barely read a Bordeaux label.) Now, note that this doesn't make the SAT inaccurate--it DOES predict what it's supposed to predict (for whatever reason), just as Jim said. But we have to be careful that the metrics, particularly if two levels deep, predict something useful in the end result. -- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870 ...Simpler is better.
jls@netcom.COM (Jim Showalter) (05/23/91)
>Now, note that this doesn't make the SAT inaccurate--it DOES predict what >it's supposed to predict (for whatever reason), just as Jim said. But we >have to be careful that the metrics, particularly if two levels deep, >predict something useful in the end result. Agreed. It may seem silly, but my basic point is that if it turned out that there was a very strong correlation between a metric that measured the monthly rutabaga consumption of a development team and that team's success on a project, then it is quite arguable that rutabaga consumption is a VALID metric. I realize this is a reductio ad absurdum, but consider this: the SAT's only really measure one's ability to take multiple-choice tests...and yet there is apparently a very strong correlation between that ability and one's success in college (this may well say something very nasty about the state of education in this country, but that's for another newsgroup!). Hopefully we can even do BETTER than rutabaga consumption, but if it works, what the hell... This is all a rather murky area, really. My father pointed out one time that the statement "it only provides symptomatic relief" was stupid: if the symptoms of a broken arm are pain, bone jutting from muscle, and an inability to lift objects with the arm, then relieving those symptoms is the same as curing the problem--so what's the objection? Similarly, if SAT's only measure the ability to take multiple-choice tests, but this predicts success in college, then that's functionally equivalent to directly measuring college-success-ish-ness. You've provided symptomatic relief. The real danger is in arguing backwards from a metric. For example, I remember reading somewhere that 90% of all violent felons in prison were determined to have eaten potatoes in one form or another within the 48 hours preceding their commission of the crime for which they were convicted. Obviously, then, potato consumption is a valid metric for violent criminal behavior... ;-) -- **************** JIM SHOWALTER, jls@netcom.com, (408) 243-0630 **************** *Proven solutions to software problems. Consulting and training on all aspects* *of software development. Management/process/methodology. Architecture/design/* *reuse. Quality/productivity. Risk reduction. EFFECTIVE OO usage. Ada/C++. *
frank@grep.co.uk (Frank Wales) (05/25/91)
In article <1991May23.014904.5896@netcom.COM> jls@netcom.COM (Jim Showalter) writes: >It may seem silly, but my basic point is that if it turned out >that there was a very strong correlation between a metric that measured >the monthly rutabaga consumption of a development team and that team's >success on a project, then it is quite arguable that rutabaga consumption >is a VALID metric. It must be possible to establish credible causality too, otherwise you can't be sure what you're measuring. Say you notice that levels of ice-cream consumption correlate strongly with deaths at the beach. Does this mean ice-cream is a killer? Not if you realise that both variables have a common influence, such as sunny weather. >My father pointed out one time >that the statement "it only provides symptomatic relief" was stupid: if >the symptoms of a broken arm are pain, bone jutting from muscle, and an >inability to lift objects with the arm, then relieving those symptoms >is the same as curing the problem--so what's the objection? I take it your father wasn't a doctor, then? :-) Taking Tylenol whenever you have a headache doesn't cure your brain tumour. >Obviously, then, potato consumption is a valid metric >for violent criminal behavior... ;-) Indeed. Just like looking at local death rates makes hospitals hazardous. -- Frank Wales, Grep Limited, [frank@grep.co.uk<->uunet!grep!frank] Kirkfields Business Centre, Kirk Lane, LEEDS, UK, LS19 7LX. (+44) 532 500303
jls@netcom.COM (Jim Showalter) (05/25/91)
>It must be possible to establish credible causality too, otherwise you can't >be sure what you're measuring. Say you notice that levels of ice-cream >consumption correlate strongly with deaths at the beach. Does this mean >ice-cream is a killer? Not at all. But it DOES mean ice-cream is a good predictor for beach-deaths, which was precisely my point. Thanks for providing another example to support my thesis! :-) >>My father pointed out one time >>that the statement "it only provides symptomatic relief" was stupid: if >>the symptoms of a broken arm are pain, bone jutting from muscle, and an >>inability to lift objects with the arm, then relieving those symptoms >>is the same as curing the problem--so what's the objection? >I take it your father wasn't a doctor, then? :-) Taking Tylenol whenever >you have a headache doesn't cure your brain tumour. My dad was only pointing it out for broken arms. It doesn't work for everything. -- **************** JIM SHOWALTER, jls@netcom.com, (408) 243-0630 **************** *Proven solutions to software problems. Consulting and training on all aspects* *of software development. Management/process/methodology. Architecture/design/* *reuse. Quality/productivity. Risk reduction. EFFECTIVE OO usage. Ada/C++. *
adamksh@ip2020.Berkeley.EDU (Adam Kao (KSh)) (05/26/91)
In article <1991May25.053304.10445@netcom.COM>, jls@netcom.COM (Jim Showalter) writes: [attribution lost] >>It must be possible to establish credible causality too, otherwise you can't >>be sure what you're measuring. Say you notice that levels of ice-cream >>consumption correlate strongly with deaths at the beach. Does this mean >>ice-cream is a killer? >Not at all. But it DOES mean ice-cream is a good predictor for beach-deaths, >which was precisely my point. Thanks for providing another example to support >my thesis! :-) No. Please, take the argument one step further. Imagine that a high-level committee is formed for the purpose of reducing deaths at the beach. Upon discovering the correlation above, the committee promptly imposes limits upon the consumption of ice-cream. Surprisingly, nothing happens. (More likely, the committee acts in late September, and then declares victory as beach deaths decline.) We don't go around discovering correlations just for fun. We usually wish to draw conclusions about actions we should take to reach a desired outcome. We don't want software metrics just to predict when our project will fail (as we stand helpless); we want to be able to prevent a possible project failure. What most people don't understand is that we must establish causality before we can know what action to take. Correlations do not establish causality. >>>My father pointed out one time >>>that the statement "it only provides symptomatic relief" was stupid: if >>>the symptoms of a broken arm are pain, bone jutting from muscle, and an >>>inability to lift objects with the arm, then relieving those symptoms >>>is the same as curing the problem--so what's the objection? >>I take it your father wasn't a doctor, then? :-) Taking Tylenol whenever >>you have a headache doesn't cure your brain tumour. >My dad was only pointing it out for broken arms. It doesn't work for >everything. This is exactly the point. Now that you admit symptomatic relief doesn't work for everything, you must show that symptomatic relief does work for software. Adam
frank@grep.co.uk (Frank Wales) (05/28/91)
JS == jls@netcom.COM (Jim Showalter) == JS: ME == me: ME>It must be possible to establish credible causality too, otherwise you ME>can't be sure what you're measuring. Say you notice that levels of ME>ice-cream consumption correlate strongly with deaths at the beach. ME>Does this mean ice-cream is a killer? JS>Not at all. But it DOES mean ice-cream is a good predictor for JS>beach-deaths, which was precisely my point. Thanks for providing another JS>example to support my thesis! :-) "It's dead, Jim!" :-) Beach deaths are already an excellent measure of beach deaths. There is little point in obtaining others unless they buy you something; for example, insight or understanding. My concern here is not with whatever other auxiliary numbers can be obtained that mean the same thing as the original statistics; it's what people *do* with these other numbers, and especially what they attempt to intuit from the relationship between them. For example, if people attempt to cure the "beach-death problem" by restricting ice-cream sales on the basis of these data, they've only bought disappointment and frustration. More to the point, such bogus applications of non-causal correlations undermine similar, perhaps more valid, measures, and that is a real cost; for a start, it makes it harder to convince people of the value of metrics. JS>My father pointed out one time JS>that the statement "it only provides symptomatic relief" was stupid: if JS>the symptoms of a broken arm are pain, bone jutting from muscle, and an JS>inability to lift objects with the arm, then relieving those symptoms JS>is the same as curing the problem--so what's the objection? ME>I take it your father wasn't a doctor, then? :-) Taking Tylenol whenever ME>you have a headache doesn't cure your brain tumour. JS>My dad was only pointing it out for broken arms. It doesn't work for JS>everything. Indeed. Like helping to 'cure' ailing software projects, for example. -- Frank Wales, Grep Limited, [frank@grep.co.uk<->uunet!grep!frank] Kirkfields Business Centre, Kirk Lane, LEEDS, UK, LS19 7LX. (+44) 532 500303
hlavaty@CRVAX.Sri.Com (05/28/91)
In article <1991May24.192101.22317@grep.co.uk>, frank@grep.co.uk (Frank Wales) writes... >In article <1991May23.014904.5896@netcom.COM> jls@netcom.COM (Jim Showalter) writes: > >It must be possible to establish credible causality too, otherwise you can't >be sure what you're measuring. Say you notice that levels of ice-cream >consumption correlate strongly with deaths at the beach. Does this mean >ice-cream is a killer? Not if you realise that both variables have >a common influence, such as sunny weather. While causality is important and helpfull, I don't agree that it prevents the use of the metric. In your example, if you wanted to prevent deaths at the beach, you could use the ice cream metric to alert you when more deaths at the beach were immininent. Then you could go to the beach and figure out what was going on. While a simple (and silly) example, the same principle applies to something more arcane like the health and status of an integration effort. If you have noticed that more overtime usually indicates that your effort is in trouble, you certainly don't prevent people from working overtime. You spend some time going over your integration effort and try and figure out what the problems are. The major problem facing any software development is that so many potential things can go wrong that are *not* obvious, or that "seemed like a good idea at the time" but turned out later to be short term thinking. The quest for metrics is a search to find something (ANY something) that you can demonstrate is a good indication of success or failure, OR allows you more insight into the inner workings of the project. An example of the former would be looking at overtime or number of compiles/day. A good example of the latter would be tracking test cases completed over time and comparing it to your original plan. >Indeed. Just like looking at local death rates makes hospitals hazardous. Well, yes. Being in a hospital is a good metric to indicate that your chances of dying in the near future MAY have gone up. So you look closer. Why am I here? If it's because a broke my foot, I relax and conclude that there is no cause for concern. If it's because I am having an operation that requires me to be knocked out, I get a little more concerned and may look at this hospital's track record for 1) knocking people out and not killing them and 2) their overall success at this type of operation. Applying the analogy directly to software development, let's say I have been tracking the success rate of all my programmers (How I am doing this or what my definition of success is I will leave to future discussions). Now, when I realize that a particular module that I am concerned with is being worked on by a programmer with a demonstrated poor performance, I get concerned and proceed to look closer at the situation. Has that programmer ever done something similar to this module? How did he do on that one? What did he learn (if anything) from his mistakes? After going through this process I may decide that no action is warranted, or that some help is in order, or that the module should be given to someone else entirely. In fact, here I am not concerned too much with causality. Last time the programmer tried this it got all bungled up. I am now concerned, whether or not I know why he bungled it up. If I know why it got bungled, I can possibly make a more eductaed decision. If I don't know why it got bungled, I can decide to 1) watch the development vert closely this time and try and figure out why the programmer has problems or 2) decide the module is too important to risk and give it to someone else. The metric is more valuable to me with causality, but still usefull even without it. Jim Hlavaty
jls@netcom.COM (Jim Showalter) (05/29/91)
>Beach deaths are already an excellent measure of >beach deaths. Granted, just as project failures are an excellent measure of project failures. But so what? We are trying to intercede BEFORE the death occurs or the project fails. >More to the point, such bogus applications of non-causal correlations >undermine similar, perhaps more valid, measures, Also granted. Now, I invite you to provide me with a list of causal correlations with respect to software projects. All the ones I know of are based on experience and intuition into the process of software development--and are good metrics--but none of them have, to the best of my knowledge, ever been proven to be the CAUSE of a failed project. -- **************** JIM SHOWALTER, jls@netcom.com, (408) 243-0630 **************** *Proven solutions to software problems. Consulting and training on all aspects* *of software development. Management/process/methodology. Architecture/design/* *reuse. Quality/productivity. Risk reduction. EFFECTIVE OO usage. Ada/C++. *