mcmahan@netcom.UUCP (Dave Mc Mahan) (06/27/90)
In a previous article, flint@gistdev.gist.com (Flint Pellett) writes: >mcmahan@netcom.UUCP (Dave Mc Mahan) writes: >> In a previous article, Don.Allingham@FtCollins.NCR.COM (Don Allingham) writes: >>>What is a good measure of code quality? Management likes to use "bugs >>>per line of code". Is this really a good measure? Doesn't seem so to >>>me. And what is a line of code? Does it include comments? Do you make >>>adjustments for codings style? Program complexity? > >>Personally, I think there is only one standard, and it is set by the 'user' of >>the code. If the software does what it is supposed to do, the user will be >>happy... > >>Other metrics of 'code quality' should include the perspective of those >>programmers that come later in the project to do maintainence. Version 1.0 >>of a program could be great, but if the new programmer breaks a finely tuned >>system that isn't robust and works because the original programmer had special >>knowledge he didn't leave behind, the code is of poor quality in the eyes of >>the maintainer. I always shudder when I'm asked to modify someone else's >>existing code. Scrapping the current stuff is usually not possible due to time >>and cost, so the code must be re-analyzed and adjusted to take in the new >>features. > >>I guess the bottom line is, if the code does what it is supposed to do to the >>satisfaction of the user, it is 'good code'. If it contains bugs or puts major >>constraints on the user in the environment is was designed for, it could use >>some more work before it is ready for a final release. It may not get that >>extra work, but it could use it. > >>>Don Allingham > >I certainly cannot agree with this definition, because the concept of quality >is not one that lends itself to anything this simple. For code to be good >code, it has to measure up against more than just one yardstick: the measure >of "does it do what it was supposed to do" is only one yardstick. A few >simple examples should suffice to show what I mean. I would claim that for any given perspective (user, maintainer, coder, administrator, etc) that there is only one yardstick. It may be different depending upon the hat you wear, but for each mode there is only one. Those that wear multiple hats at the same time may have more than one yardstick. >Given: The code that does exactly what it is supposed to do to the >satisfaction of the user, works great, no constraints on the user, does >what is was designed to do, no bugs, in the environment it was designed for. > >However: If the following is also true, I believe anyone qualified to judge >good code is going to tell you that the code is not good code: > >1. The code is completely uncommented. (My personal yardstick: if you have > one line of code, there should be 1 line of comment explaining that > line of code, and at least 25% of the lines of code should have > comments on the line explaining them. I include documentation files > that explain usage, file formats, etc. in my list of required > documentation. Doesn't this fall under the 'hat' of the maintainer? A 'user' could care less if comments exist or not, since he can't ever see them. If one used an interpretted language such as BASIC and it includes comments, I argue that running the program and viewing the source code for comments requires wearing two different hats. I agree that a different hat provides a different viewpoint about what is and isn't good code, but the user could care less about the internal details. S/he just wants a program that gets the job done. Lack of documentation isn't important. The maintainer, however, is a different matter. Again, I would argue that lack of comments in the source falls under the catagory of the original programmer having special knowledge that wasn't left behind. (personally, I tend to agree with your rules of thumb for the percentages of comments. However, each person and program have their own view of what is correct). From the user's point of view, the code may be fine. From the viewpoint of the maintainer, it's an absolute living nightmare come true. Is there really any one perspective that is right for everyone? I think not. >2. The code doesn't document itself. If you've ever seen code where > all the variables are named "t1", "t2", "t3", etc., you know what > I mean. Once again, I claim this falls under the, "original programmer had special knowledge that he didn't properly leave behind". >3. You cannot perform even trivial additions to the code without major > amounts of effort. (When your user says "The product is great, can > you add <trivial thing> and you have to say "Sure, for $10K" even your > user is going to wonder about the code quality.) Again, "original programmer ...." The user isn't going to wonder about code quality. The person paying the bills may, however. These two personna are not always the same individual. >4. The choice of language was inappropriate: 2000 lines of code were > written to do what 5 lines would do in another language. I feel this is a relative point. Is there really any 'quality' difference between a program that is 20000 lines of assembly vs. a program that is 5 lines of Lisp (or other language) which requires a 200K interpretter to run it? What is a 'line of code' in one language equivilent to when compared to another language? It is quite possible that the 2000 lines of code may be much faster than the other version that requires only 5 lines. What if the rest of the program is already written in the language of the 2000 lines, and switching to a more compact language form isn't even close to trivial? I've seen some pretty wicked-looking APL programs that are only 5 lines, but you would need 5 years of dedicated APL experience to figure them out. Quality code? You make the call. >5. The code re-invented the wheel. It is not great code to write your > own routines to do things the standard libraries do for you, even if > your routines work. It is not good code if your code could have been > driven from standard file formats and wasn't. IMHO this is bad form, but is it really bad code? Most probably yes, but who can say? Perhaps the redundant code was written to make the program truely portable across a large spectrum of machines. Is it then redundant and 'Bad form' just because the machine you are running it on has a better way? Wouldn't this conflict with item #6 below? >6. The code isn't portable. Well, maybe that isn't important to you, it > does work perfectly in the original environment. Once again, it may be non-portable, but is that a measure of bad code quality? It is optimal (IMHO) to write code that IS portable across machines as often as possible, but the obvious case of an assembly language program is clearly never portable (or hardly ever). Is this poor quality code? >7. Code complexity. This goes to the issue of maintenance referred to > before. One tool I use is a mccabe complexity analysis: I get a > number back for each routine telling me it's complexity according to > the McCabe model, and I know if I see anything above a 10 I better > go look at it and see if I can rewrite it in a better fashion. I'm not familiar with the McCabe model. It sounds like a good thing from what you write of. I feel that this might provide a good guidepost to code that is too complex, but is this always the case? To steal a phrase from many .signature lines (I think the orginal author was Alan McKay), "Simple things should be simple. Complex things should be possible." I don't feel that code complexity is a hard and fast good/bad indicator of code quality. Ever try to filter out satellite frequency shift from a local oscillator in 10 lines or less? Some things just take a lot of complicated code. >I could probably list a few dozen more things, but this posting is too >long already: but maybe that is the point: there is no simple definition >of what represents code quality. This is a statement I totally agree with. I was trying to write about the most common metric I use, the perspective of the end user. I feel the bottom line is that no matter how elegant the code is, if it doesn't do the job correctly, it's not worth the paper it is printed on. > In the meantime, don't >trust anyone who says they can define what quality software is in one >paragraph, because it cannot be done. I'm not sure if that was the message that went out, but it's definately not the message I was trying to deliver. The original poster asked for some ways to measure quality, and I gave him the ones I use to sell of the product. You are quite correct in stating that there are many more ways than one to look at what makes 'quality software'. >Flint Pellett, Global Information Systems Technology, Inc. -dave
mcmahan@netcom.UUCP (Dave Mc Mahan) (06/27/90)
In a previous article, cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: >In article <10865@netcom.UUCP> mcmahan@netcom.UUCP (Dave Mc Mahan) writes: > > Personally, I think there is only one standard, and it is set by the 'user' > of the code. If the software does what it is supposed to do, the user will > be happy. > >This seems all right for an internal, "captive" user, but how about the >end-user of a software product? Must he buy the product only to determine >that he isn't happy with it? Isn't that a little late? Yes, especially if s/he didn't buy from a place that has a money back guarantee! I'm a firm believer that every buyer is responsible for determining the 'fitness' for any program purchased. I always try to demo a piece of software before I spend money. However, fairness aside, I still stick to my original premise that the yardstick for determining if the software is 'quality' or not belongs to the user. Who else knows better then s/he what problem needs to be solved. Finding out that the 'quality' is lacking after the purchase is indeed an unfortunate suprise, but not related to whether or not you have just purchased a 'quality' product. The product was either fit or unfit before the purchase. Finding out before is always better (although usually harder) than finding out after the purchase. Caveat Emptor!! >David Masterson Consilium, Inc. -dave
flint@gistdev.gist.com (Flint Pellett) (06/28/90)
To answer your question on the McCabe complexity, since I've gotten several pieces of e-mail on it as well. What it measures is the complexity of each "routine", not each "file". I have some "quality" code (IMHO) that performs some conceptually very complex tasks which is well organized into a collection of small routines no one of which is especially complex. I also know of some examples of code where the job that is being done is relatively simple and straightforward, (albeit with a lot of special cases) but it was written as one huge routine about 3000 lines long. That routine earned an incredibly high complexity rating. The interesting thing about it is that the very complex routine has required fixing again and again and again, but the uncomplicated routines that perform the more complicated job have not. "Needing fixing" is not just a maintainer's view either: if some user hadn't had a problem with it, the maintainer wouldn't have dared to touch the thing. With some problems, you just aren't able to break the problem apart enough of course, and you will have complexity. If you use the complexity measure to conclude "this is bad code" then you've misused the measure. (There may be a high correlation between complex code and poor code, but one does not perfectly imply the other.) On the issue of "quality is in the eye of the user" only: the problem with that as the sole measure of quality is that it is not a stable "measure", it's like a rubber yardstick. For example, say that I buy a new car, and I and several million other owners are completely happy with it for two years. Then we find out on the news that our cars are unsafe if we ever need to perform a quick turn to the left, and will always roll over. Overnight the user satisfaction with the product drops to half what it was: can you claim that the quality of the product changed? I contend that the quality was constant (and not very good) but by measuring only how the users felt about it I misled myself. To really know what the quality of the product is (regardless of what hat I'm wearing) I need to examine several things, not just one. -- Flint Pellett, Global Information Systems Technology, Inc. 1800 Woodfield Drive, Savoy, IL 61874 (217) 352-1165 uunet!gistdev!flint or flint@gistdev.gist.com
ssdken@watson.Claremont.EDU (Ken Nelson) (06/29/90)
I am not sure that any metric is an absolute indicator of quality, or even a close estimate all the time. I also don't really care, as long as the metric gives me a general, accurate a lot of the time estimate of quality. I use metrics to prevent bugs, and to optimize and focus my maintenance and testing efforts. If a high McCabe Complexity generally indicates bugs, then it only makes sense to allocate more design review, testing, checking out effort to modules with high complexity. Metrics won't replace users feelings about quality, or programmers feelings about quality. They can act as an CONSISTENT and IMPARTIAL consultant when merged with user's and programmers gut feelings can make the bug prevention and detection effort more efficient. Metrics don't have personal, political, or other problems generally associated with people. They can be measured at night, on as small or large a piece of the software as you like. Just like computers, they HELP but don't replace humans. Ken Nelson Software Systems Design (714) 624-3402
mcmahan@netcom.UUCP (Dave Mc Mahan) (07/02/90)
In a previous article, flint@gistdev.gist.com (Flint Pellett) writes: >On the issue of "quality is in the eye of the user" only: the problem >with that as the sole measure of quality is that it is not a stable >"measure", it's like a rubber yardstick. For example, say that I buy >a new car, and I and several million other owners are completely happy >with it for two years. Then we find out on the news that our cars are >unsafe if we ever need to perform a quick turn to the left, and will >always roll over. Overnight the user satisfaction with the product >drops to half what it was: can you claim that the quality of the >product changed? I contend that the quality was constant (and not >very good) but by measuring only how the users felt about it I misled >myself. To really know what the quality of the product is (regardless >of what hat I'm wearing) I need to examine several things, not just one. Yes, in my original posting I mentioned that this method of measurement is one of the harder ones to quantify, as the user can't even tell you what is good or bad about the program in detailed words. Things like, "well, it just doesn't seem to make sense", or "I like it, but it's not really what I want" don't help me too much. Changing expectations are always a job hazzard. In more formal descriptions, this is called "requesting more features". The yardstick can change (since people are fickle) and you can't really do to much about it, since there was never any formal acceptance by the user as to what the program would do. If there had been, the programmer/engineer can use that as justification for more money, since the program now has to do more things. I still stand by my original premise, however. If it doesn't do what the customer wants and needs, the program can use adjustment. >Flint Pellett, Global Information Systems Technology, Inc. -dave
xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (07/02/90)
In article <CIMSHOP!DAVIDM.90Jun25103758@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: >In article <10865@netcom.UUCP> mcmahan@netcom.UUCP (Dave Mc Mahan) writes: > > Personally, I think there is only one standard, and it is set by the 'user' > of the code. If the software does what it is supposed to do, the user will > be happy. > >This seems all right for an internal, "captive" user, but how about the >end-user of a software product? Must he buy the product only to determine >that he isn't happy with it? Isn't that a little late? Moreover, if management wants to reward creation of quality code (what a novel concept, but how wise), you'd better have a lot more objective was of measuring quality. Bugs reported per time period, effort per bug to repair, changes requested per time period, time DIV complexity to make those changes, several metrics (granted they're mostly snake oil and predict little; let's improve them), all need to be considered before bonuses are paid. Kent, the (unemployable) man from xanth. <xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>
davidm@uunet.UU.NET (David S. Masterson) (07/02/90)
In article <1990Jul2.000639.14545@zorch.SF-Bay.ORG> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes: Moreover, if management wants to reward creation of quality code (what a novel concept, but how wise), you'd better have a lot more objective was of measuring quality. Bugs reported per time period, effort per bug to repair, changes requested per time period, time DIV complexity to make those changes, several metrics (granted they're mostly snake oil and predict little; let's improve them), all need to be considered before bonuses are paid. Perhaps its better for management to reward a percentage of the return on the code than worry about the quality of the code. No measurements, no predictions, just simple sharing of returns (product sales, money saved, etc.). I know this sort of contradicts what I've said before, but what the hey... Actually, I think its more important to be able to predict outcomes with some measure of accuracy going into a project than coming out of one (when you would normally decide on rewards). -- =================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mt. View, CA 94043 =================================================================== "If someone thinks they know what I said, then I didn't say it!"
flint@gistdev.gist.com (Flint Pellett) (07/02/90)
Expectations are probably the biggest variable in our industry: how happy a customer is with something depends a lot on what they expect out of it. Continuing the previous analogy: If I bought a $50,000 sports car today, and tomorrow a different company announced one that was better at everything, and cost half as much, my satisfaction would nosedive. With cars that isn't very likely, but with software it is VERY likely. ("Gee, I just bought your $2K UNIX based system, and today I see a $50 package on a Mac that does "X" so much better..." and expectations rise.) I've noticed that this is particularly prevalent when people make comparisons between complete systems and individual components, such as the recent spate of comparisons between the compiler provided as part of Coherent's UNIX system and stand-alone DOS compilers. (People seem to "expect" the same level of performance in the compiler they got as part of a whole system (for $100) as they expect out of a stand-alone compiler (that costs $150).) -- Flint Pellett, Global Information Systems Technology, Inc. 1800 Woodfield Drive, Savoy, IL 61874 (217) 352-1165 uunet!gistdev!flint or flint@gistdev.gist.com
brianc@labmed.ucsf.edu (Brian Colfer) (07/03/90)
In article <CIMSHOP!DAVIDM.90Jul2005035@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: > > Moreover, if management wants to reward creation of quality code > (what a novel concept, but how wise), you'd better have a lot > more objective was of measuring quality. Bugs reported per time > period, effort per bug to repair, changes requested per time period, > time DIV complexity to make those changes, several metrics (granted > they're mostly snake oil and predict little; let's improve them), > all need to be considered before bonuses are paid. > >Perhaps its better for management to reward a percentage of the return on the >code than worry about the quality of the code. No measurements, no >predictions, just simple sharing of returns (product sales, money saved, >etc.). Yet this is a measurement... For systems which are designed with the primary motivation of generating profits I think that this is only way to reward software creation. There are two problems with this measurement system: 1) There may be a latency between production of the system and realization of return. This will dampen the reward effect but probably not by very much ... at least you would reward the units individuals who follow good practices ... code review,etc. It will also surpress the link for management between good software principles and rewards. 2) How do we reward public sector programmers? For example if I write code which makes it easier for an M.D. to save a life how much is that code worth? To belabor the point... how much is the Huble telescope code worth? We still need a way to recognize quality and reinforce it. >Actually, I think its more important to be able to predict outcomes with some >measure of accuracy going into a project than coming out of one (when you >would normally decide on rewards). Predict outcomes of what the code will do or the return on investment? -- Brian Colfer | UC San Francisco |------------------------| | Dept. of Lab. Medicine | System Administrator, | brianc@labmed.ucsf.edu | S.F. CA, 94143-0134 USA | Programer/Analyst | BRIANC@UCSFCCA.BITNET | PH. (415) 476-2325 |------------------------|
tom@stl.stc.co.uk (Tom Thomson) (07/05/90)
In article <926@gistdev.gist.com> flint@gistdev.gist.com (Flint Pellett) writes: >To answer your question on the McCabe complexity, ................. > >With some problems, you just aren't able to break the problem apart >enough of course, and you will have complexity. If you use the >complexity measure to conclude "this is bad code" then you've misused >the measure. (There may be a high correlation between complex code >and poor code, but one does not perfectly imply the other.) > Some published work indicates no correlation at all between McCabe complexity measure and rate of change of code after release; while there's a very strong correlation between rate of change after release and source code size. [Barbara Kitchenham published some stuff on this years ago, I can't remember the exact reference]. For an economic measure of quality, post-release rate-of-change is pretty good: change arises because there were bugs; debugging after release is expensive. [Obviously, for something that is going through a planned series of releases with planned facility enhancements, not all change is to do with bugs, but the change that is is bad news.] So code size is a better (very much better) measure of code quality than McCabe Complexity - - the smaller the source code to do a job, the better the quality. The McCabe measure tells us something about the control flow graph of the program: so it is completely useless for code not written in a control flow language (ML, pure Lisp, Prolog, ......); it favours (assigns lower complexity to) programs written in a style which use jump tables in data areas instead of case statements in code and compute program addresses by arithmetic instead of using labels or procedure names, so it's going to tell you that really awful programs are pretty good. Just as the quote above is not claiming that not every program with a high McCabe number is a bad one, I'm not claiming that every program with a low McCabe number is a bad one either; just pointing that the McCabe number is of very little use for anything.
warren@eecs.cs.pdx.edu (Warren Harrison) (07/06/90)
In article <3182@stl.stc.co.uk> "Tom Thomson" <tom@stl.stc.co.uk> writes: >In article <926@gistdev.gist.com> flint@gistdev.gist.com (Flint Pellett) writes: >Some published work indicates no correlation at all between McCabe complexity >measure and rate of change of code after release; while there's a very strong >correlation between rate of change after release and source code size. >[Barbara Kitchenham published some stuff on this years ago, I can't remember > the exact reference]. This isn't surprising. Rate of change after code release has very little to do with (sorry, but it seems like the most appropriate term) "code quality". It is well known that under 20% of maintenance (ie, code changes after release) are due to things other than bugs (see Lientz & Swanson's research for the exact percentages) - primarily new functionality and adaptations to new environments (aka "porting"). We recently looked at about 250,000 lines of embedded avionics software from several families of Navay attack aircraft. Our percentages agreed with the L & S study. Not surprisingly, we had minimal correlations between metrics and change. >For an economic measure of quality, post-release rate-of-change is pretty good: >change arises because there were bugs; debugging after release is expensive. Not true for most software (see above)! >[Obviously, for something that is going through a planned series of releases >with planned facility enhancements, not all change is to do with bugs, but the >change that is is bad news.] Even isolating the changes due to bugs does not give you a suitable basis for evaluating code metrics, since large percentages of bugs are typically due to Sepcification and/or design errors (in one of our studies we found about 25% of teh recorded bugs during testing were due to coding - the rest were put in at spec or design stage). Obviouslyu you can have the best code in the world but if the design or spec is wrong, it's still a bug. >So code size is a better (very much better) measure of code quality than >McCabe Complexity - - the smaller the source code to do a job, the better the >quality. The McCabe measure tells us something about the control flow graph >of the program: so it is completely useless for code not written in a control >flow language (ML, pure Lisp, Prolog, ......); it favours (assigns lower >complexity to) programs written in a style which use jump tables in data areas >instead of case statements in code and compute program addresses by arithmetic >instead of using labels or procedure names, so it's going to tell you that >really awful programs are pretty good. > All metrics should be assumed to be useful only within their specific domain - it's asking a little much for a universal property to be applied to all programming paridigms - consider that people measure expert system performance using LIPS instead of MIPS. In fact, a number of studies have shown that as programs get smaller, their bug rate (ie, bugs per KLOC) increase (Basili's work stands out most in my mind, but others have done this too), so while larger modules have more bugs, they often have fewer bugs per thousand lines of code. >Just as the quote above is not claiming that not every program with a high >McCabe number is a bad one, I'm not claiming that every program with a low >McCabe number is a bad one either; just pointing that the McCabe number is >of very little use for anything. It *can* identify source code that is hard to follow due to explicit flow of control (to evaluate the McCabe metric, only consider code errors that were due to control flow problems). This won't give you the whole picture, but it will give you one part of it. Most metricians recommend that you use a set of metrics to get a handle on the different aspects of the code (sorry again) "quality", just like a physician will tell you about blood pressure, height, weight, cholesterol, etc. The mistake is evaluating the code using *one* number. > I wouldbe happy to send copies of tech reports or reprints of our papers to anyone who is interested. Just send me you US Mail address (we don't have most of them on-line). Warren ========================================================================== Warren Harrison warren@cs.pdx.edu Department of Computer Science 503/725-3108 Portland State University
davidm@uunet.UU.NET (David S. Masterson) (07/06/90)
In article <3016@ucsfcca.ucsf.edu> brianc@labmed.ucsf.edu (Brian Colfer) writes: In article <CIMSHOP!DAVIDM.90Jul2005035@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: > > Moreover, if management wants to reward creation of quality code > (what a novel concept, but how wise), you'd better have a lot > more objective was of measuring quality. Bugs reported per time > period, effort per bug to repair, changes requested per time period, > time DIV complexity to make those changes, several metrics (granted > they're mostly snake oil and predict little; let's improve them), > all need to be considered before bonuses are paid. > >Perhaps its better for management to reward a percentage of the return on >the code than worry about the quality of the code. No measurements, no >predictions, just simple sharing of returns (product sales, money saved, >etc.). Yet this is a measurement... Yeh, but not necessarily a measurement within the control of the organization (how many high-quality programs have you seen generate little return?) and certainly not something knowable going into a project. Besides, if a project fails, is it the programmers fault or top-management? >Actually, I think its more important to be able to predict outcomes with >some measure of accuracy going into a project than coming out of one (when >you would normally decide on rewards). Predict outcomes of what the code will do or the return on investment? By the above, isn't this one and the same? The need to know how code will perform will come from the desire to maximize a return on investment. Without some measure of certainty in getting a good return on invest, why bother to invest? -- =================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mt. View, CA 94043 =================================================================== "If someone thinks they know what I said, then I didn't say it!"
mcgregor@hemlock.Atherton.COM (Scott McGregor) (07/07/90)
In article <3182@stl.stc.co.uk>, tom@stl.stc.co.uk (Tom Thomson) writes: > Just as the quote above is not claiming that not every program with a high > McCabe number is a bad one, I'm not claiming that every program with a low > McCabe number is a bad one either; just pointing that the McCabe number is > of very little use for anything. In one case that I am aware of a group decided to put a ceiling on McCabe complexity metric numbers for routines in their product. Any routine that was too complex was sent back for re-write. An unexpected side-effect was that the total number of files in the system grew as the McCabe numbers declined. And as the number of files grew, the complexity of the configuration grew, and there were more misconfigurations made in submittals to QA as a result. So while bugs in the modules were reduced, bugs in the configuration increased and required extra effort to fix. Again, this doesn't mean McCabe numbers are bad; I can imagine that if you put a ceiling on source statements you might have a similar effect. But I agree with other earlier posters that you need to be very careful what you measure and what interpretation you put on it. --Scott McGregor mcgregor@atherton