richard@iesd.auc.dk (Richard Flamsholt S0rensen) (03/13/91)
>>>>> On 11 Mar 91 09:06:47 GMT, raj@crosfield.co.uk (Ray Jones) said: Ray> In article <1991Mar6.214157.18633@ntpal.uucp> dcavasso@ntpal.uucp (Dana Cavasso) writes: > > I need a "C" code line counter program, preferably written in >"C". It will be used on several platforms, so solutions involving >shell scripts and other UNIX utilities won't work. I'm not very >picky (although I'd like something that did a little more than count >newlines :-) Ray> So how about one that counts semi-colons :-) Doesn't sound that silly after all. The only thing (somebody correct me if I'm wrong) that "terminates" statements, are semicolons and right braces. (Surely I know, that right braces doesn't terminate a statement in the sense, that you can write x = 1} y = 3; . What I mean is, that when you encounter a }, you can be at the end of a compound statement, the only statement that's *not* terminated by a semicolon.) The problem is, that if (x == 7) { y = 1; } - would count as two statements, thought that's actually quite correct; a compound statement is a statement in itself and need not include any statements at all inside it; here, it does include one so all in all we have two statements. In a line counting program, however, you probably wouldn't want the {}'s themselfes to count as a statement, would you? Furthermore, if we count }'s as statements this also makes function definitions, structs, unions, enums and initializers - who all ends with a } - look like statements. Therefore, I think the idea of dismissing the compound statements and only counting semicolons is allright (as long as you stay out of comments, character constants and strings, of course ...) - and if I'm wrong, somebody'll probably tell me so ;-) Richard -- /Richard Flamsholt richard@iesd.auc.dk
frank@grep.co.uk (Frank Wales) (03/14/91)
In article <RICHARD.91Mar12184440@dompap.iesd.auc.dk> richard@iesd.auc.dk (Richard Flamsholt S0rensen) writes: >The only thing (somebody correct me if I'm wrong) that "terminates" >statements, are semicolons and right braces. > > The problem is, that > if (x == 7) { > y = 1; > } > - would count as two statements, thought that's actually quite correct; Then it's not a problem, is it? >Furthermore, if we >count }'s as statements this also makes function definitions, structs, >unions, enums and initializers - who all ends with a } - look like >statements. You mean you don't consider all that stuff to be statements? If one is attempting to make statement counts meaningful, surely they must include all components of the program, and that must include all the above. -- Frank Wales, Grep Limited, [frank@grep.co.uk<->uunet!grep!frank] Kirkfields Business Centre, Kirk Lane, LEEDS, UK, LS19 7LX. (+44) 532 500303
lws@comm.wang.com (Lyle Seaman) (03/15/91)
richard@iesd.auc.dk (Richard Flamsholt S0rensen) writes: > In a line counting program, however, you probably wouldn't want the >{}'s themselfes to count as a statement, would you? Furthermore, if we >count }'s as statements this also makes function definitions, structs, >unions, enums and initializers - who all ends with a } - look like >statements. > Therefore, I think the idea of dismissing the compound statements >and only counting semicolons is allright (as long as you stay out of >comments, character constants and strings, of course ...) - and if I'm >wrong, somebody'll probably tell me so ;-) Well, that's a matter of interpretation. I think compund statements are a useful complexity indicator, and reasonably simple to count. I guess it depends on *why* you're counting things in your code-- what your goals are, what you hope to be able to use such data for. -- Lyle 508 967 2322 "We have had television problems directly lws@capybara.comm.wang.com attributable to something not understandable" Wang Labs, Lowell, MA, USA - unnamed believer in poltergeists
jameson@jade.uucp (Kevin Jameson) (03/16/91)
If the author figured that something (semicolons, brackets, partial boolean expressions, etc) was worth putting on a line by itself when the code was written, wouldn't it be reasonable to count it as a distinct line of code? Surely something like this if ((some-long-condition1 && some-other-long-condition2) || third condition) { ...; ...; } should count as more lines of code than something like this: if (debug) ...;
EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/16/91)
In article <1991Mar15.231326.972@jade.uucp>, jameson@jade.uucp (Kevin Jameson) writes: >If the author figured that something (semicolons, brackets, partial boolean >expressions, etc) was worth putting on a line by itself when the code was >written, wouldn't it be reasonable to count it as a distinct line of code? > >Surely something like this > >if ((some-long-condition1 > && some-other-long-condition2) > || third condition) { > ...; > ...; >} > >should count as more lines of code than something like this: This is one of the few sensible remarks on this issue I've seen (other than my own, of course). While the facts remain that: * "Line of code" is not a meaningful concept in a free-form language like C * "Statement" as precisely defined by the Backus-Naur Form definition of the language is the only meaningful entity to count (if, like that character in Sesame Street, you Must Count) * "Measuring complexity" is suspect you are right that some sort of meaningful complexity of intent is shown by spreading a complex if conditional over a couple of lines of code. But then what is more complex? if ( logical_cond_1 || logical_cond_2 && logical_cond_3 ) ... or if ( logical_cond_1 || logical_cond_2 && logical_cond_3 ) ... The programmer of the second example has used more lines of code...but has, I think, more perspicuously expounded the intent of her code. She places the major operator on a line by itself, and displays the fact that the logical AND has higher priority than the OR by keeping it on the same line as its operands. Yet a "lines of code" metric would score the first example as being simpler and (as we all know) Simplicity is Good. The only responsible metric is not psychological, it is mathematical: it is counting statements according to the strict Backus-Naur Form definition of the language. And to avoid scoring { a = b; } /* Two statements: an assignment & a compound */ as having more statements than a = b; /* One statement: an assignment */ you must do what an optimizing compiler does and detect code that adds nothing to the "real", executable program...not only compound statements with zero or one inner statements but also unreachable "dead" code. But even if you do not do what an optimizer does and your statement counter counts dead code and useless compounds, your metric is consider- ably superior to any pseudo-clever one-line sed or awk rune. Why is it so darn important to get a "good" metric (and where's the meta-metric that tells me what's better? just asking.) Because, dammit, you are not really measuring software. You are measuring people. +--------------------------------+ Edward G. Nilges | Child support, tax-deductible | Princeton University | to payer AND receiver: an idea | Information Center | whose time has come. | Bitnet: EGNILGES@PUCC +--------------------------------+ (609) 258-2985
bobd@zaphod.UUCP (Bob Dalgleish) (03/26/91)
Define your accuracy requirements!
Some C statements don't show up as such because they are primarily
expressions and show up in expression lists:
for ( count = 0, f = front; f->value; f = f-> next) {
count++;
}
would show only two statements, even though six kinds of actions are
occurring.
Other arguments have been better presented for others as to what
constitutes a statement.
Prescription: choose something that is easy to count with a standard
Unix tool: i.e., wc, grep, sed, etc. Then, determine how accurate it is
for the sample code that you are using - within 5%, within 2.5%,
whatever. Is this good enough for your usage?
Don't get bogged down in what is a statement, or other metaphysical
things. Just find something that is well correlated with your needs,
and measure it cheaply. Anything else misses the point.
--
-- * * * Remember: I before E except after DALGL * * *--
Bob Dalgleish bobd@zaphod.UUCP
egnilges@phoenix.Princeton.EDU (Ed Nilges) (03/27/91)
In article <4196@zaphod.UUCP> bobd@zaphod.UUCP (Bob Dalgleish) writes: >Define your accuracy requirements! > >Some C statements don't show up as such because they are primarily >expressions and show up in expression lists: Good point, Bob. A strict metric using "statement" as defined in the Backus-Naur Form definition of C would measure the following code fragment index = 0; count = 0; for ( ; index<limit; index++ ) if ( condition ) count++; as "larger" than for ( index = 0, count = 0; index<limit; index++ ) if ( condition ) count++; The first example contains five "BNF statements": 1. index = 0; 2. count = 0; 3. for ( clause ) ... 4. if ... 5. count++ while the second example contains three. Apart from showing the ultimate folly of "measuring complexity", this shows that the metric based on BNF statements can be refined although it is vastly superior to line counts. Since in C assignment is an operator, perhaps a separate count of all expressions that use assignment should be provided. But you cannot escape the "complexity" of BNF. >Prescription: choose something that is easy to count with a standard >Unix tool: i.e., wc, grep, sed, etc. Then, determine how accurate it is >for the sample code that you are using - within 5%, within 2.5%, >whatever. Is this good enough for your usage? Within 5 percent of what? What's your independent measure of complexity? And note that yacc and lexx are also standard unix tools. >Don't get bogged down in what is a statement, or other metaphysical >things. Just find something that is well correlated with your needs, >and measure it cheaply. Anything else misses the point. There is nothing "metaphysical" about using the right tool for the job. Unless "metaphysical"=="I don't understand it".
jch@hollie.rdg.dec.com (John Haxby) (03/28/91)
In article <7547@idunno.Princeton.EDU>, egnilges@phoenix.Princeton.EDU (Ed Nilges) writes: |> Good point, Bob. A strict metric using "statement" as defined in the |> Backus-Naur Form definition of C would measure the following code |> fragment |> |> |> index = 0; |> count = 0; |> for ( ; index<limit; index++ ) if ( condition ) count++; |> |> |> as "larger" than |> |> |> for ( index = 0, count = 0; index<limit; index++ ) |> if ( condition ) count++; |> Actually, both of these contain six "terms" (I'm not sure what C calls these primative units, that's why I've put it in quotes). In both cases the terms are index = 0 count = 0 index < limit index++ condition count++ If you want a metric that doesn't depend (much) on coding style, then I suggest you count terms rather than statements. The metric has inaccuracies when you consider statements like index = count = 0; which is still one term and index++ < limit; which is also one term. There is no absolute metric for counting useful chunks of code (what does useful mean?), it's better to choose a metric and know what the limitations of that metric are than to spend weeks writing some tool (ie a parser) that counts chunks and then falls over in a heap because you have to run it through the C pre-processor first. Personally, I count semi-colons, I know it won't deal correctly with if (something-failing) print (error), exit (1); but then I'm not that bothered--I don't really regard this is more than one chunk anyway. And I know that the metric will differ between programmer styles, and I don't mind that. I don't mind these inaccuracies because I'm only after about a 10% accuracy--I want to know if two programs are more-or-less the same size, or one is half the size of the other; I want to know, roughly, what proportion of the code is non-functional (whether comments or whitespace). Mind you, that tends to be clouded by whether people include RCS change histories in their files: every change adds a minimum of three lines of comment! -- John Haxby, Definitively Wrong. Digital <jch@wessex.rdg.dec.com> Reading, England <...!ukc!wessex!jch>
EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/29/91)
In article <1991Mar28.091725.17574@hollie.rdg.dec.com>, jch@hollie.rdg.dec.com (John Haxby) writes: > >Actually, both of these contain six "terms" (I'm >not sure what C calls these primative units, that's >why I've put it in quotes). In both cases the terms >are > > index = 0 > count = 0 > index < limit > index++ > condition > count++ "Terms" don't form a well-defined syntactic category. C has expres- sions and C has statements. C doesn't have "terms". You could try to define a "term" as a "simple" expression consisting of no subexpressions but then how many of these "terms" does "a+b*c" have? One? Two? One and a half? You could build a shaky metric on expression complexity and work is available in this area, but here the metric would be very, very complex. You'd have to take into account differences in psycho- logical complexity of operators: is division more complicated than addition? > >If you want a metric that doesn't depend (much) on >coding style, then I suggest you count terms rather >than statements. The metric has inaccuracies when >you consider statements like > > index = count = 0; > >which is still one term and One and a half? > > index++ < limit; > >which is also one term. > >There is no absolute metric for counting useful >chunks of code (what does useful mean?), it's better >to choose a metric and know what the limitations of >that metric are than to spend weeks writing some tool >(ie a parser) that counts chunks and then falls over >in a heap because you have to run it through the C >pre-processor first. Huh? Run what through the C preprocessor first? I don't think you'd want to run the measured code through the PP first because your engineers will deal with the unpreprocessed code in nearly all cases. There is no absolute metric but there are lousy metrics and good metrics. As to spending weeks, some compilers will produce counts of BNF statements and you could awk/sed/grep the output of such a compiler to strip everything but the count, and voila there's your statement count. There should be a readily available skeleton C compiler in the public domain consisting of nothing more than a parser and a lexxer that people could modify to build good, solid, syntax-driven tools. Your example of "term" is a good engineer trying to do mathematics without having the time to do mathematics. Mathematicians and language designers have already cooked up the syntax of C in Backus-Naur Form and devising sed/awk tools that ignore this work is labor that may seem efficient but which actually wastes the work done by the person who first formalized the syntax of C. +--------------------------------+ Edward G. Nilges | Child support, tax-deductible | Princeton University | to payer AND receiver: an idea | Information Center | whose time has come. | Bitnet: EGNILGES@PUCC +--------------------------------+ (609) 258-2985
jch@hollie.rdg.dec.com (John Haxby) (04/11/91)
In article <12644@pucc.Princeton.EDU>, EGNILGES@pucc.Princeton.EDU (Ed Nilges) writes: |> >There is no absolute metric for counting useful |> >chunks of code (what does useful mean?), it's better |> >to choose a metric and know what the limitations of |> >that metric are than to spend weeks writing some tool |> >(ie a parser) that counts chunks and then falls over |> >in a heap because you have to run it through the C |> >pre-processor first. |> |> Huh? |> |> Run what through the C preprocessor first? I don't think you'd |> want to run the measured code through the PP first because your |> engineers will deal with the unpreprocessed code in nearly all |> cases. You can't parse C without running the source through the PP first--what about people that define things like "BEGIN" and "END" to be "{" and "}" and other less obvius syntactic munging? |> |> Your example of "term" is a good engineer trying to do mathematics |> without having the time to do mathematics. Mathematicians and |> language designers have already cooked up the syntax of C in |> Backus-Naur Form and devising sed/awk tools that ignore this |> work is labor that may seem efficient but which actually wastes |> the work done by the person who first formalized the syntax of |> C. You're right. Actually, an ex-mathematician. C remains, however, difficult to do anything at all with because, as we both pointed out, it doesn't have obviously countable chunks. Better designed or simpler languages have easy to recognize code `chunks'. -- John Haxby, Definitively Wrong. Digital <jch@wessex.rdg.dec.com> Reading, England <...!ukc!wessex!jch>
egnilges@phoenix.Princeton.EDU (Ed Nilges) (04/12/91)
In article <1991Apr11.160835.15130@hollie.rdg.dec.com> jch@hollie.rdg.dec.com (John Haxby) writes: > >|> Run what through the C preprocessor first? I don't think you'd >|> want to run the measured code through the PP first because your >|> engineers will deal with the unpreprocessed code in nearly all >|> cases. > >You can't parse C without running the source through the PP >first--what about people that define things like "BEGIN" and "END" >to be "{" and "}" and other less obvius syntactic munging? > Although this is an excellent point, I don't think our metric should concern itself with pathological uses of the C preprocessor such as you describe, or it should give a "complexity" rating of "infinity" to a program that uses the preprocessor to alter syntax. There is no need to replace, for example, left curly bracket by BEGIN and the programmer doing it should be sent to Code Camp until he forgets he ever knew Pascal :-). The metric should determine if the preprocessor is being used at the level you describe, and fail deliberately if it is. > > >You're right. Actually, an ex-mathematician. C remains, however, >difficult to do anything at all with because, as we both pointed out, >it doesn't have obviously countable chunks. Better designed or >simpler languages have easy to recognize code `chunks'. What languages? Any block-structured language with a many-many relationship between lines of code and statements is going to have hard-to-count chunks. FORTRAN and assembler will be "simpler" but does this mean that they are better designed? I maintain still you cannot "measure" complexity in any straightforward sense acceptable to the beancounter mentality.
jch@hollie.rdg.dec.com (John Haxby) (04/12/91)
In article <8176@idunno.Princeton.EDU>, egnilges@phoenix.Princeton.EDU (Ed Nilges) writes: |> > Better designed or |> >simpler languages have easy to recognize code `chunks'. |> |> What languages? Any block-structured language with a many-many |> relationship between lines of code and statements is going to have |> hard-to-count chunks. FORTRAN and assembler will be "simpler" |> but does this mean that they are better designed? |> |> I maintain still you cannot "measure" complexity in any straightforward |> sense acceptable to the beancounter mentality. If you're treating code as a cost (I hope) then the metric is probably not as simple as the number of chunks. I would hate to confuse simplicity with good design. Good designs tend to be simple, but simple designs aren't necessarily good. I think chunk counting in CLU is fairly easy: once you can decide on the chunks you are going to count. -- John Haxby, Definitively Wrong. Digital <jch@wessex.rdg.dec.com> Reading, England <...!ukc!wessex!jch>
daves@hpopd.pwd.hp.com (Dave Straker) (04/17/91)
On measuring complexity... Before you measure anything, you should ask: Why am I doing this? What am I going to do with the information? What questions will it help answer? What decisions will it help me make? Thus, for example, you might be measuring complexity in order to find defect-prone modules which you might inspect. Note the word 'help' in the above questions. Metrics don't make decisions for you. Thus, you wouldn't just blindly inspect all modules with a McCabe factor of >10, but you would look at them, and ask: Does this look like inspection would help? The final decision is human. Dave Straker Pinewood Information Systems Division (PWD not PISD) [8-{) HPDESK: David Straker/HP1600/01 Unix: daves@hpopd.pwd.hp.com
alan@tivoli.UUCP (Alan R. Weiss) (04/19/91)
In article <36650004@hpopd.pwd.hp.com> daves@hpopd.pwd.hp.com (Dave Straker) writes: >On measuring complexity... > >Before you measure anything, you should ask: Why am I doing this? >What am I going to do with the information? What questions will >it help answer? What decisions will it help me make? > >Thus, for example, you might be measuring complexity in order >to find defect-prone modules which you might inspect. > >Note the word 'help' in the above questions. Metrics don't make >decisions for you. Thus, you wouldn't just blindly inspect all >modules with a McCabe factor of >10, but you would look at them, >and ask: Does this look like inspection would help? The final >decision is human. > >Dave Straker Pinewood Information Systems Division (PWD not PISD) >[8-{) HPDESK: David Straker/HP1600/01 > Unix: daves@hpopd.pwd.hp.com YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!! YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!! YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!! YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!! YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!! YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!! YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!! YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!! YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!! YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!! Thank you. _______________________________________________________________________ Alan R. Weiss TIVOLI Systems, Inc. E-mail: alan@tivoli.com 6034 West Courtyard Drive, E-mail: alan@whitney.tivoli.com Suite 210 Voice : (512) 794-9070 Austin, Texas USA 78730 Fax : (512) 794-0623 _______________________________________________________________________
jgautier@vangogh.ads.com (Jorge Gautier) (04/22/91)
In article <36650004@hpopd.pwd.hp.com> daves@hpopd.pwd.hp.com (Dave Straker) writes: > Before you measure anything, you should ask: Why am I doing this? > What am I going to do with the information? What questions will > it help answer? What decisions will it help me make? I hope you mean not only to ask, but to answer as well. I would also add questions on validity and reliability of the measures. Using metrics without answering these questions is the worst of the pseudo-sciences that comprise "software engineering." -- Jorge A. Gautier| "The enemy is at the gate. And the enemy is the human mind jgautier@ads.com| itself--or lack of it--on this planet." -General Boy DISCLAIMER: All statements in this message are false.