[comp.software-eng] Counting semicolons

richard@iesd.auc.dk (Richard Flamsholt S0rensen) (03/13/91)

>>>>> On 11 Mar 91 09:06:47 GMT, raj@crosfield.co.uk (Ray Jones) said:
Ray> In article <1991Mar6.214157.18633@ntpal.uucp> dcavasso@ntpal.uucp (Dana Cavasso) writes:
>
>     I need a "C" code line counter program, preferably written in
>"C".  It will be used on several platforms, so solutions involving
>shell scripts and other UNIX utilities won't work.  I'm not very 
>picky (although I'd like something that did a little more than count 
>newlines :-) 

Ray> So how about one that counts semi-colons :-)

  Doesn't sound that silly after all. The only thing (somebody correct
me if I'm wrong) that "terminates" statements, are semicolons and
right braces. (Surely I know, that right braces doesn't terminate a
statement in the sense, that you can write x = 1} y = 3; . What I mean
is, that when you encounter a }, you can be at the end of a compound
statement, the only statement that's *not* terminated by a semicolon.)

 The problem is, that
	if (x == 7) {
	  y = 1;
	}
  - would count as two statements, thought that's actually quite correct;
a compound statement is a statement in itself and need not include any
statements at all inside it; here, it does include one so all in all
we have two statements.

  In a line counting program, however, you probably wouldn't want the
{}'s themselfes to count as a statement, would you? Furthermore, if we
count }'s as statements this also makes function definitions, structs,
unions, enums and initializers - who all ends with a } - look like
statements.

  Therefore, I think the idea of dismissing the compound statements
and only counting semicolons is allright (as long as you stay out of
comments, character constants and strings, of course ...) - and if I'm
wrong, somebody'll probably tell me so  ;-)

  Richard

--
/Richard Flamsholt
richard@iesd.auc.dk

frank@grep.co.uk (Frank Wales) (03/14/91)

In article <RICHARD.91Mar12184440@dompap.iesd.auc.dk> richard@iesd.auc.dk 
(Richard Flamsholt S0rensen) writes:
>The only thing (somebody correct me if I'm wrong) that "terminates"
>statements, are semicolons and right braces. 
>
> The problem is, that
>	if (x == 7) {
>	  y = 1;
>	}
>  - would count as two statements, thought that's actually quite correct;

Then it's not a problem, is it?

>Furthermore, if we
>count }'s as statements this also makes function definitions, structs,
>unions, enums and initializers - who all ends with a } - look like
>statements.

You mean you don't consider all that stuff to be statements?  If one
is attempting to make statement counts meaningful, surely they must
include all components of the program, and that must include all the above.
--
Frank Wales, Grep Limited,             [frank@grep.co.uk<->uunet!grep!frank]
Kirkfields Business Centre, Kirk Lane, LEEDS, UK, LS19 7LX. (+44) 532 500303

lws@comm.wang.com (Lyle Seaman) (03/15/91)

richard@iesd.auc.dk (Richard Flamsholt S0rensen) writes:
>  In a line counting program, however, you probably wouldn't want the
>{}'s themselfes to count as a statement, would you? Furthermore, if we
>count }'s as statements this also makes function definitions, structs,
>unions, enums and initializers - who all ends with a } - look like
>statements.

>  Therefore, I think the idea of dismissing the compound statements
>and only counting semicolons is allright (as long as you stay out of
>comments, character constants and strings, of course ...) - and if I'm
>wrong, somebody'll probably tell me so  ;-)

Well, that's a matter of interpretation.  I think compund statements
are a useful complexity indicator, and reasonably simple to count.
I guess it depends on *why* you're counting things in your code--
what your goals are, what you hope to be able to use such data for.

-- 
Lyle 	508 967 2322  		"We have had television problems directly
lws@capybara.comm.wang.com 	 attributable to something not understandable"
Wang Labs, Lowell, MA, USA 	 - unnamed believer in poltergeists

jameson@jade.uucp (Kevin Jameson) (03/16/91)

If the author figured that something (semicolons, brackets, partial boolean 
expressions, etc) was worth putting on a line by itself when the code was 
written, wouldn't it be reasonable to count it as a distinct line of code?

Surely something like this

if ((some-long-condition1
     && some-other-long-condition2)
     || third condition) {
        ...;
        ...;
}   

should count as more lines of code than something like this:

if (debug)
   ...;

EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/16/91)

In article <1991Mar15.231326.972@jade.uucp>, jameson@jade.uucp (Kevin Jameson) writes:

>If the author figured that something (semicolons, brackets, partial boolean
>expressions, etc) was worth putting on a line by itself when the code was
>written, wouldn't it be reasonable to count it as a distinct line of code?
>
>Surely something like this
>
>if ((some-long-condition1
>     && some-other-long-condition2)
>     || third condition) {
>        ...;
>        ...;
>}
>
>should count as more lines of code than something like this:

This is one of the few sensible remarks on this issue I've seen (other
than my own, of course).  While the facts remain that:

     *  "Line of code" is not a meaningful concept in a free-form
        language like C

     *  "Statement" as precisely defined by the Backus-Naur Form
        definition of the language is the only meaningful entity
        to count (if, like that character in Sesame Street, you
        Must Count)

     *  "Measuring complexity" is suspect

you are right that some sort of meaningful complexity of intent is
shown by spreading a complex if conditional over a couple of lines
of code.

But then what is more complex?

     if ( logical_cond_1 || logical_cond_2 && logical_cond_3 ) ...

or

     if ( logical_cond_1
          ||
          logical_cond_2 && logical_cond_3 ) ...

The programmer of the second example has used more lines of code...but
has, I think, more perspicuously expounded the intent of her code.  She
places the major operator on a line by itself, and displays the
fact that the logical AND has higher priority than the OR by keeping
it on the same line as its operands.

Yet a "lines of code" metric would score the first example as being
simpler and (as we all know) Simplicity is Good.

The only responsible metric is not psychological, it is mathematical:
it is counting statements according to the strict Backus-Naur Form
definition of  the language.  And to avoid scoring

     { a = b; }      /* Two statements: an assignment & a compound */

as having more statements than

     a = b;          /* One statement: an assignment */

you must do what an optimizing compiler does and detect code that adds
nothing to the "real", executable program...not only compound statements
with zero or one inner statements but also unreachable "dead" code.
But even if you do not do what an optimizer does and your statement
counter counts dead code and useless compounds, your metric is consider-
ably superior to any pseudo-clever one-line sed or awk rune.

Why is it so darn important to get a "good" metric (and where's the
meta-metric that tells me what's better?  just asking.)  Because,
dammit, you are not really measuring software.  You are measuring
people.
+--------------------------------+ Edward G. Nilges
| Child support, tax-deductible  | Princeton University
| to payer AND receiver: an idea | Information Center
| whose time has come.           | Bitnet: EGNILGES@PUCC
+--------------------------------+ (609) 258-2985

bobd@zaphod.UUCP (Bob Dalgleish) (03/26/91)

Define your accuracy requirements!

Some C statements don't show up as such because they are primarily
expressions and show up in expression lists:
	for ( count = 0, f = front; f->value; f = f-> next) {
		count++;
	}
would show only two statements, even though six kinds of actions are
occurring.

Other arguments have been better presented for others as to what
constitutes a statement.

Prescription: choose something that is easy to count with a standard
Unix tool: i.e., wc, grep, sed, etc.  Then, determine how accurate it is
for the sample code that you are using - within 5%, within 2.5%,
whatever.  Is this good enough for your usage?

Don't get bogged down in what is a statement, or other metaphysical
things.  Just find something that is well correlated with your needs,
and measure it cheaply.  Anything else misses the point.
-- 
-- * * * Remember: I before E except after DALGL * * *--
Bob Dalgleish		bobd@zaphod.UUCP

egnilges@phoenix.Princeton.EDU (Ed Nilges) (03/27/91)

In article <4196@zaphod.UUCP> bobd@zaphod.UUCP (Bob Dalgleish) writes:
>Define your accuracy requirements!
>
>Some C statements don't show up as such because they are primarily
>expressions and show up in expression lists:

Good point, Bob.  A strict metric using "statement" as defined in the
Backus-Naur Form definition of C would measure the following code
fragment

     index = 0;
     count = 0;
     for ( ; index<limit; index++ ) if ( condition ) count++;

as "larger" than                     

     for ( index = 0, count = 0; index<limit; index++ )
         if ( condition ) count++;

The first example contains five "BNF statements":

     1.  index = 0;
     2.  count = 0;
     3.  for ( clause ) ...
     4.  if ...
     5.  count++

while the second example contains three.  

Apart from showing the ultimate folly of "measuring complexity", this
shows that the metric based on BNF statements can be refined although
it is vastly superior to line counts.  Since in C assignment is an
operator, perhaps a separate count of all expressions that use
assignment should be provided.  But you cannot escape the "complexity"
of BNF.

>Prescription: choose something that is easy to count with a standard
>Unix tool: i.e., wc, grep, sed, etc.  Then, determine how accurate it is
>for the sample code that you are using - within 5%, within 2.5%,
>whatever.  Is this good enough for your usage?

Within 5 percent of what?  What's your independent measure of
complexity?

And note that yacc and lexx are also standard unix tools.

>Don't get bogged down in what is a statement, or other metaphysical
>things.  Just find something that is well correlated with your needs,
>and measure it cheaply.  Anything else misses the point.

There is nothing "metaphysical" about using the right tool for the
job.  Unless "metaphysical"=="I don't understand it".

jch@hollie.rdg.dec.com (John Haxby) (03/28/91)

In article <7547@idunno.Princeton.EDU>, egnilges@phoenix.Princeton.EDU (Ed Nilges) writes:
|> Good point, Bob.  A strict metric using "statement" as defined in the
|> Backus-Naur Form definition of C would measure the following code
|> fragment
|> 
|> 
|>      index = 0;
|>      count = 0;
|>      for ( ; index<limit; index++ ) if ( condition ) count++;
|> 
|> 
|> as "larger" than                     
|> 
|> 
|>      for ( index = 0, count = 0; index<limit; index++ )
|>          if ( condition ) count++;
|> 

Actually, both of these contain six "terms" (I'm
not sure what C calls these primative units, that's
why I've put it in quotes).  In both cases the terms
are

	index = 0
	count = 0
	index < limit
	index++
	condition
	count++

If you want a metric that doesn't depend (much) on
coding style, then I suggest you count terms rather
than statements. The metric has inaccuracies when
you consider statements like

	index = count = 0;

which is still one term and

	index++ < limit;

which is also one term.

There is no absolute metric for counting useful
chunks of code (what does useful mean?), it's better
to choose a metric and know what the limitations of
that metric are than to spend weeks writing some tool
(ie a parser) that counts chunks and then falls over
in a heap because you have to run it through the C
pre-processor first.

Personally, I count semi-colons, I know it won't deal
correctly with

	if (something-failing)
		print (error),
		exit (1);

but then I'm not that bothered--I don't really regard
this is more than one chunk anyway. And I know that
the metric will differ between programmer styles, and I
don't mind that.  I don't mind these inaccuracies because
I'm only after about a 10% accuracy--I want to know
if two programs are more-or-less the same size, or
one is half the size of the other; I want to know,
roughly, what proportion of the code is non-functional
(whether comments or whitespace).  Mind you, that tends to
be clouded by whether people include RCS change histories
in their files: every change adds a minimum of three lines of comment!
-- 
John Haxby, Definitively Wrong.
Digital				<jch@wessex.rdg.dec.com>
Reading, England		<...!ukc!wessex!jch>

EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/29/91)

In article <1991Mar28.091725.17574@hollie.rdg.dec.com>, jch@hollie.rdg.dec.com (John Haxby) writes:
>
>Actually, both of these contain six "terms" (I'm
>not sure what C calls these primative units, that's
>why I've put it in quotes).  In both cases the terms
>are
>
>        index = 0
>        count = 0
>        index < limit
>        index++
>        condition
>        count++

"Terms" don't form a well-defined syntactic category.  C has expres-
sions and C has statements.  C doesn't have "terms".

You could try to define a "term" as a "simple" expression consisting
of no subexpressions but then how many of these "terms" does "a+b*c"
have?  One?  Two? One and a half?

You could build a shaky metric on expression complexity and work is
available in this area, but here the metric would be very, very
complex.  You'd have to take into account differences in psycho-
logical complexity of operators: is division more complicated than
addition?
>
>If you want a metric that doesn't depend (much) on
>coding style, then I suggest you count terms rather
>than statements. The metric has inaccuracies when
>you consider statements like
>
>        index = count = 0;
>
>which is still one term and

One and a half?

>
>        index++ < limit;
>
>which is also one term.
>
>There is no absolute metric for counting useful
>chunks of code (what does useful mean?), it's better
>to choose a metric and know what the limitations of
>that metric are than to spend weeks writing some tool
>(ie a parser) that counts chunks and then falls over
>in a heap because you have to run it through the C
>pre-processor first.

Huh?

Run what through the C preprocessor first?  I don't think you'd
want to run the measured code through the PP first because your
engineers will deal with the unpreprocessed code in nearly all
cases.

There is no absolute metric but there are lousy metrics and
good metrics.  As to spending weeks, some compilers will produce
counts of BNF statements and you could awk/sed/grep the output
of such a compiler to strip everything but the count, and voila
there's your statement count.  There should be a readily available
skeleton C compiler in the public domain consisting of nothing
more than a parser and a lexxer that people could modify to
build good, solid, syntax-driven tools.

Your example of "term" is a good engineer trying to do mathematics
without having the time to do mathematics.  Mathematicians and
language designers have already cooked up the syntax of C in
Backus-Naur Form and devising sed/awk tools that ignore this
work is labor that may seem efficient but which actually wastes
the work done by the person who first formalized the syntax of
C.
+--------------------------------+ Edward G. Nilges
| Child support, tax-deductible  | Princeton University
| to payer AND receiver: an idea | Information Center
| whose time has come.           | Bitnet: EGNILGES@PUCC
+--------------------------------+ (609) 258-2985

jch@hollie.rdg.dec.com (John Haxby) (04/11/91)

In article <12644@pucc.Princeton.EDU>, EGNILGES@pucc.Princeton.EDU (Ed Nilges) writes:

|> >There is no absolute metric for counting useful
|> >chunks of code (what does useful mean?), it's better
|> >to choose a metric and know what the limitations of
|> >that metric are than to spend weeks writing some tool
|> >(ie a parser) that counts chunks and then falls over
|> >in a heap because you have to run it through the C
|> >pre-processor first.
|> 
|> Huh?
|> 
|> Run what through the C preprocessor first?  I don't think you'd
|> want to run the measured code through the PP first because your
|> engineers will deal with the unpreprocessed code in nearly all
|> cases.

You can't parse C without running the source through the PP
first--what about people that define things like "BEGIN" and "END"
to be "{" and "}" and other less obvius syntactic munging?

|> 
|> Your example of "term" is a good engineer trying to do mathematics
|> without having the time to do mathematics.  Mathematicians and
|> language designers have already cooked up the syntax of C in
|> Backus-Naur Form and devising sed/awk tools that ignore this
|> work is labor that may seem efficient but which actually wastes
|> the work done by the person who first formalized the syntax of
|> C.


You're right.  Actually, an ex-mathematician. C remains, however,
difficult to do anything at all with because, as we both pointed out,
it doesn't have obviously countable chunks.  Better designed or
simpler languages have easy to recognize code `chunks'.
-- 
John Haxby, Definitively Wrong.
Digital				<jch@wessex.rdg.dec.com>
Reading, England		<...!ukc!wessex!jch>

egnilges@phoenix.Princeton.EDU (Ed Nilges) (04/12/91)

In article <1991Apr11.160835.15130@hollie.rdg.dec.com> jch@hollie.rdg.dec.com (John Haxby) writes:
>
>|> Run what through the C preprocessor first?  I don't think you'd
>|> want to run the measured code through the PP first because your
>|> engineers will deal with the unpreprocessed code in nearly all
>|> cases.
>
>You can't parse C without running the source through the PP
>first--what about people that define things like "BEGIN" and "END"
>to be "{" and "}" and other less obvius syntactic munging?
>

Although this is an excellent point, I don't think our metric should
concern itself with pathological uses of the C preprocessor such as
you describe, or it should give a "complexity" rating of "infinity"
to a program that uses the preprocessor to alter syntax.  There is
no need to replace, for example, left curly bracket by BEGIN and the
programmer doing it should be sent to Code Camp until he forgets he
ever knew Pascal :-).  The metric should determine if the preprocessor
is being used at the level you describe, and fail deliberately if it
is.

>
>
>You're right.  Actually, an ex-mathematician. C remains, however,
>difficult to do anything at all with because, as we both pointed out,
>it doesn't have obviously countable chunks.  Better designed or
>simpler languages have easy to recognize code `chunks'.

What languages?  Any block-structured language with a many-many
relationship between lines of code and statements is going to have
hard-to-count chunks.  FORTRAN and assembler will be "simpler"
but does this mean that they are better designed?

I maintain still you cannot "measure" complexity in any straightforward
sense acceptable to the beancounter mentality.

jch@hollie.rdg.dec.com (John Haxby) (04/12/91)

In article <8176@idunno.Princeton.EDU>, egnilges@phoenix.Princeton.EDU (Ed Nilges) writes:
|> >                                             Better designed or
|> >simpler languages have easy to recognize code `chunks'.
|> 
|> What languages?  Any block-structured language with a many-many
|> relationship between lines of code and statements is going to have
|> hard-to-count chunks.  FORTRAN and assembler will be "simpler"
|> but does this mean that they are better designed?
|> 
|> I maintain still you cannot "measure" complexity in any straightforward
|> sense acceptable to the beancounter mentality.

If you're treating code as a cost (I hope) then
the metric is probably not as simple as the number
of chunks.

I would hate to confuse simplicity with good design.
Good designs tend to be simple, but simple designs
aren't necessarily good.

I think chunk counting in CLU is fairly easy: once you
can decide on the chunks you are going to count.
-- 
John Haxby, Definitively Wrong.
Digital				<jch@wessex.rdg.dec.com>
Reading, England		<...!ukc!wessex!jch>

daves@hpopd.pwd.hp.com (Dave Straker) (04/17/91)

On measuring complexity...

Before you measure anything, you should ask: Why am I doing this?
What am I going to do with the information? What questions will
it help answer? What decisions will it help me make?

Thus, for example, you might be measuring complexity in order
to find defect-prone modules which you might inspect.

Note the word 'help' in the above questions. Metrics don't make
decisions for you. Thus, you wouldn't just blindly inspect all
modules with a McCabe factor of >10, but you would look at them,
and ask: Does this look like inspection would help? The final
decision is human.

Dave Straker            Pinewood Information Systems Division (PWD not PISD)
[8-{)                   HPDESK: David Straker/HP1600/01
                        Unix:   daves@hpopd.pwd.hp.com

alan@tivoli.UUCP (Alan R. Weiss) (04/19/91)

In article <36650004@hpopd.pwd.hp.com> daves@hpopd.pwd.hp.com (Dave Straker) writes:
>On measuring complexity...
>
>Before you measure anything, you should ask: Why am I doing this?
>What am I going to do with the information? What questions will
>it help answer? What decisions will it help me make?
>
>Thus, for example, you might be measuring complexity in order
>to find defect-prone modules which you might inspect.
>
>Note the word 'help' in the above questions. Metrics don't make
>decisions for you. Thus, you wouldn't just blindly inspect all
>modules with a McCabe factor of >10, but you would look at them,
>and ask: Does this look like inspection would help? The final
>decision is human.
>
>Dave Straker            Pinewood Information Systems Division (PWD not PISD)
>[8-{)                   HPDESK: David Straker/HP1600/01
>                        Unix:   daves@hpopd.pwd.hp.com

 YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!!
 YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!!
 YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!!
 YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!!
 YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!!
 YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!!
 YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!!
 YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!!
 YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!!
 YESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYESYES!!

Thank you.

_______________________________________________________________________
Alan R. Weiss                           TIVOLI Systems, Inc.
E-mail: alan@tivoli.com                 6034 West Courtyard Drive,
E-mail: alan@whitney.tivoli.com	        Suite 210
Voice : (512) 794-9070                  Austin, Texas USA  78730
Fax   : (512) 794-0623
_______________________________________________________________________

jgautier@vangogh.ads.com (Jorge Gautier) (04/22/91)

In article <36650004@hpopd.pwd.hp.com> daves@hpopd.pwd.hp.com (Dave Straker) writes:
>   Before you measure anything, you should ask: Why am I doing this?
>   What am I going to do with the information? What questions will
>   it help answer? What decisions will it help me make?

I hope you mean not only to ask, but to answer as well.  I would also
add questions on validity and reliability of the measures.  Using
metrics without answering these questions is the worst of the
pseudo-sciences that comprise "software engineering."
--
Jorge A. Gautier| "The enemy is at the gate.  And the enemy is the human mind
jgautier@ads.com|  itself--or lack of it--on this planet."  -General Boy
DISCLAIMER: All statements in this message are false.