[comp.software-eng] WANTED: "C" code line counter program

dcavasso@ntpal.uucp (Dana Cavasso) (03/07/91)

     I need a "C" code line counter program, preferably written in
"C".  It will be used on several platforms, so solutions involving
shell scripts and other UNIX utilities won't work.  I'm not very 
picky (although I'd like something that did a little more than count 
newlines :-) 

     With the growing trend toward gathering metrics, I expect
such beasts are out there in force.  If you would be willing to
share your source, let me know.

-- 
Dana Cavasso                            | "A rock pile ceases to be a rock pile
dcavasso%ntpal@egsner.cirr.com          | the moment a single man contemplates 
ntpal!dcavasso@egsner.cirr.com          | it, bearing within him the image of a
...!cs.utexas.edu!egsner!ntpal!dcavasso | cathedral." - Antoine de Saint-Exupery

theo.bbs@shark.cs.fau.edu (Theo Heavey) (03/09/91)

dcavasso@ntpal.uucp (Dana Cavasso) writes:

> 
>      I need a "C" code line counter program, preferably written in
> "C".  It will be used on several platforms, so solutions involving
> shell scripts and other UNIX utilities won't work.  I'm not very 
> picky (although I'd like something that did a little more than count 
> newlines :-) 
> 
Why not use the "wc" program on the UNIX systems. It gives a line
count --- not very sophisticated BUT the source may be available
for altering!

Theo Heavey
Florida Atlantic University

EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/09/91)

In article <1991Mar6.214157.18633@ntpal.uucp>, dcavasso@ntpal.uucp (Dana Cavasso) writes:

>
>     I need a "C" code line counter program, preferably written in
>"C".  It will be used on several platforms, so solutions involving
>shell scripts and other UNIX utilities won't work.  I'm not very
>picky (although I'd like something that did a little more than count
>newlines :-)
>
>     With the growing trend toward gathering metrics, I expect
>such beasts are out there in force.  If you would be willing to
>share your source, let me know.

This request sounds innocuous: but think about it for a second.

The following nonsense C code has three lines:


     a = 1;
     a = a+c;
     a = c*7;


and the following equivalent code has one line!


     a = 1; a = a+c; a = c*7;


It gets worse.  How many lines in the following?



     if ( a = 1 )
        { a = 0; c = 1; }


Two, says the FORTRAN programmer.  But if a line is roughly equivalent
to a statement, there is only ONE line in the above...a line that
contains two lines inside it...

In short, "line" of C code is a meaningless concept.  Count STATEMENTS.
If you must count.  In the last example above, there are 3 statements.

Software engineering: computer science for people who can't program
(Edsger Dijkstra.)

raj@crosfield.co.uk (Ray Jones) (03/11/91)

In article <1991Mar6.214157.18633@ntpal.uucp> dcavasso@ntpal.uucp (Dana Cavasso) writes:
>
>     I need a "C" code line counter program, preferably written in
>"C".  It will be used on several platforms, so solutions involving
>shell scripts and other UNIX utilities won't work.  I'm not very 
>picky (although I'd like something that did a little more than count 
>newlines :-) 

So how about one that counts semi-colons :-)

Ray

-- 
- raj@cel.co.uk           - Ray Jones, Crosfield Electronics, -
- raj@crosfield.co.uk     - Hemel Hempstead, HP2 7RH  UK      -

lws@comm.wang.com (Lyle Seaman) (03/12/91)

dcavasso@ntpal.uucp (Dana Cavasso) writes:
>     I need a "C" code line counter program, preferably written in
>"C".  It will be used on several platforms, so solutions involving

Counting semi-colons is a pretty good approach, as that counts C 
statements.  Lines is kind of less meaningful.  Counting '{' is 
an interesting one, too.

-- 
Lyle 	508 967 2322  		"We have had television problems directly
lws@capybara.comm.wang.com 	 attributable to something not understandable"
Wang Labs, Lowell, MA, USA 	 - unnamed believer in poltergeists

mitch@hq.af.mil (Mitch Wright) (03/12/91)

/* 
 * On 11 Mar 91 09:06:47 GMT, 
 * raj@crosfield.co.uk (Ray Jones) said:
 * 
 */ 

>     I need a "C" code line counter program, preferably written in
>"C".  It will be used on several platforms, so solutions involving
>shell scripts and other UNIX utilities won't work.  I'm not very 
>picky (although I'd like something that did a little more than count 
>newlines :-) 

Ray> So how about one that counts semi-colons :-)

Because:

#include <stdio.h>
main()
{
  printf(";;;; Hello World ;;;;\n");
}



		:-)

--
  ..mitch

   mitch@hq.af.mil (Mitch Wright) | The Pentagon, 1B1046 | (703) 695-0262

   ``A system without PERL is like a hockey game without a fight.''
		-- Mitch Wright

bhoughto@pima.intel.com (Blair P. Houghton) (03/12/91)

In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes:
>Counting semi-colons is a pretty good approach, as that counts C 
>statements.  Lines is kind of less meaningful.  Counting '{' is 
>an interesting one, too.

{{{{{{{{printf("Oh, if I were a rich man... ;;;;;;;;;;;;;;;;;;;;;;;\n");}}}}}}}}

				--Blair
				  "Currently sleeping with my eyes open."

eversole@acae037.cadence.com (Richard Eversole; x6239) (03/12/91)

-- 
In article <MITCH.91Mar11175953@hq.af.mil>, mitch@hq.af.mil (Mitch Wright) writes:
|> >     I need a "C" code line counter program, preferably written in
|> >"C".  It will be used on several platforms, so solutions involving

|> Ray> So how about one that counts semi-colons :-)
|> 
|> Because:
|> 
|> #include <stdio.h>
|> main()
|> {
|>   printf(";;;; Hello World ;;;;\n");
|> }

But that is still only 1 semi-colon. It is very simple code to ignore
	quoted strings !!!!

Someone who knows YACC & LEX can write that in only a few lines. Take 
maybe an hour to do. (Of course counting only semi-colons not in
quoted strings using only C code would take not much more than the
same hour to code.)

Counting semi-colons is a trivial programming task.
  
  =====================================================================

    eversole@cadence.com
  
    Live long and prosper !

dave@cs.arizona.edu (Dave P. Schaumann) (03/12/91)

(had to trim alt.sources.wanted since we don't get it here...)

In article <2969@inews.intel.com> bhoughto@pima.intel.com (Blair P. Houghton) writes:
|In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes:
|>Counting semi-colons is a pretty good approach, as that counts C 
|>statements.  Lines is kind of less meaningful.  Counting '{' is 
|>an interesting one, too.
|
|{{{{{{{{printf("Oh, if I were a rich man... ;;;;;;;;;;;;;;;;;;;;;;;\n");}}}}}}}}

I think I'd have to call this a pathalogical case.  Bidee bidee bum.

I think the trouble caused by this problem is that we have no clear cut
definition of what a "line of code" is.

So what we really need is to define what we mean by "# of statements" by
setting up some rules like
	o declaration/assignment/function call is 1 statement

	o rules for counting constructs like if/while/for
	  a good idea might just count the # of statements it groups, so
	  if( foo ) S would have the same statement count as S (perhaps +1)

	o a rule for do S while() - perhaps the same as above, but you may
	  want to treat this seperately.

	o { S } has the same statement count as S

	o rules for macros, comments, function headers, etc.

This scheme seems to be workable to me, and has lots of room for fine-tuning
the results to your personal taste.  And it should be a simple matter to hack
together a fairly simple yacc grammar to parse the code.

Note that with this scheme (assuming strings are parsed correctly), the
above "pathalogical" case now has a statement count of 1, as we might
expect.

-- 
Dave Schaumann | dave@cs.arizona.edu | Short .sig's rule!
Newsgroups: alt.sources.wanted,comp.sources.wanted,comp.software-eng
Subject: Re: WANTED: "C" code line counter program
Summary: 
Expires: 
References: <1991Mar6.214157.18633@ntpal.uucp> <1991Mar11.182848.26693@comm.wang.com> <2969@inews.intel.com>
Sender: 
Followup-To: 
Distribution: 
Organization: U of Arizona CS Dept, Tucson
Keywords: 

In article <2969@inews.intel.com> bhoughto@pima.intel.com (Blair P. Houghton) writes:
|In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes:
|>Counting semi-colons is a pretty good approach, as that counts C 
|>statements.  Lines is kind of less meaningful.  Counting '{' is 
|>an interesting one, too.
|
|{{{{{{{{printf("Oh, if I were a rich man... ;;;;;;;;;;;;;;;;;;;;;;;\n");}}}}}}}}

I think I'd have to call this a pathalogical case.  Bidee bidee bum.

I think the trouble caused by this problem is that we have no clear cut
definition of what a "line of code" is.

So what we really need is to define what we mean by "# of statements" by
setting up some rules like
	o declaration/assignment/function call is 1 statement

	o rules for counting constructs like if/while/for
	  a good idea might just count the # of statements it groups, so
	  if( foo ) S would have the same statement count as S (perhaps +1)

	o a rule for do S while() - perhaps the same as above, but you may
	  want to treat this seperately.

	o { S } has the same statement count as S

	o rules for macros, comments, function headers, etc.

This scheme seems to be workable to me, and has lots of room for fine-tuning
the results to your personal taste.  And it should be a simple matter to hack
together a fairly simple yacc grammar to parse the code.

Note that with this scheme (assuming strings are parsed correctly), the
above "pathalogical" case now has a statement count of 1, as we might
expect.

-- 
Dave Schaumann | dave@cs.arizona.edu | Short .sig's rule!

pyoung@axion.bt.co.uk (Pete Young) (03/13/91)

From article <2969@inews.intel.com>, by bhoughto@pima.intel.com (Blair P. Houghton):
> In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes:
>>Counting semi-colons is a pretty good approach, as that counts C 
>>statements.  Lines is kind of less meaningful.  Counting '{' is 
>>an interesting one, too.

> {{{{{{{{printf("Oh, if I were a rich man... ;;;;;;;;;;;;;;;;;;;;;;;\n");}}}}}}}}


Tee Hee.

Good point though.

Counting lines, or semicolons, or braces is much more meaningful if
you have some kind of standard to compare your figures with. In this
instance such a standard might take the form of a set of guidelines
about the use of symbols in comments, layout and indentation of code
etc. Or even a machine to generate the code from a specification
(don't scoff too loud, it might happen one day!)

It seems to me (although I am quite prepared to admit I'm wrong) that
there are two generic questions about gathering metrics. The first is,
"what do I want to know about this
program/specification/bridge/whatever?" The second is "What can I
measure to get this information?"

Counting statements is a possible answer to the second question. So,
has the first question been satisfactorily answered? In many cases, I
suspect not. But counting lines of code is a lot easier than thinking
about useful measures of the size and complexity of a program.



  ____________________________________________________________________
  Pete Young         pyoung@axion.bt.co.uk        Phone +44 473 645054
  British  Telecom  Research Labs,  Martlesham Heath   IPSWICH IP5 7RE

cpm00@duts.ccc.amdahl.com (Craig P McLaughlin) (03/13/91)

In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes:
>dcavasso@ntpal.uucp (Dana Cavasso) writes:
>>     I need a "C" code line counter program, preferably written in
>>"C".  It will be used on several platforms, so solutions involving
>
>Counting semi-colons is a pretty good approach, as that counts C 
>statements.  Lines is kind of less meaningful.  Counting '{' is 
>an interesting one, too.
>

  Counting semi-colons may miscount setups like the one below:

    while(condition)
        do_this;

  That's two, I think. :)  What about counting newlines, but ignoring those
that immediately follow another newline (ie, skip blank lines)?

Craig McLaughlin   cpm00@duts.ccc.amdahl.com   V:(408)737-5502
   I think it's time to come up with a witty signature and disclaimer...

cml@tove.cs.umd.edu (Christopher Lott) (03/13/91)

In article <1142@caslon.cs.arizona.edu> dave@cs.arizona.edu (Dave P. Schaumann) writes:
>So what we really need is to define what we mean by "# of statements" by
>setting up some rules like
>	o declaration/assignment/function call is 1 statement
>	o rules for counting constructs like if/while/for
>	  a good idea might just count the # of statements it groups, so
>	  if( foo ) S would have the same statement count as S (perhaps +1)
>	o a rule for do S while() - perhaps the same as above, but you may
>	  want to treat this seperately.
>	o { S } has the same statement count as S
>	o rules for macros, comments, function headers, etc.


First off, remember that SLOC is not a good metric for answering hard
questions like productivity.  It's a great measure of size, though, and
cheap to compute.

Define your SLOC carefully - but I recommend that you contact the IEEE
and get their standards document on this (sorry, don't have the ref).  Anyone?
Then apply your definition carefully to projects in your environment - 
but don't even try to compare your numbers to those from another environment
until you and your fellow SLOC-counter have discussed how you defined SLOC.

For example, someone has quoted numbers from Japanese software companies.
I am told that these folks usually report their numbers in assembler-equivalent
lines - what size they believe their program would be in assembler.
This is ok, I guess, if you have an assembler available beneath your compiler
(like C) but who knows what they did for other environments?

Good luck.  A tool to do what you desire will be straightforward to build.
Lessee, are you offering consultant rates?  I might be interested  :-)  :-)

chris...
--
Christopher Lott \/ Dept of Comp Sci, Univ of Maryland, College Park, MD 20742
  cml@cs.umd.edu /\ 4122 AV Williams Bldg  301.405.2721 <standard disclaimers>

ikluft@uts.amdahl.com (Ian Kluft) (03/13/91)

In article <MITCH.91Mar11175953@hq.af.mil> mitch@hq.af.mil (Mitch Wright) writes:
>>     I need a "C" code line counter program, [...]
>
>Ray> So how about one that counts semi-colons :-)
>
>Because:
>
>#include <stdio.h>
>main()
>{
>  printf(";;;; Hello World ;;;;\n");
>}

For any simple data-gathering tool, there's always a pathological case.  You
just named one.  But counting semicolons still isn't a bad idea.

It's a little more work, but try counting uncommented, unquoted semicolons.
(Lex and yacc could be used to put that together in couple hours.  Unfortun-
ately, I don't have a couple hours to spare.  Maybe someone else can take the
idea somewhere.)  That ought to give a usable and reasonably accurate count.
-- 
#include                                                     <std-disclaimer.h>
#define      UTS                     (( Unix System V ) + ( Amdahl mainframe ))
-------------------------------------------------------------------------------
Ian Kluft                              UTS Systems Software, Amdahl Corporation
ikluft@uts.amdahl.com                                           Santa Clara, CA

jgautier@vangogh.ads.com (Jorge Gautier) (03/13/91)

In article <1991Mar6.214157.18633@ntpal.uucp> dcavasso@ntpal.uucp (Dana Cavasso) writes:
>	I need a "C" code line counter program, [...]

If you're on *nix, try "wc".  Actually, "ls -l" on the file will give
you a more accurate "metric," since the number of characters per line may
vary.  They were both written in C, I think.
--
Jorge A. Gautier| "The enemy is at the gate.  And the enemy is the human mind
jgautier@ads.com|  itself--or lack of it--on this planet."  -General Boy
DISCLAIMER: All statements in this message are false.

session@uncw.UUCP (Zack C. Sessions) (03/14/91)

cpm00@duts.ccc.amdahl.com (Craig P McLaughlin) writes:

|In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes:
||dcavasso@ntpal.uucp (Dana Cavasso) writes:
|||     I need a "C" code line counter program, preferably written in
|||"C".  It will be used on several platforms, so solutions involving
||
||Counting semi-colons is a pretty good approach, as that counts C 
||statements.  Lines is kind of less meaningful.  Counting '{' is 
||an interesting one, too.
||

|  Counting semi-colons may miscount setups like the one below:

|    while(condition)
|        do_this;

|  That's two, I think. :)  What about counting newlines, but ignoring those
|that immediately follow another newline (ie, skip blank lines)?

|Craig McLaughlin   cpm00@duts.ccc.amdahl.com   V:(408)737-5502

Counting newlines may not be the way to go either. It is perfectly
legitimate for a statement to span multiple source lines. Take a
complex if() condition, for example, which for readability, you
span a few lines with it. A true C source line counter would almost
have to be the front end to a full compiler.

Zack Sessions
session@uncw.UUCP

EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/14/91)

In article <1991Mar12.163607.18799@axion.bt.co.uk>, pyoung@axion.bt.co.uk (Pete Young) writes:
>
>Counting statements is a possible answer to the second question. So,
>has the first question been satisfactorily answered? In many cases, I
>suspect not. But counting lines of code is a lot easier than thinking
>about useful measures of the size and complexity of a program.

I'd like to make a philosophical point.  Since it questions the
foundations of software engineering, busy managers and engineers
may wish to skip this post, or save it for later perusal.

Much of software engineering seems to be a doomed attempt to
measure complexity.  I pointed out in a previous post that if
you count lines of code in a C program you may be misapplying a
model valid in the one-statement-per-line FORTRAN world but invalid
in the world of C, since C statements may be many per line, may be
spread over many lines, and may even contain other statements.

So the best metric is to count statements, and given a good yacc
description of C and a lexical analyser such a tool is
easy to develop.

But even if you have measured statements, have you measured
complexity?  A program written in simple French (using French
rather than English identifiers) is "too complex to maintain"
when given to an American programmer since there is a 99.99%
probability that such a programmer has never learned French.
Complexity is a PSYCHOLOGICAL property.

I suggest that complexity PRECEDES measurement.

Instead of measuring complexity and appealing to your programmers
to reduce the complexity of their code, I respectfully submit
that software engineers concentrate on providing the environment
wherein the programmers can write CORRECT code...at whatever
level of complexity is appropriate.  If this means that software
engineers spend their time designing good PHYSICAL environments
for programmers, determining the best location of vending machines
and making certain that programmers have privacy and interaction
when necessary, then such activity would be more productive than
what goes under the rubric of software engineering.

Although there certainly is a lot of crud code out there, much of
it is not "overly complex".  A lot of it is overly simple, not
capturing the problem requirements.  And a lot of it is wrong.

Telling a programmer who has written a correct program that is
complicated according to your metric is peculiarly offensive.  It
reminds me of the critics in the movie Amadeus, who tell the
young Mozart that his music has "too many notes."  Not all program-
mers are Mozarts, but let's not go in the opposite direction,
which I believe abuses a valuable economic resource (the nation's
programming workforce) by being concerned that its members jump
over artificial and ill-considered hoops represented by software
metrics.

rar@saturn.ads.com (Bob Riemenschneider) (03/14/91)

In article <12583@pucc.Princeton.EDU> EGNILGES@pucc.Princeton.EDU (Ed Nilges) writes:

=>   ...  A program written in simple French (using French
=>   rather than English identifiers) is "too complex to maintain"
=>   when given to an American programmer since there is a 99.99%
=>   probability that such a programmer has never learned French.
=>   Complexity is a PSYCHOLOGICAL property.

A non sequitur if I've ever seen one!  All you've shown is that a
program that isn't complex may still be hard to understand if, e.g.,
identifiers are poorly chosen.  Everyone knows that.

=>   ... If this means that software
=>   engineers spend their time designing good PHYSICAL environments
=>   for programmers, determining the best location of vending machines
=>   and making certain that programmers have privacy and interaction
=>   when necessary, then such activity would be more productive than
=>   what goes under the rubric of software engineering. ...

Optimizing the physical environment is important, and a great deal
of work has gone into figuring out what the best environment is --
from serious ergonomic studies to more informal studies, such as
DeMarco and Lister's _Peopleware_.  It doesn't follow that complexity
metrics are useless.  Programming is hard for many reasons, and different
people have chosen to address different limited sets of reasons.  This
is called "separation of concerns".

=>   ... Telling a programmer who has written a correct program that is
=>   complicated according to your metric is peculiarly offensive. ...

Obviously, correctness is a desirable property of programs.  But it's
not the only desirable property.  In the "real world", where far more
is spent on maintaining the average program than on developing it --
and the maintenance is usually done by someone other than the original
programmer -- simplicity is also a very desirable property.  In many
cases, a correct but overly complex program is useless.  What do you
find so offensive in asking a programmer to make a useless program
more useful?

=>   ... [Software metrics are] artificial and ill-considered hoops [that
=>   programmers are forced to jump through].

While some metrics are, no doubt, fairly worthless, others seem to
correlate strongly with our intuitive notion of complexity.  (This
isn't guesswork based on personal prejudice.  Studies have been
performed to determine how well complexity measures correlate with
understandability -- see, e.g., the proceedings of the "Empirical
Studies of Programmers" workshops.)  They may not be perfect, but
they're better than nothing, so people use them.

							-- rar

jeffv@bisco.kodak.com (Jeff Van Epps) (03/14/91)

I suppose what one really wants to know is: "how much code is there", i.e.
what is the quantity of effective instructions?

Use the size of the object (*.o) file.   :-)

-- 
If the From: line says nobody@kodak.com, don't believe it.

    Jeff Van Epps          jeffv@bisco.kodak.com
                           rochester!kodak!bisco!jeffv

mwb@ulysses.att.com (Michael W. Balk) (03/14/91)

In article <1991Mar11.182848.26693@comm.wang.com>, lws@comm.wang.com (Lyle Seaman) writes:
> dcavasso@ntpal.uucp (Dana Cavasso) writes:
> >     I need a "C" code line counter program, preferably written in
> >"C".  It will be used on several platforms, so solutions involving
> 
> Counting semi-colons is a pretty good approach, as that counts C 
> statements.  Lines is kind of less meaningful.  Counting '{' is 
> an interesting one, too.


If you just count semi-colons, then in for-loops such as

	for(i = 0; i < 10; i++)
	  {
	     ...
	  }


i = 0; and i < 10; will be counted as individual statements.
In fact they are, but if you want to count for( ... ) as a single statement
then count the semi-colons and correct the count by subtracting 1 for every
for-statement.  There might be other cases like this that you may want to
consider.  Then again, in most cases this is just probably nit-picking.

dlee@pallas.athenanet.com (Doug Lee) (03/14/91)

In article <dcda02id05Q.01@JUTS.ccc.amdahl.com> cpm00@DUTS.ccc.amdahl.com (PUT YOUR NAME HERE) writes:
>What about counting newlines, but ignoring those
>that immediately follow another newline (ie, skip blank lines)?

My first thought was to skip all comments (single- and multi-line) and then
count only lines containing characters other than whitespace.  This should
be close, though it will still overcount on constructs like
    if (( <long_condition_1> ) ||
        ( <long_condition_2> ) ||
        ... )
Then again, maybe a line that long *should* count as more than one line.  We
also run into the somewhat common declaration syntax
    char *
    foo()
which, by my method, counts as two lines.

Unfortunately, I see no quick way to give a consistent line count regardless
of program syntax.  Counting lines ending in '{' or ';' (after removing
comments and trailing whitespace) would catch most loops and function
definitions without counting them more than once, but constructs like
    while (line = get_next_line(file))
        (void) process_line(line);
would still count only once unless the braces were included (not a bad idea,
imho).  We need a more precise definition of "line" for this, I fear.

Does this remind anyone else of _The Mythical Man Month_?  :-)

-- 
Doug Lee  (dlee@athenanet.com or {bradley,uunet}!pallas!dlee)

ksh@ai.mit.edu (K. Shane Hartman) (03/14/91)

>For example, someone has quoted numbers from Japanese software companies.
>I am told that these folks usually report their numbers in assembler
>equivalent lines - what size they believe their program would be in assembler.
>This is ok, I guess, if you have an assembler available beneath your compiler
>(like C) but who knows what they did for other environments?

Counting assembler equivalents is useless given optimizing compilers
of varying abilities.  Better (functional) metrics such as function
points and feature points have been around for a while.  Functional
metrics can be applied to 'languages' which have no lines of code
(code generation from design diagrams for example).

-[Shane]->

dalamb@avi.umiacs.umd.edu (David Lamb) (03/15/91)

In article <1031@pallas.athenanet.com> dlee@pallas.athenanet.com (Doug Lee) writes:
>...  We need a more precise definition of "line" for this, I fear.

An excellent comment.  People interested in size metrics such as
lines-of-code and number-of-tokens should read

%A T. Capers Jones
%T Programming Productivity
%I McGraw-Hill
%C New York
%D 1986

It has a very good discussion on the flaws of size metrics (especially
lines-of-code), but also how to get the most value out of them despite
the flaws.



--

David Alex Lamb				internet: dalamb@umiacs.umd.edu

lws@comm.wang.com (Lyle Seaman) (03/15/91)

bhoughto@pima.intel.com (Blair P. Houghton) writes:

>In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes:
>>Counting semi-colons is a pretty good approach, as that counts C 
>>statements.  Lines is kind of less meaningful.  Counting '{' is 
>>an interesting one, too.

>{{{{{{{{printf("Oh, if I were a rich man... ;;;;;;;;;;;;;;;;;;;;;;;\n");}}}}}}}}

Yeah, but that could just as easily be written:
{
{
{
{
{
{
{
{
printf
(
"Oh, if I were a rich man... ;;;;;;;;;;;;;;;;;;;;;;;\n"
)
;
}
}
}
}
}
}
}
}

So either simple approach is susceptible to intentional obfuscation
(but then, most such schemes are).  No one claimed that counting semis 
and curlies was foolproof.  You've demonstrated that it isn't.  On the
other hand, seasoned coders don't usually use ; and } to such excess.
(Yes, there are *occasional* duplicates).

However, they do usually include quite a few redundant newlines.  Comments,
preprocessor directives and white space are very common, and apparently
the original poster didn't wish to count them.  I stand by my suggestion.

-- 
Lyle 	508 967 2322  		"We have had television problems directly
lws@capybara.comm.wang.com 	 attributable to something not understandable"
Wang Labs, Lowell, MA, USA 	 - unnamed believer in poltergeists

EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/15/91)

In article <9122@suns6.crosfield.co.uk>, raj@crosfield.co.uk (Ray Jones) writes:

>In all the silly examples of multiple semicolons, nobody has mentioned
>the obvious one;
>
>        for(i=0; i<end; i++)
>
>How many statements? One, surely.

COME ON, people!  Use the Backus Naur Form definition of the language, ad
printed in The C Programming Language, second edition , p. 234.  Here we
read that a statement of the type "iteration statement" is (among other
things)


     for ( optexp; optexp; optexp ) statement



which means that
>
>        for(i=0; i<end; i++)

is NOT "one, surely": it is not a well-formed statement.
+--------------------------------+ Edward G. Nilges
| Child support, tax-deductible  | Princeton University
| to payer AND receiver: an idea | Information Center
| whose time has come.           | Bitnet: EGNILGES@PUCC
+--------------------------------+ (609) 258-2985

kenr@cruise.cc.rochester.edu (Kenneth C. Rich) (03/15/91)

In article <9082@suns6.crosfield.co.uk> raj@crosfield.co.uk (ray a jones) writes:
>In article <1991Mar6.214157.18633@ntpal.uucp> dcavasso@ntpal.uucp (Dana Cavasso) writes:
>>
>>     I need a "C" code line counter program, preferably written in
>
>So how about one that counts semi-colons :-)

Seriously I emailed him a sh script that counted ';', '}' and '#',
to count simple statements, blocks of statements, and cpp directives.  !-)
Sorry, I discarded it.  !-(
It was a one liner, too, so it could be an alias in csh.  !-)
A nice improvement would be to make sure to not count them
inside quotes and comments...
--
-ken rich                   -=!=-                   kenr@cc.rochester.edu

jpc@fct.unl.pt (Jose Pina Coelho) (03/18/91)

In article <1991Mar14.192419.1576@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes:

   bhoughto@pima.intel.com (Blair P. Houghton) writes:

   >In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes:
   >>Counting semi-colons is a pretty good approach, as that counts C 
   >>statements.  Lines is kind of less meaningful.  Counting '{' is 
   >>an interesting one, too.

   >{{{{{{{{printf("Oh, if I were a rich man... ;;;;;;;;;;;;;;;;;;;;;;;\n");}}}}}}}}

[... How to fool char counters ...]

Why not compile the sources and checking the size of object code ?

--
Jose Pedro T. Pina Coelho   | BITNET/Internet: jpc@fct.unl.pt
Rua Jau N 1, 2 Dto          | UUCP: ...!mcsun!unl!jpc
1300 Lisboa, PORTUGAL       | Home phone: (+351) (1) 640767

- If all men were brothers, would you let one marry your sister ?

boyd@necisa.ho.necisa.oz.au (Boyd Roberts) (03/18/91)

What about?

    $ grep -c '[{});]$' *.c

Or?

    $ egrep -c '(^#)|([{});]$)' *.c


Boyd Roberts			boyd@necisa.ho.necisa.oz.au

``When the going gets wierd, the weird turn pro...''

carl@p4tustin.UUCP (Carl W. Bergerson) (03/19/91)

dcavasso@ntpal.uucp (Dana Cavasso) writes:
> 
>      I need a "C" code line counter program, preferably written in
> "C".  It will be used on several platforms, so solutions involving
> shell scripts and other UNIX utilities won't work.  I'm not very 
> picky (although I'd like something that did a little more than count 
> newlines :-) 

In the October or November issue of Unix World the Wizard's Grabbag column
contained three programs for removing comments from C and C++ code. I
believe that one of them was in C.

Once you have the comments removed, you can use the wc program that is
listed in "Software Tools in Pascal" by Kernighan and (memory fails me).
Translating to C shouldn't be all that difficult.
-- 
Carl Bergerson                                           uunet!p4tustin!carl  
Point 4 Data Corporation                                     carl@point4.com
15442 Del Amo Avenue                                   Voice: (714) 259 0777
Tustin, CA 92680-6445                                    Fax: (714) 259 0921

hammes@dill.informatik.uni-kl.de (Stefan Hammes (HiWi Mattern)) (03/20/91)

In article <JPC.91Mar17162220@terra.fct.unl.pt>, jpc@fct.unl.pt (Jose
Pina Coelho) writes:
|>
|>In article <1991Mar14.192419.1576@comm.wang.com> lws@comm.wang.com
(Lyle Seaman) writes:
|>
|>   bhoughto@pima.intel.com (Blair P. Houghton) writes:
|>
|>   >In article <1991Mar11.182848.26693@comm.wang.com>
lws@comm.wang.com (Lyle Seaman) writes:
|>   >>Counting semi-colons is a pretty good approach, as that counts C 
|>   >>statements.  Lines is kind of less meaningful.  Counting '{' is 
|>   >>an interesting one, too.
|>
|>   >{{{{{{{{printf("Oh, if I were a rich man...
;;;;;;;;;;;;;;;;;;;;;;;\n");}}}}}}}}
|>
|>[... How to fool char counters ...]
|>
|>Why not compile the sources and checking the size of object code ?

Because this is very machine, compiler and linker dependent!

Stefan
                      
+---------------------------------------+-------------------------------------+
| Stefan Hammes                         | e-Mail: hammes@informatik.uni-kl.de |
| FB Informatik, SFB 124-D1             +-------------------------------------+
| Universitaet Kaiserslautern, P.O.Box 3049, D-W6750 Kaiserslautern, Germany  |
+-------------+------------------------------------------------+--------------+
              | Language definition: Recursion - see recursion |
              +------------------------------------------------+

hsrender@happy.colorado.edu (03/21/91)

In article <4816@berry19.UUCP>, crocker@motcid.UUCP (Ronald T. Crocker) writes:
> Ok, so back to the question at hand: does anyone have a C program that
> counts C lines of code.  I have a sed(1) script that will do it (I'll
> post it if anyone is interested), but that isn't what is wanted.  The
> following lex program "almost" works (It doesn't handle quoted strings
> quite correctly, and it counts lines that consist of only comments as
> lines of code...; it is an exercise for the reader to modify the
> program ;-> )

There's a set of tools filed under 'metrics' under the comp.sources.unix 
archive on gatekeeper.dec.com.  Included is a tool called kdsi for counting
C LOC.  There's also a stripcom tool that strips comments out of C programs.
I don't know what volume of the archive it's under, sorry.  All of this
stuff is available via anonymous ftp to gatekeeper or any other of the
comp.sources.unix archive sites.

hal.

EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/21/91)

In article <4816@berry19.UUCP>, crocker@motcid.UUCP (Ronald T. Crocker) writes:

>
>Get off of this guy's back already.  He wants a simple C program to
>count C lines of code.  Whether LOC is a valid metric is not the
>question at hand; LOC is, generally speaking, a good heuristic
>measurement of the size of a program.  It is not the only one, nor is
>it the best. As with any code-based metric, its value can be skewed by
>"non-conforming" code; you know, GIGO. Let him worry about ensuring
>the validity of the information, just get him a tool to help out
>counting lines of code.

A metric that counts statements is much better than an inaccurate
heuristic and a tool to do so, while not trivial, is easily
generated using yacc and lexx.  I believe Richard Slomka is the
author of a book, "No Nonsense Management", in which he declares
that the fact that you cannot obtain perfect numbers does not mean
that you should stop searching for better numbers.

A crude line counter has negative worth.  It will assess programs
that use white space to clarify logic as "more complicated" than
programs that look like a burst of line noise.

I'm not "on the guy's back".  But if you're going to measure, do
it right.  This whole discussion shows why Dijkstra defines
software engineering as "computer science for people who can't
program."
+--------------------------------+ Edward G. Nilges
| Child support, tax-deductible  | Princeton University
| to payer AND receiver: an idea | Information Center
| whose time has come.           | Bitnet: EGNILGES@PUCC
+--------------------------------+ (609) 258-2985

jgautier@vangogh.ads.com (Jorge Gautier) (03/21/91)

In article <1991Mar20.114450.19653@rhrk.uni-kl.de> hammes@dill.informatik.uni-kl.de (Stefan Hammes (HiWi Mattern)) writes:
>   |>[... How to fool char counters ...]
>   |>
>   |>Why not compile the sources and checking the size of object code ?
>
>   Because this is very machine, compiler and linker dependent!

So is the size of the source code.

jpc@fct.unl.pt (Jose Pina Coelho) (03/21/91)

In article <1991Mar20.114450.19653@rhrk.uni-kl.de>
	hammes@dill.informatik.uni-kl.de (Stefan Hammes (HiWi Mattern)) writes:
   In article <JPC.91Mar17162220@terra.fct.unl.pt>, jpc@fct.unl.pt (Jose
   Pina Coelho) writes:
   |>
   |>In article <1991Mar14.192419.1576@comm.wang.com> lws@comm.wang.com
   (Lyle Seaman) writes:
   |>
   |>
   |>[... How to fool char counters ...]
   |>
   |>Why not compile the sources and checking the size of object code ?

   Because this is very machine, compiler and linker dependent!


It's probably the best [better fiability/work ratio ] bet, you can
compare by compiling all programs on the same architecture with the
same compiler.

Another bet [machine, compiler and linker independent]:

	Get a C grammar and count tokens.



On the other hand, you need a method that is programer-independent because:
	- Programer A came from Fortran, he isn't used to:
	  - ``Creative'' for's :-) .
	  - Recursivity.
        - Programer B came from C
	
B is going to produce the same funcionality with a lower ammount of
code.




--
Jose Pedro T. Pina Coelho   | BITNET/Internet: jpc@fct.unl.pt
Rua Jau N 1, 2 Dto          | UUCP: ...!mcsun!unl!jpc
1300 Lisboa, PORTUGAL       | Home phone: (+351) (1) 640767

- If all men were brothers, would you let one marry your sister ?

rlandsma@bbn.com (Rick Landsman) (03/21/91)

Sorry if this has already been mentioned.

We have been using a "C" SLOC counting tool available via anonymous
FTP from uunet.uu.net comp.sources.unix archives. I believe it is
available in volume20,21 or 22 but interested parties should get the
Index and search it for "metrics" to see where the actual package is
located in the archive.

Not only does it generate slocs (we modified it to count ASSY lines as
well), it generates McCabe and Halstead complexity metrics for C
modules. We have found it very useful.

regards rick landsman - bbn

Rick Landsman
address: uunet.uu.net!bbn.com!rlandsman
         rlandsman@bbn.com

chip@tct.uucp (Chip Salzenberg) (03/25/91)

According to dww@math.fu-berlin.de (Debora Weber-Wulff):
>(GULP!) ~20 LOC/person/day. This was the number management
>wanted to see, it made their day when we were able to give
>them their little number. So it goes.

Management education is a part of a real programmer's job.

If you hand them a meaningless number, you do them and yourself a
disservice -- even if they think they want it.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
   "All this is conjecture of course, since I *only* post in the nude.
    Nothing comes between me and my t.b.  Nothing."   -- Bill Coderre

louk@tslwat.UUCP (Lou Kates) (03/26/91)

In article <12609@pucc.Princeton.EDU# EGNILGES@pucc.Princeton.EDU writes:
#In article <4816@berry19.UUCP#, crocker@motcid.UUCP (Ronald T. Crocker) writes:
#
##
##Get off of this guy's back already.  He wants a simple C program to
##count C lines of code.  Whether LOC is a valid metric is not the
##question at hand; LOC is, generally speaking, a good heuristic
#
#A metric that counts statements is much better than an inaccurate
#heuristic and a tool to do so, while not trivial, is easily
#generated using yacc and lexx.
#
#A crude line counter has negative worth.  It will assess programs
#that use white space to clarify logic as "more complicated" than
#programs that look like a burst of line noise.
#
#I'm not "on the guy's back".  But if you're going to measure, do
#it right.  This whole discussion shows why Dijkstra defines
#software engineering as "computer science for people who can't
#program."

All the popular measures are highly correlated (over 90% positive
correlation in  the study  I saw)  so for most purposes it really
doesn't matter what you use.

Lou Kates, Teleride Sage Ltd., louk%tslwat@watmath.waterloo.edu

EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/26/91)

In article <350@tslwat.UUCP>, louk@tslwat.UUCP (Lou Kates) writes:

>In article <12609@pucc.Princeton.EDU# EGNILGES@pucc.Princeton.EDU writes:
>#In article <4816@berry19.UUCP#, crocker@motcid.UUCP (Ronald T. Crocker) writes:
>#
>##
>##Get off of this guy's back already.  He wants a simple C program to
>##count C lines of code...
>#
>#A crude line counter has negative worth.  It will assess programs
>#that use white space to clarify logic as "more complicated" than
>#programs that look like a burst of line noise.
>
>All the popular measures are highly correlated (over 90% positive
>correlation in  the study  I saw)  so for most purposes it really
>doesn't matter what you use.

This sounds impressive but it is quite vague.  "Correlated" with
what?  Each other?  But if any number of measures "correlate"
with each other, what does THAT mean?  Are they GOOD measures?
Enquiring minds want to know.

And the susceptibility of software systems, so emphasized by Tony
Hoare in his critique a few years ago of Strategic Defense Initiative,
to outlier events and chaotic behavior (captured in such gnomic
statements as "ten percent of the code causes ninety percent of the
bugs") means that a 90 percent correlation is not so good after
all (assuming we know what it means.)

Case in point: program A falls thru the cracks as a "noncomplicated
program" because it has only ONE line.  It fails.  The
programmer assigned to fix it NOW opens it up and finds that
this "noncomplicated" program is one, 4096 byte, line of C
code containing a thousand statements.  The programmer is unable
to get it working and is sacked for not being competent enough
to fix "simple" code (don't laugh: this stuff happens.)  The
Exchange opens up on the following Monday, the one-line program
starts a wild run on cocoa which translates into a generalized
financial panic.  So much for "metrics".
+--------------------------------+ Edward G. Nilges
| Child support, tax-deductible  | Princeton University
| to payer AND receiver: an idea | Information Center
| whose time has come.           | Bitnet: EGNILGES@PUCC
+--------------------------------+ (609) 258-2985

larryd@toby.bnr.ca (Larry Dunkelman) (03/27/91)

Once a metric is published, programmers can easily find ways (if they want
to) to fool the system.  If management wants to measure bugs/lines of code,
then programmers could just write lengthy programs.  If management wants
to measure productivity as lines of code/man month, then again, programmers
could "pad" their programs.

In my experience this does not happen.  We have been using a simple "C" code
line counter which does exactly that -- counts lines of code (not statements).
A source line is either blank, a comment or a line of code.  That's it.
If a programmer splits an assignment statement over three source lines, then
it will be counted as three lines of code.  If he writes 5 statements on the
same line, then they will be counted as one line of code.

In practice this works fine since *most* programmers are reasonable about
how they "format" their code.  

Our counter has an option to ignore lines with just a "{" or "}", but in
the final analysis, even that option doesn't make much difference.

Maybe a way to rid ourselves of the degenerate cases being discussed lately
is to run the programs through pretty printers before counting.  

I am not a great defender of Lines of Code (or Kilo Lines Of Code - KLOC)
as a good metric.  There are cases where a designer has had a negative
productivity since he managed to reduce the size of the program and add
functionality at the same time.  BUT, until someone comes up with a better
metric, LOCs are what management will ask for.

Larry
larryd!bnrmtl@larry.mcrcim.mcgill.edu

louk@tslwat.UUCP (Lou Kates) (03/27/91)

In article <12632@pucc.Princeton.EDU# EGNILGES@pucc.Princeton.EDU writes:
#In article <350@tslwat.UUCP#, louk@tslwat.UUCP (Lou Kates) writes:
#
##In article <12609@pucc.Princeton.EDU# EGNILGES@pucc.Princeton.EDU writes:
###In article <4816@berry19.UUCP#, crocker@motcid.UUCP (Ronald T. Crocker) writes:
###
####
####Get off of this guy's back already.  He wants a simple C program to
####count C lines of code...
###
###A crude line counter has negative worth.  It will assess programs
###that use white space to clarify logic as "more complicated" than
###programs that look like a burst of line noise.
##
##All the popular measures are highly correlated (over 90% positive
##correlation in  the study  I saw)  so for most purposes it really
##doesn't matter what you use.
#
#This sounds impressive but it is quite vague.  "Correlated" with
#what?  Each other?  But if any number of measures "correlate"
#with each other, what does THAT mean?  
#

For every pair  of measures X and  Y in the group under study the
correlation of X and  Y exceeds  90%. Actually the minimum  might
have been higher than this. I'm just going by memory.

Lou Kates, Teleride Sage Ltd., louk%tslwat@watmath.waterloo.edu

Lee Sailer <UH2@psuvm.psu.edu> (03/28/91)

Did the original poster ever get a C LOC counter?  You could probably
build one based on the GNU indent program.  Indent is a C pretty printer,
like cb only more so.

                     lee

frank@grep.co.uk (Frank Wales) (03/28/91)

In article <12630@pucc.Princeton.EDU> EGNILGES@pucc.Princeton.EDU writes:
>Case in point: program A falls thru the cracks as a "noncomplicated
>program" because it has only ONE statement.  It fails.  The
>programmer assigned to fix it NOW opens it up and finds that
>this "noncomplicated" program is one, 4096 byte, line of C
>code containing a thousand statements.

4096 bytes on a line?  Feh, kid's stuff.  :-)

Cases like this (fictitious or not) seem to me to highlight
the problems of metrics like "lines of code" or "statement
counts", because these are predicated on the notion that statements
or lines are uniformly complex, fundamental pieces from which
programs are built.  It's hard for me to accept such a notion
for anything other than assembly language.

Quite apart from the difficulties associated with inter-language
comparisons, is there any work on enumerating programs at the token
level, a sort of "component count" metric?  This, at least, would
be an honest assessment of the fundamental syntactic pieces,
whatever that might be worth.  (Yes, I'm a "metric sceptic".)
--
Frank Wales, Grep Limited,             [frank@grep.co.uk<->uunet!grep!frank]
Kirkfields Business Centre, Kirk Lane, LEEDS, UK, LS19 7LX. (+44) 532 500303

EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/30/91)

In article <1991Mar28.145235.14313@grep.co.uk>, frank@grep.co.uk (Frank Wales) writes:

>
>Cases like this (fictitious or not) seem to me to highlight
>the problems of metrics like "lines of code" or "statement
>counts", because these are predicated on the notion that statements
>or lines are uniformly complex, fundamental pieces from which
>programs are built.  It's hard for me to accept such a notion
>for anything other than assembly language.

Even if you counted lexemes (these are the things that the "lexi-
cal analyzer" recognizes, including identifiers, constants,
operators and so on), you'd not measure complexity fully.
There are two reasons for this: one psychological and one
mathematical.

The psychological reason?  Different lexemes have different
instrinsic complexity, considering complexity as a property of
our minds.  The "+" sign is less complex, perhaps, than the "-"
sign.  The trouble with this is that psychological measurements
typically ignore enormous individual differences in perception
of complexity.   Niklaus Wirth has pointed out that whenever
you add two positive numbers (a seemingly simple operation)
you are in danger of overflow: does this mean that addition
of two positive integers is more complex than subtraction
two positive integers where neither overflow nor underflow
can occur (0-(2**31-1) in twos-complement is the worst case)?

The mathematical reason?  The total number of lexemes increases
complexity, but so does the ways in which those lexemes are
COMBINED.   The structured statement a+b is simply a more
complex artifact than the unordered, unstructured SET of
lexemes {a,b,+}.  Both have the "metric" 3.
+--------------------------------+ Edward G. Nilges
| Child support, tax-deductible  | Princeton University
| to payer AND receiver: an idea | Information Center
| whose time has come.           | Bitnet: EGNILGES@PUCC
+--------------------------------+ (609) 258-2985

locke@paradyne.com (Richard Locke) (04/02/91)

In article <1991Mar28.145235.14313@grep.co.uk> frank@grep.co.uk (Frank Wales) writes:
>In article <12630@pucc.Princeton.EDU> EGNILGES@pucc.Princeton.EDU writes:

There has been a lot of work done in software metrics that addresses
the topics & concerns that have been discussed (extensively) in this thread.

>>Case in point: program A falls thru the cracks as a "noncomplicated
>>program" because it has only ONE statement.  It fails.  The
...
>...is there any work on enumerating programs at the token
>level, a sort of "component count" metric?  This, at least, would
>be an honest assessment of the fundamental syntactic pieces,
>whatever that might be worth.  (Yes, I'm a "metric sceptic".)

The Halstead "Software Science" family of metrics are based on the number
of operators and operands.  They give you some sense of a body of code's
"size".  The other major metrics family, McCabe's "Cyclomatic Complexity"
metrics, asseses complexity based on a program's flow of control.

Both these classes of metrics help evaluate "size" and "complexity", and
address the "but what about *this* program!" cases that have been kicked
around.

The best introduction these metrics of which I'm aware is in 
"A Software Metrics Tutorial", which is included with the UX-Metric
or PC-Metric tools that Set Labs sells.  Unfortunately, I don't
believe it's available anywhere else.  These XX-Metric tools
read a body of code and spit out various metrics including McCabe
and Halstead stuff, LOC, semi-colons, comment lines, estimated
number of errors, estimated time to develop, etc.
They are cheap, too -- cheaper to your company than having a programmer
mess around for a couple of days coding and perfecting a line counter ;-)

Anyway, I suggest that people interested in metrics, but unfamiliar with
the McCabe and Halstead metrics, do some investigation.

If you're more interested in software productivity than software complexity,
I further suggest that you get and read Jones' "Programming Productivity".
This has an excellent discussion as to why LOC is bad for productivity
measures.


-dick "just a satisfied customer of UX-Metric"

p.s.  Send me mail if you want pointers or references to the stuff I mention...

rlandsma@bbn.com (Rick Landsman) (04/05/91)

I have mentioned this before but for those that missed it, there is a
"C" package called metrics in volume 20 of the comp.sources.unix
archive available via anonymous ftp from uunet.uu.net that generates
both McCabe and Halstead metrics from "C" files. We have found it
useful as a way to determine POTENTIAL software functions for
re-design to reduce ERROR PRONENESS :-) due to overly complex control
and syntactical complexity. This is one of the tools we use for defect
reduction by targetting error prone modules.

There is documentation included with the package that discusses some
of the Halstead volume and other metrics used by the author.

Rick Landsman
address: uunet.uu.net!bbn.com!rlandsman
                    OR
         rlandsman@bbn.com

klb@unislc.uucp (Keith L. Breinholt) (04/05/91)

In article <1991Mar28.145235.14313@grep.co.uk> frank@grep.co.uk (Frank Wales) writes:
>4096 bytes on a line?  Feh, kid's stuff.  :-)
>
>Cases like this (fictitious or not) seem to me to highlight
>the problems of metrics like "lines of code" or "statement
>counts", because these are predicated on the notion that statements
>or lines are uniformly complex, fundamental pieces from which
>programs are built.  It's hard for me to accept such a notion
>for anything other than assembly language.

Which is why a combination of measures is important.  Lines of code is
only meant to measure complexity due to size.  Some other measures of
size complexity are, number of unique identifiers, keyword statements,
# of operators, .....

Measures of control flow complexity are McCabe's cyclomatic complexity
or essential complexity, branch statement size (operators not
characters) and so on.

>Quite apart from the difficulties associated with inter-language
>comparisons, is there any work on enumerating programs at the token
>level, a sort of "component count" metric?  This, at least, would
>be an honest assessment of the fundamental syntactic pieces,
>whatever that might be worth.  (Yes, I'm a "metric sceptic".)

Key word statements and unique or total identifier count is in the
ballpark of what your talking about.  Key word statements are if,
then, else, while, do...and assignments (=) and function calls thrown
in to round out other cases.  Identifier counts are of the same idea
but tries to measure more than the language specific identifiers.

I have some tools to measure the above, they're in lex and took about
a week to write.  (search for identifiers and look them up in a table,
or just count them).

Keith
-- 
___________________________________________________________________________

Keith L. Breinholt		hellgate.utah.edu!uplherc!unislc!klb or
Unisys, Unix Systems Group	kbreinho@peruvian.utah.edu