dcavasso@ntpal.uucp (Dana Cavasso) (03/07/91)
I need a "C" code line counter program, preferably written in "C". It will be used on several platforms, so solutions involving shell scripts and other UNIX utilities won't work. I'm not very picky (although I'd like something that did a little more than count newlines :-) With the growing trend toward gathering metrics, I expect such beasts are out there in force. If you would be willing to share your source, let me know. -- Dana Cavasso | "A rock pile ceases to be a rock pile dcavasso%ntpal@egsner.cirr.com | the moment a single man contemplates ntpal!dcavasso@egsner.cirr.com | it, bearing within him the image of a ...!cs.utexas.edu!egsner!ntpal!dcavasso | cathedral." - Antoine de Saint-Exupery
theo.bbs@shark.cs.fau.edu (Theo Heavey) (03/09/91)
dcavasso@ntpal.uucp (Dana Cavasso) writes: > > I need a "C" code line counter program, preferably written in > "C". It will be used on several platforms, so solutions involving > shell scripts and other UNIX utilities won't work. I'm not very > picky (although I'd like something that did a little more than count > newlines :-) > Why not use the "wc" program on the UNIX systems. It gives a line count --- not very sophisticated BUT the source may be available for altering! Theo Heavey Florida Atlantic University
EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/09/91)
In article <1991Mar6.214157.18633@ntpal.uucp>, dcavasso@ntpal.uucp (Dana Cavasso) writes: > > I need a "C" code line counter program, preferably written in >"C". It will be used on several platforms, so solutions involving >shell scripts and other UNIX utilities won't work. I'm not very >picky (although I'd like something that did a little more than count >newlines :-) > > With the growing trend toward gathering metrics, I expect >such beasts are out there in force. If you would be willing to >share your source, let me know. This request sounds innocuous: but think about it for a second. The following nonsense C code has three lines: a = 1; a = a+c; a = c*7; and the following equivalent code has one line! a = 1; a = a+c; a = c*7; It gets worse. How many lines in the following? if ( a = 1 ) { a = 0; c = 1; } Two, says the FORTRAN programmer. But if a line is roughly equivalent to a statement, there is only ONE line in the above...a line that contains two lines inside it... In short, "line" of C code is a meaningless concept. Count STATEMENTS. If you must count. In the last example above, there are 3 statements. Software engineering: computer science for people who can't program (Edsger Dijkstra.)
raj@crosfield.co.uk (Ray Jones) (03/11/91)
In article <1991Mar6.214157.18633@ntpal.uucp> dcavasso@ntpal.uucp (Dana Cavasso) writes: > > I need a "C" code line counter program, preferably written in >"C". It will be used on several platforms, so solutions involving >shell scripts and other UNIX utilities won't work. I'm not very >picky (although I'd like something that did a little more than count >newlines :-) So how about one that counts semi-colons :-) Ray -- - raj@cel.co.uk - Ray Jones, Crosfield Electronics, - - raj@crosfield.co.uk - Hemel Hempstead, HP2 7RH UK -
lws@comm.wang.com (Lyle Seaman) (03/12/91)
dcavasso@ntpal.uucp (Dana Cavasso) writes: > I need a "C" code line counter program, preferably written in >"C". It will be used on several platforms, so solutions involving Counting semi-colons is a pretty good approach, as that counts C statements. Lines is kind of less meaningful. Counting '{' is an interesting one, too. -- Lyle 508 967 2322 "We have had television problems directly lws@capybara.comm.wang.com attributable to something not understandable" Wang Labs, Lowell, MA, USA - unnamed believer in poltergeists
mitch@hq.af.mil (Mitch Wright) (03/12/91)
/* * On 11 Mar 91 09:06:47 GMT, * raj@crosfield.co.uk (Ray Jones) said: * */ > I need a "C" code line counter program, preferably written in >"C". It will be used on several platforms, so solutions involving >shell scripts and other UNIX utilities won't work. I'm not very >picky (although I'd like something that did a little more than count >newlines :-) Ray> So how about one that counts semi-colons :-) Because: #include <stdio.h> main() { printf(";;;; Hello World ;;;;\n"); } :-) -- ..mitch mitch@hq.af.mil (Mitch Wright) | The Pentagon, 1B1046 | (703) 695-0262 ``A system without PERL is like a hockey game without a fight.'' -- Mitch Wright
bhoughto@pima.intel.com (Blair P. Houghton) (03/12/91)
In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes: >Counting semi-colons is a pretty good approach, as that counts C >statements. Lines is kind of less meaningful. Counting '{' is >an interesting one, too. {{{{{{{{printf("Oh, if I were a rich man... ;;;;;;;;;;;;;;;;;;;;;;;\n");}}}}}}}} --Blair "Currently sleeping with my eyes open."
eversole@acae037.cadence.com (Richard Eversole; x6239) (03/12/91)
-- In article <MITCH.91Mar11175953@hq.af.mil>, mitch@hq.af.mil (Mitch Wright) writes: |> > I need a "C" code line counter program, preferably written in |> >"C". It will be used on several platforms, so solutions involving |> Ray> So how about one that counts semi-colons :-) |> |> Because: |> |> #include <stdio.h> |> main() |> { |> printf(";;;; Hello World ;;;;\n"); |> } But that is still only 1 semi-colon. It is very simple code to ignore quoted strings !!!! Someone who knows YACC & LEX can write that in only a few lines. Take maybe an hour to do. (Of course counting only semi-colons not in quoted strings using only C code would take not much more than the same hour to code.) Counting semi-colons is a trivial programming task. ===================================================================== eversole@cadence.com Live long and prosper !
dave@cs.arizona.edu (Dave P. Schaumann) (03/12/91)
(had to trim alt.sources.wanted since we don't get it here...) In article <2969@inews.intel.com> bhoughto@pima.intel.com (Blair P. Houghton) writes: |In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes: |>Counting semi-colons is a pretty good approach, as that counts C |>statements. Lines is kind of less meaningful. Counting '{' is |>an interesting one, too. | |{{{{{{{{printf("Oh, if I were a rich man... ;;;;;;;;;;;;;;;;;;;;;;;\n");}}}}}}}} I think I'd have to call this a pathalogical case. Bidee bidee bum. I think the trouble caused by this problem is that we have no clear cut definition of what a "line of code" is. So what we really need is to define what we mean by "# of statements" by setting up some rules like o declaration/assignment/function call is 1 statement o rules for counting constructs like if/while/for a good idea might just count the # of statements it groups, so if( foo ) S would have the same statement count as S (perhaps +1) o a rule for do S while() - perhaps the same as above, but you may want to treat this seperately. o { S } has the same statement count as S o rules for macros, comments, function headers, etc. This scheme seems to be workable to me, and has lots of room for fine-tuning the results to your personal taste. And it should be a simple matter to hack together a fairly simple yacc grammar to parse the code. Note that with this scheme (assuming strings are parsed correctly), the above "pathalogical" case now has a statement count of 1, as we might expect. -- Dave Schaumann | dave@cs.arizona.edu | Short .sig's rule! Newsgroups: alt.sources.wanted,comp.sources.wanted,comp.software-eng Subject: Re: WANTED: "C" code line counter program Summary: Expires: References: <1991Mar6.214157.18633@ntpal.uucp> <1991Mar11.182848.26693@comm.wang.com> <2969@inews.intel.com> Sender: Followup-To: Distribution: Organization: U of Arizona CS Dept, Tucson Keywords: In article <2969@inews.intel.com> bhoughto@pima.intel.com (Blair P. Houghton) writes: |In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes: |>Counting semi-colons is a pretty good approach, as that counts C |>statements. Lines is kind of less meaningful. Counting '{' is |>an interesting one, too. | |{{{{{{{{printf("Oh, if I were a rich man... ;;;;;;;;;;;;;;;;;;;;;;;\n");}}}}}}}} I think I'd have to call this a pathalogical case. Bidee bidee bum. I think the trouble caused by this problem is that we have no clear cut definition of what a "line of code" is. So what we really need is to define what we mean by "# of statements" by setting up some rules like o declaration/assignment/function call is 1 statement o rules for counting constructs like if/while/for a good idea might just count the # of statements it groups, so if( foo ) S would have the same statement count as S (perhaps +1) o a rule for do S while() - perhaps the same as above, but you may want to treat this seperately. o { S } has the same statement count as S o rules for macros, comments, function headers, etc. This scheme seems to be workable to me, and has lots of room for fine-tuning the results to your personal taste. And it should be a simple matter to hack together a fairly simple yacc grammar to parse the code. Note that with this scheme (assuming strings are parsed correctly), the above "pathalogical" case now has a statement count of 1, as we might expect. -- Dave Schaumann | dave@cs.arizona.edu | Short .sig's rule!
pyoung@axion.bt.co.uk (Pete Young) (03/13/91)
From article <2969@inews.intel.com>, by bhoughto@pima.intel.com (Blair P. Houghton): > In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes: >>Counting semi-colons is a pretty good approach, as that counts C >>statements. Lines is kind of less meaningful. Counting '{' is >>an interesting one, too. > {{{{{{{{printf("Oh, if I were a rich man... ;;;;;;;;;;;;;;;;;;;;;;;\n");}}}}}}}} Tee Hee. Good point though. Counting lines, or semicolons, or braces is much more meaningful if you have some kind of standard to compare your figures with. In this instance such a standard might take the form of a set of guidelines about the use of symbols in comments, layout and indentation of code etc. Or even a machine to generate the code from a specification (don't scoff too loud, it might happen one day!) It seems to me (although I am quite prepared to admit I'm wrong) that there are two generic questions about gathering metrics. The first is, "what do I want to know about this program/specification/bridge/whatever?" The second is "What can I measure to get this information?" Counting statements is a possible answer to the second question. So, has the first question been satisfactorily answered? In many cases, I suspect not. But counting lines of code is a lot easier than thinking about useful measures of the size and complexity of a program. ____________________________________________________________________ Pete Young pyoung@axion.bt.co.uk Phone +44 473 645054 British Telecom Research Labs, Martlesham Heath IPSWICH IP5 7RE
cpm00@duts.ccc.amdahl.com (Craig P McLaughlin) (03/13/91)
In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes: >dcavasso@ntpal.uucp (Dana Cavasso) writes: >> I need a "C" code line counter program, preferably written in >>"C". It will be used on several platforms, so solutions involving > >Counting semi-colons is a pretty good approach, as that counts C >statements. Lines is kind of less meaningful. Counting '{' is >an interesting one, too. > Counting semi-colons may miscount setups like the one below: while(condition) do_this; That's two, I think. :) What about counting newlines, but ignoring those that immediately follow another newline (ie, skip blank lines)? Craig McLaughlin cpm00@duts.ccc.amdahl.com V:(408)737-5502 I think it's time to come up with a witty signature and disclaimer...
cml@tove.cs.umd.edu (Christopher Lott) (03/13/91)
In article <1142@caslon.cs.arizona.edu> dave@cs.arizona.edu (Dave P. Schaumann) writes: >So what we really need is to define what we mean by "# of statements" by >setting up some rules like > o declaration/assignment/function call is 1 statement > o rules for counting constructs like if/while/for > a good idea might just count the # of statements it groups, so > if( foo ) S would have the same statement count as S (perhaps +1) > o a rule for do S while() - perhaps the same as above, but you may > want to treat this seperately. > o { S } has the same statement count as S > o rules for macros, comments, function headers, etc. First off, remember that SLOC is not a good metric for answering hard questions like productivity. It's a great measure of size, though, and cheap to compute. Define your SLOC carefully - but I recommend that you contact the IEEE and get their standards document on this (sorry, don't have the ref). Anyone? Then apply your definition carefully to projects in your environment - but don't even try to compare your numbers to those from another environment until you and your fellow SLOC-counter have discussed how you defined SLOC. For example, someone has quoted numbers from Japanese software companies. I am told that these folks usually report their numbers in assembler-equivalent lines - what size they believe their program would be in assembler. This is ok, I guess, if you have an assembler available beneath your compiler (like C) but who knows what they did for other environments? Good luck. A tool to do what you desire will be straightforward to build. Lessee, are you offering consultant rates? I might be interested :-) :-) chris... -- Christopher Lott \/ Dept of Comp Sci, Univ of Maryland, College Park, MD 20742 cml@cs.umd.edu /\ 4122 AV Williams Bldg 301.405.2721 <standard disclaimers>
ikluft@uts.amdahl.com (Ian Kluft) (03/13/91)
In article <MITCH.91Mar11175953@hq.af.mil> mitch@hq.af.mil (Mitch Wright) writes: >> I need a "C" code line counter program, [...] > >Ray> So how about one that counts semi-colons :-) > >Because: > >#include <stdio.h> >main() >{ > printf(";;;; Hello World ;;;;\n"); >} For any simple data-gathering tool, there's always a pathological case. You just named one. But counting semicolons still isn't a bad idea. It's a little more work, but try counting uncommented, unquoted semicolons. (Lex and yacc could be used to put that together in couple hours. Unfortun- ately, I don't have a couple hours to spare. Maybe someone else can take the idea somewhere.) That ought to give a usable and reasonably accurate count. -- #include <std-disclaimer.h> #define UTS (( Unix System V ) + ( Amdahl mainframe )) ------------------------------------------------------------------------------- Ian Kluft UTS Systems Software, Amdahl Corporation ikluft@uts.amdahl.com Santa Clara, CA
jgautier@vangogh.ads.com (Jorge Gautier) (03/13/91)
In article <1991Mar6.214157.18633@ntpal.uucp> dcavasso@ntpal.uucp (Dana Cavasso) writes: > I need a "C" code line counter program, [...] If you're on *nix, try "wc". Actually, "ls -l" on the file will give you a more accurate "metric," since the number of characters per line may vary. They were both written in C, I think. -- Jorge A. Gautier| "The enemy is at the gate. And the enemy is the human mind jgautier@ads.com| itself--or lack of it--on this planet." -General Boy DISCLAIMER: All statements in this message are false.
session@uncw.UUCP (Zack C. Sessions) (03/14/91)
cpm00@duts.ccc.amdahl.com (Craig P McLaughlin) writes: |In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes: ||dcavasso@ntpal.uucp (Dana Cavasso) writes: ||| I need a "C" code line counter program, preferably written in |||"C". It will be used on several platforms, so solutions involving || ||Counting semi-colons is a pretty good approach, as that counts C ||statements. Lines is kind of less meaningful. Counting '{' is ||an interesting one, too. || | Counting semi-colons may miscount setups like the one below: | while(condition) | do_this; | That's two, I think. :) What about counting newlines, but ignoring those |that immediately follow another newline (ie, skip blank lines)? |Craig McLaughlin cpm00@duts.ccc.amdahl.com V:(408)737-5502 Counting newlines may not be the way to go either. It is perfectly legitimate for a statement to span multiple source lines. Take a complex if() condition, for example, which for readability, you span a few lines with it. A true C source line counter would almost have to be the front end to a full compiler. Zack Sessions session@uncw.UUCP
EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/14/91)
In article <1991Mar12.163607.18799@axion.bt.co.uk>, pyoung@axion.bt.co.uk (Pete Young) writes: > >Counting statements is a possible answer to the second question. So, >has the first question been satisfactorily answered? In many cases, I >suspect not. But counting lines of code is a lot easier than thinking >about useful measures of the size and complexity of a program. I'd like to make a philosophical point. Since it questions the foundations of software engineering, busy managers and engineers may wish to skip this post, or save it for later perusal. Much of software engineering seems to be a doomed attempt to measure complexity. I pointed out in a previous post that if you count lines of code in a C program you may be misapplying a model valid in the one-statement-per-line FORTRAN world but invalid in the world of C, since C statements may be many per line, may be spread over many lines, and may even contain other statements. So the best metric is to count statements, and given a good yacc description of C and a lexical analyser such a tool is easy to develop. But even if you have measured statements, have you measured complexity? A program written in simple French (using French rather than English identifiers) is "too complex to maintain" when given to an American programmer since there is a 99.99% probability that such a programmer has never learned French. Complexity is a PSYCHOLOGICAL property. I suggest that complexity PRECEDES measurement. Instead of measuring complexity and appealing to your programmers to reduce the complexity of their code, I respectfully submit that software engineers concentrate on providing the environment wherein the programmers can write CORRECT code...at whatever level of complexity is appropriate. If this means that software engineers spend their time designing good PHYSICAL environments for programmers, determining the best location of vending machines and making certain that programmers have privacy and interaction when necessary, then such activity would be more productive than what goes under the rubric of software engineering. Although there certainly is a lot of crud code out there, much of it is not "overly complex". A lot of it is overly simple, not capturing the problem requirements. And a lot of it is wrong. Telling a programmer who has written a correct program that is complicated according to your metric is peculiarly offensive. It reminds me of the critics in the movie Amadeus, who tell the young Mozart that his music has "too many notes." Not all program- mers are Mozarts, but let's not go in the opposite direction, which I believe abuses a valuable economic resource (the nation's programming workforce) by being concerned that its members jump over artificial and ill-considered hoops represented by software metrics.
rar@saturn.ads.com (Bob Riemenschneider) (03/14/91)
In article <12583@pucc.Princeton.EDU> EGNILGES@pucc.Princeton.EDU (Ed Nilges) writes:
=> ... A program written in simple French (using French
=> rather than English identifiers) is "too complex to maintain"
=> when given to an American programmer since there is a 99.99%
=> probability that such a programmer has never learned French.
=> Complexity is a PSYCHOLOGICAL property.
A non sequitur if I've ever seen one! All you've shown is that a
program that isn't complex may still be hard to understand if, e.g.,
identifiers are poorly chosen. Everyone knows that.
=> ... If this means that software
=> engineers spend their time designing good PHYSICAL environments
=> for programmers, determining the best location of vending machines
=> and making certain that programmers have privacy and interaction
=> when necessary, then such activity would be more productive than
=> what goes under the rubric of software engineering. ...
Optimizing the physical environment is important, and a great deal
of work has gone into figuring out what the best environment is --
from serious ergonomic studies to more informal studies, such as
DeMarco and Lister's _Peopleware_. It doesn't follow that complexity
metrics are useless. Programming is hard for many reasons, and different
people have chosen to address different limited sets of reasons. This
is called "separation of concerns".
=> ... Telling a programmer who has written a correct program that is
=> complicated according to your metric is peculiarly offensive. ...
Obviously, correctness is a desirable property of programs. But it's
not the only desirable property. In the "real world", where far more
is spent on maintaining the average program than on developing it --
and the maintenance is usually done by someone other than the original
programmer -- simplicity is also a very desirable property. In many
cases, a correct but overly complex program is useless. What do you
find so offensive in asking a programmer to make a useless program
more useful?
=> ... [Software metrics are] artificial and ill-considered hoops [that
=> programmers are forced to jump through].
While some metrics are, no doubt, fairly worthless, others seem to
correlate strongly with our intuitive notion of complexity. (This
isn't guesswork based on personal prejudice. Studies have been
performed to determine how well complexity measures correlate with
understandability -- see, e.g., the proceedings of the "Empirical
Studies of Programmers" workshops.) They may not be perfect, but
they're better than nothing, so people use them.
-- rar
jeffv@bisco.kodak.com (Jeff Van Epps) (03/14/91)
I suppose what one really wants to know is: "how much code is there", i.e. what is the quantity of effective instructions? Use the size of the object (*.o) file. :-) -- If the From: line says nobody@kodak.com, don't believe it. Jeff Van Epps jeffv@bisco.kodak.com rochester!kodak!bisco!jeffv
mwb@ulysses.att.com (Michael W. Balk) (03/14/91)
In article <1991Mar11.182848.26693@comm.wang.com>, lws@comm.wang.com (Lyle Seaman) writes: > dcavasso@ntpal.uucp (Dana Cavasso) writes: > > I need a "C" code line counter program, preferably written in > >"C". It will be used on several platforms, so solutions involving > > Counting semi-colons is a pretty good approach, as that counts C > statements. Lines is kind of less meaningful. Counting '{' is > an interesting one, too. If you just count semi-colons, then in for-loops such as for(i = 0; i < 10; i++) { ... } i = 0; and i < 10; will be counted as individual statements. In fact they are, but if you want to count for( ... ) as a single statement then count the semi-colons and correct the count by subtracting 1 for every for-statement. There might be other cases like this that you may want to consider. Then again, in most cases this is just probably nit-picking.
dlee@pallas.athenanet.com (Doug Lee) (03/14/91)
In article <dcda02id05Q.01@JUTS.ccc.amdahl.com> cpm00@DUTS.ccc.amdahl.com (PUT YOUR NAME HERE) writes: >What about counting newlines, but ignoring those >that immediately follow another newline (ie, skip blank lines)? My first thought was to skip all comments (single- and multi-line) and then count only lines containing characters other than whitespace. This should be close, though it will still overcount on constructs like if (( <long_condition_1> ) || ( <long_condition_2> ) || ... ) Then again, maybe a line that long *should* count as more than one line. We also run into the somewhat common declaration syntax char * foo() which, by my method, counts as two lines. Unfortunately, I see no quick way to give a consistent line count regardless of program syntax. Counting lines ending in '{' or ';' (after removing comments and trailing whitespace) would catch most loops and function definitions without counting them more than once, but constructs like while (line = get_next_line(file)) (void) process_line(line); would still count only once unless the braces were included (not a bad idea, imho). We need a more precise definition of "line" for this, I fear. Does this remind anyone else of _The Mythical Man Month_? :-) -- Doug Lee (dlee@athenanet.com or {bradley,uunet}!pallas!dlee)
ksh@ai.mit.edu (K. Shane Hartman) (03/14/91)
>For example, someone has quoted numbers from Japanese software companies. >I am told that these folks usually report their numbers in assembler >equivalent lines - what size they believe their program would be in assembler. >This is ok, I guess, if you have an assembler available beneath your compiler >(like C) but who knows what they did for other environments? Counting assembler equivalents is useless given optimizing compilers of varying abilities. Better (functional) metrics such as function points and feature points have been around for a while. Functional metrics can be applied to 'languages' which have no lines of code (code generation from design diagrams for example). -[Shane]->
dalamb@avi.umiacs.umd.edu (David Lamb) (03/15/91)
In article <1031@pallas.athenanet.com> dlee@pallas.athenanet.com (Doug Lee) writes: >... We need a more precise definition of "line" for this, I fear. An excellent comment. People interested in size metrics such as lines-of-code and number-of-tokens should read %A T. Capers Jones %T Programming Productivity %I McGraw-Hill %C New York %D 1986 It has a very good discussion on the flaws of size metrics (especially lines-of-code), but also how to get the most value out of them despite the flaws. -- David Alex Lamb internet: dalamb@umiacs.umd.edu
lws@comm.wang.com (Lyle Seaman) (03/15/91)
bhoughto@pima.intel.com (Blair P. Houghton) writes: >In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes: >>Counting semi-colons is a pretty good approach, as that counts C >>statements. Lines is kind of less meaningful. Counting '{' is >>an interesting one, too. >{{{{{{{{printf("Oh, if I were a rich man... ;;;;;;;;;;;;;;;;;;;;;;;\n");}}}}}}}} Yeah, but that could just as easily be written: { { { { { { { { printf ( "Oh, if I were a rich man... ;;;;;;;;;;;;;;;;;;;;;;;\n" ) ; } } } } } } } } So either simple approach is susceptible to intentional obfuscation (but then, most such schemes are). No one claimed that counting semis and curlies was foolproof. You've demonstrated that it isn't. On the other hand, seasoned coders don't usually use ; and } to such excess. (Yes, there are *occasional* duplicates). However, they do usually include quite a few redundant newlines. Comments, preprocessor directives and white space are very common, and apparently the original poster didn't wish to count them. I stand by my suggestion. -- Lyle 508 967 2322 "We have had television problems directly lws@capybara.comm.wang.com attributable to something not understandable" Wang Labs, Lowell, MA, USA - unnamed believer in poltergeists
EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/15/91)
In article <9122@suns6.crosfield.co.uk>, raj@crosfield.co.uk (Ray Jones) writes: >In all the silly examples of multiple semicolons, nobody has mentioned >the obvious one; > > for(i=0; i<end; i++) > >How many statements? One, surely. COME ON, people! Use the Backus Naur Form definition of the language, ad printed in The C Programming Language, second edition , p. 234. Here we read that a statement of the type "iteration statement" is (among other things) for ( optexp; optexp; optexp ) statement which means that > > for(i=0; i<end; i++) is NOT "one, surely": it is not a well-formed statement. +--------------------------------+ Edward G. Nilges | Child support, tax-deductible | Princeton University | to payer AND receiver: an idea | Information Center | whose time has come. | Bitnet: EGNILGES@PUCC +--------------------------------+ (609) 258-2985
kenr@cruise.cc.rochester.edu (Kenneth C. Rich) (03/15/91)
In article <9082@suns6.crosfield.co.uk> raj@crosfield.co.uk (ray a jones) writes: >In article <1991Mar6.214157.18633@ntpal.uucp> dcavasso@ntpal.uucp (Dana Cavasso) writes: >> >> I need a "C" code line counter program, preferably written in > >So how about one that counts semi-colons :-) Seriously I emailed him a sh script that counted ';', '}' and '#', to count simple statements, blocks of statements, and cpp directives. !-) Sorry, I discarded it. !-( It was a one liner, too, so it could be an alias in csh. !-) A nice improvement would be to make sure to not count them inside quotes and comments... -- -ken rich -=!=- kenr@cc.rochester.edu
jpc@fct.unl.pt (Jose Pina Coelho) (03/18/91)
In article <1991Mar14.192419.1576@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes: bhoughto@pima.intel.com (Blair P. Houghton) writes: >In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes: >>Counting semi-colons is a pretty good approach, as that counts C >>statements. Lines is kind of less meaningful. Counting '{' is >>an interesting one, too. >{{{{{{{{printf("Oh, if I were a rich man... ;;;;;;;;;;;;;;;;;;;;;;;\n");}}}}}}}} [... How to fool char counters ...] Why not compile the sources and checking the size of object code ? -- Jose Pedro T. Pina Coelho | BITNET/Internet: jpc@fct.unl.pt Rua Jau N 1, 2 Dto | UUCP: ...!mcsun!unl!jpc 1300 Lisboa, PORTUGAL | Home phone: (+351) (1) 640767 - If all men were brothers, would you let one marry your sister ?
boyd@necisa.ho.necisa.oz.au (Boyd Roberts) (03/18/91)
What about? $ grep -c '[{});]$' *.c Or? $ egrep -c '(^#)|([{});]$)' *.c Boyd Roberts boyd@necisa.ho.necisa.oz.au ``When the going gets wierd, the weird turn pro...''
carl@p4tustin.UUCP (Carl W. Bergerson) (03/19/91)
dcavasso@ntpal.uucp (Dana Cavasso) writes: > > I need a "C" code line counter program, preferably written in > "C". It will be used on several platforms, so solutions involving > shell scripts and other UNIX utilities won't work. I'm not very > picky (although I'd like something that did a little more than count > newlines :-) In the October or November issue of Unix World the Wizard's Grabbag column contained three programs for removing comments from C and C++ code. I believe that one of them was in C. Once you have the comments removed, you can use the wc program that is listed in "Software Tools in Pascal" by Kernighan and (memory fails me). Translating to C shouldn't be all that difficult. -- Carl Bergerson uunet!p4tustin!carl Point 4 Data Corporation carl@point4.com 15442 Del Amo Avenue Voice: (714) 259 0777 Tustin, CA 92680-6445 Fax: (714) 259 0921
hammes@dill.informatik.uni-kl.de (Stefan Hammes (HiWi Mattern)) (03/20/91)
In article <JPC.91Mar17162220@terra.fct.unl.pt>, jpc@fct.unl.pt (Jose Pina Coelho) writes: |> |>In article <1991Mar14.192419.1576@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes: |> |> bhoughto@pima.intel.com (Blair P. Houghton) writes: |> |> >In article <1991Mar11.182848.26693@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes: |> >>Counting semi-colons is a pretty good approach, as that counts C |> >>statements. Lines is kind of less meaningful. Counting '{' is |> >>an interesting one, too. |> |> >{{{{{{{{printf("Oh, if I were a rich man... ;;;;;;;;;;;;;;;;;;;;;;;\n");}}}}}}}} |> |>[... How to fool char counters ...] |> |>Why not compile the sources and checking the size of object code ? Because this is very machine, compiler and linker dependent! Stefan +---------------------------------------+-------------------------------------+ | Stefan Hammes | e-Mail: hammes@informatik.uni-kl.de | | FB Informatik, SFB 124-D1 +-------------------------------------+ | Universitaet Kaiserslautern, P.O.Box 3049, D-W6750 Kaiserslautern, Germany | +-------------+------------------------------------------------+--------------+ | Language definition: Recursion - see recursion | +------------------------------------------------+
hsrender@happy.colorado.edu (03/21/91)
In article <4816@berry19.UUCP>, crocker@motcid.UUCP (Ronald T. Crocker) writes: > Ok, so back to the question at hand: does anyone have a C program that > counts C lines of code. I have a sed(1) script that will do it (I'll > post it if anyone is interested), but that isn't what is wanted. The > following lex program "almost" works (It doesn't handle quoted strings > quite correctly, and it counts lines that consist of only comments as > lines of code...; it is an exercise for the reader to modify the > program ;-> ) There's a set of tools filed under 'metrics' under the comp.sources.unix archive on gatekeeper.dec.com. Included is a tool called kdsi for counting C LOC. There's also a stripcom tool that strips comments out of C programs. I don't know what volume of the archive it's under, sorry. All of this stuff is available via anonymous ftp to gatekeeper or any other of the comp.sources.unix archive sites. hal.
EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/21/91)
In article <4816@berry19.UUCP>, crocker@motcid.UUCP (Ronald T. Crocker) writes: > >Get off of this guy's back already. He wants a simple C program to >count C lines of code. Whether LOC is a valid metric is not the >question at hand; LOC is, generally speaking, a good heuristic >measurement of the size of a program. It is not the only one, nor is >it the best. As with any code-based metric, its value can be skewed by >"non-conforming" code; you know, GIGO. Let him worry about ensuring >the validity of the information, just get him a tool to help out >counting lines of code. A metric that counts statements is much better than an inaccurate heuristic and a tool to do so, while not trivial, is easily generated using yacc and lexx. I believe Richard Slomka is the author of a book, "No Nonsense Management", in which he declares that the fact that you cannot obtain perfect numbers does not mean that you should stop searching for better numbers. A crude line counter has negative worth. It will assess programs that use white space to clarify logic as "more complicated" than programs that look like a burst of line noise. I'm not "on the guy's back". But if you're going to measure, do it right. This whole discussion shows why Dijkstra defines software engineering as "computer science for people who can't program." +--------------------------------+ Edward G. Nilges | Child support, tax-deductible | Princeton University | to payer AND receiver: an idea | Information Center | whose time has come. | Bitnet: EGNILGES@PUCC +--------------------------------+ (609) 258-2985
jgautier@vangogh.ads.com (Jorge Gautier) (03/21/91)
In article <1991Mar20.114450.19653@rhrk.uni-kl.de> hammes@dill.informatik.uni-kl.de (Stefan Hammes (HiWi Mattern)) writes: > |>[... How to fool char counters ...] > |> > |>Why not compile the sources and checking the size of object code ? > > Because this is very machine, compiler and linker dependent! So is the size of the source code.
jpc@fct.unl.pt (Jose Pina Coelho) (03/21/91)
In article <1991Mar20.114450.19653@rhrk.uni-kl.de> hammes@dill.informatik.uni-kl.de (Stefan Hammes (HiWi Mattern)) writes: In article <JPC.91Mar17162220@terra.fct.unl.pt>, jpc@fct.unl.pt (Jose Pina Coelho) writes: |> |>In article <1991Mar14.192419.1576@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes: |> |> |>[... How to fool char counters ...] |> |>Why not compile the sources and checking the size of object code ? Because this is very machine, compiler and linker dependent! It's probably the best [better fiability/work ratio ] bet, you can compare by compiling all programs on the same architecture with the same compiler. Another bet [machine, compiler and linker independent]: Get a C grammar and count tokens. On the other hand, you need a method that is programer-independent because: - Programer A came from Fortran, he isn't used to: - ``Creative'' for's :-) . - Recursivity. - Programer B came from C B is going to produce the same funcionality with a lower ammount of code. -- Jose Pedro T. Pina Coelho | BITNET/Internet: jpc@fct.unl.pt Rua Jau N 1, 2 Dto | UUCP: ...!mcsun!unl!jpc 1300 Lisboa, PORTUGAL | Home phone: (+351) (1) 640767 - If all men were brothers, would you let one marry your sister ?
rlandsma@bbn.com (Rick Landsman) (03/21/91)
Sorry if this has already been mentioned. We have been using a "C" SLOC counting tool available via anonymous FTP from uunet.uu.net comp.sources.unix archives. I believe it is available in volume20,21 or 22 but interested parties should get the Index and search it for "metrics" to see where the actual package is located in the archive. Not only does it generate slocs (we modified it to count ASSY lines as well), it generates McCabe and Halstead complexity metrics for C modules. We have found it very useful. regards rick landsman - bbn Rick Landsman address: uunet.uu.net!bbn.com!rlandsman rlandsman@bbn.com
chip@tct.uucp (Chip Salzenberg) (03/25/91)
According to dww@math.fu-berlin.de (Debora Weber-Wulff): >(GULP!) ~20 LOC/person/day. This was the number management >wanted to see, it made their day when we were able to give >them their little number. So it goes. Management education is a part of a real programmer's job. If you hand them a meaningless number, you do them and yourself a disservice -- even if they think they want it. -- Chip Salzenberg at Teltronics/TCT <chip@tct.uucp>, <uunet!pdn!tct!chip> "All this is conjecture of course, since I *only* post in the nude. Nothing comes between me and my t.b. Nothing." -- Bill Coderre
louk@tslwat.UUCP (Lou Kates) (03/26/91)
In article <12609@pucc.Princeton.EDU# EGNILGES@pucc.Princeton.EDU writes: #In article <4816@berry19.UUCP#, crocker@motcid.UUCP (Ronald T. Crocker) writes: # ## ##Get off of this guy's back already. He wants a simple C program to ##count C lines of code. Whether LOC is a valid metric is not the ##question at hand; LOC is, generally speaking, a good heuristic # #A metric that counts statements is much better than an inaccurate #heuristic and a tool to do so, while not trivial, is easily #generated using yacc and lexx. # #A crude line counter has negative worth. It will assess programs #that use white space to clarify logic as "more complicated" than #programs that look like a burst of line noise. # #I'm not "on the guy's back". But if you're going to measure, do #it right. This whole discussion shows why Dijkstra defines #software engineering as "computer science for people who can't #program." All the popular measures are highly correlated (over 90% positive correlation in the study I saw) so for most purposes it really doesn't matter what you use. Lou Kates, Teleride Sage Ltd., louk%tslwat@watmath.waterloo.edu
EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/26/91)
In article <350@tslwat.UUCP>, louk@tslwat.UUCP (Lou Kates) writes: >In article <12609@pucc.Princeton.EDU# EGNILGES@pucc.Princeton.EDU writes: >#In article <4816@berry19.UUCP#, crocker@motcid.UUCP (Ronald T. Crocker) writes: ># >## >##Get off of this guy's back already. He wants a simple C program to >##count C lines of code... ># >#A crude line counter has negative worth. It will assess programs >#that use white space to clarify logic as "more complicated" than >#programs that look like a burst of line noise. > >All the popular measures are highly correlated (over 90% positive >correlation in the study I saw) so for most purposes it really >doesn't matter what you use. This sounds impressive but it is quite vague. "Correlated" with what? Each other? But if any number of measures "correlate" with each other, what does THAT mean? Are they GOOD measures? Enquiring minds want to know. And the susceptibility of software systems, so emphasized by Tony Hoare in his critique a few years ago of Strategic Defense Initiative, to outlier events and chaotic behavior (captured in such gnomic statements as "ten percent of the code causes ninety percent of the bugs") means that a 90 percent correlation is not so good after all (assuming we know what it means.) Case in point: program A falls thru the cracks as a "noncomplicated program" because it has only ONE line. It fails. The programmer assigned to fix it NOW opens it up and finds that this "noncomplicated" program is one, 4096 byte, line of C code containing a thousand statements. The programmer is unable to get it working and is sacked for not being competent enough to fix "simple" code (don't laugh: this stuff happens.) The Exchange opens up on the following Monday, the one-line program starts a wild run on cocoa which translates into a generalized financial panic. So much for "metrics". +--------------------------------+ Edward G. Nilges | Child support, tax-deductible | Princeton University | to payer AND receiver: an idea | Information Center | whose time has come. | Bitnet: EGNILGES@PUCC +--------------------------------+ (609) 258-2985
larryd@toby.bnr.ca (Larry Dunkelman) (03/27/91)
Once a metric is published, programmers can easily find ways (if they want to) to fool the system. If management wants to measure bugs/lines of code, then programmers could just write lengthy programs. If management wants to measure productivity as lines of code/man month, then again, programmers could "pad" their programs. In my experience this does not happen. We have been using a simple "C" code line counter which does exactly that -- counts lines of code (not statements). A source line is either blank, a comment or a line of code. That's it. If a programmer splits an assignment statement over three source lines, then it will be counted as three lines of code. If he writes 5 statements on the same line, then they will be counted as one line of code. In practice this works fine since *most* programmers are reasonable about how they "format" their code. Our counter has an option to ignore lines with just a "{" or "}", but in the final analysis, even that option doesn't make much difference. Maybe a way to rid ourselves of the degenerate cases being discussed lately is to run the programs through pretty printers before counting. I am not a great defender of Lines of Code (or Kilo Lines Of Code - KLOC) as a good metric. There are cases where a designer has had a negative productivity since he managed to reduce the size of the program and add functionality at the same time. BUT, until someone comes up with a better metric, LOCs are what management will ask for. Larry larryd!bnrmtl@larry.mcrcim.mcgill.edu
louk@tslwat.UUCP (Lou Kates) (03/27/91)
In article <12632@pucc.Princeton.EDU# EGNILGES@pucc.Princeton.EDU writes: #In article <350@tslwat.UUCP#, louk@tslwat.UUCP (Lou Kates) writes: # ##In article <12609@pucc.Princeton.EDU# EGNILGES@pucc.Princeton.EDU writes: ###In article <4816@berry19.UUCP#, crocker@motcid.UUCP (Ronald T. Crocker) writes: ### #### ####Get off of this guy's back already. He wants a simple C program to ####count C lines of code... ### ###A crude line counter has negative worth. It will assess programs ###that use white space to clarify logic as "more complicated" than ###programs that look like a burst of line noise. ## ##All the popular measures are highly correlated (over 90% positive ##correlation in the study I saw) so for most purposes it really ##doesn't matter what you use. # #This sounds impressive but it is quite vague. "Correlated" with #what? Each other? But if any number of measures "correlate" #with each other, what does THAT mean? # For every pair of measures X and Y in the group under study the correlation of X and Y exceeds 90%. Actually the minimum might have been higher than this. I'm just going by memory. Lou Kates, Teleride Sage Ltd., louk%tslwat@watmath.waterloo.edu
Lee Sailer <UH2@psuvm.psu.edu> (03/28/91)
Did the original poster ever get a C LOC counter? You could probably build one based on the GNU indent program. Indent is a C pretty printer, like cb only more so. lee
frank@grep.co.uk (Frank Wales) (03/28/91)
In article <12630@pucc.Princeton.EDU> EGNILGES@pucc.Princeton.EDU writes: >Case in point: program A falls thru the cracks as a "noncomplicated >program" because it has only ONE statement. It fails. The >programmer assigned to fix it NOW opens it up and finds that >this "noncomplicated" program is one, 4096 byte, line of C >code containing a thousand statements. 4096 bytes on a line? Feh, kid's stuff. :-) Cases like this (fictitious or not) seem to me to highlight the problems of metrics like "lines of code" or "statement counts", because these are predicated on the notion that statements or lines are uniformly complex, fundamental pieces from which programs are built. It's hard for me to accept such a notion for anything other than assembly language. Quite apart from the difficulties associated with inter-language comparisons, is there any work on enumerating programs at the token level, a sort of "component count" metric? This, at least, would be an honest assessment of the fundamental syntactic pieces, whatever that might be worth. (Yes, I'm a "metric sceptic".) -- Frank Wales, Grep Limited, [frank@grep.co.uk<->uunet!grep!frank] Kirkfields Business Centre, Kirk Lane, LEEDS, UK, LS19 7LX. (+44) 532 500303
EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/30/91)
In article <1991Mar28.145235.14313@grep.co.uk>, frank@grep.co.uk (Frank Wales) writes: > >Cases like this (fictitious or not) seem to me to highlight >the problems of metrics like "lines of code" or "statement >counts", because these are predicated on the notion that statements >or lines are uniformly complex, fundamental pieces from which >programs are built. It's hard for me to accept such a notion >for anything other than assembly language. Even if you counted lexemes (these are the things that the "lexi- cal analyzer" recognizes, including identifiers, constants, operators and so on), you'd not measure complexity fully. There are two reasons for this: one psychological and one mathematical. The psychological reason? Different lexemes have different instrinsic complexity, considering complexity as a property of our minds. The "+" sign is less complex, perhaps, than the "-" sign. The trouble with this is that psychological measurements typically ignore enormous individual differences in perception of complexity. Niklaus Wirth has pointed out that whenever you add two positive numbers (a seemingly simple operation) you are in danger of overflow: does this mean that addition of two positive integers is more complex than subtraction two positive integers where neither overflow nor underflow can occur (0-(2**31-1) in twos-complement is the worst case)? The mathematical reason? The total number of lexemes increases complexity, but so does the ways in which those lexemes are COMBINED. The structured statement a+b is simply a more complex artifact than the unordered, unstructured SET of lexemes {a,b,+}. Both have the "metric" 3. +--------------------------------+ Edward G. Nilges | Child support, tax-deductible | Princeton University | to payer AND receiver: an idea | Information Center | whose time has come. | Bitnet: EGNILGES@PUCC +--------------------------------+ (609) 258-2985
locke@paradyne.com (Richard Locke) (04/02/91)
In article <1991Mar28.145235.14313@grep.co.uk> frank@grep.co.uk (Frank Wales) writes: >In article <12630@pucc.Princeton.EDU> EGNILGES@pucc.Princeton.EDU writes: There has been a lot of work done in software metrics that addresses the topics & concerns that have been discussed (extensively) in this thread. >>Case in point: program A falls thru the cracks as a "noncomplicated >>program" because it has only ONE statement. It fails. The ... >...is there any work on enumerating programs at the token >level, a sort of "component count" metric? This, at least, would >be an honest assessment of the fundamental syntactic pieces, >whatever that might be worth. (Yes, I'm a "metric sceptic".) The Halstead "Software Science" family of metrics are based on the number of operators and operands. They give you some sense of a body of code's "size". The other major metrics family, McCabe's "Cyclomatic Complexity" metrics, asseses complexity based on a program's flow of control. Both these classes of metrics help evaluate "size" and "complexity", and address the "but what about *this* program!" cases that have been kicked around. The best introduction these metrics of which I'm aware is in "A Software Metrics Tutorial", which is included with the UX-Metric or PC-Metric tools that Set Labs sells. Unfortunately, I don't believe it's available anywhere else. These XX-Metric tools read a body of code and spit out various metrics including McCabe and Halstead stuff, LOC, semi-colons, comment lines, estimated number of errors, estimated time to develop, etc. They are cheap, too -- cheaper to your company than having a programmer mess around for a couple of days coding and perfecting a line counter ;-) Anyway, I suggest that people interested in metrics, but unfamiliar with the McCabe and Halstead metrics, do some investigation. If you're more interested in software productivity than software complexity, I further suggest that you get and read Jones' "Programming Productivity". This has an excellent discussion as to why LOC is bad for productivity measures. -dick "just a satisfied customer of UX-Metric" p.s. Send me mail if you want pointers or references to the stuff I mention...
rlandsma@bbn.com (Rick Landsman) (04/05/91)
I have mentioned this before but for those that missed it, there is a "C" package called metrics in volume 20 of the comp.sources.unix archive available via anonymous ftp from uunet.uu.net that generates both McCabe and Halstead metrics from "C" files. We have found it useful as a way to determine POTENTIAL software functions for re-design to reduce ERROR PRONENESS :-) due to overly complex control and syntactical complexity. This is one of the tools we use for defect reduction by targetting error prone modules. There is documentation included with the package that discusses some of the Halstead volume and other metrics used by the author. Rick Landsman address: uunet.uu.net!bbn.com!rlandsman OR rlandsman@bbn.com
klb@unislc.uucp (Keith L. Breinholt) (04/05/91)
In article <1991Mar28.145235.14313@grep.co.uk> frank@grep.co.uk (Frank Wales) writes: >4096 bytes on a line? Feh, kid's stuff. :-) > >Cases like this (fictitious or not) seem to me to highlight >the problems of metrics like "lines of code" or "statement >counts", because these are predicated on the notion that statements >or lines are uniformly complex, fundamental pieces from which >programs are built. It's hard for me to accept such a notion >for anything other than assembly language. Which is why a combination of measures is important. Lines of code is only meant to measure complexity due to size. Some other measures of size complexity are, number of unique identifiers, keyword statements, # of operators, ..... Measures of control flow complexity are McCabe's cyclomatic complexity or essential complexity, branch statement size (operators not characters) and so on. >Quite apart from the difficulties associated with inter-language >comparisons, is there any work on enumerating programs at the token >level, a sort of "component count" metric? This, at least, would >be an honest assessment of the fundamental syntactic pieces, >whatever that might be worth. (Yes, I'm a "metric sceptic".) Key word statements and unique or total identifier count is in the ballpark of what your talking about. Key word statements are if, then, else, while, do...and assignments (=) and function calls thrown in to round out other cases. Identifier counts are of the same idea but tries to measure more than the language specific identifiers. I have some tools to measure the above, they're in lex and took about a week to write. (search for identifiers and look them up in a table, or just count them). Keith -- ___________________________________________________________________________ Keith L. Breinholt hellgate.utah.edu!uplherc!unislc!klb or Unisys, Unix Systems Group kbreinho@peruvian.utah.edu