swonk@ccicpg.UUCP (Glen Swonk) (08/10/89)
Does anyone have a program or a method of determing the number of C source lines in a source file? My assumption is that comments don't count as source lines unless the comment is on a line with code. Are there any other tools to measure the complexity of a source file? thanks -- Glenn L. Swonk CCI Computers (714)458-7282 9801 Muirlands Boulevard Irvine, CA 92718 uunet!ccicpg!swonk
flint@gistdev.UUCP (Flint Pellett) (08/11/89)
Comment lines don't count? What are you going to use the count for when you get it? Everytime I've wanted to do this, I've wanted to count every line except blank lines, which is easy. If you're counting for purposes of measuring productivity, then comment lines certainly do count, otherwise you're going to be encouraging people to not document their code. -- Flint Pellett, Global Information Systems Technology, Inc. 1800 Woodfield Drive, Savoy, IL 61874 (217) 352-1165 INTERNET: flint%gistdev@uxc.cso.uiuc.edu UUCP: {uunet,pur-ee,convex}!uiucuxc!gistdev!flint
gwyn@smoke.BRL.MIL (Doug Gwyn) (08/12/89)
In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes:
-Does anyone have a program or a method of determing
-the number of C source lines in a source file?
-My assumption is that comments don't count as source
-lines unless the comment is on a line with code.
What precisely is this supposed to measure?
johnk@opel.UUCP (John Kennedy) (08/14/89)
In article <10707@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes: >-Does anyone have a program or a method of determing >-the number of C source lines in a source file? >-My assumption is that comments don't count as source >-lines unless the comment is on a line with code. > >What precisely is this supposed to measure? It is not uncommon to generate NCSL (Non-Commentary Source Lines) for purposes of productivity. No, this does not encourage programmers not to comment their files. NCSL estimates have a relationship to size and execution time predictions. Comments do not. John -- John Kennedy johnk@opel.UUCP Second Source, Inc. Annapolis, MD
kazua-u@ascii.JUNET (Kazuaki Ueno) (08/14/89)
In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes: |Does anyone have a program or a method of determing |the number of C source lines in a source file? |My assumption is that comments don't count as source |lines unless the comment is on a line with code. | How about trying this: grep -v '^#[ ]*include' <filename> | /lib/cpp -P | grep -v '^[ ]*$' | wc -l ( Inside []'s are a SPACE and a TAB. ) You will get the number of 'effective' C source lines with this list of commands. Of course I do not care what you do with it. :-) -- Kazuaki Uyeno ASCII Corporation, Tokyo, Japan also a student of Univ. of Tokyo kazua-u@ascii.JUNET
reggie@dinsdale.nm.paradyne.com (George W. Leach) (08/14/89)
In article <10707@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes: >-Does anyone have a program or a method of determing >-the number of C source lines in a source file? >-My assumption is that comments don't count as source >-lines unless the comment is on a line with code. >What precisely is this supposed to measure? I also want to know just what you are going to measure with this number? A much simplier approach would be to use awk to strip out what you don't want and pipe to wc. Then again, a much simplier way to measure code size with equally worthwhile scientific precision is to just measure the thinkness of your printouts with a ruler :-) George W. Leach AT&T Paradyne (uunet|att)!pdn!reggie Mail stop LG-133 Phone: 1-813-530-2376 P.O. Box 2826 FAX: 1-813-530-8224 Largo, FL USA 34649-2826
linda@rtech.rtech.com (Linda Mundy) (08/17/89)
In article <6500@pdn.paradyne.com> reggie@dinsdale.paradyne.com (George W. Leach) writes: > [deleted discussion of what's being measured when counting lines of code...] > > Then again, a much simplier way to measure code size with equally worthwhile >scientific precision is to just measure the thinkness of your printouts with ^^^^^^^^^ >a ruler :-) Aha! so that's what we're trying to measure here... I don't think a ruler will suffice! :-) > >George W. Leach AT&T Paradyne >(uunet|att)!pdn!reggie Mail stop LG-133 >Phone: 1-813-530-2376 P.O. Box 2826 >FAX: 1-813-530-8224 Largo, FL USA 34649-2826 -- "Who are you to tell me to question authority?" Linda Mundy {ucbvax,decvax}!mtxinu!rtech!linda
alanm@cognos.UUCP (Alan Myrvold) (08/17/89)
In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes: >-Does anyone have a program or a method of determing >-the number of C source lines in a source file? Ok. First off, sources don't really belong in comp.software-eng ... so I feel a bit guilty, but here's a reasonably portable C program to count : NCSL - non-commentary source lines LINES - source lines COMMENTS - C comments NCC - non-contiguous comments It will even run on systems where (heaven forbid) the argv[] list isn't as convienient to use as Unix's. And should compile with either a K&R or ANSI-style compiler. On known bug in the program is that VAX CC (and others) allow: #include "foo.c"" Which confuses the string parsing part of my program. Obfusicated C contest winners may also fould the program. Flames and comments to alanm@cognos.uucp, please. - Alan --- cut here --- /* LOC.C count C lines of code, comments */ /* For each c file, produces NCSL - non-commentary source lines LINES - source lines COMMENTS - C comments NCC - non-contiguous comments If invoked with no arguments and the file "cfiles.lis" does not exist, input is taken from stdin, output goes to stdout. If invoked with no arguments and "cfiles.lis" does exist, the filenames are assume to be in "cfiles.lis", and the output is written to BOTH stdout and "cfiles.out". If invoked with arguments, the args are taken as filenames, and output is written to stdout. */ /* Alan Myrvold 3755 Riverside Dr. uunet!mitel!sce!cognos!alanm */ /* Cognos Incorporated P.O. Box 9707 alanm@cognos.uucp */ /* (613) 738-1440 x5530 Ottawa, Ontario */ /* CANADA K1G 3Z4 */ #include <stdio.h> #include <string.h> #include <ctype.h> #define NORM 0 #define COMMENT 1 #define STRING 2 #define CHAR 3 #define ID 4 #define SPECIAL 5 #define WHITE 6 static long LINES_OF_CODE,LAST_LINE,CURRENT_LINE,COMMENTS,NCC,IS_CONTIG; #define id1(c) (isalpha(c) || ((c) == '_')) #define id2(c) (id1(c) || (('0' <= (c)) && ((c) <= '9')) || ((c) == '$')) #define is_white(c) (((c) == ' ') || ((c) == '\t') || ((c) == '\n')) void echo_fn(k,s) int k; char *s; { fputs(s,stdout); } void dump_white(s) char *s; { for (; *s; s++) { switch (*s) { case '\t' : printf("\\t"); break; case '\n' : printf("\\n"); break; case ' ' : printf("_"); break; default : putchar(*s); } } } void dump_fn(k,s) int k; char *s; { switch (k) { case ID : printf("ID %s\n",s); break; case COMMENT : printf("COMMENT %s\n",s); break; case SPECIAL : printf("SPECIAL %s\n",s); break; case STRING : printf("STRING %s\n",s); break; case CHAR : printf("CHAR %s\n",s); break; case WHITE : printf("WHITE "); dump_white(s); putchar('\n'); break; default : printf("unknown %s\n",s); } } void count_fn(k,s) int k; char *s; { switch (k) { case ID : case SPECIAL : case STRING : case CHAR : if (CURRENT_LINE != LAST_LINE) { LINES_OF_CODE++; LAST_LINE = CURRENT_LINE; } IS_CONTIG = 0; break; case COMMENT : COMMENTS++; if (!IS_CONTIG) { IS_CONTIG = 1; NCC++; } break; } } /* Beware trespassers of this code ... it is rather obtuse... */ /* but it SEEMS to work */ void tokenize(f,t) FILE *f; void (*t)(); { int skip_next,in_id,in_white,bptr,mode,c,old_c,retain; static char buffer[8000]; IS_CONTIG = NCC = LINES_OF_CODE = COMMENTS = CURRENT_LINE = 0; LAST_LINE = -1; bptr = 0; mode = NORM; old_c = ' '; skip_next = in_id = in_white = 0; while (old_c != EOF) { c = getc(f); if (c == '\n') CURRENT_LINE++; retain = 0; /* Now, in NORM mode, we read one too many characters before deciding to start a new token */ if (mode == NORM) { /* already in id mode */ if (in_id) { if (id2(c)) { /* stay in mode */ retain = 1; buffer[bptr++] = c; } else { /* send off identifier */ buffer[bptr] = 0; t(ID,buffer); in_id = bptr = 0; } } /* already in white mode */ if (in_white) { if (is_white(c)) { /* stay in mode */ retain = 1; buffer[bptr++] = c; } else { /* send off white space */ buffer[bptr] = 0; t(WHITE,buffer); in_white = bptr = 0; } } /* Check if we are going to change modes now */ if (!in_white && is_white(c)) { /* start white mode */ retain = 1; buffer[bptr++] = c; in_white = 1; } if (!in_id && id1(c)) { /* start id mode */ retain = 1; in_id = 1; buffer[bptr++] = c; } /* start other modes */ switch (c) { case '/' : /* look ahead 1 character */ if (ungetc(getc(f),f) == '*') { retain = 1; mode = COMMENT; } break; case '\'' : retain = 1; mode = CHAR; break; case '\"' : retain = 1; mode = STRING; break; } } /* Now deal with the modes where we know when we are done */ switch (mode) { case COMMENT : retain = 1; buffer[bptr++] = c; if ((c == '/') && (old_c == '*')) { mode = NORM; buffer[bptr] = 0; t(COMMENT,buffer); bptr = 0; } break; case CHAR : retain = 1; buffer[bptr++] = c; if (skip_next) { skip_next = 0; } else { skip_next = (c == '\\'); if ((bptr > 1) && (c == '\'')) { mode = NORM; buffer[bptr] = 0; t(CHAR,buffer); bptr = 0; } } break; case STRING : retain = 1; buffer[bptr++] = c; if (skip_next) { skip_next = 0; } else { skip_next = (c == '\\'); if ((bptr > 1) && (c == '\"')) { mode = NORM; buffer[bptr] = 0; t(STRING,buffer); bptr = 0; } } break; } /* one-character token */ if (!retain) { buffer[0] = c; buffer[1] = 0; if (c != EOF) t(SPECIAL,buffer); } /* save previous character */ old_c = c; } } int count_main(argc,argv) int argc; char *argv[]; { int i,ier; FILE *fout,*mas,*f; char fbuf[80]; int exit(); ier = 0; if ((argc < 2) || ((argc == 2) && (strcmp(argv[1],"-") == 0))) { mas = fopen("cfiles.lis","rt"); if (mas) { fout = fopen("cfiles.out","wt"); if (!fout) exit(0); while (fscanf(mas,"%s",fbuf) == 1) { f = fopen(fbuf,"rt"); if (!f) { ier = 1; } else { tokenize(f,count_fn); printf("%s %ld %ld %ld %ld\n",fbuf, LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC); fprintf(fout,"%s %ld %ld %ld %ld\n",fbuf, LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC); fclose(f); } } fclose(mas); fclose(fout); } else { tokenize(stdin,count_fn); printf("%ld %ld %ld %ld\n",LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC); } } else { for (i = 1; i < argc; i++) { f = fopen(argv[i],"rt"); if (!f) { ier = 1; } else { tokenize(f,count_fn); printf("%s %ld %ld %ld %ld\n",argv[i], LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC); fclose(f); } } } return ier; } int main(argc,argv) int argc; char *argv[]; { #if VAX return !count_main(argc,argv); #else return count_main(argc,argv); #endif } --- cut here ---
rcd@ico.ISC.COM (Dick Dunn) (08/18/89)
swonk@ccicpg.UUCP (Glen Swonk) writes: > Does anyone have a program or a method of determing > the number of C source lines in a source file? > My assumption is that comments don't count as source > lines unless the comment is on a line with code. If you're on a UNIX system or have comparable tools, a simple awk script can do this much. However, you don't learn much from it. In particular, given the question: > Are there any other tools to measure the complexity > of a source file? it's clear you're off on the wrong foot. A count of source lines is NOT a useful measure of program size or complexity. Incidentally, be careful about the difference between size and complexity! As noted by flint@gistdev.UUCP (Flint Pellett): > Comment lines don't count? What are you going to use the count for when you > get it? ... If you're counting for purposes of > measuring productivity, then comment lines certainly do count, otherwise > you're going to be encouraging people to not document their code. Pellett is correct about the effect of not counting comment lines. However, if you go off counting lines as a measure of work, you'll see a useful comment like: /* lexcom - scan (a piece of) a comment * Return either T_COM if end of comment found or T_NULL if end of * line found first. * Also handles instate and comment counting. */ turn into a baroque display like: /************************************************************************/ /* */ /* FUNCTION NAME: lexcom */ /* */ /* RESULT TYPE: int */ /* */ /* ARGUMENTS: (none) */ /* */ /* PURPOSE: blather babble... */ /* */ [etc., ad nauseam...no sense wasting netbandwidth on it...] /* */ /************************************************************************/ The same thing will happen if you associate some reward or figure of merit with source-line count, or identifier length, etc...you'll see: for (p = s; *p; p++) { [stuff] } turn into: for (string_search_pointer = target_string; *string_search_pointer != STRING_TERMINATOR; string_search_pointer++) { [stuff] } When I've tried to measure C source-file size and complexity, I've used a program which does a simple analysis of the source but gives several measures, including the following: blank lines lines containing only comment text lines containing only code lines containing comment and code average comment length or histogram of lengths average number of tokens per line, per nonblank line average identifier length or histogram of lengths average nesting level (requires tedious explanation) count of occurrences of each keyword count of occurrences of literal constants, by type The result, of course, does NOT reduce program size or complexity to a single number. The token count is far more useful than a line count if you want to know "how much code" you've got, but it's still woefully inadequate. I offer two rules about measuring program size/complexity: 1. Any variant of "source line count" is useless as a measure of the program. I've heard countless times the rationalization that "Well, it may not be good, but it's the best we can do." This is WRONG! It's worse than no measure at all. It implies that you have information you don't really have. If it's used as a measure of productivity, it's particularly bad, because there are obvious ways to pervert any obvious measures--and all of them make for worse programs. 2. Programs are supposed to be good, not big. A program should be measured against what it is supposed to do. Sheer size is often unrelated to apparent complexity, and both may be unrelated to actual complexity (in terms of programming effort). Talking among various people I know, we've all come up with a joke about "negative productivity". You start the day with, say, a thousand lines of crappy code and end the day with 300 lines of clean code--thereby having produced -700 lines of code for the day. -- Dick Dunn rcd@ico.isc.com uucp: {ncar,nbires}!ico!rcd (303)449-2870 ...Are you making this up as you go along?
rcd@ico.ISC.COM (Dick Dunn) (08/18/89)
johnk@opel.UUCP (John Kennedy) writes: > It is not uncommon to generate NCSL (Non-Commentary Source Lines) for > purposes of productivity. I assume this intended "productivity measurement". And sure, it's not uncommon...but it's still wrong. A count of non-commentary source lines says nothing about the amount of actual code produced, nor about productivity. In fact, it's worse than that: If you tell people their productivity will be judged (in part) on NCSL, their coding style will change--probably for the worse. An experienced eye can often tell whether code was written with an eye to NCSL as a productivity measure! > ...NCSL estimates have a relationship to > size and execution time predictions... They have some relationship to code size--not very good, but possibly slightly useful if you compensate for density of code (number of tokens per line). They have little or no relationship to execution time predic- tions; in too many cases the relationship will be inverted. Consider replacing an algorithm which is O(n^2) in time with a more complicated O(n*log(n)) algorithm. Or consider manual inline expansion of time-critical code sequences instead of using a procedure. Both increase the program text size in order to decrease execution time. -- Dick Dunn rcd@ico.isc.com uucp: {ncar,nbires}!ico!rcd (303)449-2870 ...Are you making this up as you go along?
ajmyrvold@violet.waterloo.edu (Alan Myrvold) (08/18/89)
In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes: >-Does anyone have a program or a method of determing >-the number of C source lines in a source file? Ok. First off, sources don't really belong in comp.software-eng ... so I feel a bit guilty, but here's a reasonably portable C program to count : NCSL - non-commentary source lines LINES - source lines COMMENTS - C comments NCC - non-contiguous comments It will even run on systems where (heaven forbid) the argv[] list isn't as convienient to use as Unix's. And should compile with either a K&R or ANSI-style compiler. One known bug in the program is that VAX CC (and others) allow: #include "foo.c"" Which confuses the string parsing part of my program. Obfusicated C contest winners may also foul the program. Flames and comments to alanm@cognos.uucp, please. - Alan --- cut here --- /* LOC.C count C lines of code, comments */ /* For each c file, produces NCSL - non-commentary source lines LINES - source lines COMMENTS - C comments NCC - non-contiguous comments If invoked with no arguments and the file "cfiles.lis" does not exist, input is taken from stdin, output goes to stdout. If invoked with no arguments and "cfiles.lis" does exist, the filenames are assume to be in "cfiles.lis", and the output is written to BOTH stdout and "cfiles.out". If invoked with arguments, the args are taken as filenames, and output is written to stdout. */ /* Alan Myrvold 3755 Riverside Dr. uunet!mitel!sce!cognos!alanm */ /* Cognos Incorporated P.O. Box 9707 alanm@cognos.uucp */ /* (613) 738-1440 x5530 Ottawa, Ontario */ /* CANADA K1G 3Z4 */ #include <stdio.h> #include <string.h> #include <ctype.h> #define NORM 0 #define COMMENT 1 #define STRING 2 #define CHAR 3 #define ID 4 #define SPECIAL 5 #define WHITE 6 static long LINES_OF_CODE,LAST_LINE,CURRENT_LINE,COMMENTS,NCC,IS_CONTIG; #define id1(c) (isalpha(c) || ((c) == '_')) #define id2(c) (id1(c) || (('0' <= (c)) && ((c) <= '9')) || ((c) == '$')) #define is_white(c) (((c) == ' ') || ((c) == '\t') || ((c) == '\n')) void echo_fn(k,s) int k; char *s; { fputs(s,stdout); } void dump_white(s) char *s; { for (; *s; s++) { switch (*s) { case '\t' : printf("\\t"); break; case '\n' : printf("\\n"); break; case ' ' : printf("_"); break; default : putchar(*s); } } } void dump_fn(k,s) int k; char *s; { switch (k) { case ID : printf("ID %s\n",s); break; case COMMENT : printf("COMMENT %s\n",s); break; case SPECIAL : printf("SPECIAL %s\n",s); break; case STRING : printf("STRING %s\n",s); break; case CHAR : printf("CHAR %s\n",s); break; case WHITE : printf("WHITE "); dump_white(s); putchar('\n'); break; default : printf("unknown %s\n",s); } } void count_fn(k,s) int k; char *s; { switch (k) { case ID : case SPECIAL : case STRING : case CHAR : if (CURRENT_LINE != LAST_LINE) { LINES_OF_CODE++; LAST_LINE = CURRENT_LINE; } IS_CONTIG = 0; break; case COMMENT : COMMENTS++; if (!IS_CONTIG) { IS_CONTIG = 1; NCC++; } break; } } /* Beware trespassers of this code ... it is rather obtuse... */ /* but it SEEMS to work */ void tokenize(f,t) FILE *f; void (*t)(); { int skip_next,in_id,in_white,bptr,mode,c,old_c,retain; static char buffer[8000]; IS_CONTIG = NCC = LINES_OF_CODE = COMMENTS = CURRENT_LINE = 0; LAST_LINE = -1; bptr = 0; mode = NORM; old_c = ' '; skip_next = in_id = in_white = 0; while (old_c != EOF) { c = getc(f); if (c == '\n') CURRENT_LINE++; retain = 0; /* Now, in NORM mode, we read one too many characters before deciding to start a new token */ if (mode == NORM) { /* already in id mode */ if (in_id) { if (id2(c)) { /* stay in mode */ retain = 1; buffer[bptr++] = c; } else { /* send off identifier */ buffer[bptr] = 0; t(ID,buffer); in_id = bptr = 0; } } /* already in white mode */ if (in_white) { if (is_white(c)) { /* stay in mode */ retain = 1; buffer[bptr++] = c; } else { /* send off white space */ buffer[bptr] = 0; t(WHITE,buffer); in_white = bptr = 0; } } /* Check if we are going to change modes now */ if (!in_white && is_white(c)) { /* start white mode */ retain = 1; buffer[bptr++] = c; in_white = 1; } if (!in_id && id1(c)) { /* start id mode */ retain = 1; in_id = 1; buffer[bptr++] = c; } /* start other modes */ switch (c) { case '/' : /* look ahead 1 character */ if (ungetc(getc(f),f) == '*') { retain = 1; mode = COMMENT; } break; case '\'' : retain = 1; mode = CHAR; break; case '\"' : retain = 1; mode = STRING; break; } } /* Now deal with the modes where we know when we are done */ switch (mode) { case COMMENT : retain = 1; buffer[bptr++] = c; if ((c == '/') && (old_c == '*')) { mode = NORM; buffer[bptr] = 0; t(COMMENT,buffer); bptr = 0; } break; case CHAR : retain = 1; buffer[bptr++] = c; if (skip_next) { skip_next = 0; } else { skip_next = (c == '\\'); if ((bptr > 1) && (c == '\'')) { mode = NORM; buffer[bptr] = 0; t(CHAR,buffer); bptr = 0; } } break; case STRING : retain = 1; buffer[bptr++] = c; if (skip_next) { skip_next = 0; } else { skip_next = (c == '\\'); if ((bptr > 1) && (c == '\"')) { mode = NORM; buffer[bptr] = 0; t(STRING,buffer); bptr = 0; } } break; } /* one-character token */ if (!retain) { buffer[0] = c; buffer[1] = 0; if (c != EOF) t(SPECIAL,buffer); } /* save previous character */ old_c = c; } } int count_main(argc,argv) int argc; char *argv[]; { int i,ier; FILE *fout,*mas,*f; char fbuf[80]; int exit(); ier = 0; if ((argc < 2) || ((argc == 2) && (strcmp(argv[1],"-") == 0))) { mas = fopen("cfiles.lis","rt"); if (mas) { fout = fopen("cfiles.out","wt"); if (!fout) exit(0); while (fscanf(mas,"%s",fbuf) == 1) { f = fopen(fbuf,"rt"); if (!f) { ier = 1; } else { tokenize(f,count_fn); printf("%s %ld %ld %ld %ld\n",fbuf, LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC); fprintf(fout,"%s %ld %ld %ld %ld\n",fbuf, LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC); fclose(f); } } fclose(mas); fclose(fout); } else { tokenize(stdin,count_fn); printf("%ld %ld %ld %ld\n",LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC); } } else { for (i = 1; i < argc; i++) { f = fopen(argv[i],"rt"); if (!f) { ier = 1; } else { tokenize(f,count_fn); printf("%s %ld %ld %ld %ld\n",argv[i], LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC); fclose(f); } } } return ier; } int main(argc,argv) int argc; char *argv[]; { #if VAX return !count_main(argc,argv); #else return count_main(argc,argv); #endif } --- cut here ---
jgn@nvuxr.UUCP (Joe Niederberger) (08/18/89)
In article <6500@pdn.paradyne.com> reggie@dinsdale.paradyne.com (George W. Leach) writes: >In article <10707@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >>In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes: >>-Does anyone have a program or a method of determing >>-the number of C source lines in a source file? >>-My assumption is that comments don't count as source >>-lines unless the comment is on a line with code. > >>What precisely is this supposed to measure? > > I also want to know just what you are going to measure with this number? I'm often surprised at these sorts of statements. He obviously wants to (precisely) measure the number of C source lines in a source file, disregarding lines that only contains comments (or blank lines also I presume.) It seemed perfectly obvious to me 8^). I suppose the question on some peoples minds is really "what are you going to do with this measurement?" Perhaps he wants to look for a correlation with some other measurements. Now, there's not much chance of doing that if he can't obtain this measurement in the first place, is there ? If these observations seem obvious, then just maybe they are correct. This wasn't a flame, just a flicker. Joe Niederberger
hallett@shoreland.uucp (Jeff Hallett x4-6328) (08/19/89)
In article <16018@vail.ICO.ISC.COM> rcd@ico.ISC.COM (Dick Dunn) writes: >swonk@ccicpg.UUCP (Glen Swonk) writes: >> Does anyone have a program or a method of determing >> the number of C source lines in a source file? >> My assumption is that comments don't count as source >> lines unless the comment is on a line with code. In my former job, we came up with a way to measure C lines in a way that suited us. The basic approach was to 1. Remove all comments 2. Ensure that there was only 1 "statement" of code per textual line (a stmt here may be a curly brace or null stmt (solitary ;)) 3. Removed all blank lines, braces and ; with no text with them. 4. Removed all 'do' keywords (they do no work). 5. Pulled all broken function calls together on one line (ie. where a newline was inserted between parameters to make the call prettier) 5. Count the lines which are left. Granted, this implies some "sanity" on the part of the programmer not to do some really weird things (like put the ; for a statement on the line below the statement), but on the whole this procedure (done mostly with sed scripts) produced what we would have done by hand. >it's clear you're off on the wrong foot. A count of source lines is NOT a >useful measure of program size or complexity. Incidentally, be careful >about the difference between size and complexity! > Excellent point about size vs. complexity. However, "size" is a nebulous term (more below). > >I offer two rules about measuring program size/complexity: > >1. Any variant of "source line count" is useless as a measure of the >program. > I've heard countless times the rationalization that "Well, it may > not be good, but it's the best we can do." This is WRONG! It's > worse than no measure at all. It implies that you have information I agree that LOC really is a bad measure of productivity, but so are most of the items listed by Dick in his earlier posting. Productivity of a coder is a difficult thing and most methods I've heard of are really inadequate since I think that writing code is really still more an art than a science or manufacturing system. However, LOC is still a good estimator of cost. I say this with the caveat that different s/w houses will have different correlations and that it is still stongly linked to complexity. This is why I like methods like Cocomo which attempt to relate lines produced with various drivers, both about the nature of the code and programmers involved, to produce estimates of cost and time. Also, most of these methods can be modified to reflect a particular production site. How one defines "size" I don't think is as important as how consistently and accurately it can be measured and what it is used for. To judge quality of ANY system based on its size alone is foolhardy and especially to use systems that encourage programmers to bloat their code are destructive (as Dick points out). I encourage Glen to not only check out various software economics books, but also managerial evaluation and operations research texts to determine useful ways to utilize what is collected. -- Jeffrey A. Hallett, PET Software Engineering GE Medical Systems, W641, PO Box 414 Milwaukee, WI 53201 (414) 548-5173 : EMAIL - hallett@postron.gemed.ge.com
garye@hp-ptp.HP.COM (Gary_Ericson) (08/19/89)
> ...measuring lines of code to indicate effective programming...
I agree with other posters. PLEASE, PLEASE, PLEASE don't encourage the idea
that counting lines of code provides *any* indication of good programming,
productivity, effectiveness, whatever.
I came from a group working on a small mini-computer used in real-time
applications. The keys to good programming were "small" and "fast". There was
limited logical memory space, so fewer instructions were (almost) always
better. And the very critical factor of speed was often affected by code size
(execution time, how long it took to pull the program into and out of main
memory, etc.). Some of the most incredible coding examples I've seen were
little pieces of code with only 3 or 4 assembly language instructions. The
creator may have spent days designing it. Is that productive? Very, because
they may have been the most significant parts of the system.
Instead of counting lines of code, maybe we need to find a way to measure
"intelligence density". I don't know of any such measure...
Gary Ericson - Hewlett-Packard, Workstation Systems Division
phone: (408)746-5098 mailstop: 101N email: gary@hpdsla9.hp.com
EGNILGES@pucc.Princeton.EDU (Ed Nilges) (08/19/89)
In article <895@mrsvr.UUCP>, hallett@shoreland.uucp (Jeff Hallett x4-6328) writes: >In article <16018@vail.ICO.ISC.COM> rcd@ico.ISC.COM (Dick Dunn) writes: >>swonk@ccicpg.UUCP (Glen Swonk) writes: >>> Does anyone have a program or a method of determing >>> the number of C source lines in a source file? Turns out this unambiguous measure is ambiguous. I modified a compiler a while back to take comments out of its count of source lines. I ended up counting, not "lines" (which is an incoherent concept in any language which allows multiple statements per line or multiple lines per statement) but syntactical "statements". That meant that in the following code DO while foo foo := bar+1 END there are three lines but two statements, one of which is contained in the other! I believe that this is the best of a not too good collection of complexity measures. ************************************************************************ Edward Nilges "If the universe were perfect, it wouldn't exist" - Yogi Berra
jdc@naucse.UUCP (John Campbell) (08/20/89)
From article <6500@pdn.paradyne.com>, by reggie@dinsdale.nm.paradyne.com (George W. Leach): > In article <10707@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >>In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes: >>-Does anyone have a program or a method of determing >>-the number of C source lines in a source file? >>-My assumption is that comments don't count as source >>-lines unless the comment is on a line with code. > >>What precisely is this supposed to measure? > > I also want to know just what you are going to measure with this number? Who knows? I, for one, think the comments are the only valid part of the code--what if the language changes or a better algorithm is found. Hope Glen's employeer isn't trying to create some sweatshop. Anyway, here's a lex goodie I use to count comments, *exactly* what he wanted, right? Note that the output is in lines of 'C' code, so you could look very productive if you counted those lines of code instead! OBTW, this comment recognizer works well enough for my style of commenting. It does not solve the general problem of recognizing ANSI 'C' comments with a regular expression. A solution to that problem was posted a while back, but it's pretty ugly... =====cut here for comments.l==== %{ FILE *ifp; int lineno=0, incom=0, com_bytes=0, cod_bytes=0,comments=0, code=0, scom = 0; #define yyin ifp %} %% \/\* { incom = 1; scom = 1; com_bytes +=2; } \*\/ { incom = 0; com_bytes +=2; } \n { lineno++; if (incom) com_bytes++; else cod_bytes++; if (scom) comments++; else code++; if (!incom) scom = 0; } . { if (incom) com_bytes += yyleng; else cod_bytes += yyleng; } %% main(argc, argv) int argc; char *argv[]; { int i; FILE *fopen(); if (argc < 2) { fprintf (stderr, "useage: %s in_file [in_file2] [...]\n", argv[0]); exit(1); } for (i = 1; i < argc; i++) { if ((ifp = fopen (argv[i], "r")) == NULL) { fprintf (stderr, "%s: can't open %s\n", argv[0], argv[i]); exit(1); } yylex(); if (argc > 2) printf ("\n"); printf ("%-14s lines:%5d, comment:%5d, code:%5d, comment/lines: %5.3f\n", argv[i], lineno, comments, code, ((float )comments)/lineno); printf (" bytes:%5d, comment:%5d, code:%5d, comment/total: %5.3f\n", com_bytes+cod_bytes, com_bytes, cod_bytes, ((float )com_bytes)/(com_bytes+cod_bytes)); } /* test.c lines: 10, comment: 4, code: 6, comment/lines: 0.400 bytes: 201, comment: 81, code: 120, comment/total: 0.403 */ } yywrap() { return(1); } -- John Campbell ...!arizona!naucse!jdc CAMPBELL@NAUVAX.bitnet unix? Sure send me a dozen, all different colors.
lowell@tc.fluke.COM (Lowell Skoog) (08/21/89)
> A count of non-commentary source lines says nothing about the amount > of actual code produced, nor about productivity. In fact, it's worse > than that: If you tell people their productivity will be judged (in > part) on NCSL, their coding style will change--probably for the > worse. Then that's your own fault for mis-applying the metric. If you set out to use metrics to rate programmer productivity, then you're asking for trouble. Instead you should use metrics to characterize your software development process so that you can improve the process and predict future performance. When used consistently in this manner, even simple NCSL metrics can be quite valuable. By dismissing metrics completely you are throwing the baby out with the bath water. --------------------------------------------------------------------------- Lowell Skoog - John Fluke Mfg. Co. Inc., P.O. Box C9090, Everett, WA 98206 lowell@tc.fluke.COM | {uw-beaver,microsoft,sun}!fluke!lowell | 206/356-5283
hughes@math.berkeley.edu (Eric Hughes) (08/22/89)
In article <1658@naucse.UUCP>, jdc@naucse (John Campbell) writes: >Anyway, here's a lex goodie I use to count comments, *exactly* what he >wanted, right? Note that the output is in lines of 'C' code, so you could >look very productive if you counted those lines of code instead! > >OBTW, this comment recognizer works well enough for my style of commenting. >It does not solve the general problem of recognizing ANSI 'C' comments with a >regular expression. A solution to that problem was posted a while back, but >it's pretty ugly... Flex, the lex replacement by Vern Paxson, has a wonderful capability to recognize comments that does not require a large ugly regexp and will not overflow the input buffer. One makes an exclusive start condition which represents the predicate "the input pointer is inside a comment." Then the start and end of comment markers can be recognized separately. This technique can also be use to recognize string and character constants, and should be for a general purpose program, to eliminate the possibility that a comment start marker appears inside a string. Eric Hughes hughes@math.berkeley.edu ucbvax!math!hughes ------------cut here------------- /* Small flex program to recognize C-style comments in text. */ %x COMMENT %% "/*" BEGIN( COMMENT ) ; . ECHO ; <COMMENT>"*/" BEGIN( 0 ) ; <COMMENT>"*" | <COMMENT>[^*\n]+ | <COMMENT>\n ; %%
joannz@halley.UUCP (Joann Zimmerman) (08/22/89)
There are indeed other things to count non-commented source lines for than just attempts to measure productivity or estimate time-to-completion. In a former life, a company for which I worked was involved in an effort to characterize the efficiency of the software QA process, and to estimate required testing time. By the time I left, a value for the number of non-commented lines changed was beginning to look as though it would prove very useful. This required somewhat more sophisticated difference measures than could be supplied by just running diff, so I invented a multi-language (C, C++, Pascal and Apollo Aegis Shell/Mentor Graphics Userware) line-count program allowing counts of both commented and non-commented lines, and inclusion (once only) or exclusion of include files. You could operate it over a single program or over a given DSEE library. Yes, we were going for some complexity measurements as well, but lines made a very good start. -- "A woman seldom writes her mind but in her postscript" - Richard Steele Joann Zimmerman Tandem Computers Austin, TX ...!{rutgers,harvard,gatech,uunet}!cs.texas.edu!halley!joannz
rcd@ico.ISC.COM (Dick Dunn) (08/22/89)
In article <1658@naucse.UUCP>, jdc@naucse.UUCP (John Campbell) writes: [discussion about the purpose of counting, significance of comments, etc. deleted] > ...I, for one, think the comments are the only valid part of the > code--what if the language changes or a better algorithm is found... The BS alarm just went off! When we talk about software engineering, it's easy to get so wrapped up in platitudes that we say things that only make sense if we don't think about them. Good comments are right up there with motherhood and apple pie, but it's the code that gets compiled and shipped! The code is, after all, the part that describes the algorithm precisely; comments are an aid to understanding it. The code is also the part that gets checked for consistency, and gets tested. -- Dick Dunn rcd@ico.isc.com uucp: {ncar,nbires}!ico!rcd (303)449-2870 ...Are you making this up as you go along?
jcgs@wundt.harlqn.uucp (John Sturdy) (08/23/89)
garye@hp-ptp.HP.COM (Gary_Ericson): >Instead of counting lines of code, maybe we need to find a way to measure >"intelligence density". I don't know of any such measure... For the same system, time taken to compile might reflect this fairly closely. -- __John When asked to attend a court case, Father Moses took with him a leaking jug of water. Asked about it, he said: "You ask me to judge the faults of another, while mine run out like water behind me." jcgs@uk.co.harlqn (UK notation) jcgs@harlqn.co.uk (most places) ...!mcvax!ukc!harlqn!jcgs (uucp - really has more stages, but ukc knows us) John Sturdy Telephone +44-223-872522 Harlequin Ltd, Barrington Hall, Barrington, Cambridge, UK