[comp.software-eng] C source lines in file

swonk@ccicpg.UUCP (Glen Swonk) (08/10/89)

Does anyone have a program or a method of determing
the number of C source lines in a source file?
My assumption is that comments don't count as source
lines unless the comment is on a line with code.

Are there any other tools to measure the complexity
of a source file?


thanks
-- 
Glenn L. Swonk		CCI Computers 
(714)458-7282		9801 Muirlands Boulevard
			Irvine, CA 92718
uunet!ccicpg!swonk

flint@gistdev.UUCP (Flint Pellett) (08/11/89)

Comment lines don't count?  What are you going to use the count for when you
get it?  Everytime I've wanted to do this, I've wanted to count every line
except blank lines, which is easy.  If you're counting for purposes of
measuring productivity, then comment lines certainly do count, otherwise
you're going to be encouraging people to not document their code.

-- 
Flint Pellett, Global Information Systems Technology, Inc.
1800 Woodfield Drive, Savoy, IL  61874     (217) 352-1165
INTERNET: flint%gistdev@uxc.cso.uiuc.edu
UUCP:     {uunet,pur-ee,convex}!uiucuxc!gistdev!flint

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/12/89)

In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes:
-Does anyone have a program or a method of determing
-the number of C source lines in a source file?
-My assumption is that comments don't count as source
-lines unless the comment is on a line with code.

What precisely is this supposed to measure?

johnk@opel.UUCP (John Kennedy) (08/14/89)

In article <10707@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes:
>-Does anyone have a program or a method of determing
>-the number of C source lines in a source file?
>-My assumption is that comments don't count as source
>-lines unless the comment is on a line with code.
>
>What precisely is this supposed to measure?

It is not uncommon to generate NCSL (Non-Commentary Source Lines) for
purposes of productivity.  No, this does not encourage programmers
not to comment their files.  NCSL estimates have a relationship to
size and execution time predictions.  Comments do not.

John
-- 
John Kennedy                     johnk@opel.UUCP
Second Source, Inc.
Annapolis, MD

kazua-u@ascii.JUNET (Kazuaki Ueno) (08/14/89)

In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes:

 |Does anyone have a program or a method of determing
 |the number of C source lines in a source file?
 |My assumption is that comments don't count as source
 |lines unless the comment is on a line with code.
 |
	How about trying this:

grep -v '^#[ 	]*include' <filename>  | /lib/cpp -P | grep -v '^[ 	]*$' | wc -l 
( Inside []'s are a SPACE and a TAB. )

	You will get the number of 'effective' C source lines with this list of 
	commands.  Of course I do not care what you do with it. :-)
	
--
Kazuaki Uyeno	ASCII Corporation, Tokyo, Japan
also a student of Univ. of Tokyo
kazua-u@ascii.JUNET

reggie@dinsdale.nm.paradyne.com (George W. Leach) (08/14/89)

In article <10707@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes:
>-Does anyone have a program or a method of determing
>-the number of C source lines in a source file?
>-My assumption is that comments don't count as source
>-lines unless the comment is on a line with code.

>What precisely is this supposed to measure?

   I also want to know just what you are going to measure with this number?

   A much simplier approach would be to use awk to strip out what you don't
want and  pipe to wc.


   Then again, a much simplier way to measure code size with equally worthwhile
scientific precision is to just measure the thinkness of your printouts with
a ruler :-)




George W. Leach					AT&T Paradyne 
(uunet|att)!pdn!reggie				Mail stop LG-133
Phone: 1-813-530-2376				P.O. Box 2826
FAX: 1-813-530-8224				Largo, FL  USA  34649-2826

linda@rtech.rtech.com (Linda Mundy) (08/17/89)

In article <6500@pdn.paradyne.com> reggie@dinsdale.paradyne.com (George W. Leach) writes:

> [deleted discussion of what's being measured when counting lines of code...]
>
>   Then again, a much simplier way to measure code size with equally worthwhile
>scientific precision is to just measure the thinkness of your printouts with
                                             ^^^^^^^^^
>a ruler :-)

Aha!  so that's what we're trying to measure here...  I don't think a ruler
will suffice!  :-)
>
>George W. Leach					AT&T Paradyne 
>(uunet|att)!pdn!reggie				Mail stop LG-133
>Phone: 1-813-530-2376				P.O. Box 2826
>FAX: 1-813-530-8224				Largo, FL  USA  34649-2826


-- 
"Who are you to tell me to question authority?"

Linda Mundy	{ucbvax,decvax}!mtxinu!rtech!linda

alanm@cognos.UUCP (Alan Myrvold) (08/17/89)

In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes:
>-Does anyone have a program or a method of determing
>-the number of C source lines in a source file?

Ok. First off, sources don't really belong in comp.software-eng ...
so I feel a bit guilty, but here's a reasonably portable C program
to count :

     NCSL - non-commentary source lines
     LINES - source lines
     COMMENTS - C comments
     NCC - non-contiguous comments

It will even run on systems where (heaven forbid) the argv[] list
isn't as convienient to use as Unix's. And should compile with
either a K&R or ANSI-style compiler.

On known bug in the program is that VAX CC (and others) allow:

#include "foo.c""

Which confuses the string parsing part of my program.
Obfusicated C contest winners may also fould the program.

Flames and comments to alanm@cognos.uucp, please.

                                 - Alan
--- cut here ---
/* LOC.C count C lines of code, comments                                  */
/* For each c file, produces
     NCSL - non-commentary source lines
     LINES - source lines
     COMMENTS - C comments
     NCC - non-contiguous comments

If invoked with no arguments and the file "cfiles.lis" does not
exist, input is taken from stdin, output goes to stdout.

If invoked with no arguments and "cfiles.lis" does exist,
the filenames are assume to be in "cfiles.lis", and the
output is written to BOTH stdout and "cfiles.out".

If invoked with arguments, the args are taken as filenames, and
output is written to stdout.

*/

/* Alan Myrvold          3755 Riverside Dr.  uunet!mitel!sce!cognos!alanm */
/* Cognos Incorporated   P.O. Box 9707       alanm@cognos.uucp            */
/* (613) 738-1440 x5530  Ottawa, Ontario                                  */
/*                       CANADA  K1G 3Z4                                  */

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define NORM 0
#define COMMENT 1
#define STRING 2
#define CHAR 3
#define ID 4
#define SPECIAL 5
#define WHITE 6

static long LINES_OF_CODE,LAST_LINE,CURRENT_LINE,COMMENTS,NCC,IS_CONTIG;

#define id1(c) (isalpha(c) || ((c) == '_'))
#define id2(c) (id1(c) || (('0' <= (c)) && ((c) <= '9')) || ((c) == '$'))
#define is_white(c) (((c) == ' ') || ((c) == '\t') || ((c) == '\n'))

void echo_fn(k,s)
int k;
char *s;
{
   fputs(s,stdout);
}

void dump_white(s)
char *s;
{
   for (; *s; s++) {
       switch (*s) {
          case '\t' : printf("\\t"); break;
          case '\n' : printf("\\n"); break;
          case ' '  : printf("_"); break;
          default   : putchar(*s);
       }
   }
}

void dump_fn(k,s)
int k;
char *s;
{
   switch (k) {
      case ID : printf("ID %s\n",s); break;
      case COMMENT : printf("COMMENT %s\n",s); break;
      case SPECIAL : printf("SPECIAL %s\n",s); break;
      case STRING : printf("STRING %s\n",s); break;
      case CHAR : printf("CHAR %s\n",s); break;
      case WHITE : printf("WHITE ");
                   dump_white(s); 
                   putchar('\n');
                   break;
      default : printf("unknown %s\n",s);
   }
}

void count_fn(k,s)
int k;
char *s;
{
   switch (k) {
      case ID : 
      case SPECIAL :
      case STRING : 
      case CHAR :
          if (CURRENT_LINE != LAST_LINE) {
             LINES_OF_CODE++;
             LAST_LINE = CURRENT_LINE;
          }
          IS_CONTIG = 0;
          break;
      case COMMENT : COMMENTS++; 
                     if (!IS_CONTIG) {
                        IS_CONTIG = 1;
                        NCC++;
                     }
                     break;
   }
}

/* Beware trespassers of this code ... it is rather obtuse... */
/* but it SEEMS to work */
void tokenize(f,t)
FILE *f;
void (*t)();
{
   int skip_next,in_id,in_white,bptr,mode,c,old_c,retain;
   static char buffer[8000];

   IS_CONTIG = NCC = LINES_OF_CODE = COMMENTS = CURRENT_LINE = 0;
   LAST_LINE = -1;
   bptr = 0;
   mode = NORM;
   old_c = ' ';
   skip_next = in_id = in_white = 0;
   while (old_c != EOF) {
      c = getc(f);
      if (c == '\n') CURRENT_LINE++;
      retain = 0;

      /* Now, in NORM mode, we read one too many
         characters before deciding to start a new
         token */
      if (mode == NORM) {

         /* already in id mode */
         if (in_id) {
            if (id2(c)) {
               /* stay in mode */
               retain = 1;
               buffer[bptr++] = c;
            } else {
               /* send off identifier */
               buffer[bptr] = 0;
               t(ID,buffer);
               in_id = bptr = 0;
            }
         }

         /* already in white mode */
         if (in_white) {
            if (is_white(c)) {
               /* stay in mode */
               retain = 1;
               buffer[bptr++] = c;
            } else {
               /* send off white space */
               buffer[bptr] = 0;
               t(WHITE,buffer);
               in_white = bptr = 0;
            }
         }

         /* Check if we are going to change modes now */

         if (!in_white && is_white(c)) {
            /* start white mode */
            retain = 1;
            buffer[bptr++] = c;
            in_white = 1;
         }

         if (!in_id && id1(c)) {
            /* start id mode */
            retain = 1;
            in_id = 1;
            buffer[bptr++] = c;
         }

         /* start other modes */
         switch (c) {
            case '/'  : 
               /* look ahead 1 character */
               if (ungetc(getc(f),f) == '*') {
                  retain = 1;
                  mode = COMMENT; 
               }
            break;
            case '\'' : retain = 1; mode = CHAR; break;
            case '\"' : retain = 1; mode = STRING; break;
        }
      }

      /* Now deal with the modes where we know when we are done */
      switch (mode) {
         case COMMENT : 
           retain = 1;
           buffer[bptr++] = c;
           if ((c == '/') && (old_c == '*')) {
              mode = NORM; 
              buffer[bptr] = 0;
              t(COMMENT,buffer);
              bptr = 0;
           }
         break;
         case CHAR :
           retain = 1;
           buffer[bptr++] = c;
           if (skip_next) {
              skip_next = 0;
           } else {
              skip_next = (c == '\\');
              if ((bptr > 1) && (c == '\'')) {
                 mode = NORM; 
                 buffer[bptr] = 0;
                 t(CHAR,buffer);
                 bptr = 0;
              }
           }
           break;
         case STRING :
           retain = 1;
           buffer[bptr++] = c;
           if (skip_next) {
              skip_next = 0;
           } else {
              skip_next = (c == '\\');
              if ((bptr > 1) && (c == '\"')) {
                 mode = NORM; 
                 buffer[bptr] = 0;
                 t(STRING,buffer);
                 bptr = 0;
              }
           }
           break;
      }

      /* one-character token */
      if (!retain) {
         buffer[0] = c;
         buffer[1] = 0;
         if (c != EOF) t(SPECIAL,buffer);
      }

      /* save previous character */
      old_c = c;
   }
}


int count_main(argc,argv)
int argc;
char *argv[];
{
   int i,ier;
   FILE *fout,*mas,*f;
   char fbuf[80];
   int exit();

   ier = 0;
   if ((argc < 2) || ((argc == 2) && (strcmp(argv[1],"-") == 0))) {
      mas = fopen("cfiles.lis","rt");
      if (mas) {
         fout = fopen("cfiles.out","wt");
         if (!fout) exit(0);
         while (fscanf(mas,"%s",fbuf) == 1) {
            f = fopen(fbuf,"rt");
            if (!f) {
               ier = 1;
            } else {
               tokenize(f,count_fn);
               printf("%s %ld %ld %ld %ld\n",fbuf,
                      LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC);
               fprintf(fout,"%s %ld %ld %ld %ld\n",fbuf,
                      LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC);
               fclose(f);
            }
         }
         fclose(mas);
         fclose(fout);
      } else {
         tokenize(stdin,count_fn);
         printf("%ld %ld %ld %ld\n",LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC);
      }
   } else {
      for (i = 1; i < argc; i++) {
          f = fopen(argv[i],"rt");
          if (!f) {
             ier = 1;
          } else {
             tokenize(f,count_fn);
             printf("%s %ld %ld %ld %ld\n",argv[i],
                    LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC);
             fclose(f);
          }
      }
   }
   return ier;
}

int main(argc,argv)
int argc;
char *argv[];
{
#if VAX
   return !count_main(argc,argv);
#else
   return count_main(argc,argv);
#endif
}
--- cut here ---

rcd@ico.ISC.COM (Dick Dunn) (08/18/89)

swonk@ccicpg.UUCP (Glen Swonk) writes:
> Does anyone have a program or a method of determing
> the number of C source lines in a source file?
> My assumption is that comments don't count as source
> lines unless the comment is on a line with code.

If you're on a UNIX system or have comparable tools, a simple awk script
can do this much.  However, you don't learn much from it.  In particular,
given the question:

> Are there any other tools to measure the complexity
> of a source file?

it's clear you're off on the wrong foot.  A count of source lines is NOT a
useful measure of program size or complexity.  Incidentally, be careful
about the difference between size and complexity!

As noted by flint@gistdev.UUCP (Flint Pellett):

> Comment lines don't count?  What are you going to use the count for when you
> get it? ... If you're counting for purposes of
> measuring productivity, then comment lines certainly do count, otherwise
> you're going to be encouraging people to not document their code.

Pellett is correct about the effect of not counting comment lines.
However, if you go off counting lines as a measure of work, you'll see a
useful comment like:

/*	lexcom - scan (a piece of) a comment
 *	Return either T_COM if end of comment found or T_NULL if end of
 *	line found first.
 *	Also handles instate and comment counting.
 */

turn into a baroque display like:

/************************************************************************/
/*                                                                      */
/*    FUNCTION NAME:    lexcom                                          */
/*                                                                      */
/*    RESULT TYPE:      int                                             */
/*                                                                      */
/*    ARGUMENTS:        (none)                                          */
/*                                                                      */
/*    PURPOSE:          blather babble...                               */
/*                                                                      */
[etc., ad nauseam...no sense wasting netbandwidth on it...]
/*                                                                      */
/************************************************************************/

The same thing will happen if you associate some reward or figure of merit
with source-line count, or identifier length, etc...you'll see:

	for (p = s; *p; p++) {
		[stuff]
	}

turn into:

	for (string_search_pointer = target_string;
		*string_search_pointer != STRING_TERMINATOR;
		string_search_pointer++)
	{
		[stuff]
	}

When I've tried to measure C source-file size and complexity, I've used a
program which does a simple analysis of the source but gives several
measures, including the following:
	blank lines
	lines containing only comment text
	lines containing only code
	lines containing comment and code
	average comment length or histogram of lengths
	average number of tokens per line, per nonblank line
	average identifier length or histogram of lengths
	average nesting level (requires tedious explanation)
	count of occurrences of each keyword
	count of occurrences of literal constants, by type
The result, of course, does NOT reduce program size or complexity to a
single number.  The token count is far more useful than a line count if you
want to know "how much code" you've got, but it's still woefully
inadequate.

I offer two rules about measuring program size/complexity:

1.  Any variant of "source line count" is useless as a measure of the
program.
	I've heard countless times the rationalization that "Well, it may
	not be good, but it's the best we can do."  This is WRONG!  It's
	worse than no measure at all.  It implies that you have information
	you don't really have.  If it's used as a measure of productivity,
	it's particularly bad, because there are obvious ways to pervert
	any obvious measures--and all of them make for worse programs.

2.  Programs are supposed to be good, not big.
	A program should be measured against what it is supposed to do.
	Sheer size is often unrelated to apparent complexity, and both may
	be unrelated to actual complexity (in terms of programming effort).

Talking among various people I know, we've all come up with a joke about
"negative productivity".  You start the day with, say, a thousand lines of
crappy code and end the day with 300 lines of clean code--thereby having
produced -700 lines of code for the day.
-- 
Dick Dunn     rcd@ico.isc.com    uucp: {ncar,nbires}!ico!rcd     (303)449-2870
   ...Are you making this up as you go along?

rcd@ico.ISC.COM (Dick Dunn) (08/18/89)

johnk@opel.UUCP (John Kennedy) writes:
> It is not uncommon to generate NCSL (Non-Commentary Source Lines) for
> purposes of productivity.

I assume this intended "productivity measurement".  And sure, it's not
uncommon...but it's still wrong.  A count of non-commentary source lines
says nothing about the amount of actual code produced, nor about
productivity.  In fact, it's worse than that:  If you tell people their
productivity will be judged (in part) on NCSL, their coding style will
change--probably for the worse.  An experienced eye can often tell whether
code was written with an eye to NCSL as a productivity measure!

> ...NCSL estimates have a relationship to
> size and execution time predictions...

They have some relationship to code size--not very good, but possibly
slightly useful if you compensate for density of code (number of tokens
per line). They have little or no relationship to execution time predic-
tions; in too many cases the relationship will be inverted.

Consider replacing an algorithm which is O(n^2) in time with a more
complicated O(n*log(n)) algorithm.  Or consider manual inline expansion
of time-critical code sequences instead of using a procedure.  Both
increase the program text size in order to decrease execution time.
-- 
Dick Dunn     rcd@ico.isc.com    uucp: {ncar,nbires}!ico!rcd     (303)449-2870
   ...Are you making this up as you go along?

ajmyrvold@violet.waterloo.edu (Alan Myrvold) (08/18/89)

In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes:
>-Does anyone have a program or a method of determing
>-the number of C source lines in a source file?

Ok. First off, sources don't really belong in comp.software-eng ...
so I feel a bit guilty, but here's a reasonably portable C program
to count :

     NCSL - non-commentary source lines
     LINES - source lines
     COMMENTS - C comments
     NCC - non-contiguous comments

It will even run on systems where (heaven forbid) the argv[] list
isn't as convienient to use as Unix's. And should compile with
either a K&R or ANSI-style compiler.

One known bug in the program is that VAX CC (and others) allow:

#include "foo.c""

Which confuses the string parsing part of my program.
Obfusicated C contest winners may also foul the program.

Flames and comments to alanm@cognos.uucp, please.

                                 - Alan
--- cut here ---
/* LOC.C count C lines of code, comments                                  */
/* For each c file, produces
     NCSL - non-commentary source lines
     LINES - source lines
     COMMENTS - C comments
     NCC - non-contiguous comments

If invoked with no arguments and the file "cfiles.lis" does not
exist, input is taken from stdin, output goes to stdout.

If invoked with no arguments and "cfiles.lis" does exist,
the filenames are assume to be in "cfiles.lis", and the
output is written to BOTH stdout and "cfiles.out".

If invoked with arguments, the args are taken as filenames, and
output is written to stdout.

*/

/* Alan Myrvold          3755 Riverside Dr.  uunet!mitel!sce!cognos!alanm */
/* Cognos Incorporated   P.O. Box 9707       alanm@cognos.uucp            */
/* (613) 738-1440 x5530  Ottawa, Ontario                                  */
/*                       CANADA  K1G 3Z4                                  */

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define NORM 0
#define COMMENT 1
#define STRING 2
#define CHAR 3
#define ID 4
#define SPECIAL 5
#define WHITE 6

static long LINES_OF_CODE,LAST_LINE,CURRENT_LINE,COMMENTS,NCC,IS_CONTIG;

#define id1(c) (isalpha(c) || ((c) == '_'))
#define id2(c) (id1(c) || (('0' <= (c)) && ((c) <= '9')) || ((c) == '$'))
#define is_white(c) (((c) == ' ') || ((c) == '\t') || ((c) == '\n'))

void echo_fn(k,s)
int k;
char *s;
{
   fputs(s,stdout);
}

void dump_white(s)
char *s;
{
   for (; *s; s++) {
       switch (*s) {
          case '\t' : printf("\\t"); break;
          case '\n' : printf("\\n"); break;
          case ' '  : printf("_"); break;
          default   : putchar(*s);
       }
   }
}

void dump_fn(k,s)
int k;
char *s;
{
   switch (k) {
      case ID : printf("ID %s\n",s); break;
      case COMMENT : printf("COMMENT %s\n",s); break;
      case SPECIAL : printf("SPECIAL %s\n",s); break;
      case STRING : printf("STRING %s\n",s); break;
      case CHAR : printf("CHAR %s\n",s); break;
      case WHITE : printf("WHITE ");
                   dump_white(s); 
                   putchar('\n');
                   break;
      default : printf("unknown %s\n",s);
   }
}

void count_fn(k,s)
int k;
char *s;
{
   switch (k) {
      case ID : 
      case SPECIAL :
      case STRING : 
      case CHAR :
          if (CURRENT_LINE != LAST_LINE) {
             LINES_OF_CODE++;
             LAST_LINE = CURRENT_LINE;
          }
          IS_CONTIG = 0;
          break;
      case COMMENT : COMMENTS++; 
                     if (!IS_CONTIG) {
                        IS_CONTIG = 1;
                        NCC++;
                     }
                     break;
   }
}

/* Beware trespassers of this code ... it is rather obtuse... */
/* but it SEEMS to work */
void tokenize(f,t)
FILE *f;
void (*t)();
{
   int skip_next,in_id,in_white,bptr,mode,c,old_c,retain;
   static char buffer[8000];

   IS_CONTIG = NCC = LINES_OF_CODE = COMMENTS = CURRENT_LINE = 0;
   LAST_LINE = -1;
   bptr = 0;
   mode = NORM;
   old_c = ' ';
   skip_next = in_id = in_white = 0;
   while (old_c != EOF) {
      c = getc(f);
      if (c == '\n') CURRENT_LINE++;
      retain = 0;

      /* Now, in NORM mode, we read one too many
         characters before deciding to start a new
         token */
      if (mode == NORM) {

         /* already in id mode */
         if (in_id) {
            if (id2(c)) {
               /* stay in mode */
               retain = 1;
               buffer[bptr++] = c;
            } else {
               /* send off identifier */
               buffer[bptr] = 0;
               t(ID,buffer);
               in_id = bptr = 0;
            }
         }

         /* already in white mode */
         if (in_white) {
            if (is_white(c)) {
               /* stay in mode */
               retain = 1;
               buffer[bptr++] = c;
            } else {
               /* send off white space */
               buffer[bptr] = 0;
               t(WHITE,buffer);
               in_white = bptr = 0;
            }
         }

         /* Check if we are going to change modes now */

         if (!in_white && is_white(c)) {
            /* start white mode */
            retain = 1;
            buffer[bptr++] = c;
            in_white = 1;
         }

         if (!in_id && id1(c)) {
            /* start id mode */
            retain = 1;
            in_id = 1;
            buffer[bptr++] = c;
         }

         /* start other modes */
         switch (c) {
            case '/'  : 
               /* look ahead 1 character */
               if (ungetc(getc(f),f) == '*') {
                  retain = 1;
                  mode = COMMENT; 
               }
            break;
            case '\'' : retain = 1; mode = CHAR; break;
            case '\"' : retain = 1; mode = STRING; break;
        }
      }

      /* Now deal with the modes where we know when we are done */
      switch (mode) {
         case COMMENT : 
           retain = 1;
           buffer[bptr++] = c;
           if ((c == '/') && (old_c == '*')) {
              mode = NORM; 
              buffer[bptr] = 0;
              t(COMMENT,buffer);
              bptr = 0;
           }
         break;
         case CHAR :
           retain = 1;
           buffer[bptr++] = c;
           if (skip_next) {
              skip_next = 0;
           } else {
              skip_next = (c == '\\');
              if ((bptr > 1) && (c == '\'')) {
                 mode = NORM; 
                 buffer[bptr] = 0;
                 t(CHAR,buffer);
                 bptr = 0;
              }
           }
           break;
         case STRING :
           retain = 1;
           buffer[bptr++] = c;
           if (skip_next) {
              skip_next = 0;
           } else {
              skip_next = (c == '\\');
              if ((bptr > 1) && (c == '\"')) {
                 mode = NORM; 
                 buffer[bptr] = 0;
                 t(STRING,buffer);
                 bptr = 0;
              }
           }
           break;
      }

      /* one-character token */
      if (!retain) {
         buffer[0] = c;
         buffer[1] = 0;
         if (c != EOF) t(SPECIAL,buffer);
      }

      /* save previous character */
      old_c = c;
   }
}


int count_main(argc,argv)
int argc;
char *argv[];
{
   int i,ier;
   FILE *fout,*mas,*f;
   char fbuf[80];
   int exit();

   ier = 0;
   if ((argc < 2) || ((argc == 2) && (strcmp(argv[1],"-") == 0))) {
      mas = fopen("cfiles.lis","rt");
      if (mas) {
         fout = fopen("cfiles.out","wt");
         if (!fout) exit(0);
         while (fscanf(mas,"%s",fbuf) == 1) {
            f = fopen(fbuf,"rt");
            if (!f) {
               ier = 1;
            } else {
               tokenize(f,count_fn);
               printf("%s %ld %ld %ld %ld\n",fbuf,
                      LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC);
               fprintf(fout,"%s %ld %ld %ld %ld\n",fbuf,
                      LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC);
               fclose(f);
            }
         }
         fclose(mas);
         fclose(fout);
      } else {
         tokenize(stdin,count_fn);
         printf("%ld %ld %ld %ld\n",LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC);
      }
   } else {
      for (i = 1; i < argc; i++) {
          f = fopen(argv[i],"rt");
          if (!f) {
             ier = 1;
          } else {
             tokenize(f,count_fn);
             printf("%s %ld %ld %ld %ld\n",argv[i],
                    LINES_OF_CODE,CURRENT_LINE,COMMENTS,NCC);
             fclose(f);
          }
      }
   }
   return ier;
}

int main(argc,argv)
int argc;
char *argv[];
{
#if VAX
   return !count_main(argc,argv);
#else
   return count_main(argc,argv);
#endif
}
--- cut here ---

jgn@nvuxr.UUCP (Joe Niederberger) (08/18/89)

In article <6500@pdn.paradyne.com> reggie@dinsdale.paradyne.com (George W. Leach) writes:
>In article <10707@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>>In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes:
>>-Does anyone have a program or a method of determing
>>-the number of C source lines in a source file?
>>-My assumption is that comments don't count as source
>>-lines unless the comment is on a line with code.
>
>>What precisely is this supposed to measure?
>
>   I also want to know just what you are going to measure with this number?

I'm often surprised at these sorts of statements. He obviously wants to
(precisely) measure the number of C source lines in a source file,
disregarding lines that only contains comments (or blank lines also I
presume.) It seemed perfectly obvious to me 8^).

I suppose the question on some peoples minds is really "what are you
going to do with this measurement?" Perhaps he wants to look for a
correlation with some other measurements. Now, there's not much chance of
doing that if he can't obtain this measurement in the first place, is
there ? 

If these observations seem obvious, then just maybe they are correct.

This wasn't a flame, just a flicker.

Joe Niederberger

hallett@shoreland.uucp (Jeff Hallett x4-6328) (08/19/89)

In article <16018@vail.ICO.ISC.COM> rcd@ico.ISC.COM (Dick Dunn) writes:
>swonk@ccicpg.UUCP (Glen Swonk) writes:
>> Does anyone have a program or a method of determing
>> the number of C source lines in a source file?
>> My assumption is that comments don't count as source
>> lines unless the comment is on a line with code.

In my former job, we came up with a way to measure C lines in a way
that suited us.  The basic approach was to 

	1. Remove all comments
	2. Ensure that there was only 1 "statement" of code per textual line
	  (a stmt here may be a curly brace or null stmt (solitary ;))
	3. Removed all blank lines, braces and ; with no text with them.
	4. Removed all 'do' keywords (they do no work).
	5. Pulled all broken function calls together on one line (ie.
	   where a newline was inserted between parameters to make the
	   call prettier) 
	5. Count the lines which are left.

Granted, this implies some "sanity" on the part of the programmer not
to do some really weird things (like put the ; for a statement on the
line below the statement), but on the whole this procedure (done
mostly with sed scripts) produced what we would have done by hand.

>it's clear you're off on the wrong foot.  A count of source lines is NOT a
>useful measure of program size or complexity.  Incidentally, be careful
>about the difference between size and complexity!
>

Excellent point about size vs. complexity.  However, "size" is a
nebulous term (more below).

>
>I offer two rules about measuring program size/complexity:
>
>1.  Any variant of "source line count" is useless as a measure of the
>program.
>	I've heard countless times the rationalization that "Well, it may
>	not be good, but it's the best we can do."  This is WRONG!  It's
>	worse than no measure at all.  It implies that you have information

I agree that LOC really is a bad measure of productivity, but so are
most of the items listed by Dick in his earlier posting.  Productivity
of a coder is a difficult thing and most methods I've heard of are
really inadequate since I think that writing code is really still more
an art than a science or manufacturing system.  However, LOC is still
a good estimator of cost.  I say this with the caveat that different
s/w houses will have different correlations and that it is still
stongly linked to complexity.  This is why I like methods like Cocomo
which attempt to relate lines produced with various drivers, both
about the nature of the code and programmers involved, to produce
estimates of cost and time.  Also, most of these methods can be
modified to reflect a particular production site.

How one defines "size" I don't think is as important as how
consistently and accurately it can be measured and what it is used
for.  To judge quality of ANY system based on its size alone is
foolhardy and especially to use systems that encourage programmers to
bloat their code are destructive (as Dick points out).  I encourage
Glen to not only check out various software economics books, but also
managerial evaluation and operations research texts to determine
useful ways to utilize what is collected.

--
                Jeffrey A. Hallett, PET Software Engineering
                    GE Medical Systems, W641, PO Box 414
                            Milwaukee, WI  53201
           (414) 548-5173 : EMAIL -  hallett@postron.gemed.ge.com

garye@hp-ptp.HP.COM (Gary_Ericson) (08/19/89)

> ...measuring lines of code to indicate effective programming...

I agree with other posters.  PLEASE, PLEASE, PLEASE don't encourage the idea
that counting lines of code provides *any* indication of good programming,
productivity, effectiveness, whatever.

I came from a group working on a small mini-computer used in real-time
applications.  The keys to good programming were "small" and "fast".  There was
limited logical memory space, so fewer instructions were (almost) always
better.  And the very critical factor of speed was often affected by code size
(execution time, how long it took to pull the program into and out of main 
memory, etc.).  Some of the most incredible coding examples I've seen were
little pieces of code with only 3 or 4 assembly language instructions.  The
creator may have spent days designing it.  Is that productive?  Very, because
they may have been the most significant parts of the system.

Instead of counting lines of code, maybe we need to find a way to measure
"intelligence density".  I don't know of any such measure...

Gary Ericson - Hewlett-Packard, Workstation Systems Division
               phone: (408)746-5098  mailstop: 101N  email: gary@hpdsla9.hp.com

EGNILGES@pucc.Princeton.EDU (Ed Nilges) (08/19/89)

In article <895@mrsvr.UUCP>, hallett@shoreland.uucp (Jeff Hallett x4-6328) writes:

>In article <16018@vail.ICO.ISC.COM> rcd@ico.ISC.COM (Dick Dunn) writes:
>>swonk@ccicpg.UUCP (Glen Swonk) writes:
>>> Does anyone have a program or a method of determing
>>> the number of C source lines in a source file?

Turns out this unambiguous measure is ambiguous.  I modified a compiler
a while back to take comments out of its count of source lines.  I
ended up counting, not "lines" (which is an incoherent concept in any
language which allows multiple statements per line or multiple lines
per statement) but syntactical "statements".  That meant that in the
following code

     DO while foo
        foo := bar+1
     END

there are three lines but two statements, one of which is contained
in the other!  I believe that this is the best of a not too good
collection of complexity measures.

************************************************************************

Edward Nilges

"If the universe were perfect, it wouldn't exist" - Yogi Berra

jdc@naucse.UUCP (John Campbell) (08/20/89)

From article <6500@pdn.paradyne.com>, by reggie@dinsdale.nm.paradyne.com (George W. Leach):
> In article <10707@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>>In article <35120@ccicpg.UUCP> swonk@ccicpg.UUCP (Glen Swonk) writes:
>>-Does anyone have a program or a method of determing
>>-the number of C source lines in a source file?
>>-My assumption is that comments don't count as source
>>-lines unless the comment is on a line with code.
> 
>>What precisely is this supposed to measure?
> 
>    I also want to know just what you are going to measure with this number?
Who knows?  I, for one, think the comments are the only valid part of the
code--what if the language changes or a better algorithm is found.  Hope 
Glen's employeer isn't trying to create some sweatshop.

Anyway, here's a lex goodie I use to count comments, *exactly* what he
wanted, right?  Note that the output is in lines of 'C' code, so you could
look very productive if you counted those lines of code instead!

OBTW, this comment recognizer works well enough for my style of commenting.
It does not solve the general problem of recognizing ANSI 'C' comments with a
regular expression.  A solution to that problem was posted a while back, but
it's pretty ugly...
=====cut here for comments.l====
%{
   FILE *ifp;
   int lineno=0, incom=0, com_bytes=0, cod_bytes=0,comments=0, code=0, scom = 0;
#define yyin  ifp
%}
%%
\/\*		{ incom = 1;
		  scom = 1;
		  com_bytes +=2;
		}
\*\/		{ incom = 0;
		  com_bytes +=2;
		}
\n	{ lineno++;
          if (incom)
             com_bytes++;
          else
             cod_bytes++;
	  if (scom)
	     comments++;
	  else
	     code++;
	  if (!incom)
	     scom = 0;
	}
.	{ if (incom)
	     com_bytes += yyleng;
          else
             cod_bytes += yyleng;
	}
%%

main(argc, argv)
int argc;
char *argv[];
{
   int i;

   FILE *fopen();
   if (argc < 2) {
      fprintf (stderr, "useage: %s in_file [in_file2] [...]\n", argv[0]);
      exit(1);
   }
   for (i = 1; i < argc; i++) {
      if ((ifp = fopen (argv[i], "r")) == NULL)  {
         fprintf (stderr, "%s: can't open %s\n", argv[0], argv[i]);
         exit(1);
      }
      yylex();

   if (argc > 2) 
      printf ("\n");

  printf ("%-14s lines:%5d,  comment:%5d,  code:%5d,  comment/lines: %5.3f\n",
               argv[i], lineno, comments, code, ((float )comments)/lineno);
  printf ("               bytes:%5d,  comment:%5d,  code:%5d,  comment/total: %5.3f\n",
           com_bytes+cod_bytes, com_bytes, cod_bytes, 
                          ((float )com_bytes)/(com_bytes+cod_bytes));
   }

/*
test.c         lines:   10,  comment:    4,  code:    6,  comment/lines: 0.400
               bytes:  201,  comment:   81,  code:  120,  comment/total: 0.403
*/
}

yywrap()
{
   return(1);
}
-- 
	John Campbell               ...!arizona!naucse!jdc
                                    CAMPBELL@NAUVAX.bitnet
	unix?  Sure send me a dozen, all different colors.

lowell@tc.fluke.COM (Lowell Skoog) (08/21/89)

> A count of non-commentary source lines says nothing about the amount
> of actual code produced, nor about productivity.  In fact, it's worse
> than that:  If you tell people their productivity will be judged (in
> part) on NCSL, their coding style will change--probably for the
> worse.

Then that's your own fault for mis-applying the metric.  If you set
out to use metrics to rate programmer productivity, then you're
asking for trouble.  Instead you should use metrics to characterize
your software development process so that you can improve the process
and predict future performance.  When used consistently in this
manner, even simple NCSL metrics can be quite valuable.  By
dismissing metrics completely you are throwing the baby out with the
bath water.

---------------------------------------------------------------------------
Lowell Skoog - John Fluke Mfg. Co. Inc., P.O. Box C9090, Everett, WA  98206
lowell@tc.fluke.COM | {uw-beaver,microsoft,sun}!fluke!lowell | 206/356-5283

hughes@math.berkeley.edu (Eric Hughes) (08/22/89)

In article <1658@naucse.UUCP>, jdc@naucse (John Campbell) writes:
>Anyway, here's a lex goodie I use to count comments, *exactly* what he
>wanted, right?  Note that the output is in lines of 'C' code, so you could
>look very productive if you counted those lines of code instead!
>
>OBTW, this comment recognizer works well enough for my style of commenting.
>It does not solve the general problem of recognizing ANSI 'C' comments with a
>regular expression.  A solution to that problem was posted a while back, but
>it's pretty ugly...

Flex, the lex replacement by Vern Paxson, has a wonderful capability
to recognize comments that does not require a large ugly regexp and will
not overflow the input buffer.  One makes an exclusive start condition
which represents the predicate "the input pointer is inside a comment."
Then the start and end of comment markers can be recognized separately.

This technique can also be use to recognize string and character
constants, and should be for a general purpose program, to eliminate
the possibility that a comment start marker appears inside a string.

Eric Hughes
hughes@math.berkeley.edu   ucbvax!math!hughes

------------cut here-------------
/* Small flex program to recognize C-style comments in text.  */

%x COMMENT 
%%
"/*"			BEGIN( COMMENT ) ;
.			ECHO ;
<COMMENT>"*/"		BEGIN( 0 ) ;
<COMMENT>"*"		|
<COMMENT>[^*\n]+	|
<COMMENT>\n		;
%%

joannz@halley.UUCP (Joann Zimmerman) (08/22/89)

There are indeed other things to count non-commented source lines for than
just attempts to measure productivity or estimate time-to-completion. In a
former life, a company for which I worked was involved in an effort to
characterize the efficiency of the software QA process, and to estimate
required testing time. By the time I left, a value for the number of 
non-commented lines changed was beginning to look as though it would prove
very useful. This required somewhat more sophisticated difference measures
than could be supplied by just running diff, so I invented a multi-language
(C, C++, Pascal and Apollo Aegis Shell/Mentor Graphics Userware) line-count
program allowing counts of both commented and non-commented lines, and
inclusion (once only) or exclusion of include files. You could operate it
over a single program or over a given DSEE library.

Yes, we were going for some complexity measurements as well, but lines made
a very good start.  

-- 
"A woman seldom writes her mind but in her postscript" - Richard Steele

Joann Zimmerman            Tandem Computers        Austin, TX 
...!{rutgers,harvard,gatech,uunet}!cs.texas.edu!halley!joannz

rcd@ico.ISC.COM (Dick Dunn) (08/22/89)

In article <1658@naucse.UUCP>, jdc@naucse.UUCP (John Campbell) writes:
[discussion about the purpose of counting, significance of comments, etc.
deleted]
> ...I, for one, think the comments are the only valid part of the
> code--what if the language changes or a better algorithm is found...

The BS alarm just went off!  When we talk about software engineering, it's
easy to get so wrapped up in platitudes that we say things that only make
sense if we don't think about them.  Good comments are right up there with
motherhood and apple pie, but it's the code that gets compiled and shipped!
The code is, after all, the part that describes the algorithm precisely;
comments are an aid to understanding it.  The code is also the part that
gets checked for consistency, and gets tested.
-- 
Dick Dunn     rcd@ico.isc.com    uucp: {ncar,nbires}!ico!rcd     (303)449-2870
   ...Are you making this up as you go along?

jcgs@wundt.harlqn.uucp (John Sturdy) (08/23/89)

garye@hp-ptp.HP.COM (Gary_Ericson):
>Instead of counting lines of code, maybe we need to find a way to measure
>"intelligence density".  I don't know of any such measure...

For the same system, time taken to compile might reflect this fairly
closely.
--
__John            When asked to attend a court case, Father Moses took with him
          a leaking jug of water. Asked about it, he said: "You ask me to judge
               the faults of another, while mine run out like water behind me."

                jcgs@uk.co.harlqn (UK notation) jcgs@harlqn.co.uk (most places)
    ...!mcvax!ukc!harlqn!jcgs (uucp - really has more stages, but ukc knows us)
John Sturdy                                            Telephone +44-223-872522
                      Harlequin Ltd, Barrington Hall, Barrington, Cambridge, UK