[comp.software-eng] 50,000 lines: a lot or a little?

jon@june.cs.washington.edu (Jon Jacky) (06/26/89)

Stanley Todd Shebs <apple!shebs@bloom-beacon.mit.edu> asks,

> Incidentally, the "50K lines == small" statement is a familiar one, but 
> I've not seen any reliable and up-to-date statistics on program sizes.
> Is there a believable chart anywhere of sizes, numbers, and the numbers 
> of people involved?

Charts like Stan mentions turn up in trade magazines from time to time, 
hard to say how believable or useful they are.  In an article on CASE products
in ELECTRONIC DESIGN, Jan 12 1989, p. 66, Fig. 1 is a bar chart labelled
``Program Length vs. Target Hardware''.  The chart contains this data:

	Computer type	Year	Lines of code (thousands)

	Mainframe	1985	10.2
			1989	11.9

	Minicomputer	1985	12.9
			1989	22.8

	Single-User	1985	10.7
			1989	18.5

	Internally-	1985	14.0
	 developed	1989	36.8

The figure legend says, ``With system complexity on the rise, the size of 
software programs is swelling.  According to the Technology Research Group,
30% of next year's embedded systems will have over 75,000 lines of code.''
The text of the story gives a clue to the source: ``At the 1988 Design 
Automation Conference, Andy Rapaport, president of Boston's Technology 
Research Group, said that 30% of of embedded systems ran over 75,000 lines
of code -  up 14% from 1985'' (but *that* figure isn't in the chart).

Similar tables, also attributed to ``Technology Research Group, 1987'' 
appear in ELECTRONIC ENGINEERING TIMES, March 20, 1989, p. T17.  They
appear as follows without any further explanation in the text:

	GEOMETRIC MEAN LENGTH OF CODE BY COMPANY SIZE


	Base	1985	increase	1987	increase	1989

Large	759	11,927	24%		14,774	25%		18,471
Medium	387	12,059	25%		15,091	35%		20,405
Small	205	11,035	47%		16,222	49%		24,208

	
	DEVELOPMENT OF VERY LARGE PROGRAMS BY COMPANY SIZE


					Large	Medium	Small

				Base	759	387	205

75,000 to 249,999 lines		1985	9%	10%	10%
				1987	11%	14%	9%
				1989	13%	19%	15%

250,000 lines or more		1985	5%	4%	3%
				1987	7%	4%	5%
				1989	11%	6%	7%

And then there are occasional reports of really big projects, in the million-
lines-and-up category; these are usually from the aerospace or military 
C3I fields.

- Jonathan Jacky, University of Washington

jon@june.cs.washington.edu (Jon Jacky) (06/27/89)

I just ran across the following table in the textbook, SOFTWARE ENGINEERING
CONCEPTS, by Richard Fairley, 1985, McGraw-Hill.  It is table 1.1 on p. 11.
Fairley says he adapted it from Edward Yourdon, TECHNIQUES OF PROGRAM 
STRUCTURE AND DESIGN, Prentice-Hall, 1975:

	SIZE CATEGORIES FOR SOFTWARE PRODUCTS

Category	N of Programmers	Duration	Size (lines)

Trivial		1			1 - 4 wks	500 
Small		1			1 - 6 months	1K - 2K
Medium		2 - 5			1 - 2 years	5K - 50K
Large		5 - 20			2 - 3 years	50K - 100K
Very Large	100 - 1000		4 - 5 years	1M
Extremely large	2000 - 5000		5 - 10 years	1M - 10M

- Jonathan Jacky, University of Washington

	

duncan@dduck.ctt.bellcore.com (Scott Duncan) (06/28/89)

In article <8587@june.cs.washington.edu> jon@june.cs.washington.edu (Jon Jacky) writes:
>I just ran across the following table in the textbook, SOFTWARE ENGINEERING
>CONCEPTS, by Richard Fairley, 1985, McGraw-Hill.  It is table 1.1 on p. 11.
>Fairley says he adapted it from Edward Yourdon, TECHNIQUES OF PROGRAM 
>STRUCTURE AND DESIGN, Prentice-Hall, 1975:
>
>	SIZE CATEGORIES FOR SOFTWARE PRODUCTS
>
>Category	N of Programmers	Duration	Size (lines)
>
>Trivial	1			1 - 4 wks	500 
>Small		1			1 - 6 months	1K - 2K
>Medium		2 - 5			1 - 2 years	5K - 50K
>Large		5 - 20			2 - 3 years	50K - 100K
>Very Large	100 - 1000		4 - 5 years	1M
>Extremely large2000 - 5000		5 - 10 years	1M - 10M

I've read both of these books but can't remember the context for these figures.
However, I am struck by the large gap between 100K and 1M as well as by the
number of programmers listed for the two higher categories.

The former is a problem because many of the projects I have looked into are
in that span, taking from 50-100 people.  I cannot imagine why such a gap ex-
ists since it is such a common size for "larger" programs.  The 2-3 year time
limit is about right for this category, I'd say.

The latter is a problem for me since an industry average of lines per person
per year is anywhere from 500 to 7000 based on the kind of application. (This
does NOT count "reused" code and generated code volumes which would drive the
numbers higher for traditional business data processing applications.)  If we
use 3000 LOC/yr as an average, then the number of programmers and years seem
to me to be incredibly large.

Perhaps some aspect of the life-time support and maintenance is being taken
into account here?  Like I said, I do not recall the context of the numbers.
I do believe context is vital whenever such numbers are being discussed!

Speaking only for myself, of course, I am...
Scott P. Duncan (duncan@ctt.bellcore.com OR ...!bellcore!ctt!duncan)
                (Bellcore, 444 Hoes Lane  RRC 1H-210, Piscataway, NJ  08854)
                (201-699-3910 (w)   609-737-2945 (h))

schow@bnr-public.uucp (Stanley Chow) (06/29/89)

In article <17077@bellcore.bellcore.com> duncan@ctt.bellcore.com (Scott Duncan) writes:
 [Commenting on the number of programmers and man-year is very high 
  for a >1Million line project.]
>
>The latter is a problem for me since an industry average of lines per person
>per year is anywhere from 500 to 7000 based on the kind of application. (This
>does NOT count "reused" code and generated code volumes which would drive the
>numbers higher for traditional business data processing applications.)  If we
>use 3000 LOC/yr as an average, then the number of programmers and years seem
>to me to be incredibly large.
>
>Perhaps some aspect of the life-time support and maintenance is being taken
>into account here?  Like I said, I do not recall the context of the numbers.
>I do believe context is vital whenever such numbers are being discussed!
>

I would think that after several years, modules starts to get rewritten,
lots of changes are made. This kind of maintainence activity contributs
to the "productivity" KLOC count but not to the "Project size" count.



Stanley Chow        BitNet:  schow@BNR.CA
BNR		    UUCP:    ..!psuvax1!BNR.CA.bitnet!schow
(613) 763-2831		     ..!utgpu!bnr-vpa!bnr-fos!schow%bnr-public
Me? Represent other people? Don't make them laugh so hard.

duncan@dduck.ctt.bellcore.com (Scott Duncan) (06/30/89)

In article <684@bnr-fos.UUCP> schow%BNR.CA.bitnet@relay.cs.net (Stanley Chow) writes:
>In article <17077@bellcore.bellcore.com> duncan@ctt.bellcore.com (Scott Duncan) writes:
> [Commenting on the number of programmers and man-year is very high 
>  for a >1Million line project.]
>>
>>The latter is a problem for me since an industry average of lines per person
>>per year is anywhere from 500 to 7000 based on the kind of application. (This
>>does NOT count "reused" code and generated code volumes which would drive the
>>numbers higher for traditional business data processing applications.)  If we
>>use 3000 LOC/yr as an average, then the number of programmers and years seem
>>to me to be incredibly large.
>>
>>Perhaps some aspect of the life-time support and maintenance is being taken
>>into account here?  Like I said, I do not recall the context of the numbers.
>>I do believe context is vital whenever such numbers are being discussed!
>>
>
>I would think that after several years, modules starts to get rewritten,
>lots of changes are made. This kind of maintainence activity contributs
>to the "productivity" KLOC count but not to the "Project size" count.

Well, projects do not tend to get smaller over time in terms of the total KLOC,
so project size would increase.  And most maintenance produces lower KLOC
counts for productivity numbers since there is significant comprehension time,
debbugging time, and regression testing added to the normal design/code/test
effort.  Thus, productivity usually drops over time while size increases.

I do not believe the typically reported metrics tell a lot about the context,
which is why I made my last statement.  For example, while most organizations
seem to count more than just "coders" in their staff numbers when they compute
productivity, not all seem to include management.  Some include administrative
support people and central data center staff bnecause they are "in the budget"
while others do not.  For defect counts, some start when the software reaches
the first use outside the development organization, others start after a sig-
nificant beta test period.

>Stanley Chow        BitNet:  schow@BNR.CA
>BNR		    UUCP:    ..!psuvax1!BNR.CA.bitnet!schow
>(613) 763-2831		     ..!utgpu!bnr-vpa!bnr-fos!schow%bnr-public
>Me? Represent other people? Don't make them laugh so hard.


Speaking only for myself, of course, I am...
Scott P. Duncan (duncan@ctt.bellcore.com OR ...!bellcore!ctt!duncan)
                (Bellcore, 444 Hoes Lane  RRC 1H-210, Piscataway, NJ  08854)
                (201-699-3910 (w)   609-737-2945 (h))