[comp.software-eng] COCOMO

samir@cvrc.med.upenn.edu (No good deed goes unpunished) (10/23/90)

I am currently reading the sections on the COCOMO (COnstructive COst MOdel)
model in Barry Boehm's "Software Engineering Economics". Can you suggest any
other text that discusses this model? What other software cost models are
popular? I am particularly interested in a discussion on the relative pros and
cons of various cost models. 

-- 
(`  _      .          |\  /| . (`     _    
_) (_\ |V| | |]       | \/ | | _) |] (_\   samir@cvrc.med.upenn.edu
_______| |___|\_      |    |______|\____

mcevoy@dev1g.mdcbbs.com (10/26/90)

In article <145@cvrc.med.upenn.edu>, samir@cvrc.med.upenn.edu (No good deed goes unpunished) writes:
> I am currently reading the sections on the COCOMO (COnstructive COst MOdel)
> model in Barry Boehm's "Software Engineering Economics". Can you suggest any
> other text that discusses this model? What other software cost models are
> popular? I am particularly interested in a discussion on the relative pros and
> cons of various cost models. 

There's a brief discussion of various models in "Software Metrics" by
Robert Grady and Deborah Caswell, pub. Prentice Hall, 1987.  The book
describes the authors' experiences in setting up a metrics program for
Hewlett Packard, and has some excellent practical advice, not to mention an
excellent bibliography.


  =======================================================================
 |    Brian F. McEvoy       |    Voice: (714) 952-6778                   |
 |    K34-C688-3E           |     UUCP: uunet!mdcbbs!mcevoy              |
 |    Software Tools Dev.   |      PSI: PSI%31060099980019::MCEVOY       |
 |    McDonnell Douglas M&E | Internet: mcevoy@mdcbbs.com                |
 |    Cypress, CA  90630    |           mcevoy%mdcbbs.com@uunet.uu.net   |
  =======================================================================

jmi@dac.mdcbbs.com (JM Ivler) (10/27/90)

 In article <145@cvrc.med.upenn.edu>, samir@cvrc.med.upenn.edu (No good deed goes unpunished) writes:
> I am currently reading the sections on the COCOMO (COnstructive COst MOdel)
> model in Barry Boehm's "Software Engineering Economics". Can you suggest any
> other text that discusses this model? What other software cost models are
> popular? I am particularly interested in a discussion on the relative pros and
> cons of various cost models. 

If you are interested (and running on a VAX/VMS system) I can forward a copy of 
a COCOMO modeling program that I coauthored about 7 years ago. Unfortunatly the 
only thing available is the .EXE as the source was left on a machine I worked 
on a few years back... so if you have MFTU... let me know. BTW the program uses 
Barry's model.

jmi     jmi@dac.mdcbbs.com
My opinions not DAC or MDC

ahl@technix.oz.au (Tony Landells) (06/15/91)

I'm looking for opinions of COCOMO.  I was on a course about software
quality assurance and they mentioned it as a methodology which
produces a lot of useful figures, but the comments were accompnied by
the disclaimer "I haven't used it, but people that do seem to think
it's pretty good".

I believe it's completely described in "Sofware Engineering Economics"
by B.W.Boehm, Prentice-Hall 1987; but the book isn't readily available
here in Australia (lead time for order is 12-14 weeks) and it is
extremely expensive, so...

I'm looking at this almost purely for a UNIX/C environment, since
that's the field I work in, in case this affects people's comments.

Thanks kindly,
Tony Landells <ahl@technix.oz.au>

crm@duke.cs.duke.edu (Charlie Martin) (06/16/91)

In article <AHL.91Jun15134716@saussure.technix.oz.au> ahl@technix.oz.au (Tony Landells) writes:
>I'm looking for opinions of COCOMO.  I was on a course about software
>quality assurance and they mentioned it as a methodology which
>produces a lot of useful figures, but the comments were accompnied by
>the disclaimer "I haven't used it, but people that do seem to think
>it's pretty good".
>
>I believe it's completely described in "Sofware Engineering Economics"
>by B.W.Boehm, Prentice-Hall 1987; but the book isn't readily available
>here in Australia (lead time for order is 12-14 weeks) and it is
>extremely expensive, so...
>
>I'm looking at this almost purely for a UNIX/C environment, since
>that's the field I work in, in case this affects people's comments.
>
>Thanks kindly,
>Tony Landells <ahl@technix.oz.au>

I've done at this point some pretty extensive work with COCOMO as a
model, and a fair bit of work with it trying to use it to make estimates
to help my wife, who manages a big project (1+ million SLOC).  My
experience is that it isn't a bad model at all, with a little
calibration, and that simply choosing the right parameters for the basic
COCOMO model is pretty predictive, probably within the accuracy of
estimation at the beginning of a project.  In other words, your error in
estimating SLOC is probably the driver in estimation of total staff
months.

Here's the basic scheme:

(1) Empirically, in any organization, man-months per 1000 lines of code
(K SLOC) is roughly constant, no matter what language or environment is
used.  So, we can always assume that effort in man-months is
proportional to size in KSLOC.

(2) All of the COCOMO effort models have the basic form

	MM = P * K^d

	where MM is man-months
		  P is productivity in man-months per KSLOC (ignore that
		  exponent)
	and	  d is an exponential-fit value found empirically, that 
		  represents the way that bigger projects are harder.  Call
		  this thing the 'diseconomy exponent'.  If you can keep this to
		  exactly 1, you can be rich.  If you can get it below 1, you
		  can be really rich.

For most monolithic non-systems programs, I have found that you get
good fits with Boehm's Basic COCOMO values, P=2.4, d=1.05.  Be careful
that you remember K is size in *thousands*, so using a test value like
K=1 tells you that MM = 2.4 man-months for 1000 lines of code (fully
documented and tested.)

(3) If you're doing something intrincially hard, like a real-time
program, you need to use different values.  For real-time or embedded
programs, reasonable values are in the neighborhoods of P=3 and d=1.3;
if you are doing anything much hard-core you should get Boehm's book for
all the weighting factors etc.  Applying the weighting factors is what
makes the model Intermediate or Advnaced COCOMO.

(4) Some of the work I've been doing involves estimating the effect of
reuse.  I've got a whole technical report inpreparation on this, but
preliminary results suggest that the COCMO model is very predictive for
compiler-writing reuse (retargeting GNU CC).
-- 
	 Charlie Martin (...!mcnc!duke!crm, crm@cs.duke.edu)
	    13 Gorham Place/Durham, NC 27705/919-383-2256

styri@cs.hw.ac.uk (Yu No Hoo) (06/17/91)

In article <AHL.91Jun15134716@saussure.technix.oz.au> ahl@technix.oz.au (Tony Landells) writes:
>I'm looking for opinions of COCOMO.

I don't have my notes here, but I think I remember that some of the
parameters you feed into COCOMO are rather old-fashioned in a UNIC/C
environment. (They made me think of COBOL.) However, they force you to
reason about both your application and your resources.

>       [...]          they mentioned it as a methodology which
>produces a lot of useful figures, but the comments were accompnied by
>the disclaimer "I haven't used it, but people that do seem to think
>it's pretty good".

Above disclaimer goes for me, but I would restrict the first statement
to "a lot of figures". Some may not be useful, but the intelligent
manager will know which to use. I think it requires some practise to
get it right.

>I believe it's completely described in "Sofware Engineering Economics"
>by B.W.Boehm, Prentice-Hall 1987; but the book isn't readily available
>here in Australia (lead time for order is 12-14 weeks) and it is
>extremely expensive, so...

Buy it! Barry Boehm is worth reading.

If your field is QA (or you are a project manager) I would also recommend
an IEEE-CS tutorial edited by mr. Boehm: "Software Risk Management".

----------------------
Haakon Styri
Dept. of Comp. Sci.              ARPA: styri@cs.hw.ac.uk
Heriot-Watt University          X-400: C=gb;PRMD=uk.ac;O=hw;OU=cs;S=styri
Edinburgh, Scotland

omahony@swift.cs.tcd.ie (06/17/91)

In article <AHL.91Jun15134716@saussure.technix.oz.au>, ahl@technix.oz.au (Tony Landells) writes:
> I'm looking for opinions of COCOMO.  I was on a course about software
> quality assurance and they mentioned it as a methodology which
> produces a lot of useful figures, but the comments were accompnied by
> the disclaimer "I haven't used it, but people that do seem to think
> it's pretty good".
> 
The COCOMO model is based on working backwards from measurements taken on
a variety of projects (in ASM, FORTRAN etc) carried out many years ago, to
get some formulae that can then be used to work forwards.

In "Software Engineering Economics", Boehm did some tests of it's accuracy
when working forwards and claimed that it is "accurate to within a factor
of 2   60% of the time".

Based on experience within your organisation, it is possible to more accurately
calibrate the formulae, and more precise estimates may be obtained after that.

One of it's problems is that before the model can be used, the person doing
the costing must make a (guess)timate! of how many lines of source code the
finished software product will contain.  Since the output is only as good as
the input, the accuracy of the final result is again dependent on the
experience of the person conjuring up this figure.  Naturally, it is 
possible to fudge this figure to yield what Boehm et al call the
"Cost to Win!"

Still, it has to be better than picking ALL the figures off the top of 
your head!

Donal O'Mahony
Computer Science Dept
Trinity College
Dublin, Ireland

jls@netcom.COM (Jim Showalter) (06/18/91)

>(1) Empirically, in any organization, man-months per 1000 lines of code
>(K SLOC) is roughly constant, no matter what language or environment is
>used.  So, we can always assume that effort in man-months is
>proportional to size in KSLOC.

The primary complaint I and others have with the COCOMO model is the
above claim. To assert that a person writing in some homebrew
dialect of FORTRAN using a line editor on an IBM mainframe with a circa 1962
debugger is as productive (or even within two orders of magnitude as
productive) as a person using the latest-greatest software development
environment and one of the modern software engineering oriented languages
(e.g. Ada, Eiffel, C++, etc) is prima-facie absurd, claims of empiricism
notwithstanding. I have empirically observed exactly the opposite:
that productivity varies wildly between different software development
organizations (and that those who are more productive have a significant
competitive edge), and I believe any estimation model that fails to take
this into account is inherently inaccurate. At a minimum I suggest that
estimation models should contain a factor for an organization's SEI rating.
-- 
*** LIMITLESS SOFTWARE, Inc: Jim Showalter, jls@netcom.com, (408) 243-0630 ****
*Proven solutions to software problems. Consulting and training on all aspects*
*of software development. Management/process/methodology. Architecture/design/*
*reuse. Quality/productivity. Risk reduction. EFFECTIVE OO usage. Ada/C++.    *

crm@duke.cs.duke.edu (Charlie Martin) (06/18/91)

In article <1991Jun18.033606.1362@netcom.COM> jls@netcom.COM (Jim Showalter) writes:
>>(1) Empirically, in any organization, man-months per 1000 lines of code
>>(K SLOC) is roughly constant, no matter what language or environment is
>>used.  So, we can always assume that effort in man-months is
>>proportional to size in KSLOC.
>
>The primary complaint I and others have with the COCOMO model is the
>above claim. To assert that a person writing in some homebrew
>dialect of FORTRAN using a line editor on an IBM mainframe with a circa 1962
>debugger is as productive (or even within two orders of magnitude as
>productive) as a person using the latest-greatest software development
>environment and one of the modern software engineering oriented languages
>(e.g. Ada, Eiffel, C++, etc) is prima-facie absurd, claims of empiricism
>notwithstanding. 

I always hate it when someone says something is "prima facie absurd"
like this.  First of all, notice the claim is not *everyone*, but that
*within an organization* the productivity is roughly constant.  One
reason that what you suggest ddoesn't apply is that it pretty unlikely
that within the same organization one programmer will be using Wylbur
and the other will be using a highly-integrated environment.  

But secondly, "claims of empiricism" cannot be just unceremoniously
dumped.  The fact is that this relationship held over something like 400
projects, in dozens of different environments, with languages from
assembler to the best HLL's of the time.  Where is your counter
evidence?  How was it measured?

Third, the COCMO model actually suggests that none of those factors may
really influence productivity in SLOC delivierd on big projects.  Recall
the basic equation:

	MM = P K^d

that is basically saying that effort MM is productivity constant times
size to the diseconomy exponent.  Clearly, then,

	MM = O(K^d)

and we can see that at some point the effects of increasing size
dominate the effects of anything that affects just the constant (so long
as d > 1.)  One supposition I've made is that the difference between
programming-in-the-small and programming-in-the-large is that large
scale programming is when scale dominates in this equation.

[ This is my Discovery Of the Week.  I can't decide if I think it's
significant or not.]

>... I have empirically observed exactly the opposite:
>that productivity varies wildly between different software development
>organizations (and that those who are more productive have a significant
>competitive edge), 

Absolutely, but that sorta says the same thing I said above.  I
question your supposition that productivity gives that much of a
competitive edge -- it looks to me like non-technical factors dominate
-- but it's probably more of an edge in the current environment than
low defect rates in the field.

>and I believe any estimation model that fails to take
>this into account is inherently inaccurate. At a minimum I suggest that
>estimation models should contain a factor for an organization's SEI rating.

You didn't read down far enough.  The relation has not one but two
factors that can be set or chosen to suit differences in the
environment.  In the Intermediate and Advanced COCOMO models there are a
number of factors that model things like language chosen, environment,
use of a methodology, etc.  Basic COCOMO does not take these into
account, and as you say is inherently inaccurate.  but it works
suprisingly well as a predictor -- Basic COCOMO is the one that is
correct "to within a factor of two 60 percent of the time" applied post
facto to big projects, and there aren't many other psychological
properties that can be predicted all that well.

-- 
	 Charlie Martin (...!mcnc!duke!crm, crm@cs.duke.edu)
	    13 Gorham Place/Durham, NC 27705/919-383-2256

jls@netcom.COM (Jim Showalter) (06/19/91)

crm@duke.cs.duke.edu (Charlie Martin) writes:

>>>(1) Empirically, in any organization, man-months per 1000 lines of code
>>>(K SLOC) is roughly constant, no matter what language or environment is
>>>used.  So, we can always assume that effort in man-months is
>>>proportional to size in KSLOC.

[I respond that this is absurd, to which Mr. Martin responds:]

>I always hate it when someone says something is "prima facie absurd"
>like this.  First of all, notice the claim is not *everyone*, but that
>*within an organization* the productivity is roughly constant.

I think I see the problem here. I'm probably not parsing your statement
the way you are writing it. When I read the original post, it seems
to read--to my parser at least--as follows: "It doesn't make any 
difference what language or environment your organization uses, because
you'll always produce the same amount of code in the same amount of time,
regardless". This was the statement I was claiming was absurd, because,
of course, it IS (is there anybody out there who thinks this statement
is NOT absurd?). In your followup response, you rephrase things so that
I get the idea you are actually saying something else entirely, in which
case we probably aren't arguing, just failing to communicate.

>But secondly, "claims of empiricism" cannot be just unceremoniously
>dumped.  The fact is that this relationship held over something like 400
>projects, in dozens of different environments, with languages from
>assembler to the best HLL's of the time.  Where is your counter
>evidence?  How was it measured?

I've witnessed organizations producing code at anywhere from 5 lines per
week per programmer to about five hundred lines per week per programmer, which
is a three order of magnitude spread. But since I'm not sure what your
original claim was, I'm not sure if this refutes it or is completely
unrelated to it.

>and we can see that at some point the effects of increasing size
>dominate the effects of anything that affects just the constant (so long
>as d > 1.)  One supposition I've made is that the difference between
>programming-in-the-small and programming-in-the-large is that large
>scale programming is when scale dominates in this equation.

>[ This is my Discovery Of the Week.  I can't decide if I think it's
>significant or not.]

I agree that this is a good Discovery of the Week, and it is the fact
that we so strongly agree here that leads me to believe that we probably
agree on the earlier stuff too, if we can just get our communications
ungarbled.

>You didn't read down far enough.  The relation has not one but two
>factors that can be set or chosen to suit differences in the
>environment.  In the Intermediate and Advanced COCOMO models there are a
>number of factors that model things like language chosen, environment,
>use of a methodology, etc.  Basic COCOMO does not take these into
>account, and as you say is inherently inaccurate.

Ah, see, then we DO agree. So much for this thread...
-- 
*** LIMITLESS SOFTWARE, Inc: Jim Showalter, jls@netcom.com, (408) 243-0630 ****
*Proven solutions to software problems. Consulting and training on all aspects*
*of software development. Management/process/methodology. Architecture/design/*
*reuse. Quality/productivity. Risk reduction. EFFECTIVE OO usage. Ada/C++.    *

rh@smds.UUCP (Richard Harter) (06/19/91)

In article <1991Jun18.033606.1362@netcom.COM>, jls@netcom.COM (Jim Showalter) writes:
> >(1) Empirically, in any organization, man-months per 1000 lines of code
> >(K SLOC) is roughly constant, no matter what language or environment is
> >used.  So, we can always assume that effort in man-months is
> >proportional to size in KSLOC.

> The primary complaint I and others have with the COCOMO model is the
> above claim. To assert that a person writing in some homebrew
> dialect of FORTRAN using a line editor on an IBM mainframe with a circa 1962
> debugger is as productive (or even within two orders of magnitude as
> productive) as a person using the latest-greatest software development
> environment and one of the modern software engineering oriented languages
> (e.g. Ada, Eiffel, C++, etc) is prima-facie absurd, claims of empiricism
> notwithstanding.

Sorry Jim, I just don't believe it.  I was programming in 1962 -- using
punched cards (how many people were actually using line editors and 
terminals in those days?)  Modern tools make a difference, but nothing
like what you claim -- I will grant you a factor of three, no more.  In
ye olde days you worked the code all out before you keypunched it, listed
it after keypunching, desk checked it, and made the revisions before
ever compiling.  The old way of working from a listing had its advantages;
the window of visibility for a listing was greater than that of a terminal
screen.  My personal observation is that I could crank out about 20,000
lines of production code per year then, and I do about the same now.
The language (almost) doesn't matter.  Nor is this surprising.

Writers (you know, people who write books) all use fancy word processors
these days, but they don't, on average, produce that much more than they
did with manual typewriters.  There is a good reason for this.  The
amount of thought, of intellectual effort, and the time that it takes
for that intellectual effort is the controlling factor.  The tools make
the physical effort easier and more pleasant.  They make certain mechanical
tasks (spell checking for the writer) faster and more reliable.

On the other hand I will grant you that modern software and hardware
technology does mean that you can do more in the same number of lines
of code, and that it is much easier to build large programs.

One further side note:  Debuggers, no matter how good, are the symptom
of a disorderly approach to software development.
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

mcgregor@hemlock.Atherton.COM (Scott McGregor) (06/20/91)

In article <677256043@macbeth.cs.duke.edu>, crm@duke.cs.duke.edu
(Charlie Martin) writes:

>   One supposition I've made is that the difference between
> programming-in-the-small and programming-in-the-large is that large
> scale programming is when scale dominates in this equation.

> [ This is my Discovery Of the Week.  I can't decide if I think it's
> significant or not.]

A causal basis for this change is when the individual problems of
cognitive overload of short term memory, faulty long term memory and
limited reasoning ability become dominated by the communication problems
of insufficient, incomplete and confusing communictions between
individuals in a complex organization.  Both are results of increasing
complexity, but manifested in different ways, and thus having different
affects on organizational productivity equation.

>   Basic COCOMO does not take these into
> account, and as you say is inherently inaccurate.  but it works
> suprisingly well as a predictor -- Basic COCOMO is the one that is
> correct "to within a factor of two 60 percent of the time" applied post
> facto to big projects, and there aren't many other psychological
> properties that can be predicted all that well.

One of the things I like about the COCOMO model is that it is presented
with apropriate variance measures and sensitivity analysis.  One of the
things that is frustrating about it is that many managers focus only on
the predicted MEAN
estimate and not size of the variance area. They treat the mean as if it
were more precise than it is, and then damn the metric, or the manager
when actual development time varies significantly from the estimate, but
still well within its accuracy.  A factor of two is often considered a
large amount of error (the statistical inability to give better
predictions is too often ignored!) Imagine the quarterback who tells his
end: "go out for a pass, I'll throw it out about ten yards give or take
a factor of two!"  I had one general manager who was asking for
differences between actual and estimate to vary no more than 8% (i.e.
one month spread for every year of effort).  The tool given project
managers for estimation was Basic COCOMO. The focus only on the mean
estimate and ignoring the variance insured frequent "failures" of this
estimation technique that probably should have been regarded as successes.

Scott McGregor
Atherton Technology
mcgregor@atherton.com

jls@netcom.COM (Jim Showalter) (06/20/91)

>On the other hand I will grant you that modern software and hardware
>technology does mean that you can do more in the same number of lines
>of code, and that it is much easier to build large programs.

Well, this wasn't my original objection, but as long as you've brought
it up, this is the OTHER thing I find absurd about the COCOMO approach:
the focus on SLOC. You cranked out 20 KSLOC of code in 1962. What language
was it? Assembler? FORTRAN? In 1990 I cranked out 20 KSLOC of Ada,
complete with tasking, genericity, exception handlers, constraint checking,
etc. To achieve this same level of functionality in a 1962 language, I
might well have had to write 10-100 times as many lines of code. How
much assembler would you have to write to provide dynamic binding and
inheritance a la C++? My guess is: considerably more than you could write
in a year (ask Stroustrup!).

As with all such metrics, the baselines should be calculated in terms of
COMPLEXITY. How many complexities could you write per year in 1962 vs
1991? THAT'S the critical issue, and I believe there is a wealth of
evidence that modern languages and environments have greatly increased
productivity when it is measured in this way. SLOC obscures the truth.
-- 
*** LIMITLESS SOFTWARE, Inc: Jim Showalter, jls@netcom.com, (408) 243-0630 ****
*Proven solutions to software problems. Consulting and training on all aspects*
*of software development. Management/process/methodology. Architecture/design/*
*reuse. Quality/productivity. Risk reduction. EFFECTIVE OO usage. Ada/C++.    *

crm@duke.cs.duke.edu (Charlie Martin) (06/20/91)

Okay, I *think* Jim and I are *laregly* in agreement, but I think there
are a couple of essential points to be clarified.  Here goes:

In article <1991Jun18.214011.17765@netcom.COM> jls@netcom.COM (Jim Showalter) writes:
>crm@duke.cs.duke.edu (Charlie Martin) writes:
>
>>>>(1) Empirically, in any organization, man-months per 1000 lines of code
>>>>(K SLOC) is roughly constant, no matter what language or environment is
>>>>used.  So, we can always assume that effort in man-months is
>>>>proportional to size in KSLOC.
>
>[I respond that this is absurd, to which Mr. Martin responds:]
>
>>I always hate it when someone says something is "prima facie absurd"
>>like this.  First of all, notice the claim is not *everyone*, but that
>>*within an organization* the productivity is roughly constant.
>
>I think I see the problem here. I'm probably not parsing your statement
>the way you are writing it. When I read the original post, it seems
>to read--to my parser at least--as follows: "It doesn't make any 
>difference what language or environment your organization uses, because
>you'll always produce the same amount of code in the same amount of time,
>regardless". This was the statement I was claiming was absurd, because,
>of course, it IS (is there anybody out there who thinks this statement
>is NOT absurd?). 

Okay, here goes:

(1) given the exact same environment -- language, operating system, etc
-- the average productivity of a group of programmers is likely to
remain nearly constant.

(2)  Across lanaguage environments, the number of source lines of code
tends to remain constant.  I know this seems counter-intuitive, but
it's well supported empirically: people average around 10 source lines
of code per total man days worked on the project.  Your milage may
vary, but the basic relationship does not.  Also, the constant there
does vary between organizations and within organizations -- factors of
2 or more average aren't uncommon -- but given the wide variation
between programmers it isn't clear whether this is the effect of
differences in management and environment, or just the effect of
having randomly assigned several high-speed programmers to one
project.  Unfortunately, it's also the sort of thing that's hard to
control for and that makes experimentation expensive.

This also looks like I am indeed insisting on the statement you say is
absurd.  I don't have any trouble with that -- the empirical evidence is
on my side.

(3) The last sentence says that effort in man months is proportional to
size in KSLOC.  This statement carefully avoids any statement about the
constant of proportionality -- but it does say that no matter how fast
you code, the time you take is proportional to the size of the final
code.

>In your followup response, you rephrase things so that
>I get the idea you are actually saying something else entirely, in which
>case we probably aren't arguing, just failing to communicate.
>
>>But secondly, "claims of empiricism" cannot be just unceremoniously
>>dumped.  The fact is that this relationship held over something like 400
>>projects, in dozens of different environments, with languages from
>>assembler to the best HLL's of the time.  Where is your counter
>>evidence?  How was it measured?
>
>I've witnessed organizations producing code at anywhere from 5 lines per
>week per programmer to about five hundred lines per week per programmer, which
>is a three order of magnitude spread. But since I'm not sure what your
>original claim was, I'm not sure if this refutes it or is completely
>unrelated to it.

We need to control for a bunch of things before I'd know either, for
example the effects of size and problem domain.  That's why we need more
than anecdotal statements here.  

If you are seeing people coding new products in an embedded program
that is in the area of 1 million SLOC and getting a productivity of
around 12.5 SLOC per man-hour (=500 SLOC /man-week), I'd like to know
what the environment is like.  on the other hand, if they are writing
seperated COBOL programs that communicate only through accessing a
well-defined collection of files, and the programs average around 10
000 SLOC, but the productivity is around 0.125 SLOC per man-hour(=5
SLOC/man-week), then I think something is wrong.

>
>>and we can see that at some point the effects of increasing size
>>dominate the effects of anything that affects just the constant (so long
>>as d > 1.)  One supposition I've made is that the difference between
>>programming-in-the-small and programming-in-the-large is that large
>>scale programming is when scale dominates in this equation.
>
>>[ This is my Discovery Of the Week.  I can't decide if I think it's
>>significant or not.]
>
>I agree that this is a good Discovery of the Week, 
Thanks!
>and it is the fact
>that we so strongly agree here that leads me to believe that we probably
>agree on the earlier stuff too, if we can just get our communications
>ungarbled.

I guess that's something for you to decide.

>
>>You didn't read down far enough.  The relation has not one but two
>>factors that can be set or chosen to suit differences in the
>>environment.  In the Intermediate and Advanced COCOMO models there are a
>>number of factors that model things like language chosen, environment,
>>use of a methodology, etc.  Basic COCOMO does not take these into
>>account, and as you say is inherently inaccurate.
>
>Ah, see, then we DO agree. So much for this thread...

Don't despair, there's still room for argument.

The part of what I said that you deleted is what I think is the most
interesting part: not that basic COCOMO without all the weighting
factors is inaccurate, but that it is suprisingly *accurate* given its
limitations.  That makes me think the basic relationship is *very*
strong, since it is good within a factor of two 60 % of the time with
very sloppy constants.  Given a well-instrumented environment and some
time working one's own regressions for the various constants, I think it
can be made *quite* accurate.  Even without this, it's not a bad
approximation for back-of-the-envelope calculations.

And the DOTW above suggests that no matter the choice of language and
environment, the effects of scale will eventually dominate as the
program gets bigger.

-- 
	 Charlie Martin (...!mcnc!duke!crm, crm@cs.duke.edu)
	    13 Gorham Place/Durham, NC 27705/919-383-2256

crm@duke.cs.duke.edu (Charlie Martin) (06/20/91)

In article <1991Jun19.174716.6861@netcom.COM> jls@netcom.COM (Jim Showalter) writes:
>>On the other hand I will grant you that modern software and hardware
>>technology does mean that you can do more in the same number of lines
>>of code, and that it is much easier to build large programs.
>
>Well, this wasn't my original objection, but as long as you've brought
>it up, this is the OTHER thing I find absurd about the COCOMO approach:
>the focus on SLOC. You cranked out 20 KSLOC of code in 1962. What language
>was it? Assembler? FORTRAN? In 1990 I cranked out 20 KSLOC of Ada,
>complete with tasking, genericity, exception handlers, constraint checking,
>etc. To achieve this same level of functionality in a 1962 language, I
>might well have had to write 10-100 times as many lines of code. How
>much assembler would you have to write to provide dynamic binding and
>inheritance a la C++? My guess is: considerably more than you could write
>in a year (ask Stroustrup!).

It's a good point in a way -- the productivity in function realized with
a high-level language is higher.  So what you get for your 20 KSLOC is
more stuff in some sense that is awfully hard to measure precisely.

On the other hand, we still tend to get 20 K SLOC of it per man year.

I think the answer is that we can't figure that measuring SLOC is the
*only* measurement of interest, but in terms of effort and costing
models, the fact that SLOC/man-hour remains relatively constant across
environments makes COCOMO and related models very attractive.

>
>As with all such metrics, the baselines should be calculated in terms of
>COMPLEXITY. How many complexities could you write per year in 1962 vs
>1991? THAT'S the critical issue, and I believe there is a wealth of
>evidence that modern languages and environments have greatly increased
>productivity when it is measured in this way. SLOC obscures the truth.

Sure, and I hope you'll forgive me if I point out its also sort of old
news.  The paradoxes of SLOC counts between HLL's and assembler were
being pointed out at least in the late sixties.  Another point is one
that Dijkstra harps on a bit -- SLOC measures a kind of bulk
productivity that might give a disincentive to good and elegant
programming.  You must be careful not to *reward* people on the basis of
SLOC per hour at a desk, because you'll get crap -- big crap.

But all the other metrics that are predictive -- Halstead volume, McCabe
cyclomatic complexity, function points -- also turn out to predict
effort about as well as Intermediate COCOMO.  (This is less of a
surprise that it sounds like: since I COCOMO is predictive and the other
ones are also predictive, they *must* correlate.)  Again, the
correlation is closer than *any* method's accuracy; there is no
empirical reason to prefer the others.
-- 
	 Charlie Martin (...!mcnc!duke!crm, crm@cs.duke.edu)
	    13 Gorham Place/Durham, NC 27705/919-383-2256

jls@netcom.COM (Jim Showalter) (06/20/91)

>Another point is one
>that Dijkstra harps on a bit -- SLOC measures a kind of bulk
>productivity that might give a disincentive to good and elegant
>programming.  You must be careful not to *reward* people on the basis of
>SLOC per hour at a desk, because you'll get crap -- big crap.

Agreed! I have often pointed out when discussing reuse that the issue
is not how much code a person wrote this week--it's how much code the
person AVOIDED writing this week while still providing the required
functionality. If a project estimates N SLOC is required to satisfy
the requirements and some reuse guru manages to satisfy the requirements
by only writing N/100 SLOC of trivial glue code, something significant
has been achieved, and a hell of a lot of money has been saved. Perhaps
programmers should get a percentage of those savings as a way to encourage
thinking before typing...
-- 
*** LIMITLESS SOFTWARE, Inc: Jim Showalter, jls@netcom.com, (408) 243-0630 ****
*Proven solutions to software problems. Consulting and training on all aspects*
*of software development. Management/process/methodology. Architecture/design/*
*reuse. Quality/productivity. Risk reduction. EFFECTIVE OO usage. Ada/C++.    *

rh@smds.UUCP (Richard Harter) (06/20/91)

In article <1991Jun19.174716.6861@netcom.COM>, jls@netcom.COM (Jim Showalter) writes:

> Well, this wasn't my original objection, but as long as you've brought
> it up, this is the OTHER thing I find absurd about the COCOMO approach:
> the focus on SLOC. You cranked out 20 KSLOC of code in 1962. What language
> was it? Assembler? FORTRAN? In 1990 I cranked out 20 KSLOC of Ada,
> complete with tasking, genericity, exception handlers, constraint checking,
> etc. To achieve this same level of functionality in a 1962 language, I
> might well have had to write 10-100 times as many lines of code. How
> much assembler would you have to write to provide dynamic binding and
> inheritance a la C++? My guess is: considerably more than you could write
> in a year (ask Stroustrup!).

It all depends on what you are doing.  Scientific analysis, physical
modelling, and numerical analysis, and so on, require much the same amount
of code whether you are doing it in FORTRAN-II or Ada -- the bulk of the
code is the expression of numerical calculations.  I am still not going
to buy your 10-100 times as much code.  I recollect that circa 1972 we
did the real time emulation for the ABM PAR radar in 75,000 lines of
FORTRAN-IV (on top of an 'off the shelf' task and interrupt manager
written in assembler) -- emulation meant that the data was coming from
a different type of radar, so we had to recast the data to make it appear
that it was coming from a phased array radar.  That little gem did an
awful lot of work -- quite frankly I rather doubt that a modern team of
programmers using Ada would do it in as few lines of code.  [The total
number of lines of production code was higher than 75,000 because of the
extensive number of auxiliary data reduction, editing, and analysis
programs.]  If anything I think that in the old days we coded tighter
and delivered more functionality per line of code than is the practice
today -- we had to because of the limitations of the machines available.

When I say 'we' I really mean those who worked hard at producing 
efficient well organized code -- then, as now, there was a lot of
bad code being written.  What we didn't have were a lot of packages
and utilities and functionality that weren't available because they
weren't written yet.  More importantly the problems we are solving
today aren't the problems we were solving then. 

> As with all such metrics, the baselines should be calculated in terms of
> COMPLEXITY. How many complexities could you write per year in 1962 vs
> 1991? THAT'S the critical issue, and I believe there is a wealth of
> evidence that modern languages and environments have greatly increased
> productivity when it is measured in this way. SLOC obscures the truth.

I try to write as few complexities as I can. :-).  However I have to
admit that I have been told that my notion of simple is somewhat
daunting.  :-)  Seriously, I don't think that complexity is the right
thing to talk about.  What is true is that the objectives and scope
of what one does with computers today is much broader and much different
than they were 20 years ago.  A lot of this has to do with the increased
capability of the machines themselves.  A lot has to do with the fact
that yesterdays problems were solved yesterday -- I don't need to write
a data analysis and display package -- vendor X will sell me one right
out of the box, complete with near instaneous color graphics for the
cost of a month of programming time.

But if you are simply saying that language X enhances my functionality
enormously because it has all sorts of nifty software engineering principles
embedded in the language I think you are overstating the case  -- I
already know those nifty principles and can use them in whatever crufty
3GL I happen to be stuck with at the moment.  [If you mean to argue
that Joe Blow with his still damp CS diploma will be better served by
using language X I shan't argue with you.]  This is not to say that I
won't use language X -- I will if it lets me write cleaner, simpler
code. 
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

hawksk@lonex.radc.af.mil (Kenneth B. Hawks) (06/20/91)

Sender:Kenneth B. Hawks

IMHO there is no universal solvent.  Having used COCOMO to cost software
developments from both the government and industry side it has its uses.
Note there are numerous implementations of the basic COCOMO formula.  I 
have used 4 or 5 of them.  COCOMO is _very_ people skill oriented.  The
experience variables selected cause the largest swings in the final
result.  

In contrast, RCA Price S, is very end use complexity sensitive.  

Both models have uses.  As a previous poster pointed out, calibration of
any costing model is a *must*.  Performing a sensitivity analysis of _your_
project is also a must.  My personal approach was to calibrate the models,
spend a couple of days honestly evaluating the team make-up I could expect,
the various complexities of the functions to be developed and then input
to the models.  If the answers came out within 5 - 10%, I bid it and was
comfortable that I had bid enough manpower.  I have done this on complex
assembly language and Ada jobs (the first was utilizing the very first
validated Ada compiler!)  Every job was on schedule and under budget.  One
job was 3 mandays over (out of 1809 costed), but the $ were under because
I was able to move the high salary guys off the job early :-)

My  $0.02....

Kenneth B. Hawks                                   |\   /|   "Fox Forever"
Rome Laboratory, Griffiss AFB, NY                   ^o.o^         BSA
hawksk@lonex.radc.af.mil                            =(v)=
Disclaimer:  There is no one else here who thinks like I do; therefore....

oman@med.cs.uidaho.edu (06/21/91)

>As with all such metrics, the baselines should be calculated in terms of
>COMPLEXITY. How many complexities could you write per year in 1962 vs
>1991? THAT'S the critical issue, and I believe there is a wealth of
>evidence that modern languages and environments have greatly increased
>productivity when it is measured in this way. SLOC obscures the truth.

I think this is also an inappropriate way to think about productivity.
Instead of thinking in these terms, we should be looking at success in
terms of completeness and user satisfaction.  Hence, productivity is the
highest number of requirements (user needs) implemented with the least
lines of code and complexity.  Therefore, if I can meet all of my 
customer's needs with the simplest and smallest program, then I am the
most productive.  Thus, a ratio of needs/size or needs/complexity (or,
for that matter, size/needs and complexity/needs) comes to mind with the
goal of attaining an optimal ratio of 1.

-----------------------------------------------------------------------------
--   Paul W. Oman, Ph.D., C.S. Dept., Univ. of Idaho, Moscow, ID, 83843    --
--     em: oman@cs.uidaho.edu    ph: 208-885-6589    fx: 208-885-6645      --
-----------------------------------------------------------------------------

alan@tivoli.UUCP (Alan R. Weiss) (06/23/91)

In article <AHL.91Jun15134716@saussure.technix.oz.au> ahl@technix.oz.au (Tony Landells) writes:
>I'm looking for opinions of COCOMO.  I was on a course about software
>quality assurance and they mentioned it as a methodology which
>produces a lot of useful figures, but the comments were accompnied by
>the disclaimer "I haven't used it, but people that do seem to think
>it's pretty good".

Sigh.  Another worthless anecdote is coming ... beware!
In my experience using a product called SLIM (Software Life Cycle
Management) that was based upon a COCOMO model, COCOMO a priori
estimates are not very good for Unix (tm)/C development because
they are based around a COBOL and mainframe environment.  COCOMO
tools that use the model are useful AFTER you have developed your
OWN measurements and have seen the results of your own experience.
The ol' chicken and the egg story.  The really useful part is getting
you to think in terms of schedules, probability, and resource
weighting. 

Their address from the users manual:
SLIM -- Quantitative Software Management, Inc. 
1057 Waverley Way
McLean, Virginia, USA  22101
(703) 790-0055

>I believe it's completely described in "Sofware Engineering Economics"
>by B.W.Boehm, Prentice-Hall 1987; but the book isn't readily available
>here in Australia (lead time for order is 12-14 weeks) and it is
>extremely expensive, so...

Ah, cough up for it :-)  Just kidding.  Dr. Barry Boehm is well-known
in the States as a S/W engineering guru who is very good in
IV&V/military/100% quality environments.  Nice lecturer, too.
He really knows his stuff.   Barry's books are great.

>I'm looking at this almost purely for a UNIX/C environment, since
>that's the field I work in, in case this affects people's comments.
>
>Thanks kindly,
>Tony Landells <ahl@technix.oz.au>

Hope this helps, mate.
I'm afraid you've just joined the Software Project Management
profession, and have hit the first problem:  how to estimate!

_______________________________________________________________________
Alan R. Weiss                           TIVOLI Systems, Inc.
E-mail: alan@tivoli.com                 6034 West Courtyard Drive,
E-mail: alan@whitney.tivoli.com	        Suite 210
Voice : (512) 794-9070                  Austin, Texas USA  78730
Fax   : (512) 794-0623
_______________________________________________________________________

seb1@druhi.ATT.COM (Sharon Badian) (06/24/91)

in article <830@tivoli.UUCP>, alan@tivoli.UUCP (Alan R. Weiss) says:
> 
> Sigh.  Another worthless anecdote is coming ... beware!
> In my experience using a product called SLIM (Software Life Cycle
> Management) that was based upon a COCOMO model, COCOMO a priori
> estimates are not very good for Unix (tm)/C development because
> they are based around a COBOL and mainframe environment. 

Not true! SLIM is not based on the COCOMO model. SLIM is a proprietary
model developed by Larry Putnam, Sr. Also, the COCOMO model is not based
on COBOL developments. It is based largely on scientific development at
TRW. They were probably mainframe developments, but I don't think this
is what you meant. SLIM is based on a large database of projects collected
by QSM. Some precentage of the projects are business applications, but
I believe the largest percentage are system and scientific developments.

Sharon Badian
seb1@druhi.att.com