[comp.software-eng] OOP in the "real world"

bglenden@colobus.cv.nrao.edu (Brian Glendenning) (06/14/91)

We have a large software system (~600k lines of code - Fortran and C)
which is starting to show its age (it was started before VAXes). We
have decided to rewrite it, and the OO paradigm seems to us to be a
good one.

Some people are skeptical about the value of OO (data encapsulation
and abstract data types are unquestioned). Unfortunately most of the
articles you see about OO describe fairly small systems - a few 10's
of thousands of lines of code.

So, can anyone point me to any articles, or even have any anecdotal
evidence, on OO in large software sytems? Thanks you.

Brian
--
       Brian Glendenning - National Radio Astronomy Observatory
bglenden@nrao.edu          bglenden@nrao.bitnet          (804) 296-0286

mark@hermesa.uucp (Mark McWiggins) (06/15/91)

bglenden@colobus.cv.nrao.edu (Brian Glendenning) writes:


>So, can anyone point me to any articles, or even have any anecdotal
>evidence, on OO in large software sytems? Thanks you.

There have been a couple of OSes written in C++ -- one (Choices) was
billed as being 78000 lines in 1987.  I also exchanged email with someone
whose company had a 250K line C++ system -- I think it was a mail-handling
thing.  Can't remember who it was.

-- 
Mark McWiggins
mark@hermesa.uucp 
...uw-beaver!amc-gw!hermesa!mark
Box 40357, Bellevue WA  98004 / +1 206 455 2786 (24 hrs.)

herkimer@tigercat.den.mmc.com (Don Herkimer) (06/22/91)

In article <BGLENDEN.91Jun14111338@colobus.cv.nrao.edu>, bglenden@colobus.cv.nrao.edu (Brian Glendenning) writes:
|> 
|> We have a large software system (~600k lines of code - Fortran and C)
|> which is starting to show its age (it was started before VAXes). We
|> have decided to rewrite it, and the OO paradigm seems to us to be a
|> good one.
|> 
|> Some people are skeptical about the value of OO (data encapsulation
|> and abstract data types are unquestioned). Unfortunately most of the
|> articles you see about OO describe fairly small systems - a few 10's
|> of thousands of lines of code.
|> 
|> So, can anyone point me to any articles, or even have any anecdotal
|> evidence, on OO in large software sytems? Thanks you.
|> 
|> Brian
|> --
|>        Brian Glendenning - National Radio Astronomy Observatory
|> bglenden@nrao.edu          bglenden@nrao.bitnet          (804) 296-0286


In my group at Martin Marietta we have been (re)using a large OO commercial package called the Analyst, from Xerox Special Information Systems (XSIS).  It is written in Smalltalk-80 and consists of

	~3.8M bytes of source code
	~661 classes
	~14K methods

If we estimate ~25 "lines of code" per method, this application weighs in at about 350 KSLOC.  Although not as big as the cited system, this can be attributed to substantial code re-use from the OO paradigm.

The Analyst is a mature product (6 years old) and has over 2500 copies distributed.  XSIS estimates that their productivity has been about 37 lines of code per day, which is anywhere from 3.6 to 7.2 times the "normal" production of "average" programmers.  Indeed, we have found that we are also very productive using the Smalltalk-80 programming environment augmented with the Analyst classes.

-- 
Don Herkimer
303-977-9580
Martin Marietta Space Launch Systems
herkimer@tigercat.den.mmc.com

bytor@ctt.bellcore.com (Ross Huitt) (06/27/91)

Sorry this is a little late. New mailer...

In article <1991Jun21.200139.14639@den.mmc.com>,
herkimer@tigercat.den.mmc.com (Don Herkimer) writes:
|> In article <BGLENDEN.91Jun14111338@colobus.cv.nrao.edu>,
bglenden@colobus.cv.nrao.edu (Brian Glendenning) writes:
|> |> 
|> |> We have a large software system (~600k lines of code - Fortran and C)
|> |> which is starting to show its age...
[...stuff...]
|> |> --
|> |>        Brian Glendenning - National Radio Astronomy Observatory
|> |> bglenden@nrao.edu          bglenden@nrao.bitnet          (804) 296-0286

|> In my group at Martin Marietta we have been (re)using a large OO
commercial package called the Analyst, from Xerox Special Information
Systems (XSIS).  It is written in Smalltalk-80 and consists of
|> 
|> 	~3.8M bytes of source code
|> 	~661 classes
|> 	~14K methods
|> 
|> If we estimate ~25 "lines of code" per method, this application
weighs in at about 350 KSLOC. 
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Let's try some basic math:
	3.8 meg of source / 14k methods = 271 bytes per method
	271 byte per method / 25 lines per method = 11 bytes per line

I don't buy it.  I've been doing some preliminary metrics gathering
on OO systems. The two largest commercial/public systems I've looked
at were the NIHCL C++ class libraries and the Smalltalk/V images. 
The numbers I've been getting are typically on the order of 3 lines 
of code per method. (The C++ LOCs were executable statements as 
defined in the BNF, the Smalltalk LOCs were Non-Comment/Non-Blank 
Source Lines.) This is also true of smaller systems I've seen. 
If we use 3 lines per method with your 14k methods this would yield
about 42K 'statements'.

If you triple this to get an estimate of the Raw Source Code Lines (ala wc -l)
then you get a system that's about 126KLOC (everything included). This is
roughly half of your estimate. Yes, its a big system, but I don't think
that 350KSLOC is correct by any definition of LOC.
(
11 bytes pe
r line woul
d be a bit 
small for a
system in a
ny language
,even Small
talk.
:-)

|> Don Herkimer
|> 303-977-9580
|> Martin Marietta Space Launch Systems
|> herkimer@tigercat.den.mmc.com

Anybody else have any numbers they would like to share?

Ross Huitt
Bell Communications Research
(908) 699-2973
bytor@ctt.bellcore.com

randy@tigercat.den.mmc.com (Randy Stafford) (06/27/91)

In article <1991Jun26.221144.6532@bellcore.bellcore.com>, bytor@ctt.bellcore.com (Ross Huitt) writes:
|> Sorry this is a little late. New mailer...

Maybe you should get another new mailer.  To wit:

|> 11 bytes pe
|> r line woul
|> d be a bit 
|> small for a
|> system in a
|> ny language
|> ,even Small
|> talk.
|> :-)

Now the rest:

|> In article <1991Jun21.200139.14639@den.mmc.com>,
|> herkimer@tigercat.den.mmc.com (Don Herkimer) writes:
|> |> [... stuff ...]
|> |> In my group at Martin Marietta we have been (re)using a large OO
|> commercial package called the Analyst, from Xerox Special Information
|> Systems (XSIS).  It is written in Smalltalk-80 and consists of
|> |> 
|> |> 	~3.8M bytes of source code
|> |> 	~661 classes
|> |> 	~14K methods
|> |> 
|> |> If we estimate ~25 "lines of code" per method, this application
|> weighs in at about 350 KSLOC. 
|>                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|> 
|> Let's try some basic math:
|> 	3.8 meg of source / 14k methods = 271 bytes per method
|> 	271 byte per method / 25 lines per method = 11 bytes per line
|> 
|> I don't buy it.  I've been doing some preliminary metrics gathering
|> on OO systems. The two largest commercial/public systems I've looked
|> at were the NIHCL C++ class libraries and the Smalltalk/V images. 
|> The numbers I've been getting are typically on the order of 3 lines 
|> of code per method. (The C++ LOCs were executable statements as 
|> defined in the BNF, the Smalltalk LOCs were Non-Comment/Non-Blank 
|> Source Lines.) This is also true of smaller systems I've seen. 
|> If we use 3 lines per method with your 14k methods this would yield
|> about 42K 'statements'.
|> 
|> If you triple this to get an estimate of the Raw Source Code Lines (ala wc -l)
|> then you get a system that's about 126KLOC (everything included). This is
|> roughly half of your estimate. Yes, its a big system, but I don't think
|> that 350KSLOC is correct by any definition of LOC.
|>
|> [... stuff ...]
|> 
|> Anybody else have any numbers they would like to share?
|> 
|> Ross Huitt
|> Bell Communications Research
|> (908) 699-2973
|> bytor@ctt.bellcore.com


Yes, Mr. Huitt, I do have some numbers I'd like to share.

First of all, there is no way in hell you're going to get an average of 3 SLOC
per method in any language, especially C++.  Dr. Tom Love (who has been in the
business for years and was behind the development of the Objective-C ICPAKs)
estimates 10 to 15 SLOC/method for a prototype method and 25 to 30 SLOC/method
for a commercial-quality, production method.  Let's look at it another way.

The size in bytes of the Analyst (ASCII) source code is 3,423,423 (the size of the
Analyst.sources file for Analyst V3.2).  The number of methods in the Analyst is
13,721 (the number of CompiledMethod instances).

3,423,423 / 13,721 = 249 bytes per method.
249 bytes per method / 40 bytes per SLOC = 6.225 SLOC per method.

Similarly, for Objectworks for Smalltalk-80 V2.5, the sources file size is
2,222,520 bytes.  The number of methods is 6784.

2,222,250 / 6784 = 328
328 / 40 = 8.2 SLOC per method.

The metric 40 (ASCII source) bytes per SLOC comes from some Analyst product literature.  In
any case, these calculations of number of SLOC per method are two to three times
greater than the estimate you used.

Using yours, Dr. Love's, and the calculated estimates, we get the following
SLOC estimates for the Analyst:

13,721 methods * 3 SLOC per method = 41,163 SLOC.
13,721 methods * 6.225 SLOC per method = 85,413 SLOC.
13,721 methods * 25 SLOC per method = 343,025 SLOC.

Tripling these, as you suggest, yields anywhere from 123KSLOC to 1.03MSLOC in the
Analyst.  Given the order of magnitude difference, I contend that it is naive to
"buy it" or "don't buy it" based on one single estimate of SLOC per method.
Perhaps you could focus your energies on statistically verifying the 40 bytes
per SLOC average for ST80 methods.  Then you could come up with a believable
estimate of SLOC per method in ST80.  You could also fine-tune the total source
size (the size of the sources file might be a little high, because it contains
other junk besides pure method code). 


Randy Stafford
randy@tigercat.den.mmc.com


P.S.  How's my "basic math"??

bytor@grumpy.Berkeley.EDU (Ross Huitt) (06/28/91)

I decided to take this off line until we clear this up. If you think
things are clear, then post this back to the net.

I find calculating metrics on Smalltalk code a little troublesome.
In C and other prodcedural languages, and to a lesser extent C++,
counting statements does make some sense. But counting statements
in Smalltalk is dubious at best. You tend to see these very large
cascades of expression that contain a lot of functionality. So,
when it came time to count Smalltalk methods I used the following
rules:
1) Don't count blank lines.
2) Don't count lines with just comments.
3) Count remaining newlines in the source of the method as LOCs.
I hate counting newlines for metrics for any reason, but right now
I'll live with these rules for Smalltalk. Please note, however,
that this is not your definition of SLOC.

Looking at the Smalltalk/V image and a couple of medium (100+class)
applications indicated averages around 3 lines of code per method.
I didn't count bytes but I would venture a guess that that lines
were around 30-40 bytes as you suggested.

Metrics for C++ are quite a bit easier. I count executable statements,
in particular all statements as defined in the ARM grammar except
labeled-statements and compound-statements. Metrics for the NIH libraries,
several publically availible systems as well as a couple of production
systems had averages in the three to five statement per method range.
Also, the more 'object-oriented' the system is the lower the average will
be. If you triple these stmt-per-method numbers for C++ it provides
a fair approximation of the number of raw lines of source code.

The tripling I suggested was for the 42K LOC number. It may (or may not)
provide a rough indicator for the number of actual lines (newlines/SLOC)
in the source code of the method. Doubling may be more accurate
as suggested by your 85K number. So, maybe we just misundertood
each other's definition of LOC.

I like the idea of trying to estimate the LOC per method based on
the image size and method count, but I don't do metrics full-time
so checking this out will have to wait. I don't know if it will
work, but I don't think anybody else does either.

My main point is that the number of executable statments per method
in an object-oriented C++ system will be very low, especially if the
Law of Demeter is adhered to. My assertion is that 'very low' will be
less than 4 statements for most systems. The number of statements per
method in a C system are typically greater then 10 statements per
function for the systems I have looked at. I think this difference in
statement-per-function/method is significant and will have very great
impact on maintenance.

So, it appears that a better estimate for the SLOC (where SLOC is the
number of actual physical lines of raw source code) of the Analyst is
around 85-100KSLOC. This assumes, of course, that you, Dr. Love and I
are defining SLOC in the same manner. (I still assert that there is no
definition of SLOC that would yield 350KSLOC for that system, which
is the reason I posted in the first place.)

I hope this clears things up for now.

Ross Huitt
bytor@ctt.bellcore.com

holen@netcom.COM (Victor Holen) (06/29/91)

In article <1991Jun26...> bytor@ctt.bellcore.com (Ross Huitt) writes:
>
>Anybody else have any numbers they would like to share?
>
Sure.  1,2,3,4...  :)
Also, In a large CAD system, using Mainsail  (OOL)
 350 K lines source  = 5,503 K bytes compiled
      (comments included -sorry :(  )  = 5,503 K bytes
  which gives ~ 16 bytesCompiled/lineSource on averge,
 and differed from one large sub-project to another 
 from 10 bytesCompiled/lineSource  to 32 bytesCompiled/lineSource.
 Some reasons for wide variations:
   Coding styles: number of comment lines, 
                  number of statements per line (sic)
   Use of macros versus functions.
   Large use of common utility macros (interactive graphics project)
       versus much less in algorithms projects.
To make more sense, one could write a program to count executable
statement after macro or inline procedures expanded,... but one didn't
--
e e-mail:   holen@netcom.com
-- 
 e-mail:   holen@netcom.com