[comp.software-eng] Code inspections very long comment on topic

duncan@ctt.bellcore.com (Scott Duncan) (02/07/91)
(To avoid looking like I'm picking on just one or two people, since their
posts formed the basis for many comments made in this vein, I', dropping all
names from references I make below.  Perhaps this is only symbolic given what
folks know about the history of this thread, but I want to focus on the con-
tent of the material not the author.  Now where have I heard that before? :-))

This thread began with a statement that

>It seems that the problem with code inspections is largely emotional.

They also require a substantial commitment of time and effort on the part of
participants so that has some effect.  This commitment also comes at a point
in the software lifecycle when there may be less encouragement to take time
from coding, unit test, and debugging to pursue explicit (manual) defect de-
tection activity.  I believe published studies and people's experience indicate
how this time is more than made up for in the evolution of the system but not
always in the first release cycle.  Thus technology which could assist with
inspection effort would be useful in making it more acceptable.  Another
person also noted that "The major problem is in scheduling if the process model
does not include inspections.  ...there are limits to how many anyone can go
through per week (about 2 max...)  ...they are very hard to add in to an exist-
ing schedule.

>Though there is plenty of evidence that code inspections are cost
>effective, I believe they would tend to be boring and stressful.
>Boring because they are a time consuming and non-creative activity --
>current issue of IEEE Software recommends 150 lines of code reviewed
>per man-day as a good figure.  I know I would not want to do this, and
>who would?

I guess it depends upon the spirit in which it is done, the interest level of
peers in one another's work, the benefits of cross-training that people feel
also come from inspections, etc.  If one defines "creative" as producing
something new, original, novel, etc., then inspections can be criticized for
not being "creative," I guess.  I always felt that part of the creative pro-
cess, for me, was how others reacted to my work.  Clients can react to the
external manifestations of a software system, but it's also nice to have one's
peers react to implementation and design.

>            Stressful because it is out of the programmer's control,
>and because criticism is involved.  People identify closely with their
>creations and find criticism painful.

If one goes into inspections -- be they the person whose code is being reviewed
or one of the reviewers -- with the idea that this is an exercise in criticism
rather than a quality improvement activity, I can see where pain is almost
guaranteed.  Inspection is not simply a process by which peers sit together and
blow holes in one another's work.  There is criticism, but I think people, in
general (and I am no exception), have difficulty giving and receiving criticism
effectively.

>Not only that, but your average programmer was very likely attracted to
>programming in order to avoid social interaction and to create
>something under his/her personal control without anyone else watching.
>He/she is likely to be on the low end of the social tact scale and
>singularly unqualified to deal with this delicate situation.

Most of the evidence I see for this position comes from studies done many years
ago.  I am not sure this is accurate any longer.  The Couger and Zawacki study
reported in Datamation in 1978 is often cited in this regard.  One may still be
able to find such folks (and many of them in certain environments); however, I
am less likely to believe this is the "average" profile of a programmer these
days.  There are many more people in programming now than a decade ago and the
environments in which development occurs have broadened too much.  I see too
many exceptions to this characterization to accept it as an indicator of what
is "average."  Another person also said that though "'the average programmer'
can handle a low level of social interaction [it] doesn't mean it's sought out.
Don't confuse correlation with causality."

>                                                              Again,
>this may very well have attracted them to programming: it doesn't
>matter whether anyone likes their personality, all that counts is
>whether the program works.

If one of the goals of the "average" programmer is to "create something under
his/her personal control without anyone else watching," perhaps they should
work in a more secretive line of employment? :-)  And their desire to work
"without anyone else watching" may not matter much if inspections produce a
software system with fewer delivered defects.  And as another person noted:
"inspections are not nearly as painful as shipping bugs to customers."

>In order to reduce these problems the following has been suggested:
>1) The author not be present at the inspection

This may or may not work depending on how well-documented the inspection com-
mentary might be (or how much effort people expend to make it well-documented).
There has been investigation into using more automated tools to do some of the
initial inspection effort such that the role of the "scribe" is reduced in
importance, i.e., many of the comments are captured on-line (both by source
analysis tools and then by reviewers in preparation for the actual inspection
session.  I rather think it is a bad idea (as others have noted) and feel that
it just avoids the issue of giving/taking criticism effectively.

>2) Only errors are communicated to the author.  No criticism of style allowed.

Sometimes "style" can also mean readability, etc. to people.  Even if it is not
a specific implementation error, "clever" code may be still be undesirable
unless exceedingly well justified and documented.  Most of the lifetime of a
system is spent in maintenance (evolution beyond the first release or modifica-
tion to a large system in the midst of getting the first release out).  If the
original author of a piece of code is not guaranteed to be its maintainer for
life, then there has to be some standard of readability and comprehension in
effect.  Some would argue that this is an aspect of "style," hence I am not
in favor of eliminating "style" criticisms unless that is defined very clearly.

Another person noted that programmers "*must* push themselves to write so that
they can be understood; if readers/reviewers/inspectors don't understand, that
is (in general) a sign that the author needs to rewrite (most often: comment
more clearly) what he or she has written."  Research conducted in comprehending
code has suggested that the lack of well-commented or clearly written code is
more than just a contributor to errors.  It also contributes to the desire to
rewrite/redesign code in order to make it more clear to the programmer current-
ly at work on it.  This tends to erode design integrity over time, making the
job of enhancements and defect repairs more costly.

>                               It also might succeed in a large
>paternalistic organization as these would be more likely to attract
>group oriented engineers.  Note that the classic studies of code
>inspection occurred at mammoth IBM.

On the other hand, others have used them in less "paternalistic" situations
as I think more recent experiences in print and reported at conferences and
workshops have indicated.  Check out some of the NASA experiences at Goddard
and some that JPL have had recently.  And someone else also noted that "often
even IBM does not do inspections (I speak from personal experience)."  My own
experience, based on talking to folks, suggests that while inspections are not
always held -- especially for smaller pieces of code which can be addressed
with comprehensive testing -- design reviews continue to be used.

>In spite of all this, I think code inspections would be accepted in any
>application where there is a clear need such as the space shuttle
>program where reliability is crucial and interfaces are complex.

Space shuttle examples are also life critical ones.  I don't think we should
limit the issue of reliability (and complexity's impact) to only life threaten-
ing examples.  I think it would be more useful to specify examples where reli-
ability is NOT crucial, then let people who think otherwise respond.  Who can
argue about the importance of reliability in life critical situations?

>                                                                  In
>such cases code inspections are clearly a necessity, and engineers
>might welcome --or at least, tolerate -- them as essential to getting
>the job done.  On the other hand in routine applications with a good
>deal of boiler plate code they could be a "real drag", exacerbating the
>humdrum nature of the task.

Is an application on a personal computer to keep tax records important enough
to need reliability?  to keep a date book?  etc?  What specific kind of appli-
cation is so routine that reliability is less significant than avoiding "real
drag" situations?

Another person noted that

>                           A halfway-decent programmer can produce several
>times that 150 l/d figure...proceeding through anything at 20 lines/hour
>(that's 3 minutes per line, effectively???) is too slow to feel productive.

But someone else suggested that

>One can write several hundred lines of code in one session.  However that is
>exceptional.  Typical industry figures are 5-10 thousand lines of delivered
>code per year which is 25-50 lines/day.  Programmers who can average 100
>lines/day are quite exceptional.

It was also noted that lines of code are not a great measure of productivity
(just lines generated).  So perhaps the issue is to focus on some other produc-
tivity measure rather than how much code can be produced.  Indeed, it is prob-
ably the wish of the client that the system be implemented in as little code as
possible.  Being able to demonstrate that one used the least amount of code
(by judicious reuse efforts rather than tricky coding) might be the virtue one
wishes.  I'd prefer to see the industry move toward measures of functionality,
however.

The comment was also made that

>Think of it this way:  Code inspection is a tool.  You don't use every tool
>for every job.

I think that's an important point.  Inspections, more specifically, are a tool
to DETECT defects.  There are also defect analysis approaches that lead to
recommendations for process-related changes (rather than product-related ones)
which seek to PREVENT defects by changing the conditions that seem to result
in specific kinds of defect occurrences.  The output of inspections and other
defect detection approaches (e.g., various levels of testing, client reports)
is fed into such analysis.

Someone asked about measuring quality by saying
>                                           What other job where a product
>(software in this case) is produced do you find people not being judged on
>the quality of their output?  People absolutely have to be judged on how well
>they perform their jobs and if putting out high quality software is their
>job then their manager should be able to measure their job performance and
>react accordingly.

And someone else replied that
>                                                      Programmers need to
>realize that their job is producing high quality software, not just a
>program that "works".  They need to be held accountable for their work.
>I have found that true professionals don't mind code inspections and
>walkthroughs, because they are confident in their ability and proud of
>their work.

And another said "Of course people must be judged on the quality of their work
- but define software quality for me."  To which someone said "It meets
customer expectations."  And another said "Conformance to standards.  Less than
target Find Rates.  The cost of doing it over again.  Etc."

There are internal measures of quality and there are external ones.  Internal
quality is measured from the perspective of the person/organization creating
the software, but the client of that effort is has the final say on what is
"quality" from the external view.  In some cases this can be the result of
negotiation around some known measures of quality (internal measures become
agreed upon external ones).  In most cases, this negotiation does not occur,
and the clients gets to say what they feel constitute "quality."

An organization who takes a customer perspective comes to care about external
quality issues.  Organizations who view the recipients of their products as
(captive) users of one sort or another, often focus more on internally opti-
mizing some aspect of their productivity (e.g., cost, schedule) concerns.
(This is based on industry interviews over 2-3 years, here and abroad, with
over 90 large (e.g., systems of 100k to 14M lines of code) software producers.)

It is interesting that the Japanese often negotiate software contracts with
their clients based on a predicted level of quality.  If the client wants
exceptionally high quality, they pay for it, but the Japanese development
organization seems to be able to deliver software to meet that requirement.

>                                          Certainly a programmer needs
>to be evaluated based on their output, but process meters cannot be the
>sole source for evaluating the programmer or else the process won't
>be followed (or it may be "tricked" into showing things that are not
>real.)

Some make distinctions between line of code and defect measurements as being
"product" measures while "process" measures mean other things usually support-
ed by line of code and defect measurements.  For example, the defect analysis
approach I noted above is based on trying to eliminate error prone activity in
the process of creating software rather than eliminating individual defects on
a one-by-one basis (a product concern).

>Another argument is that process meters usually show "faults leaked
>from one stage to another."  These indicators evaluate the process
>and the team's ability to run it.  No single programmer should be held
>accountable for a buggy module in the field -- many have been involved
>in getting that module out there.

That's certainly one way to look at it.  However, that circles the issue of
whether an individual's performance is measured or not.  Rather, it says that
individual performance should not be, at least not according to current mea-
sures of software quality and productivity.  The question still remains as
to what measures of performance one will use then?

Commenting on a personal experience, one person said that

>           When I have had my code inspected, it was to happen before
>any testing of the code took place, (by definition, after the first
>clean compile).

This is what I have experienced as the recommended approach.  I have seen
this changed to "before unit test," "before integration test," etc.  I feel
the objective is to avoid too much "tinkering" with code after initial ef-
forts at automated debugging have shown flaws.  By "tinkering" I mean changes
made in private which may move the code away from initial design specification
in the name of getting the bug out quickly.  Perhaps this is some of what
was meant by the suggestion that "programmers could spot errors and suggest
small-scale design improvements before testing occurred.  (Major design sug-
gestions should have already been received at design reviews)."

>Now, if you wish to examine my (or anyone else's code) for 
>judging the quality of work, I'd rather it was done after the 
>testing cycle.  That way, you are judging the completed code,
>as opposed to a first draft.  

This is probably a reasonable approach if defect metrics are to be used as a
measure of individual effectiveness in coding.  And as another has said "It
depends what you see as the output.  In my view, that's the code that makes it
past inspection, integration, and test, and goes out to the customer." And that
"If you make errors caught at an early stage 'evil', then people will do every-
thing they can to avoid catching them there...."

But it was also stated that

>From my experience, I think that code inspections should be started
>after a certain amount of (sanity) testing is done. This prevents
>wasting the inspector's time in uncovering (obvious) defects.

and that

>                          From empirical data, one can determine
>how many defects are detected per person-hour of testing and code
>inspection.  When less defects are detected in one hour of testing
>than there would be during an inspection, it is time to start in-
>specting.

I guess the real lesson to draw is to determine, locally, what kind of confi-
dence measures/evidence one needs to feel that the "private" debugging really
reflects what the "public" testing/field experience will be.  One view of
inspections has been that they are used to define the point at which the pro-
grammer says "I have confidence that this code is correct."  This is extended
to trying to encourage that point in time to be much earlier in the lifecycle
than during some testing phase (even unit testing for some).  Thus, the 'clean-
room' approach to development says that a programmer should develop in such a
way that they do not need to even compile the code to feel confident about it.
(Whether one agrees with this position or not, that position then means that
inspections will occur before the first compile.)

Thus, when to do inspections seems to depend a lot on what philosophy is adopt-
ed regarding the "process" of development one wishes to encourage.  Early in
development pushes for more formal review and analysis at the source code
level (i.e., designed system) while later in the lifecycle pushes toward more
automated effort directed at the object code (i.e., executable system).

>                         The idea of a review (whether it be a 
>design review, or a code review) is to allow peers the oportunity
>to comment on the proposed solution to a technical problem.  The
>reviews should remain just that.

I think this is a very well stated definition of what the inspection process
should accomplish.

>                                  Now, if you wish to measure a 
>programmer's performance based on the quality of code that the 
>programmer has agreed is ready to be released, that is another
>matter entirely.

And this is a separate issue:  an important issue, but separate.

On the subject of the "engineering" nature of software development someone
said that

>                    Right now, there is no one correct way to write a
>program, unlike other engineering disciplines, which may have a single 
>answer (Does this bridge support X lbs of weight? A simplified example...).
>How to define quality for software is still nebulous right now. 

If one removes the "nebulous" aspect of that "rating" by being specific, then
there is less objection.  On the other hand, what behavior do you wish to en-
courage through ANY system of performance evaluation/rating?  What do you want
programmers to do?  Define measures that encourage them to do that.  I believe
the problem is less with the measures, themselves, than with their misapplica-
tion given "nebulous" definition of what is expected of programmers (and other
computing professionals).

>Do you base it on how well the program works? How efficiently it runs?
>How well it is commented? All the above?

Yes, if that's how you define what's important to achieve.  Also these could
be whether it was created on schedule, within budget, etc.  What matters?

>                                         Quality will mean different
>things to different people, depending upon what their needs are.

No argument there.  I am advocating defining "needs" better.  If one is not
clear about what or why they are measuring, there are certainly going to be
problems in how people interpret that measurement.

And on the subject of process vs product someone asked

>                             isn't it about time that we started breaking
>our enfatuation with the *process* of building software (source code,
>style rules, programming language, lifecycle, methodology, software development
>process, CASE, etc, etc) and started concentrating on the *product*
>itself.

Actually, I've always though most of the attention HAS been on the product
being created.  Certainly most of the current measures (lines of code, defect
reports, etc.) are product-focused.  They are not direct measures of the pro-
cess which resulted in the software.  They are indirect implications about
that process, i.e., that it did or did not create a product of acceptable
quality or with acceptable productivity and, by implication, that the process
can be considered good or bad because of that.

Some of the things you mention (source code, style rules, programming language,
CASE, and maybe things in your "etc, etc") are aspects or artifacts of the
product of software (or the tools used to create it).  I consider methodology
(and the methods) and lifecycle to be process related issues.

>To me, the paradigm shift that we're facing is figuring out how to comprehend
>software products, which unlike manufactured things like firearm parts, 
>are intangible...undetectable by the natural senses.

A good point, but I think people have always been faced with trying to do
that.  I think very little attention, until this past decade, has really been
spent on the process.  The fact that the Waterfall Model has been around for
some 20+ years means that "a" process has been focused on, but that is not the
same, to me at least, as being concerned about process itself.

>I envision tools to assist in understanding the static and dynamic properties
>of a piece of code the way physicists study the universe, not by asking 
>how it was built (a process question), but by putting it under test to 
>determine what it does.

I think this is a very legitimate approach to the product side of software.  Is
is that you advocate ignoring the process side or are you just arguing for more
effective measures and study of the product side?  If it is the latter, as I
assume it might be, then I have no argument with it.  On the other hand, if you
are suggesting that "breaking the enfatuation" means forgetting about process,
I think the problems of years of having just one (Waterfall) is more the issue.

>I'm proposing another view from the *outside*. This view ignores the process
>whereby it was constructed. This involves specifying its static (does it
>provide methods named push and pop?) and dynamic properties (does pushing 
>1,2,3 cause pop to return 3,2,1?)

I think you have a different definition of "process" than I am used to using
if you associate process with knowing about source code implementation detail.

Commenting on the effort needed to conduct inspections, one person said that

>(There have been some studies indicating that you can get by with
>the author, a combined reader/moderator, and one other inspector,
>or similar "reduced" inspections, without letting too many more
>errors get by.  Assuming the same effort per individual, that
>increases the inspection productivity to about 250 LOC per person-
>day.)

At the NASA Software Engineering Lab's annual workshop at Goddard, JPL reported
that they felt code inspections needed fewer people than design or requirements
inspections (partly because fewer people are involved in the artifact being re-
viewed, of course).  As I recall, about 3 was the number they suggested for a
code review.  Of interest here also, based on questions some have asked, is
that JPL started with requirements, moved into design, and are just now (or
have been over ther past year) moving into more code inspections.

Then it was asked

>I can see how inspection works for new, highly modular code.  How does
>it work when your maintaining dinosaurs?  ("Rewrite the dinosaur" is
>not a solution I could have gotten past management.)

I know of companies with the policy that ALL new/changed code is inspected.
Others have policies that identify inspections if a certain percentage of an
existing module is changed.  Thus, for maintenance, it does not seem any
different than for "new" development, depending on how one defines maintenance.
If it includes evolution of existing systems, then new AND changed code is
likely to exist.  If it is restricted to defect repairs, then you may never hit
a percentage level threshold required for inspections.  In the latter instance,
the view that ALL new/changed code should be inspected is probably the only
one that makes any sense if you really intend to do inspections at all.
 
Speaking only for myself, of course, I am...
Scott P. Duncan (duncan@ctt.bellcore.com OR ...!bellcore!ctt!duncan)
                (Bellcore, 444 Hoes Lane  RRC 1H-210, Piscataway, NJ  08854)
                (908-699-3910 (w)   609-737-2945 (h))