[comp.software-eng] our experience with btool

Dean_Thompson@TRANSARC.COM (09/20/90)

Brian Marick, at the University of Illinois, has recently announce the
beta release of a testing tool called "btool".  This tool instruments
target code to find out which branch outcomes are exercised in a test
run.  We have been using btool in a commercial environment for several
months now, and it just struck me that I should post a description of
our experiences.

This post is divided into sections:
- Who I am
- Why we used btool
- How btool fits into our process
- What we had to change in btool
- How our developers reacted to btool
- Summary

=== Who I am

I am the manager of software engineering for Transarc's transaction
processing development group.  We are developing order 100,000 lines
of new C code for Unix, divided into "components" of roughly 5,000 to
35,000 lines.  Most of the components are libraries; a few are servers
with RPC interfaces.  Each component has a well-documented interface,
and is developed by 1 to 5 people.

=== Why we used btool

We have a lot of people working on testing, and most of them don't
report to me.  I needed to ensure that every component was tested
thoroughly, but I would have triggered (well-justified) rebellion if I
had imposed standards that were complex or not defensible in every
detail.  I saw a post from Brian about btool, asked him for a copy,
and brought it up in our environment.  I was pleased by btool's design
and documentation.  I decided that branch coverage is a simple,
objective measure of thoroughness that was easy to apply and very
useful.

=== How btool fits into our process

Branch coverage is not a complete measure of thoroughness.  The only
thing a high branch coverage figure tells you is that your test
program has executed a lot of your target code.  Unless you know
whether the target code has executed correctly, the branch coverage is
completely useless.  If you set standards for branch coverage, don't
allow them to be met by test programs that don't thoroughly check
their results!

If you attain high branch coverage with a test program that checks its
results carefully, you can have some confidence that the target code
doesn't do incorrect things.  You still have to make sure the target
code does everything it is supposed to do.  We enumerate everything we
want to make sure the target code does in a concise "test checklist".

Here are two excerpts from a marketing document that describe our
testing standards in more detail:

---[ from the section on testing ]---

Transarc builds a full suite of self-checking tests for each system
component and for the integrated system.  Each test suite is planned
against a "test checklist", which itemizes the inputs and scenarios
that will be tested and the results that will be confirmed.  The
checklist is created early in the component's development and
augmented over time by both testers and developers.  It helps focus
the test effort on efficient coverage of the entire component
interface and scenarios that seem prone to failure.
After the test suite has been developed, a "branch coverage tool" is
used to instrument the target code.  The test suite is expected to
cover 80% of the possible branch outcomes in the target.  The test
suite is reviewed for satisfactory branch coverage and checklist
coverage before the target code is released.

---[ from the section on reviews ]---

Transarc's managers, developers, and testers also hold
release-readiness reviews before the "alpha," "beta," and "product"
releases of each component and of the integrated system.  The
participants in a release-readiness review confirm that all
functionality planned for the release has been implemented and that no
new functionality has been added since the Alpha review.  They check
to see that the planned tests have been completed, that sufficient
testing has been performed for the release, and that testing has not
uncovered an excessive number of defects.  They also make sure the
component or system has met Transarc's standards for the specific
release stage:

 - ALPHA. The test suite has a measured total branch coverage of at
   least 50%.  All "basic" entries in the test checklist have been
   covered (other checklist categories include "error", "recovery", "stress",
   "performance", and "probe").  All code has been reviewed.  The design
   document is current.  An NLS strategy, trace strategy, and fatal error
   handling strategy are included in the design document.  The fatal
   error handling strategy has been implemented.

- BETA. The test suite has a measured total branch coverage
  of at least 80%.  All "basic" and "error" entries in the test
  checklist have been covered.  All reference documents have been
  reviewed by writers.  All user documents have been reviewed by
  developers.  The NLS strategy and trace strategy have been implemented.

- PRODUCT. All "recovery", "stress", and "performance"
  entries in the test checklist have been covered.  The product
  support plan has been reviewed.

-------------------------------------

=== What we had to change in btool

We had two significant problems with btool:

 - It didn't work well for large bodies of code because it wasn't able
   to handle recompilation of a single file.

 - It didn't give us a way to get separate sets of output from two
   instrumented components in the same test run.

We fixed these problems in our own copy of btool by changing it to
generate a separate ".map" file for each ".c" file.  We also extended
the reporting facility to report on the subset of an output log that
applies to a specified set of ".map" files.

We also modified btool to compute our "percent branch coverage"
figure, and to summarize results by file.

=== How our developers reacted to btool

At first, developers resisted mandatory branch coverage levels because
they feared that high branch coverage wouldn't correlate well with
good testing.  I told them that good testing required great creativity
and would always be an art.  I argued that we would always have to
subjectively evaluate the quality of a test, but that measuring branch
coverage would help us know how thorough the test was.  We seem to
have mostly stopped worrying about this.  Perhaps my explanation had
some effect, but I think mostly people learned with experience that an
un-exercised branch usually did represent something concrete that
wasn't tested.

The next big worry was whether the 80% coverage requirement was too
high.  We still don't know the answer to this.  We have achieved just
under 80% coverage for one component of about 7,000 lines, and I don't
feel we put excessive amounts of effort into testing that component.
We have just broken 50% for another component about the same size, and
those testers are saying that 70% might possibly be reasonable but
that 80% will be absolutely infeasible.  We will know in a month or
so.

One difficulty is that many branches represent only a few lines of code
and are executed only when extremely unusual errors occur.  I have
told everyone that I am comfortable counting specific branches as
"covered" if the developers can compellingly argue that these branches
have been confirmed correct by inspection.  I think it is extremely
beneficial to be able to decide that some branches are worth testing
by execution and others should be confirmed by very careful
inspection.  Sometime I'd like to try taking a component for which
testing was complete and reviewing all branches that weren't covered.
I wonder whether we would find substantial numbers of bugs.

=== Summary

- We require 50% branch coverage for alpha code and 80% coverage for
  beta code.

- Too much emphasis on branch coverage is dangerous.  You must also
  make sure that you are testing everything in the specification, and
  that your tests programs carefully confirm they are getting
  correct results.  Beyond this, you must remember that creating
  deviously malicious and thorough test programs will always be an art.
  Branch coverage just tells you what target code you have exercised.

- We have found branch coverage to be very useful as part of a balanced
  testing program.

Dean Thompson
Transarc Corporation
dt@transarc.com

marick@m.cs.uiuc.edu (09/24/90)

One addendum to Dean's note: I am at the University of Illinois, but
I'm a Motorola employee, here to do research in software testing.
Although the branch tool was not funded by Motorola (it came about
because two students needed an independent studies project), it would
not have happened without Motorola's support for me.  I think the
company deserves credit.

Brian Marick
Motorola @ University of Illinois
marick@cs.uiuc.edu, uiucdcs!marick
(Standard disclaimer:  my opinions are my own, etc.)