Dean_Thompson@TRANSARC.COM (09/20/90)
Brian Marick, at the University of Illinois, has recently announce the beta release of a testing tool called "btool". This tool instruments target code to find out which branch outcomes are exercised in a test run. We have been using btool in a commercial environment for several months now, and it just struck me that I should post a description of our experiences. This post is divided into sections: - Who I am - Why we used btool - How btool fits into our process - What we had to change in btool - How our developers reacted to btool - Summary === Who I am I am the manager of software engineering for Transarc's transaction processing development group. We are developing order 100,000 lines of new C code for Unix, divided into "components" of roughly 5,000 to 35,000 lines. Most of the components are libraries; a few are servers with RPC interfaces. Each component has a well-documented interface, and is developed by 1 to 5 people. === Why we used btool We have a lot of people working on testing, and most of them don't report to me. I needed to ensure that every component was tested thoroughly, but I would have triggered (well-justified) rebellion if I had imposed standards that were complex or not defensible in every detail. I saw a post from Brian about btool, asked him for a copy, and brought it up in our environment. I was pleased by btool's design and documentation. I decided that branch coverage is a simple, objective measure of thoroughness that was easy to apply and very useful. === How btool fits into our process Branch coverage is not a complete measure of thoroughness. The only thing a high branch coverage figure tells you is that your test program has executed a lot of your target code. Unless you know whether the target code has executed correctly, the branch coverage is completely useless. If you set standards for branch coverage, don't allow them to be met by test programs that don't thoroughly check their results! If you attain high branch coverage with a test program that checks its results carefully, you can have some confidence that the target code doesn't do incorrect things. You still have to make sure the target code does everything it is supposed to do. We enumerate everything we want to make sure the target code does in a concise "test checklist". Here are two excerpts from a marketing document that describe our testing standards in more detail: ---[ from the section on testing ]--- Transarc builds a full suite of self-checking tests for each system component and for the integrated system. Each test suite is planned against a "test checklist", which itemizes the inputs and scenarios that will be tested and the results that will be confirmed. The checklist is created early in the component's development and augmented over time by both testers and developers. It helps focus the test effort on efficient coverage of the entire component interface and scenarios that seem prone to failure. After the test suite has been developed, a "branch coverage tool" is used to instrument the target code. The test suite is expected to cover 80% of the possible branch outcomes in the target. The test suite is reviewed for satisfactory branch coverage and checklist coverage before the target code is released. ---[ from the section on reviews ]--- Transarc's managers, developers, and testers also hold release-readiness reviews before the "alpha," "beta," and "product" releases of each component and of the integrated system. The participants in a release-readiness review confirm that all functionality planned for the release has been implemented and that no new functionality has been added since the Alpha review. They check to see that the planned tests have been completed, that sufficient testing has been performed for the release, and that testing has not uncovered an excessive number of defects. They also make sure the component or system has met Transarc's standards for the specific release stage: - ALPHA. The test suite has a measured total branch coverage of at least 50%. All "basic" entries in the test checklist have been covered (other checklist categories include "error", "recovery", "stress", "performance", and "probe"). All code has been reviewed. The design document is current. An NLS strategy, trace strategy, and fatal error handling strategy are included in the design document. The fatal error handling strategy has been implemented. - BETA. The test suite has a measured total branch coverage of at least 80%. All "basic" and "error" entries in the test checklist have been covered. All reference documents have been reviewed by writers. All user documents have been reviewed by developers. The NLS strategy and trace strategy have been implemented. - PRODUCT. All "recovery", "stress", and "performance" entries in the test checklist have been covered. The product support plan has been reviewed. ------------------------------------- === What we had to change in btool We had two significant problems with btool: - It didn't work well for large bodies of code because it wasn't able to handle recompilation of a single file. - It didn't give us a way to get separate sets of output from two instrumented components in the same test run. We fixed these problems in our own copy of btool by changing it to generate a separate ".map" file for each ".c" file. We also extended the reporting facility to report on the subset of an output log that applies to a specified set of ".map" files. We also modified btool to compute our "percent branch coverage" figure, and to summarize results by file. === How our developers reacted to btool At first, developers resisted mandatory branch coverage levels because they feared that high branch coverage wouldn't correlate well with good testing. I told them that good testing required great creativity and would always be an art. I argued that we would always have to subjectively evaluate the quality of a test, but that measuring branch coverage would help us know how thorough the test was. We seem to have mostly stopped worrying about this. Perhaps my explanation had some effect, but I think mostly people learned with experience that an un-exercised branch usually did represent something concrete that wasn't tested. The next big worry was whether the 80% coverage requirement was too high. We still don't know the answer to this. We have achieved just under 80% coverage for one component of about 7,000 lines, and I don't feel we put excessive amounts of effort into testing that component. We have just broken 50% for another component about the same size, and those testers are saying that 70% might possibly be reasonable but that 80% will be absolutely infeasible. We will know in a month or so. One difficulty is that many branches represent only a few lines of code and are executed only when extremely unusual errors occur. I have told everyone that I am comfortable counting specific branches as "covered" if the developers can compellingly argue that these branches have been confirmed correct by inspection. I think it is extremely beneficial to be able to decide that some branches are worth testing by execution and others should be confirmed by very careful inspection. Sometime I'd like to try taking a component for which testing was complete and reviewing all branches that weren't covered. I wonder whether we would find substantial numbers of bugs. === Summary - We require 50% branch coverage for alpha code and 80% coverage for beta code. - Too much emphasis on branch coverage is dangerous. You must also make sure that you are testing everything in the specification, and that your tests programs carefully confirm they are getting correct results. Beyond this, you must remember that creating deviously malicious and thorough test programs will always be an art. Branch coverage just tells you what target code you have exercised. - We have found branch coverage to be very useful as part of a balanced testing program. Dean Thompson Transarc Corporation dt@transarc.com
marick@m.cs.uiuc.edu (09/24/90)
One addendum to Dean's note: I am at the University of Illinois, but I'm a Motorola employee, here to do research in software testing. Although the branch tool was not funded by Motorola (it came about because two students needed an independent studies project), it would not have happened without Motorola's support for me. I think the company deserves credit. Brian Marick Motorola @ University of Illinois marick@cs.uiuc.edu, uiucdcs!marick (Standard disclaimer: my opinions are my own, etc.)