sharam@munnari.oz.au (Sharam Hekmatpour) (10/12/89)
A number of people have commented about D. Knuth's "The Errors of TEX".
I read the paper and I think that he should be praised for such extreme
honesty. I find some of his suggestions very useful, but there also ones
which I find inappropriate if not shocking. This one more than any other:
"I found that structured programming greatly increased my confidence in the
correctness of the code, while the code still existed on paper. Therefore
I could wait until the whole program was written, before trying to debug any
of it. This saved a lot of time, because I did not have to prepare 'dummy'
versions of non-existent modules while testing modules that were already
written. I could test everything in its final environment..."
-- D. Knuth, The Errors of TEX, Software P&E, Vol 19(7).
This is the worst testing technique I have ever heard of.
Recently I finished writing a 40,000 line event-driven C program. I used
object-oriented design + structured programming. I agree that these methods
increase one's confidence in the 'correctness' of a program (and I should
also point out that OOD does much more in this respect than SP), but to
suggest that these methods justify the use of the Big Bang testing approach
is naive.
For my own program, I tested most modules first in isolation and then during
incremental integration. Of the 2 years that I spent on this program about 6
months of it was solid, systematic testing. Had I used Knuth's approach
I could have perhaps cut down this time to 3 months, but I doubt very
much if the whole thing would have worked in the end.
That Knuth's testing approach worked for TEX may be because of other
reasons: the nature of the application, its size, its complexity ???
But let's not forget that he decided to scrap the original version and
rewrite it from scratch. I wonder if Big Bang testing had something to
do with this.
I must admit that I have done Big Bang testing in the past. Perhaps that's
why I'm so much against it. It invariably lead to rewriting everything
from scratch.
A Big Bang scenario is usually something like this:
[1] Design and code with care and get more and more confident.
[2] Defer testing until everything can be put together. "Why should I waste
30 minutes writing stubs and testing 1 module when I can test 800 modules
in one go in 5 minutes?"
[3] Get real excited when the system is complete. "This is gonna work first
time."
[4] "Shit! It doesn't even compile."
[5] "At last! Compiled. Now it will run."
[6] "Damn! What could I have done wrong? I think I need a debugger cause I
have no idea what's causing these segmentation faults."
In the many weeks/months that the programmer spends wrestling with
the bugs, confidence goes downhill and impatience reaches its peak.
He would try anything just to see it working. His impatience
forces him to debug in front of a terminal, redesign and recode
many of the faulty pieces there and then. He has to see it working.
Tomorrow won't do. It has to be tonight. But one fix leads to
another and another and another...
[7] "At last! It runs."
But not for long. New bugs emerge everyday and it becomes increasingly
more difficult to patch them. What started as a good SP exercise is
now a beast. The design has become so unstable that it scares anyone
to touch it.
[8] "No, this won't do. I think now I have a pretty good idea how I should
have written it in the first place. Let's rewrite the whole thing."
The worst aspect of Big Bang is that it gives you no time to reconsider
your design. When it comes to testing, a whole army of design faults hit
you dead. You'll have no idea where to begin. The bugs are so many and
so interdependent that you don't know what's causing what. From this point
on it's all hacking. This is where false confidence gets you.
Systematic testing, however, gives you justified confidence. When it comes
to integration, at least you have a good idea how reliable each component
is. There will be less errors and these can be quickly located and corrected.
To sum up my argument: no matter how good your design and coding techniques,
NEVER have confidence in a component until you have fully tested it.
Good design and good testing go hand-in-hand. Trying to cut corners with
one damages the other.
+---
| Sharam Hekmatpour <sharam@munnari.oz.au> |
Melbourne University ---+
eugene@eos.UUCP (Eugene Miya) (10/13/89)
Is any one sending this thread of software engineering discussion to Don? Another gross generalization from --eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov resident cynic at the Rock of Ages Home for Retired Hackers: "You trust the `reply' command with all those different mailers out there?" "If my mail does not reach you, please accept my apology." {ncar,decwrl,hplabs,uunet}!ames!eugene Support the Free Software Foundation (FSF)
mjl@cs.rit.edu (10/13/89)
In article <2402@munnari.oz.au> sharam@munnari.oz.au (Sharam Hekmatpour) writes: [Comments about Don Knuth's version of Big Bang testing the whole TEX program at the end.] >The worst aspect of Big Bang is that it gives you no time to reconsider >your design. When it comes to testing, a whole army of design faults hit >you dead. You'll have no idea where to begin. The bugs are so many and >so interdependent that you don't know what's causing what. From this point >on it's all hacking. This is where false confidence gets you. > At the risk of oversimplification, this "Big Bang" approach has been packaged by Harlan Mills, et. al., under the name of Clean Room Software Engineering. Software products are never tested, or even compiled, by the designers and implementers; instead, these folks spend their time on rigorous verification of the design and implementation. Testing is done, but by a separate group, and with the goal being reliability prediction and statistical quality control rather than defect detection and removal. The claim is that large software systems with defect rates way below industrial average have been developed this way. But it does require a rigorous approach, and of course testing is still performed. Reference: "Cleanroom Software Engineering", Mills, Dyer, and Linger, IEEE Software, September 1987. > >+--- >| Sharam Hekmatpour <sharam@munnari.oz.au> | > Melbourne University ---+ Mike Lutz Rochester Institute of Technology, Rochester NY UUCP: {rutgers,cornell}!rochester!ritcv!mjl CSNET: mjl%rit@relay.cs.net INTERNET: mjl@cs.rit.edu
edschulz@cbnewsj.ATT.COM (edward.d.schulz) (10/14/89)
In article <2402@munnari.oz.au>, sharam@munnari.oz.au (Sharam Hekmatpour) writes: > > A number of people have commented about D. Knuth's "The Errors of TEX". > I read the paper and I think that he should be praised for such extreme > honesty. I find some of his suggestions very useful, but there also ones > which I find inappropriate if not shocking. This one more than any other: > > "I found that structured programming greatly increased my confidence in the > correctness of the code, while the code still existed on paper. Therefore > I could wait until the whole program was written, before trying to debug any > of it. This saved a lot of time, because I did not have to prepare 'dummy' > versions of non-existent modules while testing modules that were already > written. I could test everything in its final environment..." > -- D. Knuth, The Errors of TEX, Software P&E, Vol 19(7). > > This is the worst testing technique I have ever heard of. This is not shocking; it's consistent with the experiences of people who use one of the best testing techniques I have ever heard of. I suggest you read the papers on Cleanroom Software Engineering (some references follow). I spent a week this summer studying with Harlan Mills and Richard Linger, who are at the leading edge of this technology. My organization has not yet used Cleanroom Engineering, but I might be able to answer any questions beyond my notes here. The Cleanroom process does not call for a "big-bang" test of the entire system (more like a few medium bangs?), but does emphasize the prevention of errors to begin with, rather than removing them later. Using formal design methods and mathematics-based functional verification by humans, people have demonstrated the ability to create nearly defect-free software before any execution or debugging, less than five defects per thousand lines of code. Several significant software systems at IBM Federal Systems Division (one example: 80 KLOC) have been developed in the Cleanroom discipline, with no debugging before usage testing and reliability certification. Smaller student Cleanroom projects have been successful at the Universities of Maryland, Tennessee, and Florida. > To sum up my argument: no matter how good your design and coding techniques, > NEVER have confidence in a component until you have fully tested it. "Testing can show the presence of bugs, not their absence." - E. W. Dijkstra "No matter how intelligently conceived, reliability evidence of coverage testing is entirely anecdotal, not scientific." - Harlan Mills The Cleanroom method employs statistically based independent testing of each increment of the program, where an increment is typically 5K to 20K new source lines. The interfailure times during such testing are fed into a reliability model to track the reliability growth of the program during development. You can never "fully test" any interesting piece of software. Here are some references for those who wish to learn more: [1] H. D. Mills, M. Dyer, and R. C. Linger, "Cleanroom Software Engineering," IEEE Software, September 1987, pp. 19-25. [2] R. W. Selby, V. R. Basili, and F. Terry Baker, "Cleanroom Software Development: An Empirical Evaluation," IEEE Transactions on Software Engineering, Vol. SE-13, No. 9, September 1987, pp. 1027-1037. [3] Harlan D. Mills and J. H. Poore, "Bringing Software Under Statistical Quality Control," Quality Progress, November 1988, pp. 52-55. [4] R. C. Linger and H. D. Mills, "A Case Study in Cleanroom Software Engineering: The IBM COBOL Structuring Facility," Proceedings of COMPSAC '88, IEEE Computer Society Press, 1988. Does anyone else on the net have any experience, impressions, opinions, etc. on this topic? I'd like to hear some discussion... -- -- Ed Schulz, AT&T, Room 2P276 200 Laurel Ave., Middletown, NJ 07748 +1 201 957 3899 e_d_schulz@att.com or eds@mtdcb.att.com
rcd@ico.ISC.COM (Dick Dunn) (10/14/89)
sharam@munnari.oz.au (Sharam Hekmatpour) writes about D. Knuth's "The Errors of TEX": > ...I find some of his suggestions very useful, but there also ones > which I find inappropriate if not shocking. This one more than any other: >...I could wait until the whole program was written, before trying to debug any > of it. This saved a lot of time, because I did not have to prepare 'dummy' > versions of non-existent modules... I could test everything in its > final environment..." >...This is the worst testing technique I have ever heard of. I tend to agree that it sounds pretty awful...and I've experienced it. I was once called in to fire-fight on a project where a large program had been going for months and months of debugging without getting to even the rudimentary stages of doing bits of useful work. The debugging approach was "bang on it 'til it breaks, then find it and fix it." What was hap- penning, quite predictably, was that there were bugs at all levels of the code down to the most basic utility routines (like I/O and storage manage- ment). When something caved in, there was no easy way to isolate it. There was no way to make a decent probability-guess on where to look, because ALL of the code was suspect. Multiple errors at different levels masking one another was the rule, not the exception. Still, Knuth is not a fool, and TeX is a serious program. I think it would be better to look at WHY this technique seemed to work for him, instead of just recoiling in horror. (It's OK to recoil in horror first...but then look beyond it.:-) > But let's not forget that he decided to scrap the original version and > rewrite it from scratch. I wonder if Big Bang testing had something to > do with this. I tend to doubt it. Usually if you decided to rewrite from scratch, it's to do a major reworking of the program structure. Program structure isn't (or shouldn't be) much affected by testing technique; the structure is all set by the time testing comes around. Most programs of any size need to be scrapped and rewritten from scratch. The first version teaches you the things you needed to know but couldn't find out just by thinking about the problem. With reasonable luck, the first version also keeps you going with the breathing room to write the second version. -- Dick Dunn rcd@ico.isc.com uucp: {ncar,nbires}!ico!rcd (303)449-2870 ...No DOS. UNIX.
gibsong@calvin-klein.UUCP (Greggo) (10/14/89)
In article <2402@munnari.oz.au>, sharam@munnari.oz.au (Sharam Hekmatpour) writes (among other things): > > The worst aspect of Big Bang is that it gives you no time to reconsider > your design. When it comes to testing, a whole army of design faults hit > you dead. You'll have no idea where to begin. The bugs are so many and > so interdependent that you don't know what's causing what. From this point > on it's all hacking. This is where false confidence gets you. > > Systematic testing, however, gives you justified confidence. When it comes > to integration, at least you have a good idea how reliable each component > is. There will be less errors and these can be quickly located and corrected. > > [ and from earlier in the posting: ] > > For my own program, I tested most modules first in isolation and then during > incremental integration. Of the 2 years that I spent on this program about 6 > months of it was solid, systematic testing. Had I used Knuth's approach > I could have perhaps cut down this time to 3 months, but I doubt very > much if the whole thing would have worked in the end. > It sounds to me as if your method and Knuth's achieve much the same end, but that your "redesign of the system" occurs little by little as you test each piece, find a problem, correct the code, retest, etc. Whereas Knuth used more of a strict waterfall model of "design all, then test all, then redesign all", your method is more iterative: "design a little, test a little, redesign that, design some more, ...". That's why your estimate of testing with Knuth's method was 1/2 of what you actually spent; you were indeed testing 2 systems (one original, one redesigned) instead of one. I personally prefer the iterative method, mainly because my threshold of doing the same activity (design, test, whatever) for a long time (several months) is much lower than if I mix up my work and keep some variety. Note, however, that to meet time-to-market and competitive demands (as discussed in other postings), the iterative method can mean longer initial development intervals (although with cheaper maintenance), so many companies still stick to the waterfall model to get a product out and make a little moola (at the expense of greater maintenance expense later).
cweir@richsun.UUCP (Charles Weir) (10/19/89)
In article <2402@munnari.oz.au> sharam@munnari.oz.au (Sharam Hekmatpour) writes: > >A number of people have commented about D. Knuth's "The Errors of TEX". > [...] Therefore > I could wait until the whole program was written, before trying to debug any > of it. This saved a lot of time, because I did not have to prepare 'dummy' > versions of non-existent modules while testing modules that were already > written. I could test everything in its final environment..." > -- D. Knuth, The Errors of TEX, Software P&E, Vol 19(7). > >This is the worst testing technique I have ever heard of. > [....] That's why I'm so much against it. > It invariably leads to rewriting everything from scratch. Yes, and why? Part of the problem is just our own human limitations. It is possible to keep us to 9 (nine) things in our minds at one time. A well-designed module with a limited number of interfaces will have of this order of things (interfaces, globals, whatever) to think about and match and test. So we can cope with modules. A big bang - everything at once - debug session needs the engineer to keep dozens, if not thousands of things in mind at once. Impossible. So it doesn't work. Charles Weir Own Opinions...
robert@isgtec.UUCP (Robert Osborne) (10/23/89)
In article <624@richsun.UUCP> cweir@richsun.UUCP (Charles Weir) writes: >Part of the problem is just our own human limitations. It is possible >to keep us to 9 (nine) things in our minds at one time. A >well-designed module with a limited number of interfaces will have of >this order of things (interfaces, globals, whatever) to think about and >match and test. So we can cope with modules. Debugging usually consists of finding undesired behaviour and thinking "where could that have happened". Being able to track 7-9 things in our short term memory shouldn't really affect that process. >A big bang - everything at once - debug session needs the engineer to >keep dozens, if not thousands of things in mind at once. > >Impossible. > >So it doesn't work. I have found that systems with a large percentage of user interface are best tested with the "big bang" method. Since interaction of the various user controlled subsystems are the source of most of bugs, it is best to get all the subsystems running together as soon as possible. Having each subsystem working perfectly on it's own doesn't really get you that much closer the end of the project. An advantage of not using the "big bang" method is that you get a constant mix of development and debug, and (hopefully) won't get bored of either one. Rob. -- Robert A. Osborne ...uunet!mnetor!lsuc!isgtec!robert (Nice sig Bruce mind if I steal it :-) ...utzoo!lsuc!isgtec!robert ISG Technologies Inc. 3030 Orlando Dr. Mississauga. Ont. Can. L4V 1S8