[comp.software-eng] The Errors of TEX

sharam@munnari.oz.au (Sharam Hekmatpour) (10/12/89)

A number of people have commented about D. Knuth's "The Errors of TEX".
I read the paper and I think that he should be praised for such extreme
honesty. I find some of his suggestions very useful, but there also ones
which I find inappropriate if not shocking. This one more than any other:

  "I found that structured programming greatly increased my confidence in the
   correctness of the code, while the code still existed on paper. Therefore
   I could wait until the whole program was written, before trying to debug any
   of it. This saved a lot of time, because I did not have to prepare 'dummy'
   versions of non-existent modules while testing modules that were already
   written. I could test everything in its final environment..."
	-- D. Knuth, The Errors of TEX, Software P&E, Vol 19(7).

This is the worst testing technique I have ever heard of.
Recently I finished writing a 40,000 line event-driven C program. I used
object-oriented design + structured programming. I agree that these methods
increase one's confidence in the 'correctness' of a program (and I should
also point out that OOD does much more in this respect than SP), but to
suggest that these methods justify the use of the Big Bang testing approach
is naive.

For my own program, I tested most modules first in isolation and then during
incremental integration. Of the 2 years that I spent on this program about 6
months of it was solid, systematic testing. Had I used Knuth's approach
I could have perhaps cut down this time to 3 months, but I doubt very
much if the whole thing would have worked in the end.

That Knuth's testing approach worked for TEX may be because of other
reasons: the nature of the application, its size, its complexity ???
But let's not forget that he decided to scrap the original version and
rewrite it from scratch. I wonder if Big Bang testing had something to
do with this.

I must admit that I have done Big Bang testing in the past. Perhaps that's
why I'm so much against it. It invariably lead to rewriting everything
from scratch.

A Big Bang scenario is usually something like this:

[1] Design and code with care and get more and more confident.

[2] Defer testing until everything can be put together. "Why should I waste
    30 minutes writing stubs and testing 1 module when I can test 800 modules
    in one go in 5 minutes?"

[3] Get real excited when the system is complete. "This is gonna work first
    time."

[4] "Shit! It doesn't even compile."

[5] "At last! Compiled. Now it will run."

[6] "Damn! What could I have done wrong? I think I need a debugger cause I
     have no idea what's causing these segmentation faults."
     In the many weeks/months that the programmer spends wrestling with
     the bugs, confidence goes downhill and impatience reaches its peak.
     He would try anything just to see it working. His impatience
     forces him to debug in front of a terminal, redesign and recode
     many of the faulty pieces there and then. He has to see it working.
     Tomorrow won't do. It has to be tonight. But one fix leads to
     another and another and another...

[7] "At last! It runs."
     But not for long. New bugs emerge everyday and it becomes increasingly
     more difficult to patch them. What started as a good SP exercise is
     now a beast. The design has become so unstable that it scares anyone
     to touch it.

[8] "No, this won't do. I think now I have a pretty good idea how I should
     have written it in the first place. Let's rewrite the whole thing."

The worst aspect of Big Bang is that it gives you no time to reconsider
your design. When it comes to testing, a whole army of design faults hit
you dead. You'll have no idea where to begin. The bugs are so many and
so interdependent that you don't know what's causing what. From this point
on it's all hacking. This is where false confidence gets you.

Systematic testing, however, gives you justified confidence. When it comes
to integration, at least you have a good idea how reliable each component
is. There will be less errors and these can be quickly located and corrected.

To sum up my argument: no matter how good your design and coding techniques,
NEVER have confidence in a component until you have fully tested it.
Good design and good testing go hand-in-hand. Trying to cut corners with
one damages the other.

+---
| Sharam Hekmatpour <sharam@munnari.oz.au> |
  Melbourne University			---+

eugene@eos.UUCP (Eugene Miya) (10/13/89)

Is any one sending this thread of software engineering discussion to Don?

Another gross generalization from

--eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov
  resident cynic at the Rock of Ages Home for Retired Hackers:
  "You trust the `reply' command with all those different mailers out there?"
  "If my mail does not reach you, please accept my apology."
  {ncar,decwrl,hplabs,uunet}!ames!eugene
  		Support the Free Software Foundation (FSF)

mjl@cs.rit.edu (10/13/89)

In article <2402@munnari.oz.au> sharam@munnari.oz.au (Sharam Hekmatpour) writes:

[Comments about Don Knuth's version of Big Bang testing the whole TEX program
at the end.]

>The worst aspect of Big Bang is that it gives you no time to reconsider
>your design. When it comes to testing, a whole army of design faults hit
>you dead. You'll have no idea where to begin. The bugs are so many and
>so interdependent that you don't know what's causing what. From this point
>on it's all hacking. This is where false confidence gets you.
>

At the risk of oversimplification, this "Big Bang" approach has been
packaged by Harlan Mills, et. al., under the name of Clean Room
Software Engineering.  Software products are never tested, or even
compiled, by the designers and implementers; instead, these folks spend
their time on rigorous verification of the design and implementation.
Testing is done, but by a separate group, and with the goal being
reliability prediction and statistical quality control rather than
defect detection and removal.  The claim is that large software systems
with defect rates way below industrial average have been developed this
way.  But it does require a rigorous approach, and of course testing is
still performed.

Reference: "Cleanroom Software Engineering", Mills, Dyer, and Linger,
IEEE Software, September 1987.

>
>+---
>| Sharam Hekmatpour <sharam@munnari.oz.au> |
>  Melbourne University			---+

Mike Lutz	Rochester Institute of Technology, Rochester NY
UUCP:		{rutgers,cornell}!rochester!ritcv!mjl
CSNET:		mjl%rit@relay.cs.net
INTERNET:	mjl@cs.rit.edu

edschulz@cbnewsj.ATT.COM (edward.d.schulz) (10/14/89)

In article <2402@munnari.oz.au>, sharam@munnari.oz.au (Sharam Hekmatpour) writes:
> 
> A number of people have commented about D. Knuth's "The Errors of TEX".
> I read the paper and I think that he should be praised for such extreme
> honesty. I find some of his suggestions very useful, but there also ones
> which I find inappropriate if not shocking. This one more than any other:
> 
>   "I found that structured programming greatly increased my confidence in the
>    correctness of the code, while the code still existed on paper. Therefore
>    I could wait until the whole program was written, before trying to debug any
>    of it. This saved a lot of time, because I did not have to prepare 'dummy'
>    versions of non-existent modules while testing modules that were already
>    written. I could test everything in its final environment..."
> 	-- D. Knuth, The Errors of TEX, Software P&E, Vol 19(7).
> 
> This is the worst testing technique I have ever heard of.

This is not shocking; it's consistent with the experiences of people who
use one of the best testing techniques I have ever heard of.

I suggest you read the papers on Cleanroom Software Engineering (some
references follow).  I spent a week this summer studying with Harlan
Mills and Richard Linger, who are at the leading edge of this
technology.  My organization has not yet used Cleanroom Engineering, but
I might be able to answer any questions beyond my notes here.

The Cleanroom process does not call for a "big-bang" test of the entire
system (more like a few medium bangs?), but does emphasize the
prevention of errors to begin with, rather than removing them later.
Using formal design methods and mathematics-based functional
verification by humans, people have demonstrated the ability to create
nearly defect-free software before any execution or debugging, less than
five defects per thousand lines of code.  Several significant software
systems at IBM Federal Systems Division (one example: 80 KLOC) have been
developed in the Cleanroom discipline, with no debugging before usage
testing and reliability certification.  Smaller student Cleanroom
projects have been successful at the Universities of Maryland,
Tennessee, and Florida.

> To sum up my argument: no matter how good your design and coding techniques,
> NEVER have confidence in a component until you have fully tested it.

"Testing can show the presence of bugs, not their absence." - E. W.
Dijkstra

"No matter how intelligently conceived, reliability evidence of coverage
testing is entirely anecdotal, not scientific." - Harlan Mills

The Cleanroom method employs statistically based independent testing of
each increment of the program, where an increment is typically 5K to 20K
new source lines. The interfailure times during such testing are fed
into a reliability model to track the reliability growth of the program
during development. You can never "fully test" any interesting piece of
software.

Here are some references for those who wish to learn more:

[1] H. D. Mills, M. Dyer, and R. C. Linger, "Cleanroom Software
    Engineering," IEEE Software, September 1987, pp. 19-25.

[2] R. W. Selby, V. R. Basili, and F. Terry Baker, "Cleanroom Software
    Development: An Empirical Evaluation," IEEE Transactions on Software
    Engineering, Vol. SE-13, No. 9, September 1987, pp. 1027-1037.

[3] Harlan D. Mills and J. H. Poore, "Bringing Software Under Statistical
    Quality Control," Quality Progress, November 1988, pp. 52-55.

[4] R. C. Linger and H. D. Mills, "A Case Study in Cleanroom Software
    Engineering: The IBM COBOL Structuring Facility," Proceedings of
    COMPSAC '88, IEEE Computer Society Press, 1988.

Does anyone else on the net have any experience, impressions, opinions,
etc. on this topic?  I'd like to hear some discussion...
-- 
--
Ed Schulz, AT&T, Room 2P276 200 Laurel Ave., Middletown, NJ 07748
+1 201 957 3899     e_d_schulz@att.com    or    eds@mtdcb.att.com

rcd@ico.ISC.COM (Dick Dunn) (10/14/89)

sharam@munnari.oz.au (Sharam Hekmatpour) writes about D. Knuth's "The
Errors of TEX":
> ...I find some of his suggestions very useful, but there also ones
> which I find inappropriate if not shocking. This one more than any other:
>...I could wait until the whole program was written, before trying to debug any
>    of it. This saved a lot of time, because I did not have to prepare 'dummy'
>    versions of non-existent modules... I could test everything in its
>    final environment..."

>...This is the worst testing technique I have ever heard of.

I tend to agree that it sounds pretty awful...and I've experienced it.  I
was once called in to fire-fight on a project where a large program had
been going for months and months of debugging without getting to even the
rudimentary stages of doing bits of useful work.  The debugging approach
was "bang on it 'til it breaks, then find it and fix it."  What was hap-
penning, quite predictably, was that there were bugs at all levels of the
code down to the most basic utility routines (like I/O and storage manage-
ment).  When something caved in, there was no easy way to isolate it.
There was no way to make a decent probability-guess on where to look,
because ALL of the code was suspect.  Multiple errors at different levels
masking one another was the rule, not the exception.

Still, Knuth is not a fool, and TeX is a serious program.  I think it would
be better to look at WHY this technique seemed to work for him, instead of
just recoiling in horror.  (It's OK to recoil in horror first...but then
look beyond it.:-)

> But let's not forget that he decided to scrap the original version and
> rewrite it from scratch. I wonder if Big Bang testing had something to
> do with this.

I tend to doubt it.  Usually if you decided to rewrite from scratch, it's
to do a major reworking of the program structure.  Program structure isn't
(or shouldn't be) much affected by testing technique; the structure is all
set by the time testing comes around.

Most programs of any size need to be scrapped and rewritten from scratch.
The first version teaches you the things you needed to know but couldn't
find out just by thinking about the problem.  With reasonable luck, the
first version also keeps you going with the breathing room to write the
second version.
-- 
Dick Dunn     rcd@ico.isc.com    uucp: {ncar,nbires}!ico!rcd     (303)449-2870
   ...No DOS.  UNIX.

gibsong@calvin-klein.UUCP (Greggo) (10/14/89)

In article <2402@munnari.oz.au>, sharam@munnari.oz.au (Sharam Hekmatpour) writes
(among other things):
> 
> The worst aspect of Big Bang is that it gives you no time to reconsider
> your design. When it comes to testing, a whole army of design faults hit
> you dead. You'll have no idea where to begin. The bugs are so many and
> so interdependent that you don't know what's causing what. From this point
> on it's all hacking. This is where false confidence gets you.
> 
> Systematic testing, however, gives you justified confidence. When it comes
> to integration, at least you have a good idea how reliable each component
> is. There will be less errors and these can be quickly located and corrected.
> 
> [ and from earlier in the posting: ]
>
> For my own program, I tested most modules first in isolation and then during
> incremental integration. Of the 2 years that I spent on this program about 6
> months of it was solid, systematic testing. Had I used Knuth's approach
> I could have perhaps cut down this time to 3 months, but I doubt very
> much if the whole thing would have worked in the end.
> 

It sounds to me as if your method and Knuth's achieve much the same end, but
that your "redesign of the system" occurs little by little as you test each
piece, find a problem, correct the code, retest, etc.  Whereas Knuth used
more of a strict waterfall model of "design all, then test all, then redesign
all", your method is more iterative: "design a little, test a little, redesign
that, design some more, ...".  That's why your estimate of testing with
Knuth's method was 1/2 of what you actually spent; you were indeed testing 2
systems (one original, one redesigned) instead of one.

I personally prefer the iterative method, mainly because my threshold of doing
the same activity (design, test, whatever) for a long time (several months)
is much lower than if I mix up my work and keep some variety.  Note, however,
that to meet time-to-market and competitive demands (as discussed in other
postings), the iterative method can mean longer initial development intervals
(although with cheaper maintenance), so many companies still stick to the
waterfall model to get a product out and make a little moola (at the expense
of greater maintenance expense later).

cweir@richsun.UUCP (Charles Weir) (10/19/89)

In article <2402@munnari.oz.au> sharam@munnari.oz.au (Sharam Hekmatpour) writes:
>
>A number of people have commented about D. Knuth's "The Errors of TEX".
> [...] Therefore
>   I could wait until the whole program was written, before trying to debug any
>   of it. This saved a lot of time, because I did not have to prepare 'dummy'
>   versions of non-existent modules while testing modules that were already
>   written. I could test everything in its final environment..."
>	-- D. Knuth, The Errors of TEX, Software P&E, Vol 19(7).
>
>This is the worst testing technique I have ever heard of.
> [....]  That's why I'm so much against it. 
> It invariably leads to rewriting everything from scratch.

Yes, and why?

Part of the problem is just our own human limitations.   It is possible
to keep us to 9 (nine) things in our minds at one time.   A
well-designed module with a limited number of interfaces will have of
this order of things (interfaces, globals, whatever) to think about and
match and test.   So we can cope with modules.

A big bang - everything at once - debug session needs the engineer to
keep dozens, if not thousands of things in mind at once.

Impossible.

So it doesn't work.

Charles Weir

Own Opinions...

robert@isgtec.UUCP (Robert Osborne) (10/23/89)

In article <624@richsun.UUCP> cweir@richsun.UUCP (Charles Weir) writes:
>Part of the problem is just our own human limitations.   It is possible
>to keep us to 9 (nine) things in our minds at one time.   A
>well-designed module with a limited number of interfaces will have of
>this order of things (interfaces, globals, whatever) to think about and
>match and test.   So we can cope with modules.
Debugging usually consists of finding undesired behaviour and thinking
"where could that have happened".  Being able to track 7-9 things in
our short term memory shouldn't really affect that process.

>A big bang - everything at once - debug session needs the engineer to
>keep dozens, if not thousands of things in mind at once.
>
>Impossible.
>
>So it doesn't work.
I have found that systems with a large percentage of user interface are
best tested with the "big bang" method.  Since interaction of the various
user controlled subsystems are the source of most of bugs,  it is best to
get all the subsystems running together as soon as possible.  Having each
subsystem working perfectly on it's own doesn't really get you that much
closer the end of the project.

An advantage of not using the "big bang" method is that you get a constant
mix of development and debug, and (hopefully) won't get bored of either
one.

Rob.
-- 
Robert A. Osborne                  ...uunet!mnetor!lsuc!isgtec!robert
(Nice sig Bruce mind if I steal it :-)    ...utzoo!lsuc!isgtec!robert
ISG Technologies Inc. 3030 Orlando Dr. Mississauga. Ont. Can. L4V 1S8