RISKS@CSL.SRI.COM (RISKS FORUM, Peter G. Neumann -- Coordinator) (09/08/86)
RISKS-LIST: RISKS-FORUM Digest, Sunday, 7 September 1986 Volume 3 : Issue 50 FORUM ON RISKS TO THE PUBLIC IN COMPUTER SYSTEMS ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator Contents: Enlightened Traffic Management (Alan Wexelblat) Flight Simulator Simulators Have Faults (Dave Benson) Re: Flight Simulators and Software Bugs (Bjorn Freeman-Benson) Always Mount a Scratch Monkey (Art Evans) Re: supermarket crashes (Jeffrey Mogul) Machine errors - another point of view (Bob Estell) Human Behv. & FSM's (Robert DiCamillo) The RISKS Forum is moderated. Contributions should be relevant, sound, in good taste, objective, coherent, concise, nonrepetitious. Diversity is welcome. (Contributions to RISKS@CSL.SRI.COM, Requests to RISKS-Request@CSL.SRI.COM) (Back issues Vol i Issue j available in CSL.SRI.COM:<RISKS>RISKS-i.j. Summary Contents in MAXj for each i; Vol 1: RISKS-1.46; Vol 2: RISKS-2.57.) ---------------------------------------------------------------------- Date: Thu, 4 Sep 86 09:58:49 CDT From: Alan Wexelblat <wex@mcc.com> To: risks@csl.sri.com Subject: Enlightened Traffic Management The Austin rag carried the following brief item off the AP wire: NEW DELHI, India (AP) - The computer lost the battle with the commuter. "Enlightened traffic management" was the term for New Delhi's new computerized bus routes, but four days of shattered windows, deflated tires and protest marches convinced the bus company that its computer was wrong. The routes dictated by the computer proved exceedingly unpopular with passengers, who claimed that they were not being taken where they wanted to go. Bowing to demand, the New Delhi Transport Corp. scrapped the new "rationalized" routes and restored 114 old routes. "The computer has failed," shouted thousands of victorious commuters in eastern New Delhi Tuesday night after transport officials drove around in jeeps, using loudspeakers to announce the return of the old routes. COMMENTS: At first, I thought this was pretty amusing; deflated tires is a computer risk I hadn't heard of before. But the whole attitude of the article (and seemingly the people) annoyed me. The machine is taking the rap and I'll bet that the idiot who programmed it to produce "optimal" routes will get off scott free. Not to mention the company execs who failed to understand their customer base and allowed the computer to "dictate" new routes. ARGH! Alan Wexelblat UUCP: {seismo, harvard, gatech, pyramid, &c.}!ut-sally!im4u!milano!wex ------------------------------ Date: Wed, 3 Sep 86 17:01:17 pdt From: Dave Benson <benson%wsu.csnet@CSNET-RELAY.ARPA> To: risks%csl.sri.com@CSNET-RELAY.ARPA Subject: Flight Simulator Simulators Have Faults |I developed flight simulators for over 7 years and could describe many such |bizarre incidents. Might be interesting for RISKS if these suggest problems in developing risk-free software... |To point out a failure during |testing (or more likely development) seems meaningless. Failures that make |it into the actual product are what should be of concern. I do not agree. We need to understand that the more faults found at any stage to engineering software the less confidence one has in the final product. The more faults found, the higher the likelyhood that faults remain. I simply mentioned this one because it appears to demonstrate that for all the claims made for careful analysis and review of requirements and design, in fact the current practice leaves such obvious faults to be found by testing. |As for the effectiveness of simulators... Simulators are wonderful. Surely nothing I wrote suggested otherwise. Upon further inquiry, the blank sky was in a piece of software used to simulate the flight simulator hardware. The software specs essentially duplicated the functions proposed for the hardware. So the hardware was going to take the trigonmetric tangent of the pitch angle. The software simulator of the flight simulator indeed demonstrated that one ought not take the tangent of 90 degrees. So somebody with presumably a good background in engineering mathematics simply failed to think through the most immediate consequences of the trigonometric tangent function. Nobody noticed this in any kind of review, nobody THOUGHT about it at all. Since nobody bothered to think, the fault was found by writing a computer program and then observing the obvious. I suggest that this inability to think bodes ill for the practice of software engineering and the introduction of "advanced techniques" such as fault-tree analysis. I suggest that such examples of a pronounced inattention to well-known mathematics are part of the reason for such lengthy testing sequences as the military requires. And I suggest that the fact that it appears necessary to mention all this yet once again suggests that there are many people doing "software engineering" who have failed to grasp what a higher education is supposed to be about. I certainly do not expect perfection, but the trigonometric tangent is an example of an elementary function. ------------------------------ Date: Fri, 5 Sep 86 10:02:38 PDT From: bnfb@uw-june.arpa (Bjorn Freeman-Benson) To: RISKS@CSL.SRI.COM Subject: Re: Flight Simulators and Software Bugs In RISKS-3.48, Gary Whisenhunt talks about how he developed flight simulators and that he "..seriously doubt[s] that the sky went blank in the B-1 simulator when it was delivered to the government." And then he goes on to point out all the specs it had to pass. I don't know no way or the other, but I want to point out that the sky going blank points out either a design problem or an implementation problem. If it is a design problem, who knows how many other serious (sky blanking serious) problems exist? Will the MIL standards catch them all? If it is an implementation error, who knows how many other similar coding errors that sloppy/tired/etc engineer made? If it's a sign problem, what happens when you back the plane up? Will it go into an infinite speed reverse? The point I'm trying to make is that bugs are not independent, and if one shows up, other similar are usually in existence. Bjorn N Freeman-Benson U of Washington, Comp Sci ------------------------------ Date: Wed 3 Sep 86 16:46:31-EDT From: "Art Evans" <Evans@TL-20B.ARPA> Subject: Always Mount a Scratch Monkey To: Risks@CSL.SRI.COM In another forum that I follow, one corespondent always adds the comment Always Mount a Scratch Monkey after his signature. In response to a request for explanation, he replied somewhat as follows. Since I'm reproducing without permission, I have disguised a few things. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ My friend Bud used to be the intercept man at a computer vendor for calls when an irate customer called. Seems one day Bud was sitting at his desk when the phone rang. Bud: Hello. Voice: YOU KILLED MABEL!! B: Excuse me? V: YOU KILLED MABEL!! This went on for a couple of minutes and Bud was getting nowhere, so he decided to alter his approach to the customer. B: HOW DID I KILL MABEL? V: YOU PM'ED MY MACHINE!! Well to avoid making a long story even longer, I will abbreviate what had happened. The customer was a Biologist at the University of Blah-de-blah, and he had one of our computers that controlled gas mixtures that Mabel (the monkey) breathed. Now Mabel was not your ordinary monkey. The University had spent years teaching Mabel to swim, and they were studying the effects that different gas mixtures had on her physiology. It turns out that the repair folks had just gotten a new Calibrated Power Supply (used to calibrate analog equipment), and at their first opportunity decided to calibrate the D/A converters in that computer. This changed some of the gas mixtures and poor Mabel was asphyxiated. Well Bud then called the branch manager for the repair folks: Manager: Hello B: This is Bud, I heard you did a PM at the University of Blah-de-blah. M: Yes, we really performed a complete PM. What can I do for You? B: Can You Swim? The moral is, of course, that you should always mount a scratch monkey. ~~~~~~~~~~~~~~~~~~~~~~ There are several morals here related to risks in use of computers. Examples include, "If it ain't broken, don't fix it." However, the cautious philosophical approach implied by "always mount a scratch monkey" says a lot that we should keep in mind. Art Evans Tartan Labs ------------------------------ From: mogul@decwrl.DEC.COM (Jeffrey Mogul) Date: 4 Sep 1986 1614-PDT (Thursday) To: risks@csl.sri.com Subject: Re: supermarket crashes One of the nearby Safeway supermarkets is open 24 hours, and is quite popular with late-night shoppers (it's known by some as the "Singles Safeway"). Smart shoppers, however, used to avoid visiting just before midnight, because that's when all the cash registers were out of operation while they went through some sort of ritual (daily balances or somesuch), simultaneously. I also discovered that this market, at least, is not immune to power failures; I was buying a quart of milk one evening when a brief blackout hit the area. The lights were restored within minutes, but the computer was dead and the cashiers "knew" it would be a long time before it would be up; they weren't about to waste their fortuitous coffee-break adding things up by hand, perhaps because they couldn't even tell the price of anything (or indeed, what it was, in the case of produce) without the computer. I don't often shop at that market, partly because the markets I do use have cashiers who know what things are rather than relying on the computer. Some day, just for fun, I might mark a pound of pecans with the code number for walnuts, and see if I can save some money. ------------------------------ Date: Thu, 4 Sep 1986 21:27 EDT From: LENOIL@XX.LCS.MIT.EDU To: "SEFB::ESTELL" <estell%sefb.decnet@NWC-143B.ARPA> Cc: risks <risks@CSL.SRI.COM> Subject: Machine errors - another point of view A "machine" as seen by the applications programmer, is already several layers [raw hardware, microcode, operating system kernel, run-time libraries, compiler]; and each layer is perhaps nearly a million pieces [IC's, lines of (micro)code] that may interact with nearly a million other pieces in other layers. Interaction between one million pieces of a system is more than just an exaggeration, it is horrendous engineering practice that should never be seen. Flow-graphs, dependency diagrams, top-down design - all are ways of reducing interaction between system components to a small, manageable size - the smaller the better. The probability of designing a working system of one million fully-connected components is near-zero. Furthermore, you seem to imply that component interconnects can transcend abstraction boundaries (e.g. microcode <-> run-time libraries); this again is poor engineering practice. I don't disagree that rising system complexity is a problem today, but you are several orders of magnitude off in your statement of the problem. Robert Lenoil ------------------------------ Date: Fri, 5 Sep 86 16:27:45 EDT From: Robert DiCamillo <rdicamil@cc2.bbn.com> Subject: Human Behv. & FSM's To: risks@csl.sri.com Cc: rdicamil@cc2.bbn.com Comments on Bob Estell's "Machine Errors", Risks Vol. 3, #49 (FSM's need friends too) I have often felt the same way Bob Estell does - that the full scope of (software) engineering is too vast for a mere mortal to comprehend. However, I usually reassure myself with a good dose of computational theory: * "... for all these reasons 'machines make errors' in much the same * * sense that people mispronounce words or make mistakes driving." * I agree with the apparent analogy, but still cringe at the actual usage of the word error. Webster's Ninth New Collegiate dictionary defines error as an "act involving unintentional deviation from truth or accuracy". If truth or accuracy for computers or finite state automata is defined to be the mapping of all possible input states to output states, then theoretically, the only *unintentional* deviation from such truth (tables or such) is the failure to map or correlate all possible input strings to known or desired output states. I have participated in the situation where the adoption of a non-standard arbitration scheme did not take into account cycle stealing, and assembly code actually had the value of operands corrupted so that a branch occurred on the opposite condition to the true data. This was a bug that only a logic analyzer could find, and set the hardware engineers back to their drawing board. You have no idea how strange it feels to tell someone, that the code actually took a branch wrong; prior to the branch the data was true, but it always branched to the false address. The high level DDT would never show the data to be false because of the particular timing coincidences involved with using an in circuit emulator; very disturbing when even your debugger says all is well, and tests still fail operationally in the real system. In the case of bus arbitration, an entire realm of undesirable input strings should be eliminated if the timing constraints between competing processes are properly enforced in hardware. If they are not, "unintentional deviation" from the arbitration scheme will occur, but that "deviation" is really only another set of output states that serves no desirable function. However, you could sit down with a logic analyzer and painfully construct a mapping of all possible input timing states to a bus arbitration scheme, and map the output. Hopefully, the design engineers did this when they made the specifications, even if they were not exhaustive in testing every possible input string. I believe it is improper to construe human behavior, especially *unpredictability* to the results of input strings that fall outside the desired function of a finite state automata. In theory, a FSM can have an undefined output for a given input, but in practice the definition of this output usually depends upon the resolution of your measuring instruments. If an arbitration scheme appears to yield an indeterminate output when all inputs are still within spec ( proper input strings), then the characteristic function of the FSM is not complete (well defined). Practically, this could mean that a timing situation arose they couldn't or didn't see - maybe their analyzer didn't have the resolution ? But it is still ultimately, and sometimes easily attributable to a human oversight. How much of the FSM characteristic function do you know about ? The part you never dealt with is not necessarily "unpredictable". Many important computational theories hinge on the conception that any "solvable" problem can be realized in an arbitrarily complex FSM. While it may not be practical to build the machine, no one yet has been able to disprove such assertions as Church's thesis with current silicon built architectures. Computational theory still clings to this viewpoint, which I practically see as - if output states seem indeterminate, you still haven't found the correct way to cast inputs in a reliably measurable form. * "But can we examine the millions of lines of code that comprise the * * micro-instructions, the operating system, and the engineering * * applications on a multi-processor system, and hope to understand * * ALL the possible side-effects." * Goals of good software/hardware design are to make it easy to categorize all possible input strings, especially when they are countably infinite. This is not the same as viewing the machine as somehow irrational and unpredictable. Good designs may have an ease to their completeness of their characteristic function (CF). This does not mean bad designs are unpredictable, just maybe too complex to realize or measure. Anthropomorhizing is all too tempting. Systems with many architectural layers have complex interactions. Recent discussion in RISKS has highlighted the small percentage of total execution paths that are ever actually traced, but perhaps in well characterized FSM's, such exhaustive testing can be cautiously minimized. If in fact the range of the CF is countably infinite, then some method of limited testing is usually mandatory. Its the part of the FSM you don't know that you tend to ascribe human behavior to ! Maybe it does take some exposure to developing systems with both complete and incomplete characteristic functions to get an intuition about how closed the FSM has to be to give satisfactory performance, for a specific application. Bus arbitration is a relatively critical control function in most architectures, and should be given a high priority. I'm sure there are many systems out there that work just on the verge of catastrophe as sloppily implemented FSM's, at numerous levels. Writing microcode, I tend to look at design issues architecturally; however, some experts believe that new architectures may be invented that will not be encompassed by contemporary computational theory. In the August 1986 SPECTRUM (from IEEE), the series of articles on optical computing addresses this problem: * "In C. Lee Giles view, (program manager of the Air Force Office of * * Scientific Research in Washington, D.C.), theoretical computer * * science has 'stuck its neck out' by saying that computational * * models define anything that is computable, since it is unknown * * whether there are tasks these models cannot perform that the * * human brain can." ..... * * * * (from the author Trudy E. Bell), "it remains to be seen whether * * (optical) neural network architectures represent a new * * computational model." I would love to prove some philosophers wrong about how "computable tasks" can ultimately be cast in the form of FSM's. The dawn of the general purpose optical computer architecture may well introduce new models that require a new breed of non FSM computational theory. However, I think that computer engineering will focus on getting good "old fashioned" FSM's to work in the real world for a long time, and even at this level of complexity there will always be bugs from human behavior, not "machine behavior". ------------------------------ End of RISKS-FORUM Digest ************************ a -------