[mod.risks] RISKS-3.47

RISKS@CSL.SRI.COM (RISKS FORUM, Peter G. Neumann -- Coordinator) (09/02/86)

RISKS-LIST: RISKS-FORUM Digest,  Monday, 1 September 1986  Volume 3 : Issue 47

           FORUM ON RISKS TO THE PUBLIC IN COMPUTER SYSTEMS 
   ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Contents:
  Flight Simulators Have Faults (Dave Benson)
  Re: QA on nuclear power plants, the shuttle, and beer (Henry Spencer)
  Acts of God vs. Acts of Man (Nancy Leveson -- two messages)
  Computer Literacy (Mike McLaughlin)
  Another supermarket crash (Ted Lee)
  A supermarket does not grind to a halt (Brint Cooper)

The RISKS Forum is moderated.  Contributions should be relevant, sound, in good
taste, objective, coherent, concise, nonrepetitious.  Diversity is welcome. 
(Contributions to RISKS@CSL.SRI.COM, Requests to RISKS-Request@CSL.SRI.COM)
  (Back issues Vol i Issue j available in CSL.SRI.COM:<RISKS>RISKS-i.j.
  Summary Contents in MAXj for each i; Vol 1: RISKS-1.46; Vol 2: RISKS-2.57.)

----------------------------------------------------------------------

Date: Sat, 30 Aug 86 23:08:47 pdt
From: Dave Benson <benson%wsu.csnet@CSNET-RELAY.ARPA>
To: risks%csl.sri.com@CSNET-RELAY.ARPA
Subject:  Flight Simulators Have Faults

I mentioned the F-16 RISKS contributions to my Software Engineering class
yesterday.  After class, one of the students told me the following story about
the B-1 Flight Simulator. The student had been employed over the summer to
work on that project, thus having first-hand knowledge of the incident.

Seems when a pilot attempts to loop the B-1 Flight Simulator that
the (simulated) sky disappears.  Why?  Well, the simulated aircraft
pitch angle was translated by the software into a visual image by
taking the trigonometric tangent somewhere in the code.  With the
simulated aircraft on its nose, the angle is 90 degrees and the
tangent routine just couldn't manage the infinities involved.  As I
understand the story, the monitors projecting the window view went blank.

Ah, me.  The B-1 is the first aircraft with the capability to loop?  Nope,
its been done for about 70 years now...  The B-1 Flight Simulator is the
first flight simulator with the capability to allow loops?  Nope, seems to
me I've played with a commercially available Apple IIe program in which a
capable player could loop the simulated Cessna 180.  $$ to donuts that
military flight simulators with all functionality in software have been
allowing simulated loops for many years now.

Dick Hamming said something to the effect that while physicists stand on one
another's shoulders, computer scientists stand on one another's toes.  At
least on the toes is better than this failure to do as well as a game
program...  Maybe software engineers dig one another's graves?

And this company wants to research Starwars software...  Jus' beam me up,
Scotty, there's no intelligent life here.

------------------------------

Date: Sun, 31 Aug 86 01:35:29 edt
From: decwrl!decvax!LOCAL!utzoo!henry@ucbvax.Berkeley.EDU
To: LOCAL!CSL.SRI.COM!RISKS
Subject: Re: QA on nuclear power plants, the shuttle, and beer

Equipment failures and human errors are common enough in any human endeavor;
the question is not whether they happen, but whether they present actual or
potential risks of serious consequences.  In this context the lack of
publicity is not at all surprising: one form of serious consequence is public 
hysteria over insignificant trivia.  When an attempt to reach a vacationing
brewery programmer gets blown up into stories of a total production shutdown
and impending beer shortage -- this, mind you, in an industry which is *not*
the focus of hostile propaganda campaigns and widespread irrational fears --
the people involved with nuclear plants have every reason to be very quiet
about even routine, unexciting, non-hazardous problems.

				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry

------------------------------

Date: 30 Aug 86 17:26:20 PDT (Sat)
From: Nancy Leveson <nancy@ICSD.UCI.EDU>
To: risks@csl.sri.com
Subject: Acts of God vs. Acts of Man

   >> From: Nancy Leveson <nancy at ICSD.UCI.EDU>
   >>  ...  My real quibble is with the term "computer errors"...

 > From: Herb Lin <lin at XX.LCS.MIT.EDU>
 >While I agree with the above sentiment for the most part, it strikes
 >me that there is a sense in which it is not true...

I would describe it as a mistake on the part of the designers, not the
computer.  It probably did exactly what the designers told it to do.
You certainly could not expect it somehow magically to come up with
a fix for the software bug that the designers put in.  Only under those
assumptions could the computer have made a mistake by not coming up
with such a fix.  

 >We can describe this as a design (and thus human) error.  But given
 >the circumstances, it is not implausible to argue that an "Act of God"
 >occurred.  

But unless I am working very fast, am distracted, or have taken drugs, you
have described all situations in which I make mistakes.  By this argument,
as long as I work slowly, pay attention, and remain sober, then I will never
make a mistake again -- I can attribute any stupid thing I do to God and
never have to accept responsibility.  I will also never have to do anything
to improve my performance since it is God's fault (or the computer's fault
or the fault of the complexity of the problem I am working on), and
therefore there is nothing I can do about it.  There is a difference between
responsibility and liability.  I can be responsible without being negligent
(although I argue below that the designers could be considered negligent
under the above hypothesized circumstances).

By Herb's definition, no accident can ever be ascribed to human error since:

 >It was a rare happening; 

All accidents are rare happenings.  And unless the human is extremely
incompetent, most human mistakes are rare happenings.  

 >we didn't understand it very well;

I rarely make mistakes when I understand things well (unless I am working
fast, not paying attention, or stoned).  

 >we could not predict that it would happen.  

If I could predict that I will make a mistake, then I either don't do
the thing or I fix the mistake I made.  I have yet to write a program
which I did not think was correct (although a few actually had mistakes
in them).  In fact, it is precisely the stochastic, hardware-type
failures which we can predict and for which we can  provide probabilities.  
It is human mistakes which are difficult to predict.

BUT it IS possible to predict that human-made flaws will occur in complex
designs (although the particular mistakes may not be predictable) and,
it IS possible to provide mechanisms to protect against them.  This is
precisely what engineers do when they put interlocks into potentially
dangerous systems.  Software and computers are not unique in this
respect.  One can argue that the humans were responsible for the
original design flaw hypothesized above (although not negligent) and
that they were responsible and liable for the mistake they made in not
doing something to avert disaster in case their assumptions were wrong.

Most readers will agree that the Russian explanation for the
accident at Chernobyl implies human error and not an act of God.
But certainly, the events were rare, the operators did not understand
what they were doing, and they could not predict the consequences
(or they certainly would not have done it).  Again, computer
software is not needed for these conditions to hold.

My real fear of this type of thinking is that it will be used to
justify not doing anything about software safety.  If all software design 
flaws are "acts of God," then we need not spend a lot of money for safety 
because nothing can be done about programmer mistakes which are not the
result of negligence or maliciousness.  This is, however, untrue.  System
safety engineers have developed scientific and engineering principles
to identify and control hazards (caused by human design flaws or
hardware failures) in complex systems through analysis, design, 
and management procedures.  We need to do the same for software before 
catastrophies result.  

     Nancy Leveson
     Information and Computer Science Dept.
     University of California, Irvine

     [There is a lengthy response to this message from Herb, but since
      he mostly agrees with Nancy or reemphasizes earlier points, it is not
      included in this issue -- except for a paragraph that prompted a
      further message from Nancy.  I am somewhat amused that this entire 
      debate began when Nancy responded to my original statement -- which
      enumerated a few causative factors, but which in no way intended to
      imply that those were the ONLY factors in those cases.  I have 
      maintained consistently throughout RISKS that a holistic approach is
      absolutely essential; any factor individually, or various factors in
      combination, could be devastating.  Herb concludes that he and Nancy
      really seem to agree, so I presume this exchange will now end.  PGN]

------------------------------

Date: 30 Aug 86 23:30:36 PDT (Sat)
From: Nancy Leveson <nancy@ICSD.UCI.EDU>
To: lin@xx.lcs.mit.edu, risks@csl.sri.com
Subject: Acts of God vs. Acts of Man, Round n+1 (eastbound)

   >From Herb Lin:
   >But the number of assumptions that
   >designers must make is enormously large, and it is essentially
   >impossible to even articulate ALL of one's assumptions.  

Agreed.  But there are ways to determine which are the critical
assumptions with regard to particular hazards.  This is exactly what
some of my techniques, e.g. software fault tree analysis, attempt to
do.  In the Firewheel example that I published, we determined a critical
assumption which could have resulted in the satellite being destroyed.
That is, if there were two sun pulses detected within 64 milliseconds of
each other, the microprocessor interrupt system became hung which could
possibly result in destruction of the sensor booms (and thus the usefulness
of the satellite).  We found this assumption by working backward through
the software from the hazardous condition.  The solution, once the
critical assumption had been determined, was a simple blocking of the
second sun pulse interrupt.

I don't know for what size systems these backward analysis approaches are
practical.  It took Peter Harvey (my student) two days to analyze the
Firewheel software (which is about 1600 lines long) by hand.  Obviously, it
would be possible to analyze larger software, but we do not yet know how
much this will scale up practically.  We are working on a software tool to
automate as much as possible.  These techniques are, of course, no more
perfect than other more traditional software engineering techniques.  And
better ones may be found.  I am just not ready to say it is impossible
without first trying.

Backwards analysis, verification of safety, software interlocks,
software fault tolerance, fail-safe design, ... -- there are possible
solutions which we should be examining.
                                               Nancy Leveson

------------------------------

Date: Mon, 1 Sep 86 11:49:10 edt
From: mikemcl@nrl-csr (Mike McLaughlin)
To: risks@csl
Subject: Computer Literacy

From THE WASHINGTON POST, Monday, 1 Sept 86, page A14, Letters to the
Editor [ "..." indicates omissions].  While I do not entirely agree with
Mr. Jordan, much of what he says is directly applicable to Risks.

        [Before responding to this, please recall that this topic has
         already been discussed at some length in RISKS-2.36 and 37, 
         and in RISKS-3.17, 19, 20, and 21.  PGN]

        +++++++++++++++++++++++++++++++++++++++++++++++++++++++

                         COMPUTER LITERACY

     Although I earn my living as a consultant in computerized data bases,
I strongly oppose the view... that computer literacy should be mandatory in
the secondary school curriculum.

     Computers are a device for performing some task that either is already
performed by other means or first must be understood in other terms,
usually a mathematical equation.  Learning how to operate a computer, or
program one, is not going to improve a student's knowledge of languages,
mathematics, history or political science.

     Alfred North Whitehead observed that civilization increases the number
of things that we can do without thinking, i.e., that we can take for
granted.  This is evident in the development of computers, which
increasingly are becoming like automobiles; anyone can drive them.
Learning the technology of computers has as much relevance to everyday life
as learning the technology of auto engines.

     Unquestionably there are tasks for which computers are indespensable,
but individuals will learn those functions as they become involved in the
task itself, whether it be medical diagnosis, controlling the flow of
electric power over a grid or determining the authorship of a 16th-century
poem.

     What students need to know is how to think, especially about the human
condition.  As more and more college students flock to "practical" majors,
the secondary schools should be concentrating on the liberal arts.  In this
perspective, "computer literacy" may be just another form of a larger
illiteracy.

          - John S. Jordan,  Washington, D.C.

------------------------------

Date:  Sat, 30 Aug 86 23:26 EDT
From:  TMPLee@DOCKMASTER.ARPA
Subject:  Another supermarket crash
To:  Risks@CSL.SRI.COM

Same thing happened here (Minneapolis-St.Paul) a couple of years ago -- I
was in a major discount store (Target), during a normal busy time --
Saturday morning, I think -- only to find that all the cash registers wre
down because the central computer was down.  Don't know how long it lasted,
but at least long enough that by the time I got there the cashiers were
using paper and pencil.  Really, stupid -- (so he says as a so-called
computer expert) -- given that those registers probably had Z80's or 6502 or
such in them, a printed record, etc., so they could just as well have worked
off-line (except for not knowing the price on instant sale items, which
would I assume have been in the central machine.)

------------------------------

Date:     Mon, 1 Sep 86 13:29:45 EDT
From:     Brint Cooper <abc@BRL.ARPA>
To:       risks@csl.sri.com
cc:       mnetor!lsuc!dave@seismo.css.gov
Subject:  A supermarket does not grind to a halt

Last month, as I awaited checkout in a "Giant" supermarket in Bel Air,
MD, an area-wide power outage lasting several minutes occured.  The
following was the sequence of events:

	1.  All lights, outside and inside, instantly out.
	1A. Display on cash register_cum_terminal RETAINED display!
	2.  Some of the same lights came back almost immediately (seemed
            to be back-up power).
	3.  Several minutes passed; additional lights began to come on.
	4.  One-by-one, the terminals "beeped" and became functional.

No market employee with whom I spoke seemed to understand what actually
happened.  But the computer system obviously was protected from such
random power outages (which occur FREQUENTLY here).
                                                          Brint Cooper
                    UUCP:  ...{seismo,unc,decvax,cbosgd}!brl-smoke!abc

------------------------------

End of RISKS-FORUM Digest
************************
-------