[mod.risks] RISKS-3.50

RISKS@CSL.SRI.COM (RISKS FORUM, Peter G. Neumann -- Coordinator) (09/08/86)
RISKS-LIST: RISKS-FORUM Digest,  Sunday, 7 September 1986  Volume 3 : Issue 50

           FORUM ON RISKS TO THE PUBLIC IN COMPUTER SYSTEMS 
   ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Contents:
  Enlightened Traffic Management (Alan Wexelblat)
  Flight Simulator Simulators Have Faults (Dave Benson)
  Re: Flight Simulators and Software Bugs (Bjorn Freeman-Benson)
  Always Mount a Scratch Monkey (Art Evans)
  Re: supermarket crashes (Jeffrey Mogul)
  Machine errors - another point of view (Bob Estell)
  Human Behv. & FSM's (Robert DiCamillo)

The RISKS Forum is moderated.  Contributions should be relevant, sound, in good
taste, objective, coherent, concise, nonrepetitious.  Diversity is welcome. 
(Contributions to RISKS@CSL.SRI.COM, Requests to RISKS-Request@CSL.SRI.COM)
  (Back issues Vol i Issue j available in CSL.SRI.COM:<RISKS>RISKS-i.j.
  Summary Contents in MAXj for each i; Vol 1: RISKS-1.46; Vol 2: RISKS-2.57.)

----------------------------------------------------------------------

Date: Thu, 4 Sep 86 09:58:49 CDT
From: Alan Wexelblat <wex@mcc.com>
To: risks@csl.sri.com
Subject: Enlightened Traffic Management

The Austin rag carried the following brief item off the AP wire:

NEW DELHI, India (AP) - The computer lost the battle with the commuter.

"Enlightened traffic management" was the term for New Delhi's new
computerized bus routes, but four days of shattered windows, deflated tires
and protest marches convinced the bus company that its computer was wrong.

The routes dictated by the computer proved exceedingly unpopular
with passengers, who claimed that they were not being taken where
they wanted to go.

Bowing to demand, the New Delhi Transport Corp. scrapped the new
"rationalized" routes and restored 114 old routes.

"The computer has failed," shouted thousands of victorious commuters in
eastern New Delhi Tuesday night after transport officials drove around in
jeeps, using loudspeakers to announce the return of the old routes.

COMMENTS:  At first, I thought this was pretty amusing; deflated tires is a
computer risk I hadn't heard of before.  But the whole attitude of the
article (and seemingly the people) annoyed me.  The machine is taking the
rap and I'll bet that the idiot who programmed it to produce "optimal"
routes will get off scott free.  Not to mention the company execs who failed
to understand their customer base and allowed the computer to "dictate" new
routes.  ARGH!

Alan Wexelblat
UUCP: {seismo, harvard, gatech, pyramid, &c.}!ut-sally!im4u!milano!wex

------------------------------

Date: Wed, 3 Sep 86 17:01:17 pdt
From: Dave Benson <benson%wsu.csnet@CSNET-RELAY.ARPA>
To: risks%csl.sri.com@CSNET-RELAY.ARPA
Subject: Flight Simulator Simulators Have Faults

  |I developed flight simulators for over 7 years and could describe many such
  |bizarre incidents.
Might be interesting for RISKS if these suggest problems in developing
risk-free software...
  |To point out a failure during
  |testing (or more likely development) seems meaningless.  Failures that make
  |it into the actual product are what should be of concern.
I do not agree.  We need to understand that the more faults found at
any stage to engineering software the less confidence one has in the
final product.  The more faults found, the higher the likelyhood that
faults remain.  I simply mentioned this one because it appears to
demonstrate that for all the claims made for careful analysis and
review of requirements and design, in fact the current practice leaves
such obvious faults to be found by testing.
  |As for the effectiveness of simulators...
Simulators are wonderful.  Surely nothing I wrote suggested otherwise.

Upon further inquiry, the blank sky was in a piece of software used
to simulate the flight simulator hardware.  The software specs essentially
duplicated the functions proposed for the hardware.  So the hardware was
going to take the trigonmetric tangent of the pitch angle.  The software
simulator of the flight simulator indeed demonstrated that one ought not
take the tangent of 90 degrees.

So somebody with presumably a good background in engineering mathematics
simply failed to think through the most immediate consequences of
the trigonometric tangent function.  Nobody noticed this in any kind
of review, nobody THOUGHT about it at all.

Since nobody bothered to think, the fault was found by writing a
computer program and then observing the obvious.  I suggest that
this inability to think bodes ill for the practice of
software engineering and the introduction of "advanced techniques"
such as fault-tree analysis.

I suggest that such examples of a pronounced inattention to well-known
mathematics are part of the reason for such lengthy testing sequences
as the military requires.	And I suggest that the fact that it
appears necessary to mention all this yet once again suggests that
there are many people doing "software engineering" who have failed to
grasp what a higher education is supposed to be about.  I certainly
do not expect perfection, but the trigonometric tangent is an
example of an elementary function.

------------------------------

Date: Fri, 5 Sep 86 10:02:38 PDT
From: bnfb@uw-june.arpa (Bjorn Freeman-Benson)
To: RISKS@CSL.SRI.COM
Subject: Re: Flight Simulators and Software Bugs

In RISKS-3.48, Gary Whisenhunt talks about how he developed flight simulators
and that he "..seriously doubt[s] that the sky went blank in the B-1 simulator
when it was delivered to the government."  And then he goes on to point out
all the specs it had to pass.  I don't know no way or the other, but I want to
point out that the sky going blank points out either a design problem or an
implementation problem.  If it is a design problem, who knows how many other
serious (sky blanking serious) problems exist?  Will the MIL standards catch
them all?  If it is an implementation error, who knows how many other similar
coding errors that sloppy/tired/etc engineer made?  If it's a sign problem,
what happens when you back the plane up?  Will it go into an infinite speed
reverse?  The point I'm trying to make is that bugs are not independent, and
if one shows up, other similar are usually in existence.

						Bjorn N Freeman-Benson
						U of Washington, Comp Sci

------------------------------

Date: Wed 3 Sep 86 16:46:31-EDT
From: "Art Evans" <Evans@TL-20B.ARPA>
Subject: Always Mount a Scratch Monkey
To: Risks@CSL.SRI.COM

In another forum that I follow, one corespondent always adds the comment
	Always Mount a Scratch Monkey
after his signature.  In response to a request for explanation, he
replied somewhat as follows.  Since I'm reproducing without permission,
I have disguised a few things.
          
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

My friend Bud used to be the intercept man at a computer vendor for
calls when an irate customer called.  Seems one day Bud was sitting at
his desk when the phone rang.
    
    Bud:	Hello.			Voice:	YOU KILLED MABEL!!
    B:		Excuse me?		V:	YOU KILLED MABEL!!

This went on for a couple of minutes and Bud was getting nowhere, so he
decided to alter his approach to the customer.
    
    B:		HOW DID I KILL MABEL?	V:	YOU PM'ED MY MACHINE!!

Well to avoid making a long story even longer, I will abbreviate what had
happened.  The customer was a Biologist at the University of Blah-de-blah,
and he had one of our computers that controlled gas mixtures that Mabel (the
monkey) breathed.  Now Mabel was not your ordinary monkey.  The University
had spent years teaching Mabel to swim, and they were studying the effects
that different gas mixtures had on her physiology.  It turns out that the
repair folks had just gotten a new Calibrated Power Supply (used to
calibrate analog equipment), and at their first opportunity decided to
calibrate the D/A converters in that computer.  This changed some of the gas
mixtures and poor Mabel was asphyxiated.  Well Bud then called the branch
manager for the repair folks:

    Manager:	Hello
    B:		This is Bud, I heard you did a PM at the University of
    		Blah-de-blah.
    M:		Yes, we really performed a complete PM.  What can I do
		for You?
    B:		Can You Swim?

The moral is, of course, that you should always mount a scratch monkey.

              ~~~~~~~~~~~~~~~~~~~~~~

There are several morals here related to risks in use of computers.
Examples include, "If it ain't broken, don't fix it."  However, the
cautious philosophical approach implied by "always mount a scratch
monkey" says a lot that we should keep in mind.

Art Evans
Tartan Labs

------------------------------

From: mogul@decwrl.DEC.COM (Jeffrey Mogul)
Date:  4 Sep 1986 1614-PDT (Thursday)
To: risks@csl.sri.com
Subject: Re: supermarket crashes

One of the nearby Safeway supermarkets is open 24 hours, and is quite
popular with late-night shoppers (it's known by some as the "Singles
Safeway").  Smart shoppers, however, used to avoid visiting just before
midnight, because that's when all the cash registers were out of operation
while they went through some sort of ritual (daily balances or somesuch),
simultaneously.

I also discovered that this market, at least, is not immune to power
failures; I was buying a quart of milk one evening when a brief blackout hit
the area.  The lights were restored within minutes, but the computer was
dead and the cashiers "knew" it would be a long time before it would be up;
they weren't about to waste their fortuitous coffee-break adding things up
by hand, perhaps because they couldn't even tell the price of anything (or
indeed, what it was, in the case of produce) without the computer.

I don't often shop at that market, partly because the markets I do
use have cashiers who know what things are rather than relying on
the computer. Some day, just for fun, I might mark a pound of pecans
with the code number for walnuts, and see if I can save some money.

------------------------------

Date: Thu, 4 Sep 1986  21:27 EDT
From: LENOIL@XX.LCS.MIT.EDU
To:   "SEFB::ESTELL" <estell%sefb.decnet@NWC-143B.ARPA>
Cc:   risks <risks@CSL.SRI.COM>
Subject: Machine errors - another point of view

     	 A "machine" as seen by the applications programmer, is
     already several layers [raw hardware, microcode, operating system
     kernel, run-time libraries, compiler]; and each layer is perhaps
     nearly a million pieces [IC's, lines of (micro)code] that may
     interact with nearly a million other pieces in other layers.

    Interaction between one million pieces of a system is more than just
an exaggeration, it is horrendous engineering practice that should never
be seen.  Flow-graphs, dependency diagrams, top-down design - all are
ways of reducing interaction between system components to a small,
manageable size - the smaller the better.  The probability of designing
a working system of one million fully-connected components is near-zero.
Furthermore, you seem to imply that component interconnects can
transcend abstraction boundaries (e.g. microcode <-> run-time
libraries); this again is poor engineering practice.  I don't disagree
that rising system complexity is a problem today, but you are several
orders of magnitude off in your statement of the problem.

Robert Lenoil

------------------------------

Date: Fri, 5 Sep 86 16:27:45 EDT
From: Robert DiCamillo <rdicamil@cc2.bbn.com>
Subject: Human Behv. & FSM's
To: risks@csl.sri.com
Cc: rdicamil@cc2.bbn.com

Comments on Bob Estell's "Machine Errors", Risks Vol. 3, #49
(FSM's need friends too)

I have often felt the same way Bob Estell does - that the full scope of
(software) engineering is too vast for a mere mortal to comprehend.
However, I usually reassure myself with a good dose of computational theory:

  * "... for all these reasons 'machines make errors' in much the same *
  * sense that people mispronounce words or make mistakes driving."    *

I agree with the apparent analogy, but still cringe at the actual usage of
the word error. Webster's Ninth New Collegiate dictionary defines error as
an "act involving unintentional deviation from truth or accuracy". If truth
or accuracy for computers or finite state automata is defined to be the
mapping of all possible input states to output states, then theoretically,
the only *unintentional* deviation from such truth (tables or such) is the
failure to map or correlate all possible input strings to known or desired
output states.

I have participated in the situation where the adoption of a non-standard
arbitration scheme did not take into account cycle stealing, and assembly
code actually had the value of operands corrupted so that a branch occurred
on the opposite condition to the true data. This was a bug that only a logic
analyzer could find, and set the hardware engineers back to their drawing
board. You have no idea how strange it feels to tell someone, that the code
actually took a branch wrong; prior to the branch the data was true, but it
always branched to the false address. The high level DDT would never show
the data to be false because of the particular timing coincidences involved
with using an in circuit emulator; very disturbing when even your debugger
says all is well, and tests still fail operationally in the real system.

In the case of bus arbitration, an entire realm of undesirable input strings
should be eliminated if the timing constraints between competing processes
are properly enforced in hardware.  If they are not, "unintentional
deviation" from the arbitration scheme will occur, but that "deviation" is
really only another set of output states that serves no desirable function.
However, you could sit down with a logic analyzer and painfully construct a
mapping of all possible input timing states to a bus arbitration scheme, and
map the output. Hopefully, the design engineers did this when they made the
specifications, even if they were not exhaustive in testing every possible
input string.

I believe it is improper to construe human behavior, especially
*unpredictability* to the results of input strings that fall outside the
desired function of a finite state automata. In theory, a FSM can have an
undefined output for a given input, but in practice the definition of this
output usually depends upon the resolution of your measuring instruments. If
an arbitration scheme appears to yield an indeterminate output when all
inputs are still within spec ( proper input strings), then the
characteristic function of the FSM is not complete (well defined).
Practically, this could mean that a timing situation arose they couldn't or
didn't see - maybe their analyzer didn't have the resolution ?  But it is
still ultimately, and sometimes easily attributable to a human oversight.
How much of the FSM characteristic function do you know about ? The part you
never dealt with is not necessarily "unpredictable".

Many important computational theories hinge on the conception that any
"solvable" problem can be realized in an arbitrarily complex FSM. While it
may not be practical to build the machine, no one yet has been able to
disprove such assertions as Church's thesis with current silicon built
architectures. Computational theory still clings to this viewpoint, which I
practically see as - if output states seem indeterminate, you still haven't
found the correct way to cast inputs in a reliably measurable form.

  * "But can we examine the millions of lines of code that comprise the *
  * micro-instructions, the operating system, and the engineering       *
  * applications on a multi-processor system, and hope to understand    *
  * ALL the possible side-effects."                                     *

Goals of good software/hardware design are to make it easy to categorize all
possible input strings, especially when they are countably infinite. This is
not the same as viewing the machine as somehow irrational and unpredictable.
Good designs may have an ease to their completeness of their characteristic
function (CF). This does not mean bad designs are unpredictable, just maybe
too complex to realize or measure. Anthropomorhizing is all too tempting.
Systems with many architectural layers have complex interactions. Recent
discussion in RISKS has highlighted the small percentage of total execution
paths that are ever actually traced, but perhaps in well characterized
FSM's, such exhaustive testing can be cautiously minimized. If in fact the
range of the CF is countably infinite, then some method of limited testing
is usually mandatory. Its the part of the FSM you don't know that you tend
to ascribe human behavior to !

Maybe it does take some exposure to developing systems with both complete
and incomplete characteristic functions to get an intuition about how closed
the FSM has to be to give satisfactory performance, for a specific
application. Bus arbitration is a relatively critical control function in
most architectures, and should be given a high priority. I'm sure there are
many systems out there that work just on the verge of catastrophe as
sloppily implemented FSM's, at numerous levels.

Writing  microcode, I tend to look at design issues architecturally;
however, some experts believe that new architectures may be invented
that will not be encompassed by contemporary computational theory.
In the August 1986 SPECTRUM (from IEEE), the series of articles
on optical computing addresses this problem:

  * "In C. Lee Giles view, (program manager of the Air Force Office of *
  * Scientific Research in Washington, D.C.), theoretical computer     *
  * science has 'stuck its neck out' by saying that computational      *
  * models define anything that is computable, since it is unknown     *
  * whether there are tasks these models cannot perform that the       *
  * human brain can."   .....                                          *
*                                                                    *
  * (from the author Trudy E. Bell), "it remains to be seen whether    *
  * (optical) neural network architectures represent a new             *
  * computational model."

I would love to prove some philosophers wrong about how "computable
tasks" can ultimately be cast in the form of FSM's. The dawn of
the general purpose optical computer architecture may well introduce
new models that require a new breed of non FSM computational theory. 
However, I think that computer engineering will focus on getting
good "old fashioned" FSM's to work in the real world for a long time,
and even at this level of complexity there will always be bugs from
human behavior, not "machine behavior".

------------------------------

End of RISKS-FORUM Digest
************************
a
-------