[mod.risks] RISKS-2.29

RISKS@SRI-CSL.ARPA (RISKS FORUM, Peter G. Neumann, Coordinator) (03/18/86)
RISKS-LIST: RISKS-FORUM Digest,  Monday, 17 Mar 1986  Volume 2 : Issue 29

           FORUM ON RISKS TO THE PUBLIC IN COMPUTER SYSTEMS 
   ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Contents:
  Commission vs. Omission (Martin J. Moore plus an example from Dave Parnas)
  A Stitch in Time (Jagan Jagannathan)
  Clockenspiel (Jim Horning)
  Cordless phones (Chris Koenigsberg)
  Money talks (Dirk Grunwald, date correction from Matthew Kruk)
  [Non]computerized train wreck (Mark Brader)
  On-line Safety Database (Ken Dymond)

The RISKS Forum is moderated.  Contributions should be relevant, sound, in good
taste, objective, coherent, concise, nonrepetitious.  Diversity is welcome. 
(Contributions to RISKS@SRI-CSL.ARPA, Requests to RISKS-Request@SRI-CSL.ARPA.)
(Back issues Vol i Issue j stored in SRI-CSL:<RISKS>RISKS-i.j.  Vol 1: MAXj=45)

----------------------------------------------------------------------

Received: from eglin-vax.ARPA ... Mon 17 Mar 86 17:26:49-PST
Date: 0  0 00:00:00 CDT
From: "MARTIN J. MOORE" <mooremj@eglin-vax>
Subject: Commission vs. Omission
To: "risks" <risks@sri-csl>

Dave Parnas's points regarding the shuttle destruct system are well taken.
The policy, stated informally, was that "it better work if we need it --
but it absolutely better NOT 'work' when we DON'T need it" which generated the
extreme emphasis on preventing what Dave calls "risks of commission."  I feel
that the risk of commission on the destruct system is extremely small, while 
the risk of omission is somewhat higher, although still small.  During 
validation testing and in every pre-launch checkout, we performed "exhaustive" 
checks -- "exhaustive" meaning that we tried every combination of 
  [(2 central computers) * (6 remote sites) * (2 computers per site) 
      * (2 transmitters per site) * (2 comm paths to each site) 
      * (2 possible commands in various sequences)].  
Yeah, this takes a *LONG* time (with practice, we got it down 
to several hours if everything went smooth.)  On one occasion during 
validation testing, we did find a software error which only manifested on a
particular (central computer/comm path/remote computer/unusual command
sequence) combination.  Exhaustive tests *are* necessary.

I have often wondered why the emphasis was to prevent errors of commission
over errors of omission (not to say that we wanted either kind, but errors
of commission were definitely considered to be worse!).  An erroneous
destruct would cost the lives of the flight crew, loss of the Orbiter, and
possibly damage on the ground if it occurred early in the flight (e.g.,
windows blown out, etc.)  An erroneous non-destruct, in the worst case (if
the ET were to detonate near the crowded spectator area on the NASA
causeway), could cause the loss of TENS OF THOUSANDS of lives.  Certainly
this is worse than an erroneous destruct.  I believe there may be a
subconscious feeling that an erroneous destruct means the difference between
a success and a disaster, while an erroneous non-destruct means the
difference between a disaster and a worse disaster.  Subjectively, that
difference is not as great as the first, although objectively it may be much
greater.
                                     Martin Moore

<The usual disclaimers.  I'm too tired to type in the whole silly thing.>

        [By the way, Dave Parnas suggested the following example to
         illustrate his message in RISKS-2.28:]

         "Consider elevators.  Consider how much easier it is to prevent the
         floor indicator from saying "13" than to assure that the floor
         indicator will always give the actual floor that the elevator is
         on.  The risk of indicating "13" can be gotten acceptably low by
         eliminating "13" from the set of indicator lights.  The risk of
         indicating an incorrect floor or not indicating the current floor
         is much harder to eliminate."  [Dave Parnas]

------------------------------

Date: Mon 17 Mar 86 11:43:53-PST 
From: JAGAN@SRI-CSL.ARPA 
Subject: A Stitch in Time
To: Neumann@SRI-CSL.ARPA

   [As it now turns out, the reboot occurred just moments BEFORE I logged
    in Sunday night.  Here are some further details.  PGN]

This is the probable sequence of events that led us back in time on CSLA:
1. A power glitch (late night SUNDAY) caused the F4 to hard boot.
2. During a hard boot, the TIME is retrieved from eleven independent sources
   (which are assumed to be correct!)
3. One of these sources had the incorrect time of some warm day in 1972
   causing the average to be wrongly computed resulting in Dec 6th/1985.

Suggestion:
1. Change the statistical measure from MEAN to something less sensitive to
   one or two abnormal times; for example the average of the 5th, 6th, and 7th
   largest times.  

       [IT IS ABSOLUTELY INCREDIBLE THAT UNSAFE ALGORITHMS continue to
        be used.  This problem is as old as the hills.  Statisticians
        routinely throw out the absurd values before computing the
        mean.  Dorothy Denning pointed out the pun in their terminology
        (applicable to Byzantine agreement algorithms, where you don't
        trust anyone): the OUT-LIERS are really the OUT-LIARS.  

        EVEN WORSE, Jagan points out that if the clock had been accidentally
        set INTO THE FUTURE, things could also get very sticky.  We also
        have a problem of nonunique clock readings during the hour at 2AM when
        Daylight Savings Time ends.  A good time to be asleep.  PGN]

[Here is some more background.]

  Date: Mon 17 Mar 86 12:37:37-PST
  From: Mark Lottor <MKL@SRI-NIC.ARPA>
  Subject: [Louis A. Mamakos <louie@trantor.UMD.EDU>: time]
  To: Jagan@SRI-CSL.ARPA

  This was just to verify that the problem was on the
  remote system and not some local problem...
                ---------------

  Date: Mon, 17 Mar 86 15:31:14 EST
  From: Louis A. Mamakos <louie@trantor.UMD.EDU>
  To: MKL@sri-nic.ARPA
  In-Reply-To: Mark Lottor's message of Mon 17 Mar 86 11:34:41-PST
  Subject: time

  Yes, I can verify that it was indeed the clock (actually the host the clock
  was on) that was screwed up.  It it unfortunate that there is no way to get
  the current year out of the WWVB clock.  There was work being done in the
  computer room, which reset our LSI-11/73 host, which subsequently got
  confused.  Sorry about the problem.

  Louis A. Mamakos  WA3YMH    Internet: louie@TRANTOR.UMD.EDU
  University of Maryland, Computer Science Center - Systems Programming

------------------------------

From: horning@decwrl.DEC.COM (Jim Horning)
Date: 17 Mar 1986 1436-PST (Monday)
To: RISKS@SRI-CSL.ARPA
Subject: Clockenspiel

Your errant clock reminds me of something that happened at Stanford in
the mid-sixties. I was apparently one of the first users of the 360/67
to run a job that started on one day and finished on the next. When the
statement for my account arrived in the mail, I had quite a job
convincing my wife that the huge figure (to a graduate student couple)
was nothing to worry about: It was a CREDIT resulting from a job that
was charged for minus 23 hours and 58 minutes!

The Xerox Alto operating system had a compiled-in reasonableness check
on the date and time. When it started up, if the local clock wasn't
"reasonable," it sent a request over the Ethernet and put "Date and
Time Unknown" in the banner. Well, you guessed it: The day came when
the (correct) time from the time server was no longer "reasonable," and
therefore couldn't be corrected by appeal to the time server....

Jim H.

------------------------------

Date: 17 Mar 1986 11:48-EST 
From: Chris.Koenigsberg@G.CS.CMU.EDU
To: RISKS FORUM    (Peter G. Neumann, Coordinator) <RISKS@SRI-CSL.ARPA>
Subject: RISKS re: Cordless phones

My roommate has a cordless phone and it goes on the blink every few weeks.
All the phones in the house stop working. When you pick any one up, all you
get is a very loud static sound and you can't dial out. I have learned that
I can fix this problem by sneaking in his room and unplugging the cradle for
his cordless phone. A visitor in the house was very frightened one night
when she was left alone and though someone had cut the phone lines or
something. It was the cordless phone on the blink again.

Chris Koenigsberg
ckk@g.cs.cmu.edu , or ckk%andrew@pt.cs.cmu.edu
{harvard,seismo,topaz,ucbvax}!g.cs.cmu.edu!ckk
(412)268-8526 office, (412)362-6422 home
Center for Design of Educational Computing
Carnegie-Mellon U.
Pgh, Pa. 15213

------------------------------

Date: Mon, 17 Mar 86 16:44:15 CST
From: grunwald@b.CS.UIUC.EDU (Dirk Grunwald)
To: risks@sri-csl.arpa
Subject: re: money talks

I read the 'money talks' article with great amusement. One of the risks to
society which is worth talking about is the risk of using inappropriate, or
downright silly, technology.

Talking money would appear to be such a waste of resources. Certainly some
other method of denomination descrimination could be devised for the
visually impared. Rasied lettering, coinage instead of paper money, different
sized paper money, different paper stock. But talking money?

dirk grunwald
university of illinois

------------------------------

Date: Mon, 17 Mar 86 09:07:32 PST
From: Matthew_Kruk%UBC.MAILNET@MIT-MULTICS.ARPA
To: RISKS@SRI-CSL.ARPA
Subject: Money Talks

Correction to my previous message: The date of the article should be
March 14th (Friday).

------------------------------

Date: Mon, 17 Mar 86 19:04:03 EST
From: ihnp4!utzoo!lsuc!msb@seismo.CSS.GOV
To: ihnp4!seismo!sri-csl.arpa!RISKS@seismo.CSS.GOV
Subject: [Non]computerized train wreck

Me:
> Not only was this NOT a case of computer malfunction, but indeed, a more
> fully computerized system (with cab signalling and automatic train stopping)
> would probably have prevented the accident.

PGN:
> [... Thus, even though it appears NOT to be a computer problem, 
> we discover that the computer could have done better!  But, of course,
> don't blame the computer system.  Blame the people who specified, 
> designed, and implemented it -- not JUST the train operator(s).  PGN]

You sound more critical than I meant to be.  The cost of equipping all major
railways with cab signalling and the like would be considerable, to say the
least.  While such installations certainly do exist, especially on busy
high-speed lines, the "centralized traffic control" in use on the route in
question is probably much more common.  Are you calling on all railways to
upgrade their signaling systems long before they are life-expired, every
time something somewhat better comes along?

Mark Brader  (ihnp4!utzoo!lsuc!msb and ...!dciem!msb are both me.)

   [One would hope that new improvements do not always require everything
    to be thrown out.  Long ago we discovered the advantages of software
    solutions over hardware solutions.  But when human lives are at stake,
    safer systems may be worth the price of upgrading equipment.  I think
    that the incredible escalation of law-suit awards and of rates for 
    malpractice and liability insurance may provide some new incentives.  PGN]

------------------------------

Date: 17 Mar 86 15:14:00 EST
From: "DYMOND, KEN" <dymond@nbs-vms.ARPA>
Subject: On-line Safety Database
To: "risks" <risks@sri-csl.ARPA>

Our Library Bulletin (and as a frequent user I'd have to say that the NBS 
has one of the best technical libraries going) for February contained a 
notice that Pergamon Infoline (evidently a supplier of such services) is
offering a new online database service, SAFETY: "SAFETY, produced by
Cambridge Scientific Abstracts, provides broad interdisciplinary
coverage of safety, including industrial, transportation, environmental,
and medical safety.  This database indexes journals, books, reports,
patents, and proceedings published in 1981 or later."  If someone on
the list uses this database, please let us know how well it covers
computer and software safety.

Ken Dymond
National Bureau of Standards

------------------------------

End of RISKS-FORUM Digest
************************
-------