[sci.space.shuttle] why 5 computers?

cbwalton@cs.utexas.edu (Chris Walton) (10/03/88)
Summary: Hardware and Software redundancy

For a summary of the design goals that led to the the Shuttle having
5 computers, as well as a detailed account of the bug that delayed
STS-1, see "the 'Bug' Heard 'Round the World" by John R. Garman
in Software Engineering Notes vol. 6, no. 5 (October 1981). At the
time Garman wrote this article he was Deputy Chief of the Spacecraft
Software Division at Johnson Space Center.

Here is a slightly condensed version of Garman's explanation of why at least
four (4) computers are needed

   in order to satisfy a general shuttle goal of "Fail Operational -
   Fail Safe" ("FO/FS") ... most components are replicated 4-deep ...
   Four is the magic number for a very logical and intuitively obvious
   reason: FO/FS requires full operational capability after one failure,
   and a safe return capability after a second. It takes three to vote -
   so it initially takes four to still be able to vote after the first
   failure.
   
With 4 computers, we now have good protection against *hardware* failure.
But all four have the same software (programming). If there is a
catastophic logical error (bug) in that software, it could halt all
four computers simultaneously and transform the shuttle into an inert
object. 

To guard against this contingency, a different program was written by
a separate contractor (but for the same hardware). This program is
designed in such a way that it will not be corrupted by bugs in the
primary software; if a Shuttle crew suspects software failure they
can activate this backup system. The alternate software resides on the
fifth computer; it offers *software* redundancy.

Non CS-types may find Garman's account a bit technical, but it does
give a good illustration of the pitfalls in writing and testing a large
real-time software system. While much of NASA's current equipment may
by 'old-fashioned', it has been in use long enough to make one confident
that any remaining bugs are minor. There's no substitute for testing a
complete hardware-software system under real-world conditions.

 Chris Walton
 Dept. of Computer Sciences
 University of Texas at Austin

 Internet: cbwalton@cs.utexas.edu
 UUCP:     { ... }!cs.utexas.edu!cbwalton  [I think]