cbwalton@cs.utexas.edu (Chris Walton) (10/03/88)
Summary: Hardware and Software redundancy
For a summary of the design goals that led to the the Shuttle having
5 computers, as well as a detailed account of the bug that delayed
STS-1, see "the 'Bug' Heard 'Round the World" by John R. Garman
in Software Engineering Notes vol. 6, no. 5 (October 1981). At the
time Garman wrote this article he was Deputy Chief of the Spacecraft
Software Division at Johnson Space Center.
Here is a slightly condensed version of Garman's explanation of why at least
four (4) computers are needed
in order to satisfy a general shuttle goal of "Fail Operational -
Fail Safe" ("FO/FS") ... most components are replicated 4-deep ...
Four is the magic number for a very logical and intuitively obvious
reason: FO/FS requires full operational capability after one failure,
and a safe return capability after a second. It takes three to vote -
so it initially takes four to still be able to vote after the first
failure.
With 4 computers, we now have good protection against *hardware* failure.
But all four have the same software (programming). If there is a
catastophic logical error (bug) in that software, it could halt all
four computers simultaneously and transform the shuttle into an inert
object.
To guard against this contingency, a different program was written by
a separate contractor (but for the same hardware). This program is
designed in such a way that it will not be corrupted by bugs in the
primary software; if a Shuttle crew suspects software failure they
can activate this backup system. The alternate software resides on the
fifth computer; it offers *software* redundancy.
Non CS-types may find Garman's account a bit technical, but it does
give a good illustration of the pitfalls in writing and testing a large
real-time software system. While much of NASA's current equipment may
by 'old-fashioned', it has been in use long enough to make one confident
that any remaining bugs are minor. There's no substitute for testing a
complete hardware-software system under real-world conditions.
Chris Walton
Dept. of Computer Sciences
University of Texas at Austin
Internet: cbwalton@cs.utexas.edu
UUCP: { ... }!cs.utexas.edu!cbwalton [I think]