cbwalton@cs.utexas.edu (Chris Walton) (10/03/88)
Summary: Hardware and Software redundancy For a summary of the design goals that led to the the Shuttle having 5 computers, as well as a detailed account of the bug that delayed STS-1, see "the 'Bug' Heard 'Round the World" by John R. Garman in Software Engineering Notes vol. 6, no. 5 (October 1981). At the time Garman wrote this article he was Deputy Chief of the Spacecraft Software Division at Johnson Space Center. Here is a slightly condensed version of Garman's explanation of why at least four (4) computers are needed in order to satisfy a general shuttle goal of "Fail Operational - Fail Safe" ("FO/FS") ... most components are replicated 4-deep ... Four is the magic number for a very logical and intuitively obvious reason: FO/FS requires full operational capability after one failure, and a safe return capability after a second. It takes three to vote - so it initially takes four to still be able to vote after the first failure. With 4 computers, we now have good protection against *hardware* failure. But all four have the same software (programming). If there is a catastophic logical error (bug) in that software, it could halt all four computers simultaneously and transform the shuttle into an inert object. To guard against this contingency, a different program was written by a separate contractor (but for the same hardware). This program is designed in such a way that it will not be corrupted by bugs in the primary software; if a Shuttle crew suspects software failure they can activate this backup system. The alternate software resides on the fifth computer; it offers *software* redundancy. Non CS-types may find Garman's account a bit technical, but it does give a good illustration of the pitfalls in writing and testing a large real-time software system. While much of NASA's current equipment may by 'old-fashioned', it has been in use long enough to make one confident that any remaining bugs are minor. There's no substitute for testing a complete hardware-software system under real-world conditions. Chris Walton Dept. of Computer Sciences University of Texas at Austin Internet: cbwalton@cs.utexas.edu UUCP: { ... }!cs.utexas.edu!cbwalton [I think]