[net.space] Chicago Tribune Article

gd%sri-spam@sri-unix.UUCP (06/29/84)

From:  gd@sri-spam (Greg DesBrisay)




	Today's Chicago Tribune (Tues., 6/26) has an article
	about the replacement of Discovery's backup computer, 
	described as costing $1.2 million, and weighing 57
	tons.  No wonder it won't fly.



I remember seeing news pictures of a technician picking up and carrying
a box that the reporter said was the backup computer that failed on the
first launch try; it sure didn't look like 57 tons!

By the way, does the Chicago Tribune mean to imply that NASA is going to
throw that 1.2-million dollar computer on the scrap heap?!  Even the
government tries to fix such expensive pieces of equipment and use them
again, don't they?


Greg DesBrisay
gd@sri-spam

lmc@denelcor.UUCP (Lyle McElhaney) (07/10/84)

No, the computer will not go into the scrap heap (yet, anyway). Much more
important (at least in the NASA way of doing things) is to discover exactly
WHY the computer failed. It will most probably be sent back to the
manufacturer to be tested to death in order to find out whether the fault
was a fluke (and why) or whether it is a design error. Unresolved failures
in a piece of man-rated hardware are very serious. If the problem is found
to be a single error in that computer alone, and can be fixed without
stressing other components, and the computer can be requalified, then it
may join the available spare parts list. Testing may well stress it beyond
its useful life, and it will be scrapped, but not before the secrets of
its problems have been squeezed out of it.

This is one of the major reasons why the exploration of space has been so
expensive. In a commercial operation, that computer would be tested, and if
it failed, it would be scrapped and replaced. Another $1.2 million burned.
NASA will more than likely expend far more than the $1.2 million tracing
the error. The largest part of that cost per computer, in the first place,
is the paperwork that the manufacturer has to generate on each piece of
hardware in order that a fault can be traced to the exact step where it
was introduced. Yes, this probably kept a few astronauts from being killed
in space; but it has made space exploration a very expensive proposition,
indulged in only by governments.
-- 
		Lyle McElhaney
		(hao,brl-bmd,nbires,csu-cs,scgvaxd)!denelcor!lmc

al@ames.UUCP (Al Globus) (07/13/84)

No, the computer will not go into the scrap heap (yet, anyway). Much more
>>>>>>>>>>>>>>>>>>>>>>>>>>>
important (at least in the NASA way of doing things) is to discover exactly
WHY the computer failed. It will most probably be sent back to the
manufacturer to be tested to death in order to find out whether the fault
was a fluke (and why) or whether it is a design error....

This is one of the major reasons why the exploration of space has been so
expensive. In a commercial operation, that computer would be tested, and if
it failed, it would be scrapped and replaced. Another $1.2 million burned.
NASA will more than likely expend far more than the $1.2 million tracing
the error. The largest part of that cost per computer, in the first place,
is the paperwork that the manufacturer has to generate on each piece of
hardware in order that a fault can be traced to the exact step where it
was introduced. Yes, this probably kept a few astronauts from being killed
in space; but it has made space exploration a very expensive proposition,
indulged in only by governments.
-- 
		Lyle McElhaney
>>>>>>>>>>>>>>>>>>>>>>>>>>>

As a recent spate of in flight failures
has shown, extreme caution is needed in space to make things work.  The 
margins for error are tiny and the consequences of mistakes in the hundred
million dollar range or more.  Insurance money for spacecraft is drying up
and getting very expensive due to failures by PAM-D, Ariane and other upper
stages.  NASA is extremely careful because that is what it takes to make
spacecraft work.  Even the vast documentation requirements failed to note
a critical pin on Solar-Max that almost caused the mission to fail.  The
paper work could be replaced by computer work at lower cost and greater
reliability, but leaving out the tests and documentation is asking for
megabucks down the tubes.

henry@utzoo.UUCP (Henry Spencer) (07/17/84)

> As a recent spate of in flight failures
> has shown, extreme caution is needed in space to make things work.  The 
> margins for error are tiny and the consequences of mistakes in the hundred
> million dollar range or more.  Insurance money for spacecraft is drying up
> and getting very expensive due to failures by PAM-D, Ariane and other upper
> stages.  NASA is extremely careful because that is what it takes to make
> spacecraft work.  Even the vast documentation requirements failed to note
> a critical pin on Solar-Max that almost caused the mission to fail.  The
> paper work could be replaced by computer work at lower cost and greater
> reliability, but leaving out the tests and documentation is asking for
> megabucks down the tubes.

As several projects have demonstrated, vast documentation systems are *not*
necessary for the (rare) projects that are run *right*.

A good example of this is the SR-71 Blackbird.  It's still the world's
fastest aircraft (if you don't count the Shuttle's brief reentry), and
25 years ago it was a formidable challenge.  New ground had to be broken
in a dozen areas, including metallurgy.  [I mention this because tracking
every last piece of metal is one of the reasons frequently advanced for
needing bales of paper for everything.]  Nevertheless, it got by with
several orders of magnitude less documentation than "ordinary" aircraft
projects needed, even then.  "Do not confuse effort with work."

The basic problem with the space business right now is not the lack of
still-more-detailed documentation.  It is the "everything is required to
work right the first time" attitude.  Now, don't get me wrong.  There is
nothing wrong with "we will do our best to make sure it works the first
time"; it's definitely the only way to go.  The problem is when you
start insisting that failures are not just undesirable, but unacceptable.
This means that it is impossible to do meaningful experiments, because
they might fail.  *OF COURSE* it is expensive to build, say, a Space
Shuttle, when the roof falls in if the tiniest thing goes wrong.  How
many aircraft are required to be perfect after only a handful of test
flights?  Yet the Shuttle program not only organized things this way,
it based the whole viability of the program on the notion that the
Shuttle would be fully operational almost instantly.  This is madness,
and awesomely expensive madness too.

Even in military aircraft programs, not noted for being well-managed,
it's common for the first dozen aircraft to be allocated solely to test
work, with no expectation that they will ever be useful otherwise.
Where are the test shuttles?  Please don't tell me that the orbiters
are too expensive to be used this way; this is known as "painting
yourself into a corner", and does not connote good design to me.  In
retrospect, it is clear that the Shuttle was too ambitious a project
trying to meet too many needs simultaneously.  The US would be much better
off with a large fleet of much smaller reusable spacecraft, plus big
expendable boosters for heavy-lift work.  Oh, true, the heavy-lift jobs
ought to be done with reusables, too -- EVENTUALLY.  But one must learn
to crawl before one can walk, and NASA is now paying the price for trying
to take shortcuts.  "Of course it'll work."  Sure.

Of course "extreme caution is needed... to make things work", of course
"margins for error are tiny", of course the consequences of mistakes are
severe -- because the whole system is organized on the assumption that
mistakes will never happen!  The margins for error should never have been
allowed to get that small, because Murphy's Law really does apply here,
as everywhere else.  "Even the vast documentation ... failed to note a
critical pin on Solar-Max that almost caused the mission to fail...",
and as we all know, it's a good thing for the Shuttle's credibility
that the Solar Max repair worked.  This sort of cliffhanger should not
be allowed to happen.  It's a travesty to design a spacecraft to be
repaired in-orbit by the shuttle and then forget to include an emergency
de-spin system, which would permit the thing to be despun for repair
in the presence of attitude-control failure.  It's ridiculous to set up
a repair mission which cannot adapt to the smallest problem.  My
understanding was that the docking failure was because of a spike of
fiberglass sticking up; why didn't the astronaut have clippers on hand
for coping with such things?  (Yes, I know, because the spacesuits
are too clumsy for such fine work in tricky conditions... please don't
set me off about the wretched misdesign of current spacesuits...)  It's
a credit to the cleverness of the astronauts and the people on the ground
that they managed successful completion of such a zero-defects mission
after the inevitable defects showed up.

I hope the rescue mission for the PAMmed satellites is indeed mounted.
It would be another small step towards a system that is somewhat tolerant
of unexpected difficulties.  Unless, of course, the mission is a failure
because NASA, once again, assumes that the plan is perfect and nothing
will go wrong...

I realize that I am, to some degree, slandering NASA unfairly.  They do
put a lot of attention into contingency plans and such.  But this is all
to meet *expected* troubles; building in enough flexibility to meet the
*unexpected* problems is a subtly different thing.  Sometimes NASA
pulls this off, sometimes not.  It was fortunate for the Apollo 13 crew
that some smart people insisted on making the LM computer identical to
the CM one, rather than specializing it for the lunar landing only.  It
was a potentially-disastrous inconvenience to them that nobody thought to
apply the same philosophy to the lithium-hydroxide air-purifier cartridges;
fortunately they managed to improvise around that one.

This same phenomenon has been noted in other contexts, notably military
aircraft projects:  lots of attention to known problem areas, but a firm
subconscious assumption that everything else will work, because it's
required to.  The only real solution to this is a firm emphasis on getting
real working hardware -- not computerized guesswork and theoretical
pontifications -- going *early*, so that the inevitable mistakes can be
found and fixed.  Testing must be thorough, and must be done on whole
systems, not just components!  The tests, and preferably the operational
service thereafter, must not be structured on the assumption that there
will be no failures:  failure-tolerance must be built into the plans,
not just the hardware.  Note that this implies designing the whole system
so that a single failure is neither disastrous nor astronomically expensive.
(I don't even want to *think* about the results of a Shuttle crashing.)
Everyone, especially Congress and the media, should be clearly told that
trouble is expected and is not cause for panic.  ["You say your program
still needs debugging, because you didn't write it correctly the very
first time?  Unacceptable.  You're fired."]

I know, it's easier said than done.  Especially for a US government
bureaucracy.  Best argument I've heard yet for private industry in space...
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

mcgeer%ucbkim%Berkeley@sri-unix.UUCP (07/19/84)

From:  Rick McGeer (on an h19-u) <mcgeer%ucbkim@Berkeley>

	Damn right you're being unfair.  Remember the flap in the late
fifties and early sixties when NASA did have a series of tests and did have
busted launches?  There was a howl from the public and the Congress that
could be heard from Moscow.  Given that, and the Mondale/Proxmire bills in
the seventies to kill NASA, it's not surprising that NASA feels that it
can't test and can't have any failures -- the lawyers in Congress, who don't
understand engineering design and don't want to, would cut funds in a minute
if, say, an orbiter blew up on the pad.

	Also, isn't it always the case that prototypes are more expensive
than the production version?

				Rick.