[comp.ai.neural-nets] Fault-Tolerance of NN

omlinc@cs.rpi.edu (Christian Omlin) (05/06/91)

Hi !

A few papers have appeared recently dealing with retraining (using
backpropagation) as a strategy by which feedforward NN's can recover
from faults such as neuron stuck-at faults. A few questions come to
my mind:
 
 1. Often, retraining a network is claimed to be easier (i.e. faster)
    than training the original, flawless network with small random initial
    weights. My experiments show that a network is not guaranteed to
    relearn the intended I/O mapping, i.e. it a network may get trapped
    in a local minimum. Is relearning inherently easier than learning
    assuming there are enough units in the hidden layer ?

2. Suppose we can retrain a network, we are not guaranteed that the
   network exhibits the same characteristics (e.g. generalization)
   which may have been one of the criteria during the design of the NN.
   Wouldn't it be more reasonable to detect structural damages of the
   NN before it is used in an application and repair the damage ? 
   (This would require some method for detecting such faults.)

3. Giving a NN a retraining capability, certainly requires
   additional hardware and information about the training set. How
   big is the additional cost of hardware of a NN with retraining
   capability as opposed to a non-retrainable NN ?

4. It seems fault-tolerance is not an inherent property of NN, rather
   they have to be designed with fault-tolerance in mind. There seem
   to be two possibilities for improving the fault-tolerant behavior:
   changes in the architecture and a changes in the training procedure.
   Which of the two is more effective ? 

Any comments are appreciated.

Christian

----------------------------------------------------------------------------
Christian W. Omlin			

office:                                 home:
Computer Science Department             Foxberry Farm
Amos Eaton 119                          Box 332, Route #3
Rensselaer Polytechnic Institute        Averill Park, NY 12018
Troy, NY 12180 USA                      (518) 766-5790
(518) 276-2930                        

e-mail: omlinc@turing.cs.rpi.edu
----------------------------------------------------------------------------

arms@cs.UAlberta.CA (Bill Armstrong) (05/07/91)

omlinc@cs.rpi.edu (Christian Omlin) writes:

>Hi !

>A few papers have appeared recently dealing with retraining (using
>backpropagation) as a strategy by which feedforward NN's can recover
>from faults such as neuron stuck-at faults. A few questions come to
>my mind:
> 
> 1. Often, retraining a network is claimed to be easier (i.e. faster)
>    than training the original, flawless network with small random initial
>    weights. My experiments show that a network is not guaranteed to
>    relearn the intended I/O mapping, i.e. it a network may get trapped
>    in a local minimum. Is relearning inherently easier than learning
>    assuming there are enough units in the hidden layer ?

I think one could answer the question in each instance by trying to
determine whether the damaged network even has the capacity for
repair.  Let us suppose that the stuck-at signal has to be eliminated
(since it is now useless), then the question is: can some other input
replace it?  If there are several inputs feeding into the same
elements as it does, then maybe some of them have almost zero weight,
and could be recruited to perform the correction.  To do that, they
would have to already be connected at least indirectly to the network
inputs that the damaged part needs.  That seems like a fairly easy way
to correct.

Now, if there were only two inputs to the node the damaged one feeds
into, then the task which was previously done by two inputs must be
done by one.  That one may not even have connections to the right
inputs.  Hence the correction would likely have to come at a higher
level in the tree.  But then, the orginal stuck-at signal will have
passed through a sigmoid in a weighted combination with other inputs,
so the correction is no longer a "linear" matter, where one just
subtracts the erroneous signal and adds a correct one.  In fact, if
the stuck-at value shifts the value going into a sigmoid far away from
the centre, ALL of the inputs to that sigmoid will have a much smaller
dynamic range, and hence themselves would have to be "corrected"!  So in
this case, the correction might require a complete reorganization of
the computation.

>2. Suppose we can retrain a network, we are not guaranteed that the
>   network exhibits the same characteristics (e.g. generalization)
>   which may have been one of the criteria during the design of the NN.
>   Wouldn't it be more reasonable to detect structural damages of the
>   NN before it is used in an application and repair the damage ? 
>   (This would require some method for detecting such faults.)

The generalization could be approached by looking at the product of
the weights and derivatives of sigmoids through which the stuck-at
signal passes.  The greater this product, the greater is the
distortion of the network output caused by the stuck-at fault.  Hence,
I think you are right about trying to detect structural damage before
use.

>3. Giving a NN a retraining capability, certainly requires
>   additional hardware and information about the training set. How
>   big is the additional cost of hardware of a NN with retraining
>   capability as opposed to a non-retrainable NN ?

It would appear from the above, that having a lot of redundant
connections could make retraining easier.  Unfortunately, this might
not lead to as good generalization in the first place!  If the weights
of the connections are not zero, then this would also slow down the
network in its undamaged state.  So it is not only the cost of
hardware, but the damage to the speed of execution.

>4. It seems fault-tolerance is not an inherent property of NN, rather
>   they have to be designed with fault-tolerance in mind. There seem
>   to be two possibilities for improving the fault-tolerant behavior:
>   changes in the architecture and a changes in the training procedure.
>   Which of the two is more effective ? 

It seems to me that fault-tolerance is about the same as insensitivity
to input data which contains a small number of values that are way out
of line.  In the case of NN, a value with is way out of line can still
have an effect because of large weights along paths to the output.

I can contrast that with the adaptive logical networks which I work
on: a stuck-at logical signal x either gets through an AND gate (or an
OR gate) or it doesn't.  For example if a different input y to the
said AND gate is 0, then the signal x will have absolutely no effect
on the output of the AND, which is still 0.

If one could limit the size of the weights in a
backpropagation-trained network, then it is possible one could get
some fault tolerance, and if one could use instead of sigmoids,
functions that are constant outside the mid-range, then that would
localize the damage.

>Any comments are appreciated.

A lot of the above is conjecture, not based on experience.  But I
think it tends to help explain your experience.

>Christian

>----------------------------------------------------------------------------
>Christian W. Omlin			

>office:                                 home:
>Computer Science Department             Foxberry Farm
>Amos Eaton 119                          Box 332, Route #3
>Rensselaer Polytechnic Institute        Averill Park, NY 12018
>Troy, NY 12180 USA                      (518) 766-5790
>(518) 276-2930                        

>e-mail: omlinc@turing.cs.rpi.edu
>----------------------------------------------------------------------------
--
***************************************************
Prof. William W. Armstrong, Computing Science Dept.
University of Alberta; Edmonton, Alberta, Canada T6G 2H1
arms@cs.ualberta.ca Tel(403)492 2374 FAX 492 1071

aam9n@helga2.acc.Virginia.EDU (Ali Ahmad Minai) (05/07/91)

In article <lx6gjll@rpi.edu> omlinc@cs.rpi.edu (Christian Omlin) writes:
>Hi !
>
>A few papers have appeared recently dealing with retraining (using
>backpropagation) as a strategy by which feedforward NN's can recover
>from faults such as neuron stuck-at faults. A few questions come to
>my mind:
> 
> 1. Often, retraining a network is claimed to be easier (i.e. faster)
>    than training the original, flawless network with small random initial
>    weights. My experiments show that a network is not guaranteed to
>    relearn the intended I/O mapping, i.e. it a network may get trapped
>    in a local minimum. Is relearning inherently easier than learning
>    assuming there are enough units in the hidden layer ?

I have not done any experiments, but it seems to me that relearning should
not *always* be easier after neuron faults. Intuitively, it is a question
of how much difference the fault makes to the error surface. Now, this will
vary depending on how close the faulty neuron is to the output. If it is in
the last hidden layer, it will typically have a more direct (and probably
greater) impact on the error surface (which is defined in terms of network
outputs). If the faulty neuron is in a lower layer, the effect of its failure
will be statistically averaged out to some extent, especially if the network
is large. Even if the neuron does belong to the last hidden layer, the impact
of its failure will depend on its "relevance" to the output layer, which, to
some extent, is a direct function of the magnitude of weights linking it to
the output neurons (scaled by the links from other neurons in the last hidden
layer). If the failed neuron was strongly linked to the network outputs, its
failure will cause a great change in the error surface (somewhat analogous to
a "flattening out" along all directions corresponding to the input weights
of the failed neuron). While this might create local minima, it is likelier
to turn minima into ravines (or so it seems to me, because now changing some
of the weights has no impact on the error).

>2. Suppose we can retrain a network, we are not guaranteed that the
>   network exhibits the same characteristics (e.g. generalization)
>   which may have been one of the criteria during the design of the NN.
>   Wouldn't it be more reasonable to detect structural damages of the
>   NN before it is used in an application and repair the damage ? 
>   (This would require some method for detecting such faults.)

In a fault-tolerance sense, it would indeed be desirable to locate and repair
the fault. However, one of the reasons why neural nets are so attractive is
the promise of *immunity* to errors in the presence of faults, i.e. the
system can continue to function adequately even with undetected, unrepaired
faults. And, of course, as you have larger and larger networks, detecting and
repairing faults will become trickier (though the impact of individual faults
will also diminish). There has been some interesting work on networks that
automatically detect faults and reconfigure. Offhand, I can refer you to two
papers:

"Fault-Tolerant Neural Networks with Hybrid Redundancy", by Lon-Chan Chu
and B.W. Wah, IJCNN-90, San Diego, vol II, pp 639-649

"Trellis Codes, receptive Fields, and Fault-Tolerant, Self-Repairing Neural
Networks", by T. Petsche and B.W. Dickinson, IEEE Trans. on Neural Networks,
vol. 1, no. 2, pp 154-166

I'm sure there are other papers too, and I would certainly appreciate any
references etc. that others might have.

>3. Giving a NN a retraining capability, certainly requires
>   additional hardware and information about the training set. How
>   big is the additional cost of hardware of a NN with retraining
>   capability as opposed to a non-retrainable NN ?

I'll leave that to others who have thought about the problem, but ideally,
retraining should only require more time. The data, presumably, is either
stored or continually coming in.

>4. It seems fault-tolerance is not an inherent property of NN, rather
>   they have to be designed with fault-tolerance in mind. There seem
>   to be two possibilities for improving the fault-tolerant behavior:
>   changes in the architecture and a changes in the training procedure.
>   Which of the two is more effective ? 

I would say: modifying training methods to force the emergence of robust
architectures. I have some half-baked ideas along these lines, but would
like to try them out before opening my mouth. Briefly, I think that networks
with well-distributed representational responsibility should be more
inherently fault-tolerant than those with localized responsibility. We
can add constraints during the training process that forces the emergence
of distributed representations, thus achieving greater fault-tolerance.
Also, adding faults during learning could help, as evidenced by:

"Fault-Tolerance in Artificial Neural Networks", by Carlo H. Sequin and
Reed D. Clay, IJCNN-90, San Diego, vol. I, pp 703-708.

Regards,

Ali Minai
aam9n@Virginia.EDU