[comp.ai.neural-nets] back propagation problems....

rho@maccs.dcss.mcmaster.ca (Raymond Ho) (11/10/89)

Mr. Lau,

	Have you received my message thru E-mail? I just wondering if I've
sent to the right address.

							Raymond Ho
							McMaster University

ps2x+@andrew.cmu.edu (Peter John Skelly) (11/10/89)

I preformed similar experiments and had similar problems with the
network forgetting previously learned data.  I was able to make it
perform differently(sometimes better, sometimes not) by changing the
transfer function.  I was at first using just a basic step function, but
better results were obtained by using a sigmoid, or a sigmoid
approximation.
-Pete Skelly

cyusta@taux01.UUCP ( Yuval Shahar) (11/12/89)

>-------------------------------------------------- Lik Alaric Lau writes:
>			              I have written a neural network simulatio
>n program in Pascal. This simulation uses back propagation as training algorith
>m. However, when the network is trained to recognize more than 1 training pairs
>, it tends to "forget" the previous training sets.

   This is something I am experiencing myself now. The problem seems to me to
arise from the nature of the gradient descent: the error terms are calculated
according to the gradient of the error function Ep(W), where W is the set of
weights, and Ep is the error for the presentation of the p'th exemplar to be
learned. The total error E for a set of P exemplars is therefore the sum(Ep) 
for p=1..P. In order to perform a gradient descent in E it is clear you may
not update the weights after each presentation. If you do you actually perform
a gradient descent for each Ep, and thus each exemplar will be learned but
forgotten as you perform the corrections for the next exemplar.
   PDP (chapter 8 I think) have commented that the weights may be changed
after each presentation if the learning factor, Miu, is small enough, as this
will be "close enough" to a gradient descent in E.
   The results I'm seeing are disappointing to me. I have tried updating the
weights after each presentation, after presenting a set of exemplars, and even
after each exemplar but with a delta-rule which is updated so that it still
performs a true gradient descent in E. The net does learn a set of exemplars
sometimes, but more often then not, it converges to a really bad local minimum
for a set (the equivalent of learning and forgetting), and I am not talking
about big sets here (actually, a set with more than one exemplar is enough :-)).
   Is this the true nature of backprop or is there more to this??