rho@maccs.dcss.mcmaster.ca (Raymond Ho) (11/10/89)
Mr. Lau, Have you received my message thru E-mail? I just wondering if I've sent to the right address. Raymond Ho McMaster University
ps2x+@andrew.cmu.edu (Peter John Skelly) (11/10/89)
I preformed similar experiments and had similar problems with the network forgetting previously learned data. I was able to make it perform differently(sometimes better, sometimes not) by changing the transfer function. I was at first using just a basic step function, but better results were obtained by using a sigmoid, or a sigmoid approximation. -Pete Skelly
cyusta@taux01.UUCP ( Yuval Shahar) (11/12/89)
>-------------------------------------------------- Lik Alaric Lau writes: > I have written a neural network simulatio >n program in Pascal. This simulation uses back propagation as training algorith >m. However, when the network is trained to recognize more than 1 training pairs >, it tends to "forget" the previous training sets. This is something I am experiencing myself now. The problem seems to me to arise from the nature of the gradient descent: the error terms are calculated according to the gradient of the error function Ep(W), where W is the set of weights, and Ep is the error for the presentation of the p'th exemplar to be learned. The total error E for a set of P exemplars is therefore the sum(Ep) for p=1..P. In order to perform a gradient descent in E it is clear you may not update the weights after each presentation. If you do you actually perform a gradient descent for each Ep, and thus each exemplar will be learned but forgotten as you perform the corrections for the next exemplar. PDP (chapter 8 I think) have commented that the weights may be changed after each presentation if the learning factor, Miu, is small enough, as this will be "close enough" to a gradient descent in E. The results I'm seeing are disappointing to me. I have tried updating the weights after each presentation, after presenting a set of exemplars, and even after each exemplar but with a delta-rule which is updated so that it still performs a true gradient descent in E. The net does learn a set of exemplars sometimes, but more often then not, it converges to a really bad local minimum for a set (the equivalent of learning and forgetting), and I am not talking about big sets here (actually, a set with more than one exemplar is enough :-)). Is this the true nature of backprop or is there more to this??