al@gtx.com (Alan Filipski) (12/08/89)
Suppose a net has been trained (e.g., using backprop) on several thousand samples and a few more samples come along. One could retrain the net using all (several thousand + few) samples, or one could somehow modify the weights using only the few new samples.The risk of the latter approach is that the gain in learning about the few new samples may be outweighed by degraded performance on the old samples. The advantage, of course, is that all the old samples do not have to be retained. To look at a simple analog, suppose we have an estimate of the mean of 1000 numbers, and then 3 more numbers come in. it is easy to compute the mean of all 1003 numbers withous seeing the first 1000 again-- their mean, along with the count of how many there were, provides a sufficient statistic for updating the mean. Does anyone know of any work on updating neural nets to account for new samples without completely retraining them? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ( Alan Filipski, GTX Corp, 8836 N. 23rd Avenue, Phoenix, Arizona 85021, USA ) ( {decvax,hplabs,uunet!amdahl,nsc}!sun!sunburn!gtx!al (602)870-1696 ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
danforth@riacs.edu (Douglas G. Danforth) (12/09/89)
]>From: al@gtx.com (Alan Filipski) ]Newsgroups: comp.ai.neural-nets ]Subject: updating neural nets ]Message-ID: <1153@gtx.com> ]Date: 7 Dec 89 20:34:13 GMT ]Reply-To: al@gtx.UUCP (Alan Filipski) ]Organization: GTX Corporation, Phoenix ]Lines: 23 ] ]Suppose a net has been trained (e.g., using backprop) on several ]thousand samples and a few more samples come along. One could retrain ]the net using all (several thousand + few) samples, or one could ]somehow modify the weights using only the few new samples.The risk ]of the latter approach is that the gain in learning about the ]few new samples may be outweighed by degraded performance on the old ]samples. The advantage, of course, is that all the old samples do not ]have to be retained. ] ]To look at a simple analog, suppose we have an estimate of the mean of ]1000 numbers, and then 3 more numbers come in. it is easy to compute ]the mean of all 1003 numbers withous seeing the first 1000 again-- ]their mean, along with the count of how many there were, provides a ]sufficient statistic for updating the mean. ] ]Does anyone know of any work on updating neural nets to account for new ]samples without completely retraining them? ] ] ] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ] ( Alan Filipski, GTX Corp, 8836 N. 23rd Avenue, Phoenix, Arizona 85021, USA ) ] ( {decvax,hplabs,uunet!amdahl,nsc}!sun!sunburn!gtx!al (602)870-1696 ) ] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ] Consider a 3 layer network where the weights between layers 1 & 2 (input to hidden) are frozen (this is the case for certain forms of associative memory) then all of the subsequent learning takes place in the adjustment of the hidden to output weights. If the task at hand is one of CLASSIFICATION, then either a 1-out-of-n or orthogonal encoding of the class labels can be used. For any input pattern a subset of the hidden nodes will be activated (assuming a step function or almost step function for the activation rule). The code for the class label written into each of these activated hidden nodes (increment or decrement the weights leading to the activated hidden node commensurate with the label encoding scheme) will then simply be a COUNT of the number of times that class label occured with the activation of that hidden node. With the above view we see that hidden nodes can be interpreted as LOCAL estimators of the conditional probability density for each class where we are conditionalizing on some region of the input space. The pooled and summed weights from the activated nodes for a given input pattern then give the best estimate of which class is dominant in that region of the input space. This is very close to a form of Bayes estimation. Note that with a fixed increment or decrement scheme for the weights during training that the order of presentation does not matter so that only frequency counts of class labels are being retained. The problem of destroying the state of the system by further training is avoided by the above approach and is governed by the same conditions that govern statistical sampling (large samples give more robust estimators). Systems that use approaches similar to this can be found in the work of James Albus (CMAC), Pentti Kanerva (Sparse Distributed Memory) and is related to some aspects of the Cerebellum as pointed out by David Marr. Douglas Danforth danforth@riacs.edu -- Douglas G. Danforth danforth@riacs.edu NASA Ames Research Center Moffett Field, CA 94035
chrisley@kanga.uucp (Ron Chrisley UNTIL 10/3/88) (12/09/89)
From a previous message: Does anyone know of any work on updating neural nets to account for new samples without completely retraining them? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ( Alan Filipski, GTX Corp, 8836 N. 23rd Avenue, Phoenix, Arizona 85021, USA ) ( {decvax,hplabs,uunet!amdahl,nsc}!sun!sunburn!gtx!al (602)870-1696 ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ______________________________________________ Yes. This was one of the things Geoff Hinton was trying to do with fast and slow weights. I think he has a paper "Using Fast and Slow Weights to De-blur Old Memories" that appeared in the Proceedings of the Cog Sci Society '87. Is there a better reference for this paper? Ron Chrisley chrisley.pa@xerox.com