mderksen@sci.kun.nl (M. Derksen) (05/14/91)
Dear neural netters, Can someone explain me why an additional noisy inputfeature results in a better prediction performance (generalization). I've add an extra input unit to a multilayer feedforward network. 1. In the training phase, a noisy signal (gaussian distribution with zero mean and variable standard deviation) is feed into the extra input unit. 2. In the test phase, a signal value of zero is feed into this unit. I've done some experiments and received a significant better prediction performance. I came to this 'improvement' a few months ago, when there was a discussion on the net about an additional noisy vector to the patterns in the trainingset. Marco. ############################################################################### # # # Catholic University of Nijmegen Ing. M.W.J. Derksen # # Laboratory for Analytical Chemistry Tel: 080-653158 # # Faculty of Science Fax: 080-652653 # # Toernooiveld 1 Telex: 48228 wina nl # # 6525 ED Nijmegen, the Netherlands E-mail: mderksen@sci.kun.nl # # # ###############################################################################
dyes@convex.convex.COM (Tim Dyes) (05/15/91)
In article <3559@wn1.sci.kun.nl>, mderksen@sci.kun.nl (M. Derksen) writes: |> Dear neural netters, |> |> Can someone explain me why an additional noisy inputfeature results in a |> better prediction performance (generalization). |> Try this on for size... Adding a noisy input feature in effect adds a noise term to the input of each node within the 1st hidden layer. Successful training forces the network to adjust its weights to overcome this noise level. That will result in there being more distance between the 1st hidden layer's output vectors that correspond to different network outputs. This in effect raises the discrimination capability of the 3rd layer to the 2nd layer's output, and so allows it to more correctly classify an input pattern. - Tim Dyes
jones@nprdc.navy.mil (David Ryan-Jones) (05/15/91)
In article <1991May14.182932.21193@convex.com> dyes@convex.convex.COM (Tim Dyes) writes: >In article <3559@wn1.sci.kun.nl>, mderksen@sci.kun.nl (M. Derksen) writes: >|> Dear neural netters, >|> >|> Can someone explain me why an additional noisy inputfeature results in a >|> better prediction performance (generalization). >|> > >Try this on for size... >Adding a noisy input feature in effect adds a noise term to the input of each >node within the 1st hidden layer. Successful training forces the network >to adjust its weights to overcome this noise level. That will result in there >being more distance between the 1st hidden layer's output vectors that >correspond to different network outputs. This in effect raises the >discrimination capability of the 3rd layer to the 2nd layer's output, and so >allows it to more correctly classify an input pattern. > >- Tim Dyes The effect of adding noise to the input upon generalization is very interesting. I have noticed with my own research that I can improve generalization to the test set by about 10% if I add a very small amount of gaussian noise to each of the input variables. Has anyone else reading this newsgroup been able to improve generalization to a greater degree (say 30-40%) by this technique? Does this technique work by reducing the degree to which the network "learns" the data in the training set instead of the underlying relationship between input and output? Are there any studies that have been published as tech reports or journal articles which have investigated this effect. Thanks, David Ryan-Jones
arms@cs.UAlberta.CA (Bill Armstrong) (05/16/91)
In article <3559@wn1.sci.kun.nl>, mderksen@sci.kun.nl (M. Derksen) writes: > Dear neural netters, > > Can someone explain me why an additional noisy inputfeature results in a > better prediction performance (generalization). > I think the effect of adding noise to the input variables is really not an effect on the learning system, but is rather just changing the training set. The new one has some points in it obtained by "extrapolating" from a given input vector to one near it (the "noisy" one). The original vector and the noisy one are given the same output. From there, one can modify the procedure to be even more general: look at small groups of training points in close proximity and generate a new vector by linear regression, for example. One could also perform a K-nearest-neighbors (KNN) classification and use that to train the network. The above all generate new, better training sets. In the case of using KNN, the work of creating a good decision boundary is thus removed from the network, which just has to learn the training data well. The network is still useful in feedforward mode because of its high speed. -- *************************************************** Prof. William W. Armstrong, Computing Science Dept. University of Alberta; Edmonton, Alberta, Canada T6G 2H1 arms@cs.ualberta.ca Tel(403)492 2374 FAX 492 1071
jdm5548@tamsun.tamu.edu (James Darrell McCauley) (05/21/91)
In article <arms.674351516@spedden>, arms@cs.UAlberta.CA (Bill Armstrong) writes: [stuff deleted] |> The above all generate new, better training sets. In the case of using |> KNN, the work of creating a good decision boundary is thus removed from |> the network, which just has to learn the training data well. forgive my ignorance (maybe this is a stupid question), but if I'm willing to sacrifice the ability to generalize, what methods are available to pre-process training data to bring similar inputs closer together. I've tried to use bp for what I thought would be a simple classification, but it seems that often the data was "too noisy" and I struggled with convergence. (I'm expecting the end-use data to be less noisy/more consistent) -- James Darrell McCauley, Grad Res Asst, Spatial Analysis Lab Dept of Ag Engr, Texas A&M Univ, College Station, TX 77843-2117, USA (jdm5548@diamond.tamu.edu, jdm5548@tamagen.bitnet)