[comp.ai.neural-nets] Backprop with additional noisy inputfeature

mderksen@sci.kun.nl (M. Derksen) (05/14/91)

Dear neural netters,

Can someone explain me why an additional noisy inputfeature results in a
better prediction performance (generalization). 

I've add an extra input unit to a multilayer feedforward network. 

1. In the training phase, a noisy signal (gaussian distribution with 
   zero mean and variable standard deviation) is feed into the extra
   input unit. 

2. In the test phase, a signal value of zero is feed into this unit. 

I've done some experiments and received a significant better prediction 
performance. 

I came to this 'improvement' a few months ago, when there was a discussion 
on the net about an additional noisy vector to the patterns in the 
trainingset.

Marco.

###############################################################################
#                                                                             #
#     Catholic University of Nijmegen         Ing. M.W.J. Derksen             #
#     Laboratory for Analytical Chemistry     Tel: 080-653158                 #
#     Faculty of Science                      Fax: 080-652653                 #
#     Toernooiveld 1                          Telex: 48228 wina nl            #
#     6525 ED Nijmegen, the Netherlands       E-mail: mderksen@sci.kun.nl     #
#                                                                             #
###############################################################################

dyes@convex.convex.COM (Tim Dyes) (05/15/91)

In article <3559@wn1.sci.kun.nl>, mderksen@sci.kun.nl (M. Derksen) writes:
|> Dear neural netters,
|> 
|> Can someone explain me why an additional noisy inputfeature results in a
|> better prediction performance (generalization). 
|> 

Try this on for size...
Adding a noisy input feature in effect adds a noise term to the input of each
node within the 1st hidden layer.  Successful training forces the network 
to adjust its weights to overcome this noise level.  That will result in there
being more distance between the 1st hidden layer's output vectors that
correspond to different network outputs.  This in effect raises the 
discrimination capability of the 3rd layer to the 2nd layer's output, and so
allows it to more correctly classify an input pattern.

- Tim Dyes

jones@nprdc.navy.mil (David Ryan-Jones) (05/15/91)

In article <1991May14.182932.21193@convex.com> dyes@convex.convex.COM (Tim Dyes) writes:
>In article <3559@wn1.sci.kun.nl>, mderksen@sci.kun.nl (M. Derksen) writes:
>|> Dear neural netters,
>|> 
>|> Can someone explain me why an additional noisy inputfeature results in a
>|> better prediction performance (generalization). 
>|> 
>
>Try this on for size...
>Adding a noisy input feature in effect adds a noise term to the input of each
>node within the 1st hidden layer.  Successful training forces the network 
>to adjust its weights to overcome this noise level.  That will result in there
>being more distance between the 1st hidden layer's output vectors that
>correspond to different network outputs.  This in effect raises the 
>discrimination capability of the 3rd layer to the 2nd layer's output, and so
>allows it to more correctly classify an input pattern.
>
>- Tim Dyes

The effect of adding noise to the input upon generalization is very 
interesting.  I have noticed with my own research that I can improve 
generalization to the test set by about 10% if I add a very small amount 
of gaussian noise to each of the input variables.  Has anyone else reading 
this newsgroup been able to improve generalization to a greater degree 
(say 30-40%) by this technique?  Does this technique work by reducing the 
degree to which the network "learns" the data in the training set instead 
of the underlying relationship between input and output?  Are there any 
studies that have been published as tech reports or journal articles which 
have investigated this effect. 

Thanks, 

David Ryan-Jones

arms@cs.UAlberta.CA (Bill Armstrong) (05/16/91)

In article <3559@wn1.sci.kun.nl>, mderksen@sci.kun.nl (M. Derksen) writes:
> Dear neural netters,
> 
> Can someone explain me why an additional noisy inputfeature results in a
> better prediction performance (generalization). 
> 

I think the effect of adding noise to the input variables is really
not an effect on the learning system, but is rather just changing the
training set.  The new one has some points in it obtained by
"extrapolating" from a given input vector to one near it (the "noisy"
one).  The original vector and the noisy one are given the same output.

From there, one can modify the procedure to be even more general: look
at small groups of training points in close proximity and generate a
new vector by linear regression, for example. One could also perform a
K-nearest-neighbors (KNN) classification and use that to train the
network.

The above all generate new, better training sets.  In the case of using
KNN, the work of creating a good decision boundary is thus removed from
the network, which just has to learn the training data well.  The network
is still useful in feedforward mode because of its high speed.
--
***************************************************
Prof. William W. Armstrong, Computing Science Dept.
University of Alberta; Edmonton, Alberta, Canada T6G 2H1
arms@cs.ualberta.ca Tel(403)492 2374 FAX 492 1071

jdm5548@tamsun.tamu.edu (James Darrell McCauley) (05/21/91)

In article <arms.674351516@spedden>, arms@cs.UAlberta.CA (Bill Armstrong) writes:
[stuff deleted]
|> The above all generate new, better training sets.  In the case of using
|> KNN, the work of creating a good decision boundary is thus removed from
|> the network, which just has to learn the training data well.  

forgive my ignorance (maybe this is a stupid question), but
if I'm willing to sacrifice the ability to generalize, what methods
are available to pre-process training data to bring similar inputs
closer together.  I've tried to use bp for what I thought would
be a simple classification, but it seems that often the data was 
"too noisy" and I struggled with convergence.  (I'm expecting the 
end-use data to be less noisy/more consistent)
-- 
James Darrell McCauley, Grad Res Asst, Spatial Analysis Lab 
Dept of Ag Engr, Texas A&M Univ, College Station, TX 77843-2117, USA
(jdm5548@diamond.tamu.edu, jdm5548@tamagen.bitnet)