[comp.ai.neural-nets] Input scaling alters my results; why?

ling@array.UUCP (Ling Guan) (08/11/90)

I am currently using NN to do recognition of objects in
xray images. The NN I use is a standard backpropagation net with
cumulative generalized delta training rule. I found that the
classification results depend on how the inputs are scaled. For
example, scaling the inputs to between -1 and 1 gives better
results than to between 0 and 1. I can't find any explanation for
this outcome. Anybody can give me comments or explanations?

Thanks in advance.

Ling

al@gmdzi.UUCP (Alexander Linden) (08/13/90)

In article <473@array.UUCP>, ling@array.UUCP (Ling Guan) writes:
> ... scaling the inputs to between -1 and 1 gives better
> results than to between 0 and 1. I can't find any explanation for
> this outcome. Anybody can give me comments or explanations?

I see the main reason for quicker convergence in the fact that weights
going from input units away can be updated twice as often. This is
because in $$w_ji = w_ji +\eta * \delta_j * a_i $$ the factor $a_i$
will have an effect on learning. When you use sparse-coding with mane
zeros this factor will be zero most of the time. But if you use -1
instead of 0, on each update each weight can learn.

Another thing is of course that you alter semantics of activations.
-1 has the opposite effect to +1 while 0 will have no effect. This
semantic seems in many cases more plausible.

Alexander Linden                    | TEL. (49 or 0) 2241/14-2537
Research Group for Adaptive Systems | FAX. (49 or 0) 2241/14-2618 or -2889
GMD                                 | TELEX 889469 gmd d
P. O. BOX 1240                      |            /  al@gmdzi.uucp
D-5205 St. Augustin 1               |     e-mail<   al@zi.gmd.dbp.de
Federal Republic of Germany         |            \  unido!gmdzi!al@uunet.uu.net
-------------------------------------------------------------------------------

bill@wayback.unm.edu (william horne) (08/14/90)

In article <473@array.UUCP> ling@array.UUCP (Ling Guan) writes:
>I am currently using NN to do recognition of objects in
>xray images. The NN I use is a standard backpropagation net with
>cumulative generalized delta training rule. I found that the
>classification results depend on how the inputs are scaled. For
>example, scaling the inputs to between -1 and 1 gives better
>results than to between 0 and 1. I can't find any explanation for
>this outcome. Anybody can give me comments or explanations?
>

We have found by looking at 3 dimensional plots of the error
surface that if the data is not centered about the origin, then the
there is often a steep "wall" where the weight trajectory bounces
off of during learning.  However, when the data is centered about
the origin the walls become less steep, and the surface is easier
to search.  This may imply that all input data should be normalized
to be centered about the origin.

-bill

kingsley@hpwrce.HP.COM (Kingsley Morse) (08/14/90)

I remember reading a paper years ago on symetrically scaling activities between
-1 and 1 instead of 0 to 1. Unfortunately, I don't remember for sure where I 
saw it, but it may have been in the proceedings of the IEEE 1st conference
on Neural Networks. 

grange@brillig.cs.umd.edu (Granger Sutton) (08/15/90)

I looked at this phenomenon briefly a while ago. In addition to what other
people have already said the lengths of the input vectors (the activations
of the input nodes when viewed as a vector) seemed to have an impact on
learning. Specifically, if all the input vectors are the same length learning
times seem to be faster and more uniform (smaller variance). For binary
input vectors -1,1 coding gives you input vectors that are all the same
length.

Granger Sutton
grange@brillig.cs.umd.edu

al@gtx.com (Alan Filipski) (08/16/90)

In article <3430004@hpwrce.HP.COM> kingsley@hpwrce.HP.COM (Kingsley Morse) writes:
->I remember reading a paper years ago on symetrically scaling activities between
->-1 and 1 instead of 0 to 1. Unfortunately, I don't remember for sure where I 
->saw it, but it may have been in the proceedings of the IEEE 1st conference
->on Neural Networks. 


Yes, it is a paper by Stornetta & Huberman in the Proceedings of the
First International Conference on Neural Networks.


  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ( Alan Filipski, GTX Corp, 8836 N. 23rd Avenue, Phoenix, Arizona 85021, USA )
 ( {decvax,hplabs,uunet!amdahl,nsc}!sun!sunburn!gtx!al         (602)870-1696 )
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~