[comp.ai.neural-nets] Backprop Training

dfausett@zach.fit.edu ( Donald W. Fausett) (08/13/90)

	The reason that bipolar (-1,+1) data is better for training than
binary (0,1) data is that no learning occurs on a connection when its input
signal is zero.  It is easy to see the reason for this.  During the
backpropagation phase, the delta error term for a unit is multiplied by
the input signal to that unit in order to compute the update for the
weight on that connection. If the input signal is zero, then the weight
update is zero => the value of the weight does not change => no learning
occurs.  When using backpropagation, it is always better to convert binary
input patterns to bipolar form before training the network.

pjhamvs@cs.vu.nl (Summeren van Peter) (08/14/90)

In article <1331@winnie.fit.edu>,
	dfausett@zach.fit.edu ( Donald W. Fausett) writes:
> If the input signal is zero, then the weight
> update is zero => the value of the weight does not change => no learning
> occurs.  When using backpropagation, it is always better to convert binary
> input patterns to bipolar form before training the network.

Except when zero means unknown
Greetings

eeandrew@cybaswan.UUCP (e c andrews) (08/14/90)

In article <1331@winnie.fit.edu> dfausett@zach.fit.edu ( Donald W. Fausett) writes:
>	The reason that bipolar (-1,+1) data is better for training than
>binary (0,1) data is that no learning occurs on a connection when its input
>signal is zero.  It is easy to see the reason for this.  During the
>backpropagation phase, the delta error term for a unit is multiplied by
>the input signal to that unit in order to compute the update for the
>weight on that connection......

Also, the magnitude of the value can have an influence on your
training: if you non-linearities go 0->1 but your inputs are 0->100 (or
-50->+50) then the weight space is warped so that adaption will occur
mainly in the first layer of weights.

I must admit that in my work (speech processing) we haven't seen much
difference between performance using either mono- or bi-polar
non-linearities, but my inputs are reals -1->+1 so I use that for my
thresholding function. I haven't seen anything published either way.

Eddy Andrews.