dfausett@zach.fit.edu ( Donald W. Fausett) (08/13/90)
The reason that bipolar (-1,+1) data is better for training than binary (0,1) data is that no learning occurs on a connection when its input signal is zero. It is easy to see the reason for this. During the backpropagation phase, the delta error term for a unit is multiplied by the input signal to that unit in order to compute the update for the weight on that connection. If the input signal is zero, then the weight update is zero => the value of the weight does not change => no learning occurs. When using backpropagation, it is always better to convert binary input patterns to bipolar form before training the network.
pjhamvs@cs.vu.nl (Summeren van Peter) (08/14/90)
In article <1331@winnie.fit.edu>, dfausett@zach.fit.edu ( Donald W. Fausett) writes: > If the input signal is zero, then the weight > update is zero => the value of the weight does not change => no learning > occurs. When using backpropagation, it is always better to convert binary > input patterns to bipolar form before training the network. Except when zero means unknown Greetings
eeandrew@cybaswan.UUCP (e c andrews) (08/14/90)
In article <1331@winnie.fit.edu> dfausett@zach.fit.edu ( Donald W. Fausett) writes: > The reason that bipolar (-1,+1) data is better for training than >binary (0,1) data is that no learning occurs on a connection when its input >signal is zero. It is easy to see the reason for this. During the >backpropagation phase, the delta error term for a unit is multiplied by >the input signal to that unit in order to compute the update for the >weight on that connection...... Also, the magnitude of the value can have an influence on your training: if you non-linearities go 0->1 but your inputs are 0->100 (or -50->+50) then the weight space is warped so that adaption will occur mainly in the first layer of weights. I must admit that in my work (speech processing) we haven't seen much difference between performance using either mono- or bi-polar non-linearities, but my inputs are reals -1->+1 so I use that for my thresholding function. I haven't seen anything published either way. Eddy Andrews.