esrmm@warwick.ac.uk (Denis Anthony) (05/23/91)
I attended a seminar yesterday, at which it was stated that adding 0.1 to the logistic function in back prop speeds up learning by 50% (in one application anyway). If this a known phenomenum, and if so is there any reason for it ? Denis
smagt@fwi.uva.nl (Patrick van der Smagt) (05/23/91)
esrmm@warwick.ac.uk (Denis Anthony) writes: >I attended a seminar yesterday, at which it was stated that adding 0.1 >to the logistic function in back prop speeds up learning by 50% (in one >application anyway). >If this a known phenomenum, and if so is there any reason for it ? Well, maybe this is not so hard to explain. When initial weights are very small the input to each hidden unit (i.e., the parameter of the logistic function) is situated around 0. Then the behaviour of the network is almost linear, since around 0 the logisitic function is almost linear. The network will then not be able to solve a non-linear problem with linear hidden units, and the weights will tend to 0. Adding a small value to the input of the hidden unit will, of course, shift its value to a less linear region, and thus the initial phase of training will be faster. Patrick van der Smagt /\/\ \ / Organisation: Faculty of Mathematics & Computer Science / \ University of Amsterdam, Kruislaan 403, _ \/\/ _ NL-1098 SJ Amsterdam, The Netherlands | | | | Phone: +31 20 525 7524 | | /\/\ | | Fax: +31 20 525 7490 | | \ / | | | | / \ | | email: smagt@fwi.uva.nl | | \/\/ | | | \______/ | \________/ /\/\ ``The opinions expressed herein are the author's only and do \ / not necessarily reflect those of the University of Amsterdam.'' / \ \/\/
rr2p+@andrew.cmu.edu (Richard Dale Romero) (05/24/91)
in response to patrick's statement about pushing the logistic towards a more non-linear section, i think he was slightly off about what the .1 was being added to. it would make more sense to add .1 to the output of the logistic, not it's input. the bias parameter takes care of any movements along the logistic curve that you need to make. a possible reason as to why adding this .1 would speed up learning has been brought up before on this group, i believe, or something along those lines. the two things i do remember are subtracting .5 from the logistic so that it is centered around 0, or adding 0.1 to the sigmoid prime function. both are talked about in fahlman's 'empirical study of learning speed in back- propagation networks', cmu-cs-88-162. the reason for adding .1 to the sig- prime function is to avoid letting it go to 0 when the input is at an extreme. the symmetric sigmoid is talked about in stometta and huberman's 'an improved three-layer back-prop algorithm' in proceedings of the ieee international conference on neural networks, pages 637-644, 1987. -rick
len@retina.mqcs.mq.oz.au (Len Hamey) (05/24/91)
In article <1991May23.141446.28619@fwi.uva.nl> smagt@fwi.uva.nl (Patrick van der Smagt) writes: >esrmm@warwick.ac.uk (Denis Anthony) writes: > >>I attended a seminar yesterday, at which it was stated that adding 0.1 >>to the logistic function in back prop speeds up learning by 50% (in one >>application anyway). > >>If this a known phenomenum, and if so is there any reason for it ? > >Well, maybe this is not so hard to explain. When initial weights are Adding 0.1 to the logistic function is discussed in Fahlman's paper: An Empirical Study of Learning Speed in Back-Propagation Networks. It is available from neuroprose. Len Hamey len@retina.mqcs.mq.oz.au Lecturer in Computing Macquarie University