sfp@mars.ornl.gov (Phil Spelt) (12/07/90)
I remember seeing a reference to an aritcle which dealt with the effects of random initialization of weights in a backporpagation network. I believe the article reported that it *does* make a difference how the weight matrix is initialized. If anyone knows what this reference is (or any others addressing this problem, i would appreciate hearing from you, either by email or posting to this BB. Thanks, Phil Spelt
kolen-j@retina.cis.ohio-state.edu (john kolen) (12/07/90)
In article <1990Dec6.161422.5314@cs.utk.edu> sfp@mars.ornl.gov (Phil Spelt) writes:
I remember seeing a reference to an aritcle which dealt with the effects
of random initialization of weights in a backporpagation network. I believe
the article reported that it *does* make a difference how the weight matrix
is initialized.
Jordan Pollack and I have a paper appearing in the current issue of Complex
Systems (Vol 4, Num 3) titled, "Backpropagation is Sensitive to Initial
Conditions" in which we demonstrate how small changes in the initial weights
can have dramatic differences in convergence time and final solution weights.
The take-home message was: save your initial weights if you want your work
replicated.
John Kolen
==========================================================================
--
John Kolen (kolen-j@cis.ohio-state.edu)|computer science - n. A field of study
Laboratory for AI Research |somewhere between numerology and
The Ohio State Univeristy |astrology, lacking the formalism of the
Columbus, Ohio 43210 (USA) |former and the popularity of the latter
bellido@aragorn.world (Ignacio Bellido) (12/07/90)
I saw that poster also in last NIPS, but I don't believe this is very important, I think all of us who work with backpropagation have these experience and have noted this defect (yes, it's not just a virtue). My own simulator keeps track of the random seed used to initialize each process. What I think is really important is to find some way on wich backpropagation can be made almost independent of this initial configuration and reduce the number of epochs to the least possible. This can be done, I have found a way to reduce the number of epochs by changing the learning rate and momentum factor (other two variables that changes the network behavior). How final weights are placed is not really important if the network realizes the function it has been trained to. This is only important if you want to extract knowledge of the network and this is really difficult with backpropagation. Ignacio Bellido -- -------------------------------------------------------------------------- Ignacio Bellido Fernandez-Montes -1z Visiting Scholar at Stanford University e-mail: bellido@psych.stanford.edu Psychology Department Graduate Student Madrid University of Technology Department of Telematic Engineering e-mail: ibellido@dit.upm.es --------------------------------------------------------------------------
markh@csd4.csd.uwm.edu (Mark William Hopkins) (12/07/90)
In article <BELLIDO.90Dec6135649@aragorn.world> bellido@psych.stanford.edu writes: > >What I think is really important is to find some way on wich >backpropagation can be made almost independent of this initial >configuration... I know this may not sound like much, but one sure bet is to initialize the weights close to their final values... :) Other than that, it really sounds like an impossible problem. Think of what would happen if the error function looked like a lunar surface with billions of craters deep and shallow all over the place. Doing backpropagation on it would be like trying to find a real 'deep' crater on the moon by riding a lunar rover constantly downhill from where the lunar lander set down. You really have to be near the crater to find it, else it could be on the other size of the moon...
ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards) (12/07/90)
>In article <BELLIDO.90Dec6135649@aragorn.world> bellido@psych.stanford.edu writes: >>What I think is really important is to find some way on wich >>backpropagation can be made almost independent of this initial >>configuration... Any parallel, non-linear system will probably be chaotic with respect to its initial values. However, methods which reduce the total learning time may make such differences in learning time less important. I am curious how independed Cascade-Correlation learning time is to initial weight conditions (although that is a tougher question since you are continually adding new tiers of initial weight all the time). -Tom
mderksen@sci.kun.nl (M. Derksen) (12/08/90)
> I remember seeing a reference to an aritcle which dealt with the effects > of random initialization of weights in a backporpagation network. I believe > the article reported that it *does* make a difference how the weight matrix > is initialized. > If anyone knows what this reference is (or any others addressing this problem, > i would appreciate hearing from you, either by email or posting to this BB. > Thanks, Phil Spelt Look for example in: D.G. Lee, Jr., Prelimininary results of applying neural networks to ship image recognition. Proc. Int`l. Joint Conf. on Neural Networks, Washington D.C., II:576 (1989). Marco Derksen. ############################################################################### # # # University of Nijmegen Ing. M.W.J. Derksen # # Laboratory for Analytical Chemistry Tel: 080-653158 # # Faculty of Science Fax: 080-652653 # # Toernooiveld 1 Telex: 48228 wina nl # # 6525 ED Nijmegen, the Netherlands E-mail: mderksen@sci.kun.nl # # # ###############################################################################
berg@cs.albany.edu (George Berg) (12/08/90)
In article <BELLIDO.90Dec6135649@aragorn.world> bellido@psych.stanford.edu writes: [In response to John Kolen's posting] >I saw that poster [Kolen and Pollack: "Back-propagation is Sensitive to > Initial Conditions"] also in last NIPS, but I don't believe this is very >important, I think all of us who work with backpropagation have these >experience and have noted this defect (yes, it's not just a virtue). My >own simulator keeps track of the random seed used to initialize each >process. It may not fit your agenda, but a blanket dismissal of this work is utterly inappropriate. Since backpropagation is a widely-used technique, the *fact* that it is very sensitive to slight changes in initial conditions *is* important. Whether or not you view this property of bp as a "defect" or "virtue" is irrelevant. G.B. ------------------------------------------------------------------------------- | George Berg | Computer Science Dept. | "No one owes you; | | berg@cs.albany.edu | SUNY at Albany, LI 67A | You owe you." | | (518) 442 4267 | Albany, NY 12222 USA | | -------------------------------------------------------------------------------
bellido@elrond.world (Ignacio Bellido) (12/08/90)
In article <268@daedalus.albany.edu> Berg@daedalus.albany.edu.UUCP (George Berg) reply me: >>I saw that poster [Kolen and Pollack: "Back-propagation is Sensitive to >> Initial Conditions"] also in last NIPS, but I don't believe this is very >>important, I think all of us who work with backpropagation have these > It may not fit your agenda, but a blanket dismissal of this work is utterly >inappropriate. Since backpropagation is a widely-used technique, the *fact* Ok, may be I wrote something wrong. This job [Kolen and Pollack: "Back-propagation is Sensitive to Initial Conditions"] IS important. What I was trying to say is that this is obbious to anyone who has worked with backpropagation, or just studied it. Also markh@csd4.csd.uwm.edu (Mark William Hopkins) says: >I know this may not sound like much, but one sure bet is to initialize the >weights close to their final values... :) > >Other than that, it really sounds like an impossible problem. Think of what >would happen if the error function looked like a lunar surface with billions >of craters deep and shallow all over the place. > >Doing backpropagation on it would be like trying to find a real 'deep' crater >on the moon by riding a lunar rover constantly downhill from where the lunar >lander set down. You really have to be near the crater to find it, else it >could be on the other size of the moon... I like this analogy about the moon surface. More than that, its worse because we search on a different moon eachtime we begin a search. But I believe that if we try to find the nearest point to our objetive to land, we can not do that unless we have a global view of our space, and that we are trying to do is to find a way without this kind of global view, that is, we are searching in a local environment (we have little elements looking only their own space). How can we alwais find the right hole with only a local view? Thats the question, I'd like to have the answer but I'm sorry, I haven't. My point of view is that if you find a local hole, what you have to do is to change the space and pray to fall better the next time. Another question is how to know (locally) that you are into a local minima. Ignacio -- -------------------------------------------------------------------------- Ignacio Bellido Fernandez-Montes -1z Visiting Scholar at Stanford University e-mail: bellido@psych.stanford.edu Psychology Department Graduate Student Madrid University of Technology Department of Telematic Engineering e-mail: ibellido@dit.upm.es --------------------------------------------------------------------------
manjunat@aludra.usc.edu (bsm) (12/09/90)
In article <BELLIDO.90Dec7195531@elrond.world> bellido@psych.stanford.edu writes: > > >Also markh@csd4.csd.uwm.edu (Mark William Hopkins) says: > >>I know this may not sound like much, but one sure bet is to initialize the >>weights close to their final values... :) >> >>Other than that, it really sounds like an impossible problem. Think of what >>would happen if the error function looked like a lunar surface with billions >>of craters deep and shallow all over the place. >> >>Doing backpropagation on it would be like trying to find a real 'deep' crater >>on the moon by riding a lunar rover constantly downhill from where the lunar >>lander set down. You really have to be near the crater to find it, else it >>could be on the other size of the moon... > >I like this analogy about the moon surface. More than that, its >worse because we search on a different moon eachtime we begin a search. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ No ! We start at a different location on the SAME moon (unless ofcourse you start on a different problem (meaning, a different surface, different moon). > > Ignacio >-- Manjunath
lhamey@sunb.mqcc.mq.oz.au (Len Hamey) (12/14/90)
In article <BELLIDO.90Dec6135649@aragorn.world> bellido@psych.stanford.edu writes: [In response to John Kolen's posting] > >I saw that poster [Kolen and Pollack: "Back-propagation is Sensitive to > Initial Conditions"] also in last NIPS, but I don't believe this is very >important, I think all of us who work with backpropagation have these If you have not seen Kolen and Pollack's paper, then you should have a look at it (fetch it from cheops). Everyone knows that BP is sensitive to the starting point, but Kolen and Pollack show that the sensitivity has an unexpected Fractal nature. There are some really pretty pictures to look at too. It is not fair to dismiss a piece of work based only on a casual view of the title (although one is often forced to choose which sessions to attend at a conference on that basis alone :-( ). Disclaimer: I have no relationship with Kolen or Pollack except as a satisfied reader :-).