[comp.ai.neural-nets] Backprop Weight Initialization

sfp@mars.ornl.gov (Phil Spelt) (12/07/90)

I remember seeing a reference to an aritcle which dealt with the effects
of random initialization of weights in a backporpagation network.  I believe
the article reported that it *does* make a difference how the weight matrix 
is initialized.

If anyone knows what this reference is (or any others addressing this problem,
i would appreciate hearing from you, either by email or posting to this BB.

Thanks, Phil Spelt

kolen-j@retina.cis.ohio-state.edu (john kolen) (12/07/90)

In article <1990Dec6.161422.5314@cs.utk.edu> sfp@mars.ornl.gov (Phil Spelt) writes:

   I remember seeing a reference to an aritcle which dealt with the effects
   of random initialization of weights in a backporpagation network.  I believe
   the article reported that it *does* make a difference how the weight matrix 
   is initialized.


Jordan Pollack and I have a paper appearing in the current issue of Complex
Systems (Vol 4, Num 3) titled, "Backpropagation is Sensitive to Initial
Conditions" in which we demonstrate how small changes in the initial weights
can have dramatic differences in convergence time and final solution weights.
The take-home message was: save your initial weights if you want your work
replicated.

John Kolen
==========================================================================

--
John Kolen (kolen-j@cis.ohio-state.edu)|computer science - n. A field of study
Laboratory for AI Research             |somewhere between numerology and
The Ohio State Univeristy	       |astrology, lacking the formalism of the
Columbus, Ohio	43210	(USA)	       |former and the popularity of the latter

bellido@aragorn.world (Ignacio Bellido) (12/07/90)

I saw that poster also in last NIPS, but I don't believe this is very
important, I think all of us who work with backpropagation have these
experience and have noted this defect (yes, it's not just a virtue). My
own simulator keeps track of the random seed used to initialize each
process.

What I think is really important is to find some way on wich
backpropagation can be made almost independent of this initial
configuration and reduce the number of epochs to the least possible.
This can be done, I have found a way to reduce the number of epochs by
changing the learning rate and momentum factor (other two variables that
changes the network behavior).

How final weights are placed is not really important if the network
realizes the function it has been trained to. This is only important if
you want to extract knowledge of the network and this is really
difficult with backpropagation.

Ignacio Bellido
--
--------------------------------------------------------------------------
Ignacio Bellido Fernandez-Montes
-1z
Visiting Scholar at		
Stanford University			e-mail: bellido@psych.stanford.edu
Psychology Department

Graduate Student
Madrid University of Technology
Department of Telematic Engineering	e-mail: ibellido@dit.upm.es
--------------------------------------------------------------------------

markh@csd4.csd.uwm.edu (Mark William Hopkins) (12/07/90)

In article <BELLIDO.90Dec6135649@aragorn.world> bellido@psych.stanford.edu writes:
>
>What I think is really important is to find some way on wich
>backpropagation can be made almost independent of this initial
>configuration...

I know this may not sound like much, but one sure bet is to initialize the
weights close to their final values... :)

Other than that, it really sounds like an impossible problem.  Think of what
would happen if the error function looked like a lunar surface with billions
of craters deep and shallow all over the place.

Doing backpropagation on it would be like trying to find a real 'deep' crater
on the moon by riding a lunar rover constantly downhill from where the lunar
lander set down.  You really have to be near the crater to find it, else it
could be on the other size of the moon...

ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards) (12/07/90)

>In article <BELLIDO.90Dec6135649@aragorn.world> bellido@psych.stanford.edu writes:

>>What I think is really important is to find some way on wich
>>backpropagation can be made almost independent of this initial
>>configuration...

Any parallel, non-linear system will probably be chaotic with 
respect to its initial values.  However, methods which reduce the
total learning time may make such differences in learning time less
important.  I am curious how independed Cascade-Correlation
learning time is to initial weight conditions (although that is
a tougher question since you are continually adding new tiers of
initial weight all the time).

-Tom

mderksen@sci.kun.nl (M. Derksen) (12/08/90)

> I remember seeing a reference to an aritcle which dealt with the effects
> of random initialization of weights in a backporpagation network.  I believe
> the article reported that it *does* make a difference how the weight matrix 
> is initialized.

> If anyone knows what this reference is (or any others addressing this problem,
> i would appreciate hearing from you, either by email or posting to this BB.

> Thanks, Phil Spelt

Look for example in:

  D.G. Lee, Jr., Prelimininary results of applying neural networks to ship
  image recognition. Proc. Int`l. Joint Conf. on Neural Networks, 
  Washington D.C., II:576 (1989).

Marco Derksen.

###############################################################################
#                                                                             #
#     University of Nijmegen                  Ing. M.W.J. Derksen             #
#     Laboratory for Analytical Chemistry     Tel: 080-653158                 #
#     Faculty of Science                      Fax: 080-652653                 #
#     Toernooiveld 1                          Telex: 48228 wina nl            #
#     6525 ED Nijmegen, the Netherlands       E-mail: mderksen@sci.kun.nl     #
#                                                                             #
###############################################################################

berg@cs.albany.edu (George Berg) (12/08/90)

In article <BELLIDO.90Dec6135649@aragorn.world> bellido@psych.stanford.edu
writes: [In response to John Kolen's posting]

>I saw that poster [Kolen and Pollack: "Back-propagation is Sensitive to
> Initial Conditions"] also in last NIPS, but I don't believe this is very
>important, I think all of us who work with backpropagation have these
>experience and have noted this defect (yes, it's not just a virtue). My
>own simulator keeps track of the random seed used to initialize each
>process.

  It may not fit your agenda, but a blanket dismissal of this work is utterly
inappropriate. Since backpropagation is a widely-used technique, the *fact*
that it is very sensitive to slight changes in initial conditions *is*
important. Whether or not you view this property of bp as a "defect" or
"virtue" is irrelevant.

                                               G.B.

-------------------------------------------------------------------------------
| George Berg        | Computer Science Dept.    |     "No one owes you;      |
| berg@cs.albany.edu | SUNY at Albany, LI 67A    |        You owe you."       |
| (518) 442 4267     | Albany, NY 12222 USA      |                            |
-------------------------------------------------------------------------------

bellido@elrond.world (Ignacio Bellido) (12/08/90)

In article <268@daedalus.albany.edu> Berg@daedalus.albany.edu.UUCP
(George Berg) reply me:

>>I saw that poster [Kolen and Pollack: "Back-propagation is Sensitive to
>> Initial Conditions"] also in last NIPS, but I don't believe this is very
>>important, I think all of us who work with backpropagation have these

>  It may not fit your agenda, but a blanket dismissal of this work is utterly
>inappropriate. Since backpropagation is a widely-used technique, the *fact*

Ok, may be I wrote something wrong. This job [Kolen and Pollack:
"Back-propagation is Sensitive to Initial Conditions"] IS important.
What I was trying to say is that this is obbious to anyone who has
worked with backpropagation, or just studied it.

Also markh@csd4.csd.uwm.edu (Mark William Hopkins) says:

>I know this may not sound like much, but one sure bet is to initialize the
>weights close to their final values... :)
>
>Other than that, it really sounds like an impossible problem.  Think of what
>would happen if the error function looked like a lunar surface with billions
>of craters deep and shallow all over the place.
>
>Doing backpropagation on it would be like trying to find a real 'deep' crater
>on the moon by riding a lunar rover constantly downhill from where the lunar
>lander set down.  You really have to be near the crater to find it, else it
>could be on the other size of the moon...

I like this analogy about the moon surface. More than that, its
worse because we search on a different moon eachtime we begin a search.
But I believe that if we try to find the nearest point to our objetive
to land, we can not do that unless we have a global view of our space,
and that we are trying to do is to find a way without this kind of
global view, that is, we are searching in a local environment (we have
little elements looking only their own space).

How can we alwais find the right hole with only a local view? Thats the
question, I'd like to have the answer but I'm sorry, I haven't. My point
of view is that if you find a local hole, what you have to do is to
change the space and pray to fall better the next time. Another question
is how to know (locally) that you are into a local minima.

					Ignacio
--
--------------------------------------------------------------------------
Ignacio Bellido Fernandez-Montes
-1z
Visiting Scholar at		
Stanford University			e-mail: bellido@psych.stanford.edu
Psychology Department

Graduate Student
Madrid University of Technology
Department of Telematic Engineering	e-mail: ibellido@dit.upm.es
--------------------------------------------------------------------------

manjunat@aludra.usc.edu (bsm) (12/09/90)

In article <BELLIDO.90Dec7195531@elrond.world> bellido@psych.stanford.edu writes:
 >
 >
 >Also markh@csd4.csd.uwm.edu (Mark William Hopkins) says:
 >
 >>I know this may not sound like much, but one sure bet is to initialize the
 >>weights close to their final values... :)
 >>
 >>Other than that, it really sounds like an impossible problem.  Think of what
 >>would happen if the error function looked like a lunar surface with billions
 >>of craters deep and shallow all over the place.
 >>
 >>Doing backpropagation on it would be like trying to find a real 'deep' crater
 >>on the moon by riding a lunar rover constantly downhill from where the lunar
 >>lander set down.  You really have to be near the crater to find it, else it
 >>could be on the other size of the moon...
 >
 >I like this analogy about the moon surface. More than that, its
 >worse because we search on a different moon eachtime we begin a search.
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 No ! We start at a different location on the SAME moon (unless ofcourse
  you start on a different problem (meaning, a different surface, different moon).

 >
 >					Ignacio
 >--

 Manjunath

lhamey@sunb.mqcc.mq.oz.au (Len Hamey) (12/14/90)

In article <BELLIDO.90Dec6135649@aragorn.world> bellido@psych.stanford.edu
writes: [In response to John Kolen's posting]
>
>I saw that poster [Kolen and Pollack: "Back-propagation is Sensitive to
> Initial Conditions"] also in last NIPS, but I don't believe this is very
>important, I think all of us who work with backpropagation have these

If you have not seen Kolen and Pollack's paper, then you should have a look
at it (fetch it from cheops).  Everyone knows that BP is sensitive to
the starting point, but Kolen and Pollack show that the sensitivity has
an unexpected Fractal nature.  There are some really pretty pictures to
look at too.  It is not fair to dismiss a piece of work based only on a
casual view of the title (although one is often forced to choose which
sessions to attend at a conference on that basis alone  :-(  ).

Disclaimer:  I have no relationship with Kolen or Pollack except
as a satisfied reader :-).