[comp.ai.neural-nets] Back-Propagation Weight Initialization

news@helens.Stanford.EDU (news) (12/08/90)

I have a paper titled "Improving the learning speed of two-layer
networks by choosing the initial values of the adaptive weights" in
the proceedings of the IJCNN, June 1990, San Diego, page III-21.  The
co-author is Dr. B. Widrow.  The paper is applicable to networks with one
hidden layer (such networks have been proven to be universal approximator,
giving a sufficient number of hidden units.)  It describes a technique that 
divides the input space into small regions and assign each unit in the
hidden layer to a region.  The algorithm itself is fairly simple.
(I get a factor of 4 or 5 improvement in learning time over random
initial weights.)
Derrick Nguyen

egel@neural.dynas.se (Peter Egelberg) (12/11/90)

In article <1325@helens.Stanford.EDU> news@helens.Stanford.EDU (news) writes:

>I have a paper titled "Improving the learning speed of two-layer
>networks by choosing the initial values of the adaptive weights" in
>the proceedings of the IJCNN, June 1990, San Diego, page III-21.
> .
> .
> .
>(I get a factor of 4 or 5 improvement in learning time over random
>initial weights.)

The improvement in learning speed sounds fine. But what about generalization.
Does weight initialization improve generalization?

In most applications learning time is not a major problem, the end user
is not going to train the network. I don't mind waiting if I know that I'll
get a network that solves my problem. Generally I think there is too much
focus on learning speeds. When neural networks move to hardware learning
speed is not going to be a problem. But generalization will still be a problem!
I'm not saying that learning speed is unimportant. I'm saying that
generalization is a greater problem when using neural networks in real world
applications.

Thanks,
Peter Egelberg
-- 
Peter Egelberg			E-mail:	egel@neural.dynas.se
Neural AB			Phone:	+46 46 11 00 90
Otto Lindbladsv. 5
223 65 LUND, SWEDEN

pluto@beowulf.ucsd.edu (Mark Plutowski) (12/12/90)

egel@neural.dynas.se (Peter Egelberg) writes:

>The improvement in learning speed sounds fine. But what about generalization.
>Does weight initialization improve generalization?
   			. . .
>I'm not saying that learning speed is unimportant. I'm saying that
>generalization is a greater problem when using neural networks in real world
>applications.

>-- 
>Peter Egelberg			E-mail:	egel@neural.dynas.se
>Neural AB			Phone:	+46 46 11 00 90
>Otto Lindbladsv. 5
>223 65 LUND, SWEDEN

True, many people are more concerned with the quality of the 
fit (that is, the accuracy of generalization, in your terms)
than with learning time.  

IMHO, it is probably the case that the initial choice of weights 
has a significant effect upon the quality of the fit achieved 
for a particular learning run, unless we use a learning rule 
which is insensitive to the initial parameterization of the 
network function.  Gradient descent is not such a learning rule,
in and of itself;  if complemented with a global search mechanism
for searching the parameter space, (say, via genetic algorithms or 
even a grid search) it can be.  

-=-=
M.E. Plutowski,  pluto%cs@ucsd.edu 
UCSD,  Computer Science and Engineering 0114
La Jolla, California 92093-0114