news@helens.Stanford.EDU (news) (12/08/90)
I have a paper titled "Improving the learning speed of two-layer networks by choosing the initial values of the adaptive weights" in the proceedings of the IJCNN, June 1990, San Diego, page III-21. The co-author is Dr. B. Widrow. The paper is applicable to networks with one hidden layer (such networks have been proven to be universal approximator, giving a sufficient number of hidden units.) It describes a technique that divides the input space into small regions and assign each unit in the hidden layer to a region. The algorithm itself is fairly simple. (I get a factor of 4 or 5 improvement in learning time over random initial weights.) Derrick Nguyen
egel@neural.dynas.se (Peter Egelberg) (12/11/90)
In article <1325@helens.Stanford.EDU> news@helens.Stanford.EDU (news) writes: >I have a paper titled "Improving the learning speed of two-layer >networks by choosing the initial values of the adaptive weights" in >the proceedings of the IJCNN, June 1990, San Diego, page III-21. > . > . > . >(I get a factor of 4 or 5 improvement in learning time over random >initial weights.) The improvement in learning speed sounds fine. But what about generalization. Does weight initialization improve generalization? In most applications learning time is not a major problem, the end user is not going to train the network. I don't mind waiting if I know that I'll get a network that solves my problem. Generally I think there is too much focus on learning speeds. When neural networks move to hardware learning speed is not going to be a problem. But generalization will still be a problem! I'm not saying that learning speed is unimportant. I'm saying that generalization is a greater problem when using neural networks in real world applications. Thanks, Peter Egelberg -- Peter Egelberg E-mail: egel@neural.dynas.se Neural AB Phone: +46 46 11 00 90 Otto Lindbladsv. 5 223 65 LUND, SWEDEN
pluto@beowulf.ucsd.edu (Mark Plutowski) (12/12/90)
egel@neural.dynas.se (Peter Egelberg) writes: >The improvement in learning speed sounds fine. But what about generalization. >Does weight initialization improve generalization? . . . >I'm not saying that learning speed is unimportant. I'm saying that >generalization is a greater problem when using neural networks in real world >applications. >-- >Peter Egelberg E-mail: egel@neural.dynas.se >Neural AB Phone: +46 46 11 00 90 >Otto Lindbladsv. 5 >223 65 LUND, SWEDEN True, many people are more concerned with the quality of the fit (that is, the accuracy of generalization, in your terms) than with learning time. IMHO, it is probably the case that the initial choice of weights has a significant effect upon the quality of the fit achieved for a particular learning run, unless we use a learning rule which is insensitive to the initial parameterization of the network function. Gradient descent is not such a learning rule, in and of itself; if complemented with a global search mechanism for searching the parameter space, (say, via genetic algorithms or even a grid search) it can be. -=-= M.E. Plutowski, pluto%cs@ucsd.edu UCSD, Computer Science and Engineering 0114 La Jolla, California 92093-0114