zt@reef.cis.ufl.edu (Zaiyong Tang) (12/31/90)
Any one knows any proof of Back-Propagation convergence? Hecht-Nielsen has a proof for batch (epoch) training. Convergence is achieved when the sample size goes to infinity. (See Neurocomputing by Hecht-Nielsen, Addison-Wesley Publishing Company, p.133) He mentioned that Morris Hirsch provided similar proof for interactive (pattern) training. Could some one let me know Dr. Hirsch's email address, and/or point to me some related references? Your help is highly appreciated. -Zaiyong Tang zt@beach.cis.ufl.edu
tang@math.ufl.edu (Zaiyong Tang) (01/13/91)
In article <26115@uflorida.cis.ufl.EDU> zt@reef.cis.ufl.edu (Zaiyong Tang) writes: >Any one knows any proof of Back-Propagation convergence? >Hecht-Nielsen has a proof for batch (epoch) training. Convergence is >achieved when the sample size goes to infinity. (See Neurocomputing >by Hecht-Nielsen, Addison-Wesley Publishing Company, p.133) He mentioned >that Morris Hirsch provided similar proof for interactive (pattern) training. >Could some one let me know Dr. Hirsch's email address, and/or point to me some >related references? >Your help is highly appreciated. Dr. Hirsch did not prove the BP convergenvce for pattern training (adusting the weights after each pattern, which is the most commonly used approach, I believe). But he suggested: "In this case there is a general averaging theorem which implies that if the learning rate is small enough, then the orbits of this procedure will approximate the orbits of the gradient of the global error function, or in other words, ajusting the weights after all patterns have been input. This result is used very frequently in neural netw, but is hardly ever proved." Back-Prop with pattern training is NOT a gradient descent algorithm, because the search in the error surface is NOT guided by the global error function. Many people say that if the learning rate is small enough, BP can at least give you a local minimum, which is not necessarily true here, unless the convergence is proven. I'd like to hear any info. about the proof of BP with pattern training. If there is such a proof, please direct me to the reference; and if there is no such a proof, can you give me a reason why the algorithm (BP) often works in practice? Thanks in advance. -Zaiyong