[comp.ai.neural-nets] Back-Propagation Convergence

zt@reef.cis.ufl.edu (Zaiyong Tang) (12/31/90)

Any one knows any proof of Back-Propagation convergence?
Hecht-Nielsen has a proof for batch (epoch) training. Convergence is 
achieved when the sample size goes to infinity. (See Neurocomputing 
by Hecht-Nielsen, Addison-Wesley Publishing Company, p.133) He mentioned 
that Morris Hirsch provided similar proof for interactive (pattern) training. 
Could some one let me know Dr. Hirsch's email address, and/or point to me some
related references?
Your help is highly appreciated.

-Zaiyong Tang
zt@beach.cis.ufl.edu

tang@math.ufl.edu (Zaiyong Tang) (01/13/91)

In article <26115@uflorida.cis.ufl.EDU> zt@reef.cis.ufl.edu (Zaiyong Tang) writes:
>Any one knows any proof of Back-Propagation convergence?
>Hecht-Nielsen has a proof for batch (epoch) training. Convergence is 
>achieved when the sample size goes to infinity. (See Neurocomputing 
>by Hecht-Nielsen, Addison-Wesley Publishing Company, p.133) He mentioned 
>that Morris Hirsch provided similar proof for interactive (pattern) training. 
>Could some one let me know Dr. Hirsch's email address, and/or point to me some
>related references?
>Your help is highly appreciated.

Dr. Hirsch did not prove the BP convergenvce for pattern training 
(adusting the weights after each pattern, which is the most commonly
used approach, I believe). But he suggested:

      "In this case there is a general averaging theorem which implies 
  that if the learning rate is small enough, then the orbits of this 
  procedure will approximate the orbits of the gradient of the global 
  error function, or in other words, ajusting the weights after all patterns 
  have been input.  
       This result is used very frequently in neural netw, but is hardly ever
  proved."

Back-Prop with pattern training is NOT a gradient descent algorithm, because
the search in the error surface is NOT guided by the global error function.
Many people say that if the learning rate is small enough, BP can at least
give you a local minimum, which is not necessarily true here, unless the
convergence is proven.
I'd like to hear any info. about the proof of BP with pattern training. If
there is such a proof, please direct me to the reference; and if there is
no such a proof, can you give me a reason why the algorithm (BP) often works
in practice?
Thanks in advance.

-Zaiyong