georgiou@rex.cs.tulane.edu (George Georgiou) (04/03/91)
For those who worked with Back-Propagation: Have you notice any chaotic behavior in the graph of the (usual) error function vs epochs? Specifically, during the first 2 of 3 epochs the value of the error would jump all over the place, but afterwords becomes smooth. Only once I saw this behavior in the literature. It was in a graph in a paper in a respected publication, but it was ignored. Is this symptomatic of gradient descent procedures? I'll appreciate comments. ------------------------- 1) Alternative title of this posting: The first few moments of the Big Bang. 2) Seen on a bumper sticker (in reference to the earth and the environment): "Think globally, act locally." It would make a nice motto of some Neural Net society. ------------------------ George Georgiou georgiou@rex.cs.tulane.edu Computer Science Department +---------------------------+ Tulane University | Fiat Lux | New Orleans, LA 70118 +---------------------------+
greenba@gambia.crd.ge.com (ben a green) (04/04/91)
In article <6882@rex.cs.tulane.edu> georgiou@rex.cs.tulane.edu (George Georgiou) writes:
For those who worked with Back-Propagation: Have you notice any
chaotic behavior in the graph of the (usual) error function vs epochs?
Specifically, during the first 2 of 3 epochs the value of the error
would jump all over the place, but afterwords becomes smooth.
Only once I saw this behavior in the literature. It was in a graph in
a paper in a respected publication, but it was ignored.
Is this symptomatic of gradient descent procedures?
It is not characteristic of all gradient descent procedures, but it is
characteristic of a common back-prop procedure of updating weights
before collecting errors on the whole training set.
And it is characteristic of the usual back-prop technique of using a
constant learning rate.
To get a numerically stable procedure, collect errors over the whole
training set , compute the gradient direction of the error in weight
space, and do a line search along that line to find a minimum.
Ben
--
Ben A. Green, Jr.
greenba@crd.ge.com
Speaking only for myself, of course.
sfp@mars.ornl.gov (Phil Spelt) (04/04/91)
In article <6882@rex.cs.tulane.edu> georgiou@rex.cs.tulane.edu (George Georgiou) writes: >For those who worked with Back-Propagation: Have you notice any >chaotic behavior in the graph of the (usual) error function vs epochs? >Specifically, during the first 2 of 3 epochs the value of the error >would jump all over the place, but afterwords becomes smooth. > >Only once I saw this behavior in the literature. It was in a graph in >a paper in a respected publication, but it was ignored. > >Is this symptomatic of gradient descent procedures? > >George Georgiou georgiou@rex.cs.tulane.edu >Computer Science Department +---------------------------+ >Tulane University | Fiat Lux | >New Orleans, LA 70118 +---------------------------+ I cite the following: "Chaos and the Step-Size Dilemma in the Back-Prop Learning Algorithm" by R. Chris Lacher and Michael E. Manausa, Dept Comp. Sci., Fla. State Univ. in: Proceedings of the Second Workshop on Nerual Networks: Academic/ Industrial/Defense, WNN-AIND 91; held at Auburn University, AL, 11-13 Feb, 1991. It makes for VERY interesting reading! ============================================================================= MIND. A mysterious form of matter secreted by the brain. Its chief activity consists in the endeavor to asscertain its own nature, the futility of the attempt being due to the fact that it has nothing but itself to know itself with. -- Ambrose Bierce ============================================================================= Phil Spelt, Cognitive Systems & Human Factors Group sfp@epm.ornl.gov ============================================================================ Any opinions expressed or implied are my own, IF I choose to own up to them. ============================================================================
kolen-j@retina.cis.ohio-state.edu (john kolen) (04/05/91)
For those who worked with Back-Propagation: Have you notice any chaotic behavior in the graph of the (usual) error function vs epochs? Specifically, during the first 2 of 3 epochs the value of the error would jump all over the place, but afterwords becomes smooth. This is phenomena arise from an interaction between the shape of the error function and the initial weight selection. The error surface near the origin is relatively bumpy, giving rise to the "chaotic" appearence of the error measure. This can be attributed to the relatively large step sizes (to the bumps) taken by backprop as it traverses these divets in error space. As back-propagtion continues to change the weights, there is a tendency for these weights to move away from the origin into a relatively smoother region (wrt to step size). The same sort of behavior can be seen when bp is started in a region of weight space where small changes can make a drastic change in network functionality. To see this phenomena in action see J. Kolen and J. Pollack. (1990) Backpropagation is Sensitive to Initial Conditions. Complex Systems, 4:269-280. -- John Kolen (kolen-j@cis.ohio-state.edu)|computer science - n. A field of study Laboratory for AI Research |somewhere between numerology and The Ohio State Univeristy |astrology, lacking the formalism of the Columbus, Ohio 43210 (USA) |former and the popularity of the latter
uh311ae@sunmanager.lrz-muenchen.de (Henrik Klagges) (04/08/91)
Having a gradient search minimizer jumping around a little bit during start up is absolutely normal - you should not give a damn about it. You should rather give a damn that it STOPS jumping later, because it usually is just heading straight into a local minimum ... Cheers ! Rick@vee.lrz-muenchen.de Henrik Klagges, STM group at U of Munich, FRG