[comp.ai.neural-nets] The first few epochs in BP

georgiou@rex.cs.tulane.edu (George Georgiou) (04/03/91)

For those who worked with Back-Propagation: Have you notice any
chaotic behavior in the graph of the (usual) error function vs epochs?
Specifically, during the first 2 of 3 epochs the value of the error
would jump all over the place, but afterwords becomes smooth.

Only once I saw this behavior in the literature. It was in a graph in
a paper in a respected publication, but it was ignored.

Is this symptomatic of gradient descent procedures?

I'll appreciate comments.

-------------------------
1) Alternative title of this posting: The first few moments of the Big
Bang.

2) Seen on a bumper sticker (in reference to the earth and the
environment): "Think globally, act locally." It would make a nice
motto of some Neural Net society.
------------------------

George Georgiou                       georgiou@rex.cs.tulane.edu
Computer Science Department           +---------------------------+
Tulane University                     |       Fiat Lux            |
New Orleans, LA 70118                 +---------------------------+

greenba@gambia.crd.ge.com (ben a green) (04/04/91)

In article <6882@rex.cs.tulane.edu> georgiou@rex.cs.tulane.edu (George Georgiou) writes:

   For those who worked with Back-Propagation: Have you notice any
   chaotic behavior in the graph of the (usual) error function vs epochs?
   Specifically, during the first 2 of 3 epochs the value of the error
   would jump all over the place, but afterwords becomes smooth.

   Only once I saw this behavior in the literature. It was in a graph in
   a paper in a respected publication, but it was ignored.

   Is this symptomatic of gradient descent procedures?

It is not characteristic of all gradient descent procedures, but it is
characteristic of a common back-prop procedure of updating weights
before collecting errors on the whole training set.

And it is characteristic of the usual back-prop technique of using a
constant learning rate.

To get a numerically stable procedure, collect errors over the whole
training set , compute the gradient direction of the error in weight
space, and do a line search along that line to find a minimum.

Ben

--
Ben A. Green, Jr.              
greenba@crd.ge.com
  Speaking only for myself, of course.

sfp@mars.ornl.gov (Phil Spelt) (04/04/91)

In article <6882@rex.cs.tulane.edu> georgiou@rex.cs.tulane.edu (George Georgiou) writes:
>For those who worked with Back-Propagation: Have you notice any
>chaotic behavior in the graph of the (usual) error function vs epochs?
>Specifically, during the first 2 of 3 epochs the value of the error
>would jump all over the place, but afterwords becomes smooth.
>
>Only once I saw this behavior in the literature. It was in a graph in
>a paper in a respected publication, but it was ignored.
>
>Is this symptomatic of gradient descent procedures?
>
>George Georgiou                       georgiou@rex.cs.tulane.edu
>Computer Science Department           +---------------------------+
>Tulane University                     |       Fiat Lux            |
>New Orleans, LA 70118                 +---------------------------+

I cite the following:

"Chaos and the Step-Size Dilemma in the Back-Prop Learning Algorithm"  by
R. Chris Lacher and Michael E. Manausa, Dept Comp. Sci., Fla. State Univ.
in:  Proceedings of the Second Workshop on Nerual Networks:  Academic/
Industrial/Defense,  WNN-AIND 91;  held at Auburn University, AL, 11-13 Feb, 1991.

It makes for VERY interesting reading!
=============================================================================
MIND.  A mysterious form of matter secreted by the brain.  Its chief activity
consists in the endeavor to asscertain its own nature, the futility of the
attempt being due to the fact that it has nothing but itself to know itself
with.   -- Ambrose Bierce
=============================================================================

Phil Spelt, Cognitive Systems & Human Factors Group  sfp@epm.ornl.gov
============================================================================
Any opinions expressed or implied are my own, IF I choose to own up to them.
============================================================================

kolen-j@retina.cis.ohio-state.edu (john kolen) (04/05/91)

  For those who worked with Back-Propagation: Have you notice any
  chaotic behavior in the graph of the (usual) error function vs epochs?
  Specifically, during the first 2 of 3 epochs the value of the error
  would jump all over the place, but afterwords becomes smooth.

This is phenomena arise from an interaction between the shape of the error
function and the initial weight selection.  The error surface near the origin
is relatively bumpy, giving rise to the "chaotic" appearence of the error
measure.  This can be attributed to the relatively large step sizes (to the
bumps) taken by backprop as it traverses these divets in error space.  As
back-propagtion continues to change the weights, there is a tendency for these
weights to move away from the origin into a relatively smoother region (wrt
to step size).  The same sort of behavior can be seen when bp is started
in a region of weight space where small changes can make a drastic change
in network functionality.  

To see this phenomena in action see

J. Kolen and J. Pollack. (1990) Backpropagation is Sensitive to Initial
Conditions.  Complex Systems, 4:269-280.


--
John Kolen (kolen-j@cis.ohio-state.edu)|computer science - n. A field of study
Laboratory for AI Research             |somewhere between numerology and
The Ohio State Univeristy	       |astrology, lacking the formalism of the
Columbus, Ohio	43210	(USA)	       |former and the popularity of the latter

uh311ae@sunmanager.lrz-muenchen.de (Henrik Klagges) (04/08/91)

Having a gradient search minimizer jumping around a little bit during
start up is absolutely normal - you should not give a damn about it.
You should rather give a damn that it STOPS jumping later, because it
usually is just heading straight into a local minimum ...

Cheers ! Rick@vee.lrz-muenchen.de

Henrik Klagges, STM group at U of Munich, FRG