[comp.ai.neural-nets] Do I Understand Conjugate Gradients Now?

loren@tristan.llnl.gov (Loren Petrich) (09/18/90)

	After reading _Numerical Recipes_, I think I understand what
Conjugate Gradients are now.

	Let us say one wants to minimize a function f(x). Given a
starting point x, one knows that the negative of the local gradient, g
= - grad f(x) will lead one to the minimum. So one finds f(x+g*l),
where l is some scalar greater than zero, and tries to minimize it as
a function of l. This is a one-dimensional problem, which is a lot
easier than a multi-dimensional one. Alternately, if one knows only
the gradient, one can look for when g*(- grad f(x+g*l)) changes sign
as one increases l.

	One repeats this step as many times as necessary to achieve
convergence. I presume that it can be shown that it will always find a
local minimum. Is that correct?


$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Loren Petrich, the Master Blaster: loren@sunlight.llnl.gov

Since this nodename is not widely known, you may have to try:

loren%sunlight.llnl.gov@star.stanford.edu

pluto@beowulf.ucsd.edu (Mark Plutowski) (09/19/90)

In article <68144@lll-winken.LLNL.GOV> loren@tristan.llnl.gov (Loren Petrich) writes:
>
>	After reading _Numerical Recipes_, I think I understand what
>Conjugate Gradients are now.
>
>	Let us say one wants to minimize a function f(x). Given a
>starting point x, one knows that the negative of the local gradient, g
>= - grad f(x) will lead one to the minimum. 

We know it will lead downhill, which is an improvement from where 
we are.  We are seeking a global minima.  The technique we discuss 
here will assuredly lead us to a local minima if done correctly.  
This is gradient descent, a popular version of which is our beloved 
backpropagation.

>So one finds f(x+g*l),
>where l is some scalar greater than zero, and tries to minimize it as
>a function of l. This is a one-dimensional problem, which is a lot
>easier than a multi-dimensional one. Alternately, if one knows only
>the gradient, one can look for when g*(- grad f(x+g*l)) changes sign
>as one increases l.

This is still gradient descent.  However, instead of doing backprop,
we are minimizing in a direction until a local minima is found.  
If we do this repeatedly, we must ensure that the new direction 
doesn't wipe out gains attained by previous line searches.  

>	One repeats this step as many times as necessary to achieve
>convergence. I presume that it can be shown that it will always find a
>local minimum. Is that correct?

We repeat the step, but each line minimization is in a direction 
conjugate to the previous line minimizations.  If you have "p" weights,
it is possible to search in "p" conjugate directions.  Due to 
nonlinearities, we are not assured of finding the global  minima after
p line minimizations, so after p directions, the method is restarted,
by taking the gradient, doing a line minimization, then, finding 
the next direction conjugate to the previous, doing a line 
minimization, etc.  

Conjugate directions is a technique of giving you a set of directions 
which are mutually conjugate.  Hence, line minimizations along 
any of these directions will not screw up gains obtained by searching 
along any of the other directions in the set previously.  However, 
I am not convinced that searching in conjugate directions will 
necessarily get you to the bottom fastest in all situations.

A counterexample which illustrates this:
You are walking around in a hilly area.
Imagine that the first direction (the one given by the gradient)
is down a straight road.  While travelling downhill on this road,
you pass by sections of the road where there are steep dropoffs to 
the sides, i.e., very steep shoulders.  You cannot explore these
downhill directions, since you are destined to travel in a straightline
down the path you are on, until you find you cannot go further 
without going uphill.  Proceeding downhill, the shoulders of the 
road you are on flatten off, no longer steep dropoffs - now the 
shoulders of the road start to rise on either side of you, 
placing you into a canyon.  You continue into the canyon. 
Finally, you stop when the road reaches a minimum.

The next direction you must take (since this is a two dimensional 
problem, there are only 2 directions) is conjugate to the road,
i.e., in this case, orthogonal.  You look to either side of you, 
and realize that you cannot travel to either side, since you are 
already at a local minima along that direction.  

You remember back to the time when you had the opportunity to 
explore the steep dropoffs previously in the journey, which, if 
taken, would surely have taken you lower then you now are.   

That is the essence of conjugate gradient.  Some users have the 
technique "look to the side" once in while, to see if they are 
on a ridge with steep dropoffs.  Also, the line minimization need 
not stop at a local minima.  It could proceed an indetermate length 
past the local minima searching for lower minima.  Either one of these
is a kludge - but then again, anything goes when seeking global minima
in a nonlinear domain.  Whatever gets you to a lower spot then the 
next method is fair game.  

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
M.E. Plutowski, UCSD 
CSE Dept. Mail code C-014		   INTERNET: pluto%cs@ucsd.edu 
La Jolla, California 92093   		
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-