loren@tristan.llnl.gov (Loren Petrich) (09/18/90)
After reading _Numerical Recipes_, I think I understand what Conjugate Gradients are now. Let us say one wants to minimize a function f(x). Given a starting point x, one knows that the negative of the local gradient, g = - grad f(x) will lead one to the minimum. So one finds f(x+g*l), where l is some scalar greater than zero, and tries to minimize it as a function of l. This is a one-dimensional problem, which is a lot easier than a multi-dimensional one. Alternately, if one knows only the gradient, one can look for when g*(- grad f(x+g*l)) changes sign as one increases l. One repeats this step as many times as necessary to achieve convergence. I presume that it can be shown that it will always find a local minimum. Is that correct? $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ Loren Petrich, the Master Blaster: loren@sunlight.llnl.gov Since this nodename is not widely known, you may have to try: loren%sunlight.llnl.gov@star.stanford.edu
pluto@beowulf.ucsd.edu (Mark Plutowski) (09/19/90)
In article <68144@lll-winken.LLNL.GOV> loren@tristan.llnl.gov (Loren Petrich) writes: > > After reading _Numerical Recipes_, I think I understand what >Conjugate Gradients are now. > > Let us say one wants to minimize a function f(x). Given a >starting point x, one knows that the negative of the local gradient, g >= - grad f(x) will lead one to the minimum. We know it will lead downhill, which is an improvement from where we are. We are seeking a global minima. The technique we discuss here will assuredly lead us to a local minima if done correctly. This is gradient descent, a popular version of which is our beloved backpropagation. >So one finds f(x+g*l), >where l is some scalar greater than zero, and tries to minimize it as >a function of l. This is a one-dimensional problem, which is a lot >easier than a multi-dimensional one. Alternately, if one knows only >the gradient, one can look for when g*(- grad f(x+g*l)) changes sign >as one increases l. This is still gradient descent. However, instead of doing backprop, we are minimizing in a direction until a local minima is found. If we do this repeatedly, we must ensure that the new direction doesn't wipe out gains attained by previous line searches. > One repeats this step as many times as necessary to achieve >convergence. I presume that it can be shown that it will always find a >local minimum. Is that correct? We repeat the step, but each line minimization is in a direction conjugate to the previous line minimizations. If you have "p" weights, it is possible to search in "p" conjugate directions. Due to nonlinearities, we are not assured of finding the global minima after p line minimizations, so after p directions, the method is restarted, by taking the gradient, doing a line minimization, then, finding the next direction conjugate to the previous, doing a line minimization, etc. Conjugate directions is a technique of giving you a set of directions which are mutually conjugate. Hence, line minimizations along any of these directions will not screw up gains obtained by searching along any of the other directions in the set previously. However, I am not convinced that searching in conjugate directions will necessarily get you to the bottom fastest in all situations. A counterexample which illustrates this: You are walking around in a hilly area. Imagine that the first direction (the one given by the gradient) is down a straight road. While travelling downhill on this road, you pass by sections of the road where there are steep dropoffs to the sides, i.e., very steep shoulders. You cannot explore these downhill directions, since you are destined to travel in a straightline down the path you are on, until you find you cannot go further without going uphill. Proceeding downhill, the shoulders of the road you are on flatten off, no longer steep dropoffs - now the shoulders of the road start to rise on either side of you, placing you into a canyon. You continue into the canyon. Finally, you stop when the road reaches a minimum. The next direction you must take (since this is a two dimensional problem, there are only 2 directions) is conjugate to the road, i.e., in this case, orthogonal. You look to either side of you, and realize that you cannot travel to either side, since you are already at a local minima along that direction. You remember back to the time when you had the opportunity to explore the steep dropoffs previously in the journey, which, if taken, would surely have taken you lower then you now are. That is the essence of conjugate gradient. Some users have the technique "look to the side" once in while, to see if they are on a ridge with steep dropoffs. Also, the line minimization need not stop at a local minima. It could proceed an indetermate length past the local minima searching for lower minima. Either one of these is a kludge - but then again, anything goes when seeking global minima in a nonlinear domain. Whatever gets you to a lower spot then the next method is fair game. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- M.E. Plutowski, UCSD CSE Dept. Mail code C-014 INTERNET: pluto%cs@ucsd.edu La Jolla, California 92093 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-