[comp.ai.neural-nets] What is Conjugate Gradient method???

akkh@mullian.ee.mu.OZ.AU (A. Hui) (09/10/90)

To all experts!

I recently came across the learning algorithm called "Conjugate Gradient
method" which is supposed to improve the performance of NN during the
learning phase.

I digged up a few references but was astonded by the huge vol. of maths
that is involved in the method.

So, Could someone gives me in simple term the general idea behind the method?
Or, a reference which you think is simple enough for a beginner in NN??

Thanks in advance!

===============================================================================
	Alvaro Hui		|ACSnet		akkh@mullian.oz
    4th Year B.E.\ B.Sc.	|Internet &	akkh@mullian.ee.mu.OZ.AU
   University of Melbourne	|Arpanet	rcoahk@koel.co.rmit.OZ.AU 

ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards) (09/10/90)

In article <5417@munnari.oz.au> akkh@mullian.ee.mu.OZ.AU (A. Hui) writes:
>To all experts!
>
>I recently came across the learning algorithm called "Conjugate Gradient
>method" which is supposed to improve the performance of NN during the
>learning phase.
>
>I digged up a few references but was astonded by the huge vol. of maths
>that is involved in the method.

If we look at the history of neural networks since 1975, we see that the
lack of formal numerical analysis experience of connectionist
researchers has been a continuing hinderance to the field.  But
enough of my philosophy.

Conjugate-Gradient is explained fairly well in _Numerical_Recipes_.  Yes,
it involves lots of math to implement.  Basically, it is much like
steepest descent function minimizers.  They find the negative gradient of
the function at a point, and perform a line minimization in the direction
of the negative gradient.  Line minimizations, being in a single direction,
are much easier than multidimensional minimization.  Once we have found
the minima along the negative gradient direction, we jump to that point,
find the new gradient direction, perform another line minimization,
and iterate again.  Eventually, the function should be minimized.
Note that using the line minimization allows us to fly over vast plains
of near zero gradients, where fixed step-size gradient methods would get
trapped, helplessly crawling along.

Conjugate gradient is an improvement to steepest descent.  Basically,
it allows all search directions to be more or less linearly independent
and minimizing.  The effect is difficult to imagine, but _Numerical_
Recpies_ shows how steepest-descent can criss-cross along a valley,
where conjugate gradient avoids re-minimizations along past directions.

Even if you just want to use simple steepest-descent, it is a vast
improvement over fixed step-size methods.  The hardest part is the
line minimization, though, and you will have to do a little bit of
looking to find an appropriate algorithm.

-Thomas Edwards