akkh@mullian.ee.mu.OZ.AU (A. Hui) (09/10/90)
To all experts! I recently came across the learning algorithm called "Conjugate Gradient method" which is supposed to improve the performance of NN during the learning phase. I digged up a few references but was astonded by the huge vol. of maths that is involved in the method. So, Could someone gives me in simple term the general idea behind the method? Or, a reference which you think is simple enough for a beginner in NN?? Thanks in advance! =============================================================================== Alvaro Hui |ACSnet akkh@mullian.oz 4th Year B.E.\ B.Sc. |Internet & akkh@mullian.ee.mu.OZ.AU University of Melbourne |Arpanet rcoahk@koel.co.rmit.OZ.AU
ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards) (09/10/90)
In article <5417@munnari.oz.au> akkh@mullian.ee.mu.OZ.AU (A. Hui) writes: >To all experts! > >I recently came across the learning algorithm called "Conjugate Gradient >method" which is supposed to improve the performance of NN during the >learning phase. > >I digged up a few references but was astonded by the huge vol. of maths >that is involved in the method. If we look at the history of neural networks since 1975, we see that the lack of formal numerical analysis experience of connectionist researchers has been a continuing hinderance to the field. But enough of my philosophy. Conjugate-Gradient is explained fairly well in _Numerical_Recipes_. Yes, it involves lots of math to implement. Basically, it is much like steepest descent function minimizers. They find the negative gradient of the function at a point, and perform a line minimization in the direction of the negative gradient. Line minimizations, being in a single direction, are much easier than multidimensional minimization. Once we have found the minima along the negative gradient direction, we jump to that point, find the new gradient direction, perform another line minimization, and iterate again. Eventually, the function should be minimized. Note that using the line minimization allows us to fly over vast plains of near zero gradients, where fixed step-size gradient methods would get trapped, helplessly crawling along. Conjugate gradient is an improvement to steepest descent. Basically, it allows all search directions to be more or less linearly independent and minimizing. The effect is difficult to imagine, but _Numerical_ Recpies_ shows how steepest-descent can criss-cross along a valley, where conjugate gradient avoids re-minimizations along past directions. Even if you just want to use simple steepest-descent, it is a vast improvement over fixed step-size methods. The hardest part is the line minimization, though, and you will have to do a little bit of looking to find an appropriate algorithm. -Thomas Edwards