rcoahk@koel.co.rmit.oz (Alvaro Hui Kau) (08/24/90)
Hi, all experts: From a recent experiment on Guassian data classification using Bp Algorithm, I found that the higher dimensions ones( so need more input neurons) converge much much faster than those of lower dimensions. The order of difference is nearly 100 folds! I am wondering whether this is a general behavior of Bp nets, can anyone verify this for me. Of course, I use the same number of vector pairs in all case! Neural-nets are fun if we all have supercomputers.... =============================================================================== Alvaro Hui |ACSnet akkh@mullian.oz 4th Year B.E.\ B.Sc. |Internet & akkh@mullian.ee.mu.OZ.AU University of Melbourne |Arpanet rcoahk@koel.co.rmit.OZ.AU |Arpa-relay akkh%mullian.oz@uunet.uu.net |Uunet ....!munnari!mullian!akkh |EAN akkh@mullian.ee.mu.oz.au =============================================================================== =============================================================================== Alvaro Hui |ACSnet akkh@mullian.oz 4th Year B.E.\ B.Sc. |Internet & akkh@mullian.ee.mu.OZ.AU University of Melbourne |Arpanet rcoahk@koel.co.rmit.OZ.AU
ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards) (08/25/90)
In article <5462@minyos.xx.rmit.oz> rcoahk@koel.co.rmit.oz (Alvaro Hui Kau) writes: >From a recent experiment on Guassian data classification >using Bp Algorithm, I found that the higher dimensions >ones( so need more input neurons) converge much much faster >than those of lower dimensions. I have also noticed this. I believe that networks with few dimensions suffer very seriously from local minima problems (i.e. XOR). Remember, it has been shown that Bp nets are _very_ sensitive to initial weight conditions (Sorry, my reference isn't here right now), and different initial weights can change convergence times by orders of magnitude (at least for small problems). The solution? Well, you could use high-dimensional networks, but of course you then have to spend more time per epoch. I think the best idea is to use conjugate-gradient methods (see _Numerical_Recipes_ , or the paper on efficient parallel learning methods in _Neural_Information_ Processing_Systems_I [ed. Touretzky]) or at least steepest-descent with linesearch. The line-searches allow you to quickly cross vast wastelands of nearly flat error surface, which would take a very long time with vanilla Bp. Then you won't need a supercomputer to do you neural networks (although a Sun would be nice...) Try the conjugate-gradient learning program OPT available via anon ftp from cse.ogi.edu in the /pub/nnvowels directory. -Thomas Edwards The Johns Hopkins University / U.S. Naval Research Lab
alexis@oahu.cs.ucla.edu (Alexis Wieland) (08/27/90)
> From a recent experiment on Guassian data classification > using Bp Algorithm, I found that the higher dimensions > ones( so need more input neurons) converge much much faster > than those of lower dimensions. > The order of difference is nearly 100 folds! Since bp neural nets work with a weighted sum of the inputs, and since the variance of a sum of independant Gaussian distributed random variables tends to 0 as the number gets large, *any* classifier working on Gaussian data should perform better with more inputs. This is characteristic of Gaussian classifiers. The behaviour you report is often even more true for neural nets. It is simple to create examples (even inadventently) where the noise free (and effectively also the high dimensional) case is linearly separable (i.e., a net will learn quickly) and the high noise case is not (i.e., learning will be comparatively slow). A 100 fold difference is quite believable. Actually, our experiences in the past shows there's often more to it than that. Four or so years ago we (like everyone else) did a character recognition system (ours was independant of rotation). To make a long story short, 8x8 images took about 10x the wall clock time to learn as 16x16 images. The difference was that smaller images were so bleary that it really was hard to distinguish some characters, say a 'C' and a 90 degree rotated 'A', (this is all in a INNS '87 paper) The moral is "know your data" .... (re another discussion, Russ Leighton the co-author of that work, later extended those techniques, used lots of limited receptive fields and a handful of tricks and found objects in computer generated composite images of up to 1024x1024). - alexis. ><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>< Alexis Wieland also part-time/on-call grad student at lead scientist at UCLA CS Department The MITRE Corporation, Washington alexis@CS.UCLA.EDU (don't ask, it's a long commute). ><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><
demers@odin.ucsd.edu (David E Demers) (08/28/90)
In article <5462@minyos.xx.rmit.oz> rcoahk@koel.co.rmit.oz (Alvaro Hui Kau) writes: >From a recent experiment on Guassian data classification >using Bp Algorithm, I found that the higher dimensions >ones( so need more input neurons) converge much much faster >than those of lower dimensions. >The order of difference is nearly 100 folds! >I am wondering whether this is a general behavior of Bp nets, >can anyone verify this for me. >Of course, I use the same number of vector pairs in all case! This should not be a surprising result. The more degrees of freedom you allow your model, the easier it should be to reduce error. What you might find, however, is that your net does not generalize well to other inputs. What the net is doing is building a smooth function; and as we all know from function approximation, zero error on the data does not necessarily mean we have a good model! Dave