kbesrl@uxe.cso.uiuc.edu (12/18/89)
I have been experimenting with back-prop neural nets for the past few months. I find that they are only as good as polynomial regression. Actually, I ran a back-prop neural net on some continuous mapping problems and found that they achieved the same performance as the `SAS' statistical package. I am wondering whether this is true of other neural models. If so, how can one defend the use of neural nets as opposed to statistical regression. If someone can give me pointers to any papers that discuss these aspects, it would be appreciated. I also request NN-experts to e-mail their comments. I have these additional questions regarding the use of back-prop NNs. (I have to mention here that I have been using small 3-layer networks (5 nodes X {1 to 20 nodes} X 5 nodes) for learning continuous mappings.) 1. Is there a substantial benefit from using partial connections as opposed to fully-connected NNs? If so, in what situations is it advisable? 2. I found through experimentation that the number of hidden layers did not matter much; it is the total number of hidden nodes that mattered. 3. Is there a rule-of-thumb that sets a limit on the number of hidden nodes based on the number of examples/inputs/outputs? I find that for the best predictive accuracy (on unseen examples), the number of connections is approximately equal to the number of examples. Is this in general true? I'd appreciate any responses at sudha@kbesrl.me.uiuc.edu sudhakar y. reddy mechanical and industrial engineering university of illinois at urbana-champaign urbana, il 61801
manj@brand.usc.edu (B. S. Manjunath) (12/19/89)
In article <220700005@uxe.cso.uiuc.edu> kbesrl@uxe.cso.uiuc.edu writes: > > >I have been experimenting with back-prop neural nets for the past >few months. I find that they are only as good as polynomial >regression. Actually, I ran a back-prop neural net on some >continuous mapping problems and found that they achieved the >same performance as the `SAS' statistical package. > >I am wondering whether this is true of other neural models. >If so, how can one defend the use of neural nets as opposed to >statistical regression. If someone can give me pointers to any >papers that discuss these aspects, it would be appreciated. > >sudha@kbesrl.me.uiuc.edu >sudhakar y. reddy You might be interested in a Technical report by T. Poggio and F. Girosi , "A theory of Networks for Approximations and Learning" AI Memo #1140, M.I.T. AI Lab, July 1989. B.S. Manjunath
joshi@wuche2.wustl.edu (Amol Joshi) (12/19/89)
In article <220700005@uxe.cso.uiuc.edu> kbesrl@uxe.cso.uiuc.edu writes: > > >I have been experimenting with back-prop neural nets for the past >few months. I find that they are only as good as polynomial >regression. Actually, I ran a back-prop neural net on some >continuous mapping problems and found that they achieved the >same performance as the `SAS' statistical package. > the techniques of multi-variable analysis are not suited to non-linear phenomena and many real problems are non-linear. even though non-linear regressions can treat non-linear phenomena, they require that the structure of the math model be prefixed. it is in these cases that backprop nets would be more useful. > 1. Is there a substantial benefit from using partial connections > as opposed to fully-connected NNs? If so, in what situations > is it advisable? with fully connected NNs, it is difficult to decipher the dominant relationships among input and output variables. with lesser number of connections, it would be possible to extract some "rules" with more ease when it is necessary. the problem with using lesser number of connections is that it becomes more difficult (typically) to obtain convergence. this is especially true if the number of nodes that one is using is very near the "minimum" needed to represent the function in question. e.g. i found it very difficult to get convergence for representing XOR function with just four nodes (in three layers - i.e. to get textbook solution). it was easier to get convergence for a 5-node network. in the last AI Expert, there is an article about NNs and statistics. :amol -- ------------------------------------------------------ amol joshi dept of chemical engrg
slehar@bucasd.bu.edu (Lehar) (12/19/89)
HOW CAN ONE DEFEND THE USE OF NEURAL NETS WHEN BACKPROP DOES NO BETTER THAN POLYNOMIAL REGRESSION? There are two distinct reasons for studying neural nets, the primary reason is to gain insights into the mechanisms of natural intelligence. The secondary reason is that SOMETIMES neural nets can solve problems more elegantly. When this is the case, then it's appropriate to use them. If polynomial regression does better for your problem, then use polynomial regression, it's sure to be faster, and is probably easier to understand. Backprop, like all neural nets, works best (relatively) when the data is ambiguous, incomplete or noisy. Remember that backprop is not the be-all and end-all of neural nets. That honour (to date) goes to that big blob of jelly in your head. Whenever you wonder whether neural nets are worth studying, think of what you are wondering with. -- (O)((O))(((O)))((((O))))(((((O)))))(((((O)))))((((O))))(((O)))((O))(O) (O)((O))((( slehar@bucasb.bu.edu )))((O))(O) (O)((O))((( Steve Lehar Boston University Boston MA )))((O))(O) (O)((O))((( (617) 424-7035 (H) (617) 353-6425 (W) )))((O))(O) (O)((O))(((O)))((((O))))(((((O)))))(((((O)))))((((O))))(((O)))((O))(O)
fishwick@fish.cis.ufl.edu (Paul Fishwick) (12/19/89)
In article <1989Dec18.210859.23621@wuche2.wustl.edu> joshi@wuche2.UUCP (Amol Joshi) writes: >In article <220700005@uxe.cso.uiuc.edu> kbesrl@uxe.cso.uiuc.edu writes: >> >> >>I have been experimenting with back-prop neural nets for the past >>few months. I find that they are only as good as polynomial >>regression. Actually, I ran a back-prop neural net on some >>continuous mapping problems and found that they achieved the >>same performance as the `SAS' statistical package. >> > the techniques of multi-variable analysis are not suited to non-linear > phenomena and many real problems are non-linear. > even though non-linear regressions can treat non-linear phenomena, they > require that the structure of the math model be prefixed. > it is in these cases that backprop nets would be more useful. You say that in regression that the "structure of the model be prefixed" however I will debate this assumption -- the structure of a set of equations is no more prefixed than a neural network model. A neural network is a set of equations shown in a graphical syntactic form. It is just as easy to add and delete terms/equations as it is to add/delete nodes, etc. The equational equivalent of removing a link is to make zero a parameter. We have also done some work on using neural networks for purposes of simulation and time-series analysis (vs. the Box-Jenkins methodology). Much more comparative work is necessary!! Here is a recent article: Fishwick, P. A. "Neural Network Models in Simulation: A Comparision with Traditional Modelling Appoaches", Winter Simulation Conference, December 1989, Washington, D.C., pp. 702 - 710.> -paul f. +------------------------------------------------------------------------+ | Prof. Paul A. Fishwick.... INTERNET: fishwick@bikini.cis.ufl.edu | | Dept. of Computer Science. UUCP: gatech!uflorida!fishwick | | Univ. of Florida.......... PHONE: (904)-335-8036 | | Bldg. CSE, Room 301....... FAX is available | | Gainesville, FL 32611..... | +------------------------------------------------------------------------+
joshi@wuche2.wustl.edu (Amol Joshi) (12/20/89)
In article <21539@uflorida.cis.ufl.EDU> fishwick@fish.cis.ufl.edu (Paul Fishwick) writes: >You say that in regression that the "structure of the model be prefixed" >however I will debate this assumption -- the structure of a set of >equations is no more prefixed than a neural network model. A neural >network is a set of equations shown in a graphical syntactic form. >It is just as easy to add and delete terms/equations as it is to add/delete >nodes, etc. The equational equivalent of removing a link is to make >zero a parameter. by "structure" of a nonlinear model i mean also the nature of the non-linearities. when doing non-linear least sqaure fit,e.g., i have to specify what exactly these terms look like (exponential, hyperbolic etc..). so, i would use regression for finding out "best" parameters for an existing analytical model. the BP-nn equivalent of the complexity of the model is, i think, the parameters like the number of layers, number of nodes in each layer - and yes, you need to fiddle with those. the advantage with regression is that, if it works, it provides insight to the physical system. BP-nn is more like a black-box and extracting knowledge like the exponential dependencies etc is impossible from the information about weights alone. am i missing something? :amol -- ------------------------------------------------------ Amol Joshi Department of Chemical Engineering
dhw@itivax.iti.org (David H. West) (12/20/89)
In article <21539@uflorida.cis.ufl.EDU> fishwick@fish.cis.ufl.edu (Paul Fishwick) writes: |In article <1989Dec18.210859.23621@wuche2.wustl.edu> joshi@wuche2.UUCP (Amol Joshi) writes: |> the techniques of multi-variable analysis are not suited to non-linear |> phenomena and many real problems are non-linear. |> even though non-linear regressions can treat non-linear phenomena, they |> require that the structure of the math model be prefixed. |> it is in these cases that backprop nets would be more useful. | |You say that in regression that the "structure of the model be prefixed" |however I will debate this assumption -- the structure of a set of |equations is no more prefixed than a neural network model. A neural |network is a set of equations shown in a graphical syntactic form. |It is just as easy to add and delete terms/equations as it is to add/delete |nodes, etc. The equational equivalent of removing a link is to make |zero a parameter. ... and to reduce the rank of the model by one. Variable-rank methods are essentially a recent development in statistics and optimization [1960s and later - yes, that's "recent" :-( ], and are not yet part of the repertoire of many (perhaps most) practitioners and software packages, even in the methods' simpler linear form. Nonlinear variable-rank methods are still a research problem. One advantage of viewing neural-net training in the light of statistics and optimization is to focus attention on the fact that the standard sigmoidal transfer function is no less a mere convention than is the Gaussian probability density, and equally a choice to be made consciously rather than by default. -David West dhw@iti.org
fishwick@fish.cis.ufl.edu (Paul Fishwick) (12/20/89)
In article <1989Dec19.172314.16051@wuche2.wustl.edu> joshi@wuche2.UUCP (Amol Joshi) writes: > ... > BP-nn is more like a black-box and extracting >knowledge like the exponential dependencies etc is impossible from the information >about weights alone. am i missing something? >:amol > I agree with Amol that extracting information from NN models directly is more difficult. This is a general problem when analyzing any nonlinear system. However, some NN model properties may be proved directly by studying the set of equations that represent the neural network. On a slightly different note, I think that is important that we always remember that a neural network, like a signal flow graph, is just a convenient representation for a set of equations (unless one is interested in studying the neurophysiology aspect -- where structure of the network may represent biological structure). Any thoughts? -paul fishwick +------------------------------------------------------------------------+ | Prof. Paul A. Fishwick.... INTERNET: fishwick@bikini.cis.ufl.edu | | Dept. of Computer Science. UUCP: gatech!uflorida!fishwick | | Univ. of Florida.......... PHONE: (904)-335-8036 | | Bldg. CSE, Room 301....... FAX is available | | Gainesville, FL 32611..... | +------------------------------------------------------------------------+
bill@boulder.Colorado.EDU (12/20/89)
> > There are two distinct reasons for studying neural nets, the primary > reason is to gain insights into the mechanisms of natural > intelligence. The secondary reason is that SOMETIMES neural nets can > solve problems more elegantly. When this is the case, then it's > appropriate to use them. > The secondary reason (IMHO) is that neural nets are massively parallel. When one has reached the limits of sequential speed, one must go to parallelism in order to get greater power. Neural nets are unlikely to ever provide especially _elegant_ solutions to very many problems: their virtue is that they provide a brutal and simplistic solution that sometimes (surprisingly) actually works. I don't expect neural network methods to be practical until massively parallel VLSI neural network chips exist and are easily obtainable. At that point the advantages of parallelism will compensate for the crudity of the method for some applications, and the revolution will truly begin. The day is not yet here, but it can't be too much longer in coming. Bill Skaggs
al@gtx.com (Alan Filipski) (12/27/89)
In article <21541@uflorida.cis.ufl.EDU> fishwick@fish.cis.ufl.edu (Paul Fishwick) writes: > I think that is important that we always >remember that a neural network, like a signal flow graph, is just >a convenient representation for a set of equations > Depends on your point of view. One might just as well say that a set of equations is just a convenient representation for a Neural Network. I don't see that one representation is necessarily more fundamental than the other. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ( Alan Filipski, GTX Corp, 8836 N. 23rd Avenue, Phoenix, Arizona 85021, USA ) ( {decvax,hplabs,uunet!amdahl,nsc}!sun!sunburn!gtx!al (602)870-1696 ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~