[comp.ai.neural-nets] back-prop NNs and `SAS' regression!

kbesrl@uxe.cso.uiuc.edu (12/18/89)

I have been experimenting with back-prop neural nets for the past
few months. I find that they are only as good as polynomial
regression. Actually, I ran a back-prop neural net on some
continuous mapping problems and found that they achieved the
same performance as the `SAS' statistical package.

I am wondering whether this is true of other neural models.
If so, how can one defend the use of neural nets as opposed to
statistical regression. If someone can give me pointers to any
papers that discuss these aspects, it would be appreciated.
I also request NN-experts to e-mail their comments.

I have these additional questions regarding the use of back-prop
NNs. (I have to mention here that I have been using small 3-layer
networks (5 nodes X {1 to 20 nodes} X 5 nodes) for learning continuous
mappings.)

    1. Is there a substantial benefit from using partial connections
       as opposed to fully-connected NNs? If so, in what situations
       is it advisable?
    2. I found through experimentation that the number of hidden layers
       did not matter much; it is the total number of hidden nodes
       that mattered.
    3. Is there a rule-of-thumb that sets a limit on the number of
       hidden nodes based on the number of examples/inputs/outputs?
       I find that for the best predictive accuracy (on unseen examples),
       the number of connections is approximately equal to the number
       of examples. Is this in general true?

I'd appreciate any responses at

sudha@kbesrl.me.uiuc.edu
sudhakar y. reddy
mechanical and industrial engineering
university of illinois at urbana-champaign
urbana, il 61801

manj@brand.usc.edu (B. S. Manjunath) (12/19/89)

In article <220700005@uxe.cso.uiuc.edu> kbesrl@uxe.cso.uiuc.edu writes:
 >
 >
 >I have been experimenting with back-prop neural nets for the past
 >few months. I find that they are only as good as polynomial
 >regression. Actually, I ran a back-prop neural net on some
 >continuous mapping problems and found that they achieved the
 >same performance as the `SAS' statistical package.
 >
 >I am wondering whether this is true of other neural models.
 >If so, how can one defend the use of neural nets as opposed to
 >statistical regression. If someone can give me pointers to any
 >papers that discuss these aspects, it would be appreciated.
 >
 >sudha@kbesrl.me.uiuc.edu
 >sudhakar y. reddy

  You might be interested in a Technical report by T. Poggio and
F. Girosi , "A theory of Networks for Approximations and Learning"
 AI Memo #1140, M.I.T. AI Lab, July 1989.

B.S. Manjunath

joshi@wuche2.wustl.edu (Amol Joshi) (12/19/89)

In article <220700005@uxe.cso.uiuc.edu> kbesrl@uxe.cso.uiuc.edu writes:
>
>
>I have been experimenting with back-prop neural nets for the past
>few months. I find that they are only as good as polynomial
>regression. Actually, I ran a back-prop neural net on some
>continuous mapping problems and found that they achieved the
>same performance as the `SAS' statistical package.
>
	the techniques of multi-variable analysis are not suited to non-linear
	phenomena and many real problems are non-linear.
	even though non-linear regressions can treat non-linear phenomena, they
	require that the structure of the math model be prefixed.
	it is in these cases that backprop nets would be more useful.

>    1. Is there a substantial benefit from using partial connections
>       as opposed to fully-connected NNs? If so, in what situations
>       is it advisable?
	
	with fully connected NNs, it is difficult to decipher the dominant
	relationships among input and output variables. with lesser number of
	connections, it would be possible to extract some "rules" with more
	ease when it is necessary. 
	the problem with using lesser number of connections is that it  becomes
	more difficult (typically) to obtain convergence. this is especially true
	if the number of nodes that one is using is very near the "minimum" needed
	to represent the function in question. e.g. i found it very difficult to get
	convergence for representing XOR function with  just four nodes (in three
	layers - i.e. to get textbook solution). it was easier to get convergence
    for a 5-node network.

	in the last AI Expert, there is an article about NNs and statistics. 

:amol

-- 

------------------------------------------------------

amol joshi			dept of chemical engrg

slehar@bucasd.bu.edu (Lehar) (12/19/89)

 HOW CAN ONE DEFEND THE USE OF NEURAL NETS WHEN BACKPROP DOES NO BETTER
 THAN POLYNOMIAL REGRESSION?

There are two distinct reasons for studying neural nets, the primary
reason is to gain insights into the mechanisms of natural
intelligence.  The secondary reason is that SOMETIMES neural nets can
solve problems more elegantly.  When this is the case, then it's
appropriate to use them.

If polynomial regression does better for your problem, then use
polynomial regression, it's sure to be faster, and is probably easier
to understand.  Backprop, like all neural nets, works best
(relatively) when the data is ambiguous, incomplete or noisy.

Remember that backprop is not the be-all and end-all of neural nets.
That honour (to date) goes to that big blob of jelly in your head.
Whenever you wonder whether neural nets are worth studying, think of
what you are wondering with.

--
(O)((O))(((O)))((((O))))(((((O)))))(((((O)))))((((O))))(((O)))((O))(O)
(O)((O))(((              slehar@bucasb.bu.edu              )))((O))(O)
(O)((O))(((    Steve Lehar Boston University Boston MA     )))((O))(O)
(O)((O))(((    (617) 424-7035 (H)   (617) 353-6425 (W)     )))((O))(O)
(O)((O))(((O)))((((O))))(((((O)))))(((((O)))))((((O))))(((O)))((O))(O)

fishwick@fish.cis.ufl.edu (Paul Fishwick) (12/19/89)

In article <1989Dec18.210859.23621@wuche2.wustl.edu> joshi@wuche2.UUCP (Amol Joshi) writes:
>In article <220700005@uxe.cso.uiuc.edu> kbesrl@uxe.cso.uiuc.edu writes:
>>
>>
>>I have been experimenting with back-prop neural nets for the past
>>few months. I find that they are only as good as polynomial
>>regression. Actually, I ran a back-prop neural net on some
>>continuous mapping problems and found that they achieved the
>>same performance as the `SAS' statistical package.
>>
>	the techniques of multi-variable analysis are not suited to non-linear
>	phenomena and many real problems are non-linear.
>	even though non-linear regressions can treat non-linear phenomena, they
>	require that the structure of the math model be prefixed.
>	it is in these cases that backprop nets would be more useful.

You say that in regression that the "structure of the model be prefixed"
however I will debate this assumption -- the structure of a set of
equations is no more prefixed than a neural network model. A neural
network is a set of equations shown in a graphical syntactic form.
It is just as easy to add and delete terms/equations as it is to add/delete
nodes, etc. The equational equivalent of removing a link is to make
zero a parameter.

We have also done some work on using neural networks for purposes of
simulation and time-series analysis (vs. the Box-Jenkins methodology).
Much more comparative work is necessary!! Here is a recent article:

Fishwick, P. A. "Neural Network Models in Simulation: A Comparision
with Traditional Modelling Appoaches", Winter Simulation Conference,
December 1989, Washington, D.C., pp. 702 - 710.>

-paul f.

+------------------------------------------------------------------------+
| Prof. Paul A. Fishwick.... INTERNET: fishwick@bikini.cis.ufl.edu       |
| Dept. of Computer Science. UUCP: gatech!uflorida!fishwick              |
| Univ. of Florida.......... PHONE: (904)-335-8036                       |
| Bldg. CSE, Room 301....... FAX is available                            |
| Gainesville, FL 32611.....                                             |
+------------------------------------------------------------------------+

joshi@wuche2.wustl.edu (Amol Joshi) (12/20/89)

In article <21539@uflorida.cis.ufl.EDU> fishwick@fish.cis.ufl.edu (Paul Fishwick) writes:
>You say that in regression that the "structure of the model be prefixed"
>however I will debate this assumption -- the structure of a set of
>equations is no more prefixed than a neural network model. A neural
>network is a set of equations shown in a graphical syntactic form.
>It is just as easy to add and delete terms/equations as it is to add/delete
>nodes, etc. The equational equivalent of removing a link is to make
>zero a parameter.

by "structure" of a nonlinear model i mean also the nature of the non-linearities.
when doing non-linear least sqaure fit,e.g., i have to specify what exactly these
terms look like (exponential, hyperbolic etc..). so, i would use regression for
finding out "best" parameters for an existing analytical model.
the BP-nn equivalent of the complexity of the model is, i think, the parameters
like the number of layers, number of nodes in each layer - and yes, you need to
fiddle with those. the advantage with regression is that, if it works, it provides
insight to the physical system. BP-nn is more like a black-box and extracting
knowledge like the exponential dependencies etc is impossible from the information
about weights alone. am i missing something?
:amol

-- 
------------------------------------------------------

Amol Joshi
Department of Chemical Engineering

dhw@itivax.iti.org (David H. West) (12/20/89)

In article <21539@uflorida.cis.ufl.EDU> fishwick@fish.cis.ufl.edu (Paul Fishwick) writes:
|In article <1989Dec18.210859.23621@wuche2.wustl.edu> joshi@wuche2.UUCP (Amol Joshi) writes:
|>	the techniques of multi-variable analysis are not suited to non-linear
|>	phenomena and many real problems are non-linear.
|>	even though non-linear regressions can treat non-linear phenomena, they
|>	require that the structure of the math model be prefixed.
|>	it is in these cases that backprop nets would be more useful.
|
|You say that in regression that the "structure of the model be prefixed"
|however I will debate this assumption -- the structure of a set of
|equations is no more prefixed than a neural network model. A neural
|network is a set of equations shown in a graphical syntactic form.
|It is just as easy to add and delete terms/equations as it is to add/delete
|nodes, etc. The equational equivalent of removing a link is to make
|zero a parameter.

... and to reduce the rank of the model by one.  Variable-rank
methods are essentially a recent development in statistics and
optimization [1960s and later - yes, that's "recent" :-(  ], and are
not yet part of the repertoire of many (perhaps most) practitioners
and software packages, even in the methods' simpler linear form.  
Nonlinear variable-rank methods are still a research problem.

One advantage of viewing neural-net training in the light of
statistics and optimization is to focus attention on the fact that
the standard sigmoidal transfer function is no less a mere
convention than is the Gaussian probability density, and equally a
choice to be made consciously rather than by default.

-David West       dhw@iti.org

fishwick@fish.cis.ufl.edu (Paul Fishwick) (12/20/89)

In article <1989Dec19.172314.16051@wuche2.wustl.edu> joshi@wuche2.UUCP (Amol Joshi) writes:

> ...
> BP-nn is more like a black-box and extracting
>knowledge like the exponential dependencies etc is impossible from the information
>about weights alone. am i missing something?
>:amol
>

I agree with Amol that extracting information from NN models directly
is more difficult. This is a general problem when analyzing any nonlinear
system. However, some NN model properties may be proved directly by
studying the set of equations that represent the neural network. On
a slightly different note, I think that is important that we always
remember that a neural network, like a signal flow graph, is just
a convenient representation for a set of equations (unless one is
interested in studying the neurophysiology aspect -- where structure
of the network may represent biological structure). Any thoughts?

-paul fishwick

+------------------------------------------------------------------------+
| Prof. Paul A. Fishwick.... INTERNET: fishwick@bikini.cis.ufl.edu       |
| Dept. of Computer Science. UUCP: gatech!uflorida!fishwick              |
| Univ. of Florida.......... PHONE: (904)-335-8036                       |
| Bldg. CSE, Room 301....... FAX is available                            |
| Gainesville, FL 32611.....                                             |
+------------------------------------------------------------------------+

bill@boulder.Colorado.EDU (12/20/89)

>
> There are two distinct reasons for studying neural nets, the primary
> reason is to gain insights into the mechanisms of natural
> intelligence.  The secondary reason is that SOMETIMES neural nets can
> solve problems more elegantly.  When this is the case, then it's
> appropriate to use them.
> 

  The secondary reason (IMHO) is that neural nets are massively parallel.
When one has reached the limits of sequential speed, one must go to
parallelism in order to get greater power.  Neural nets are unlikely to
ever provide especially _elegant_ solutions to very many problems:  their
virtue is that they provide a brutal and simplistic solution that sometimes
(surprisingly) actually works.  
  
  I don't expect neural network methods to be practical until massively
parallel VLSI neural network chips exist and are easily obtainable.  At
that point the advantages of parallelism will compensate for the crudity
of the method for some applications, and the revolution will truly begin.
The day is not yet here, but it can't be too much longer in coming.

							Bill Skaggs

al@gtx.com (Alan Filipski) (12/27/89)

In article <21541@uflorida.cis.ufl.EDU> fishwick@fish.cis.ufl.edu (Paul Fishwick) writes:
>                           I think that is important that we always
>remember that a neural network, like a signal flow graph, is just
>a convenient representation for a set of equations
>

Depends on your point of view.  One might just as well say that a set
of equations is just a convenient representation for a Neural Network.
I don't see that one representation is necessarily more fundamental
than the other.


  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ( Alan Filipski, GTX Corp, 8836 N. 23rd Avenue, Phoenix, Arizona 85021, USA )
 ( {decvax,hplabs,uunet!amdahl,nsc}!sun!sunburn!gtx!al         (602)870-1696 )
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~