[comp.ai.neural-nets] NNs, regression, emp. evidence, etc

kbesrl@uxe.cso.uiuc.edu (12/27/89)

On December 17, I posted an article seeking opinions and answers on
some of the questions I had about "back-prop NNs and regression."
The main question was:

Q. What can back-prop NNs do that cannot be done by polynomial
   regression? How, then, can one justify their use?

The discussion following this query clarified some of the questions I
had, but left many unanswered. (Thanks for all of those who sent 
e-mail and also those of you who posted replies). I still feel the
following points need further clarification. If someone can provide
additional information, it would be appreciated.

1. It is claimed that neural nets perform well when the data is ambiguous
   and noisy. I hear claims that they have good generalizability
   characteristics. However, I have never seen any evidence supporting
   these claims, not even (comparative) empirical evidence. Any papers
   or any personal experience? 

2. Another claim that is often made is that they do well on non-linear
   problems. I haven't seen any paper comparing the use of neural nets
   with polynomial regression. My experience, on the other hand, does
   not give evidence to that claim. Any paper that compares the
   effectiveness of different algorithms?

   I have seen a paper by Weiss (sp?) comparing the performance of NNs
   with some AI and statistical (not poly reg) algorithms on some
   classification problems, where no clear evidence is found on the
   superiority of one technique over the other.

3. Another reason proposed is that the structure of the model is not
   fixed a priori in NNs. However, the number of nodes and the
   connectivity does determine the structure.

4. Another point often brought forth is that neural nets will do 
   wonders under parallel architectures. I have been told that this
   holds true with matrix operations also and hence regressions can
   be made even faster!
   Someone noted:
   "He shows how to parallelize an ID3-like algorithm. Even when run on
    a serial machine, ID3 is  much, much faster than most neural-inspired
    algorithms. Putting it on a parallel machine makes it faster 
    still."

5. Someone noted:
   "However, for discrete mappings (eg pattern classification of binary
    vectors into distinct classes) back-prop can *usually* separate
    arbitrary non-convex decision surfaces, approximating ideal
    Bayesian classification."
   Any references?

7. Finally, has someone got some references on work pertaining to 
   interpretation of internal representations? I am aware of one
   paper (NETtalk) that does hierarchical clustering on weight vectors
   to look at the internal representations.

sudha@kbesrl.me.uiuc.edu
sudhakar y. reddy
mechanical and industrial engineering
university of illinois at urbana-champaign
urbana, il 61801

sharma@hpihoed.HP.COM (Sanjeev Sharma) (12/28/89)

Would you please send me a copy of the responses that you got to your
question on "back-prop vs regression" ?

Thanks.

Sanjeev