kbesrl@uxe.cso.uiuc.edu (12/27/89)
On December 17, I posted an article seeking opinions and answers on
some of the questions I had about "back-prop NNs and regression."
The main question was:
Q. What can back-prop NNs do that cannot be done by polynomial
regression? How, then, can one justify their use?
The discussion following this query clarified some of the questions I
had, but left many unanswered. (Thanks for all of those who sent
e-mail and also those of you who posted replies). I still feel the
following points need further clarification. If someone can provide
additional information, it would be appreciated.
1. It is claimed that neural nets perform well when the data is ambiguous
and noisy. I hear claims that they have good generalizability
characteristics. However, I have never seen any evidence supporting
these claims, not even (comparative) empirical evidence. Any papers
or any personal experience?
2. Another claim that is often made is that they do well on non-linear
problems. I haven't seen any paper comparing the use of neural nets
with polynomial regression. My experience, on the other hand, does
not give evidence to that claim. Any paper that compares the
effectiveness of different algorithms?
I have seen a paper by Weiss (sp?) comparing the performance of NNs
with some AI and statistical (not poly reg) algorithms on some
classification problems, where no clear evidence is found on the
superiority of one technique over the other.
3. Another reason proposed is that the structure of the model is not
fixed a priori in NNs. However, the number of nodes and the
connectivity does determine the structure.
4. Another point often brought forth is that neural nets will do
wonders under parallel architectures. I have been told that this
holds true with matrix operations also and hence regressions can
be made even faster!
Someone noted:
"He shows how to parallelize an ID3-like algorithm. Even when run on
a serial machine, ID3 is much, much faster than most neural-inspired
algorithms. Putting it on a parallel machine makes it faster
still."
5. Someone noted:
"However, for discrete mappings (eg pattern classification of binary
vectors into distinct classes) back-prop can *usually* separate
arbitrary non-convex decision surfaces, approximating ideal
Bayesian classification."
Any references?
7. Finally, has someone got some references on work pertaining to
interpretation of internal representations? I am aware of one
paper (NETtalk) that does hierarchical clustering on weight vectors
to look at the internal representations.
sudha@kbesrl.me.uiuc.edu
sudhakar y. reddy
mechanical and industrial engineering
university of illinois at urbana-champaign
urbana, il 61801sharma@hpihoed.HP.COM (Sanjeev Sharma) (12/28/89)
Would you please send me a copy of the responses that you got to your question on "back-prop vs regression" ? Thanks. Sanjeev