kbesrl@uxe.cso.uiuc.edu (12/27/89)
On December 17, I posted an article seeking opinions and answers on some of the questions I had about "back-prop NNs and regression." The main question was: Q. What can back-prop NNs do that cannot be done by polynomial regression? How, then, can one justify their use? The discussion following this query clarified some of the questions I had, but left many unanswered. (Thanks for all of those who sent e-mail and also those of you who posted replies). I still feel the following points need further clarification. If someone can provide additional information, it would be appreciated. 1. It is claimed that neural nets perform well when the data is ambiguous and noisy. I hear claims that they have good generalizability characteristics. However, I have never seen any evidence supporting these claims, not even (comparative) empirical evidence. Any papers or any personal experience? 2. Another claim that is often made is that they do well on non-linear problems. I haven't seen any paper comparing the use of neural nets with polynomial regression. My experience, on the other hand, does not give evidence to that claim. Any paper that compares the effectiveness of different algorithms? I have seen a paper by Weiss (sp?) comparing the performance of NNs with some AI and statistical (not poly reg) algorithms on some classification problems, where no clear evidence is found on the superiority of one technique over the other. 3. Another reason proposed is that the structure of the model is not fixed a priori in NNs. However, the number of nodes and the connectivity does determine the structure. 4. Another point often brought forth is that neural nets will do wonders under parallel architectures. I have been told that this holds true with matrix operations also and hence regressions can be made even faster! Someone noted: "He shows how to parallelize an ID3-like algorithm. Even when run on a serial machine, ID3 is much, much faster than most neural-inspired algorithms. Putting it on a parallel machine makes it faster still." 5. Someone noted: "However, for discrete mappings (eg pattern classification of binary vectors into distinct classes) back-prop can *usually* separate arbitrary non-convex decision surfaces, approximating ideal Bayesian classification." Any references? 7. Finally, has someone got some references on work pertaining to interpretation of internal representations? I am aware of one paper (NETtalk) that does hierarchical clustering on weight vectors to look at the internal representations. sudha@kbesrl.me.uiuc.edu sudhakar y. reddy mechanical and industrial engineering university of illinois at urbana-champaign urbana, il 61801
sharma@hpihoed.HP.COM (Sanjeev Sharma) (12/28/89)
Would you please send me a copy of the responses that you got to your question on "back-prop vs regression" ? Thanks. Sanjeev