burow@cernvax.cern.ch (burkhard burow) (01/21/91)
What are the performance and 'elegance' differences between neural net sw and stat. techniques, e.g. discriminant analysis, when used to assign events to one of 2 or more populations. Seen by the user as a black box, the 2 methods are identical, a understood training set of events sets up the machinery and the unknown events follow. I certainly understand the performance advantage of hardware neural nets, e.g. the brain, but what's the story when both of the above methods run on 'normal' computers. I'm looking for comments, facts, arguments, beliefs, pointers to literature, postings, etc. thanks INTERNET: burow%13313.hepnet@csa3.lbl.gov burkhard
minsky@media-lab.MEDIA.MIT.EDU (Marvin Minsky) (01/22/91)
In article <3885@cernvax.cern.ch> burow@cernvax.cern.ch (burkhard burow) writes: >What are the performance and 'elegance' differences between neural net sw and >stat. techniques, e.g. discriminant analysis, when used to assign events to one >of 2 or more populations. Seen by the user as a black box, the 2 methods are >identical, a understood training set of events sets up the machinery and the >unknown events follow. The nn methods form a wider class of clustering procedures. Wider for several reasons: 1. The result is not constrained to be unique. The same data can produce different classifications, because the learning trajectory can depend on the order in which the data is presented. 2. NN methods are still largely empirical. A method becomes popular if enough researchers claim that it gives good results. In most cases, very little has been proven about the range and reliability of the method. Statistical methods, in contrast, are not considered "scientific" unless accompanied by theorems about their behavior. 3. In order to prove a theorem about classification, the theorem's antecedent must precisely describe a class of classification problems. In the real world, this is really hard to do -- so mathematical statistics tends to confine itself to idealized well-defined cases. The AI and NN researchers don't restrict themselves that way. So the differences are substantial. In some cases, NN methods are known that in fact turn out to compute well-known statistical functions. For example, read section 12.4 of Perceptrons (Minsky and Papert, MIT Press, 1988. There you can see an iterative, NN-like process that computes Bayesian statistics -- excep that, because of the memory decay, the variances do not converge to zero with increasing sample size, as is often characteristic of NN's.