neuron-request@HPLABS.HP.COM (Neuron-Digest Moderator Peter Marvit) (12/02/88)
Neuron Digest Thursday, 1 Dec 1988 Volume 4 : Issue 29 Today's Topics: Administrivia neural net job opening at MCC Re: Learning arbitrary transfer functions Radial basis functions etc. Journal of Complexity car pooling from Denver Airport to NIPS conference hotel NIPS Speech Workshop Flaming on Neural Nets and Transfer Functions RE: advantages of NNs over symbolic systems Send submissions, questions, address maintenance and requests for old issues to "neuron-request@hplabs.hp.com" or "{any backbone,uunet}!hplabs!neuron-request" ------------------------------------------------------------ Subject: Administrivia From: Your local Moderator Date: Thu, 1 Dec 88 13:00:00 PST [[ Editor's Preface: I've recieved several comments resulting from the recent discussion of public conduct, information referencing and my editorial policy (such as it is). I'll report on those in the next issue. I'm still open to suggestions and recommendations. As always, though, I look for submissions -- especially those with good technical foundations. On another note, the Neuron Digest has grown to over 650 addresses, many of which are redistribution points and a very large number of international sites. Please, if you are going to move or your site has mail difficulties, let me know. I get a large number of (frustrating) undeliverable returns and am often forced to delete addresses. One of the aims of this Digest is to offer information and provide conduits for people working in this field to communicate effectively. To this aim, I'd like to start publishing short "biographies" -- especially of groups around the world. In my "welcome" message, I usually ask about the new subscriber's interests. I now invite YOU (the entire readership) to introduce yourselves to the Digest. Who are YOU and what are YOU doing? I will publish responses appropriately in future issues. -PM ]] ------------------------------ Subject: neural net job opening at MCC From: keeler@mcc.com (Jim Keeler) Date: Tue, 22 Nov 88 19:04:48 -0600 WANTED: CONNECTIONIST/NEURAL NET RESEARCHERS MCC (Microelectronics and Computer Technology Corporation, Austin Texas) is looking for research scientists to join our newly formed neural network research team. We are looking for researchers with strong theoretical skills in Physics, Electrical Engineering or Computer Science (Ph. D. level or above preferred). The research will focus on (non-military), fundamental questions about neural networks including -Scaling and improvement of existing algorithms -Development of new learning algorithms -Temporal pattern recognition and processing -Reverse engineering of biological networks -Optical neural network architectures MCC offers competitive salaries and a very stimulating, academic-like research environment. Contact Jim Keeler at jdk.mcc.com or Haran Boral at haran.mcc.com Or contact Jim Keeler at the NIPS conference in Denver. ------------------------------ Subject: Re: Learning arbitrary transfer functions From: djb@flash.bellcore.com (David J Burr) Date: Wed, 23 Nov 88 00:04:32 -0500 Based on arguments outlined in the report "Knowledge Representation in Connectionist Networks", S. J. Hanson and D. J. Burr, Bellcore Tech. Report, February 9, 1987, a layered neural net with "two" hidden layers should be able to approximate any continuous function. We argue that a continuous function is approximated by superposing individual activation "hills" in the (input) space. Negative hills or valleys can also be formed. In a network with conventional linear summation neurons the hills are formed in the "first" hidden layer. However, when special quadratic "spherical" neurons are used, they individually form activation hills and the first hidden layer is not needed. These hills are superposed (OR-ed) with linear summation neurons in the outer hidden layer. Superposition can be quasi-linear depending on the values of the output layer weights. We actually demonstrated a one (hidden) layer network learning a noisy 2-D gaussian function in a later paper: "Minkowski-r Back Propagation: Learning in Connectionist Models with Non-Euclidean Error Metrics", S. J. Hanson and D. J. Burr, in D. Anderson, Neural Information Processing Systems: Natural and Synthetic, AIP, 1988. D. J. Burr, 2B-397 Bellcore 445 South St. Morristown, NJ 07960 djb@bellcore.com ------------------------------ Subject: Radial basis functions etc. From: "M. Niranjan" <niranjan%digsys.engineering.cambridge.ac.uk@NSS.Cs.Ucl.AC.UK> Date: Wed, 23 Nov 88 12:42:17 +0000 [[ Editor's Note: The original announcement of Niranjan's paper was in Vol 4 #17. -PM ]] Some recent comments on RBFs that you might find interesting! niranjan ============================== ONE ===================================== Date: Wed, 16 Nov 88 09:41:13 +0200 From: Dario Ringach <dario@earn.techunix> To: ajr <ajr%uk.ac.cambridge.engineering.digsys@uk.ac.ucl.cs.nss> Subject: Comments on TR.25 Thanks a lot for the papers! I'd like to share a few thoughts on TR.25 "Generalising the Nodes of the Error Propagation Network". What are the advantages of choosing radial basis functions (or Gaussian nodes) in *general* discrimination tasks? It seems clear to me, that the results presented in Table 1 are due to the fact that the spectral distribution of steady state vowels can be closely represented by normal/radial distributions. If I have no a-priori information about the distribution of the classes then how can I know which kind of nodes will perform better? I think that in this case the best we can do is to look at the combinatorical problem of how many partitions of the n-dimensional Euclidean space can be obtained using N (proposed shape) boundaries. This is closely related to obtaining the Vapnik-Chervonenkis Dimension of the boundary class. In the case of n-dimensional hyperplanes and hypershperes, both have VC-dimension n+1, so I think there is really no difference in using hyperplanes or hyperspheres in *general* discrimination problems. Don't you agree? Thanks again for the papers! Dario ============================== TWO ===================================== From: M. Niranjan <niranjan> Date: Mon, 21 Nov 88 14:03:16 GMT To: dario@bitnet.techunix Subject: RBF etc With RBFs of the Gaussian type, the class conditional density function is approximated by a mixture of multiple Gaussians. But the parameters of the mixture are estimated to maximise the discrimination rather than modelling the individual probability densities. > If I have no a-priori information about the distribution of the classes > then how can I know which kind of nodes will perform better? There is no way other than by a set of experiments. In small scale problems, we can probably plot cross sections of the feature space, or even projections of it on a linear discriminant plane and get some rough idea. > problem of how many partitions of the n-dimensional Euclidean space > can be obtained using N (proposed shape) boundaries. It is not how many different partitions; I think our problem in pattern classification is dealing with breakpoints of class boundary. It is this capability that is the power in MLPs (and RBFs). In a two class problem, we still partition the input space into two using N boundary segments (or splines), with N-1 break-points. What I like about RBFs is that you can have a probabilistic interpretation. With standard MLPs this is not very obvious and what happens is more like a functional interpolation. > both have VC-dimension n+1, so I think there is really no difference I dont know what VC-dimension is. Any reference please? Best wishes niranjan ============================ THREE ======================================= Date: Tue, 22 Nov 88 08:22:53 +0200 From: Dario Ringach <dario@EARN.TECHUNIX> To: M. Niranjan <niranjan@UK.AC.CAM.ENG.DSL> Subject: Re: RBF etc Thanks for your Re! [some stuff deleted] > > I think that in this case the best we can do is to look at the combinatoric al > > problem of how many partitions of the n-dimensional Euclidean space > > can be obtained using N (proposed shape) boundaries. > > It is not how many different partitions; I think our problem in pattern > classification is dealing with breakpoints of class boundary. It is this > capability that is the power in MLPs (and RBFs). In a two class problem, > we still partition the input space into two using N boundary segments > (or splines), with N-1 break-points. Sure, I agree. But if you address the question of how many hidden units of a determined type you need to classify the input vector into one of N distinct classes, and consider it a rough measure of the complexity of the boundary class proposed for the units, then the problem seems to be the one of partitioning the input space. Note that I don't care about the nature of the class shapes in real world problems, in this case I must agree with you that the issue of breakpoints of the class boundary becomes of real importance. [...] > > I dont know what VC-dimension is. Any reference please? > An earlier draft is "Classifying Learnable Geometric Concepts with the Vapnik-Chervonenkis Dimension" by D. Haussler et al, at FOCS '86, pp 273-282. But if you don't know what the Valiant's lernability model is take a look at "A Theory of the Learnable" by L. Valiant, CACM 27(11), 1984, pp 1134-42. The original article by Vapnik and Chervonenkis is "On the Uniform Convergence of Relative Frequencies of Events to their Probabilities", Th. Prob. and its Appl., 16(2), 1971, pp 264-80. More up-to-date papers dealing with the VC-dimension can be found at the Proc. of the first Workshop on Computational Learning Theory, COLT '88, held at MIT last June. - --Dario. =========================== THE END ===================================== ------------------------------ Subject: Journal of Complexity From: hirsch%math.Berkeley.EDU@cartan.berkeley.edu Date: Wed, 23 Nov 88 11:37:07 -0800 The issue on neural nets is Volume 4, Number 3, September 1988. Address is 1 East First St Duluth MN 55802. Professor Morris W. Hirsch Department of Mathematics University of California Berkeley, CA 94720 USA Phone: (415) 642-4318 (messages) (415) 642-5026 e-mail: hirsch@math.berkeley.edu ------------------------------ Subject: car pooling from Denver Airport to NIPS conference hotel From: john moody <moody-john@YALE.ARPA> Date: Fri, 25 Nov 88 17:01:03 -0500 I'm arriving at Denver Airport at 10:15 PM Monday night (after the last shuttle leaves the airport for the hotel) and will probably have to rent a Hertz car to get to the hotel. Would anyone out there arriving Monday night like to car pool with me and possibly split the cost of a one-day car rental? (Starving students are welcome to tag along for free.) If interested, please reply ASAP. - --John Moody (203)432-6493 ------------------------------ Subject: NIPS Speech Workshop From: John.Hampshire@SPEECH2.CS.CMU.EDU Date: Mon, 28 Nov 88 14:02:50 -0500 This is a preliminary outline for those planning to attend the speech workshop following NIPS 88 in Keystone, CO. For answers to questions/details, please contact Alex Waibel. Speech Workshop - ------------------ - ----Dec. 1, eve: Overview. All Groups meet. --------------------------- - ----Dec. 2, Neural Nets (NN) and Hidden Markov Models (HMM)--------------- 7:30 - 9:30 Introduction. Short Informal Presentations (15 mins each). Connectionist Speech. The HMM/NN Debate. (Alex Waibel, CMU) State of the Art in HMMs (Rich Schwartz, BBN) Links between HMMs and NNs (Herve Bourlard, ICSI) Commonalities, Differences, HMMs, NNs. (John Bridle, RSRE) NNs and HMMs (Richard Lippmann, Lincoln Labs) Brief Questions and Answers. 4:30 - 6:30 Discussion. NNs, HMMs. Strengths and Weaknesses, Commonalities. Comparisons. Performance, Computational Needs, Extensions. Hybrid Approaches. Evening: Highlights. - ----Dec. 3, Directions for Connectionist Speech Understanding.----------- 7:30 - 9:30 Introduction. Phoneme Recognition. Word Recognition. Syntax. Semantics. Pragmatics. Integral System Design. Learning Algorithms. Computational Needs/ Limitations. Large Scale Neural System Design. Modularity. Instruction. Heuristic Knowledge. 4:30 - 6:30 Discussion. Extensions. Evening: Highlights. Summary. ------------------------------ Subject: Flaming on Neural Nets and Transfer Functions From: alexis%yummy@GATEWAY.MITRE.ORG Organization: The Internet Date: 18 Nov 88 15:18:37 +0000 I have to admit some surprise that so many people got this "wrong." Our experience is that neural nets of the PDP/backprop variety are at their *BEST* with continueous mappings. If you just want classification you might as well go with nearest-neighbor alg.s (or if you want the same thing in a net try Nestor's Coulombic stuff). If you can't learn x=>sin(x) in a couple of minutes, you've done something wrong and should check your code (I'm assuming you thought to scale sin(x) to [0,1]). Actually, requiring a PDP net to output 1's and 0's means your weights must be quite large which takes alot of time and puts you way out on the tails of the sigmoids where learning is slow and painful. What I do for fun (?) these days is try to make nets output sin(t) {where t is time} and other waveforms with static or "seed" wave inputs. For those who like math, G. Cybenko (currently of U. Illinois and starting 12/10/88 of Tufts) has a very good paper "Approximation by Superpositions of a Sigmoidal Function" where he gives a existence proof that you can uniformly approximate any continuous function with support in the unit hypercube. This means a NN with one hidden layer (1 up from a perceptron). Certainly more layers generally give more compact and robust codings ... but the theory is *finally* coming together. Alexis Wieland .... alexis%yummy@gateway.mitre.org ------------------------------ Subject: RE: advantages of NNs over symbolic systems From: kortge@psych.Stanford.EDU (Chris Kortge) Date: Wed, 23 Nov 88 14:41:31 -0800 >From: bradb@ai.toronto.edu (Brad Brown) > Neural network-based systems have advantages over symbolic > systems for the following reasons. > [...] > (2) Neural nets can adapt to changes in their environment. > For instance, a financial expert system implemented as > a NN could use new information to modify its > performance over time to reflect changing market > conditions. Symbolic systems are usually either static > or require re-training on a substantial fraction of the > dataset to adapt to new data. I'm a Connectionist, but I don't think this advantage typically holds. The powerful existing learning procedures, those which can learn distributed representations (e.g. back-prop), actually require that the environment (i.e., the input distribution) remain _fixed_. If, after learning, you change the environment a little bit, you can't just train on the new inputs; rather, you must retrain on the entire distribution. Otherwise, the NN happily wipes out old knowledge in order to learn the new. Roger Ratcliffe at Northwestern has a new paper (unpublished as yet, I believe) on this problem with regard to modeling recognition memory. Also, Stephen Grossberg pointed the problem out long ago, and his ART networks don't suffer from it (but they can't learn distributed representations, either). His system has a learned attention mechanism, which gates learning such that it only occurs in response to novel inputs. Back-prop networks don't have such a mechanism, so it's only natural they don't treat new information any different from old. Chris Kortge kortge@psych.stanford.edu ------------------------------ End of Neurons Digest *********************