neuron-request@HPLABS.HP.COM ("Neuron-Digest Moderator Peter Marvit") (02/14/89)
Neuron Digest Monday, 13 Feb 1989 Volume 5 : Issue 9 Today's Topics: Weight Decay Re: Weight Decay Re: Weight Decay Where to order NIPS proceedings? Harmony theory wish to find out more about neural nets (Weather Simulation) Presenting Enidata Research Group Request for Information summer-schools Image and Data Compression Markov chains and Multi-layer perceptrons Speech recognition information requested ART and BP Send submissions, questions, address maintenance and requests for old issues to "neuron-request@hplabs.hp.com" or "{any backbone,uunet}!hplabs!neuron-request" ARPANET users can get old issues via ftp from hplpm.hpl.hp.com (15.255.16.205). ------------------------------------------------------------ Subject: Weight Decay From: movellan%garnet.Berkeley.EDU@violet.berkeley.edu Date: Mon, 23 Jan 89 20:33:34 -0800 Referring to the compilation about weight decay from John: I cannot see the analogy between weight decay and ridge regression. The weight solutions in a linear network (Ordinary Least Squares) are the solutions to (I'I) W = I'T where: I is the input matrix (rows are # of patterns in epoch and columns are # of input units in net). T is the teacher matrix (rows are # of patterns in epoch and columns are # of teacher units in net). W is the matrix of weights (net is linear with only one layer!). The weight solutions in ridge regression would be given by (I'I + k<1>) W = I'T. Where k is a "shrinkage" constant and <1> represents the identity matrix. Notice that k<1> has the same effect as increasing the variances of the inputs (Diagonal of I'I) without increasing their covariances (rest of the I'I matrix). The final effect is biasing the W solutions but reducing the extreme variability to which they are subject when I'I is near singular (multicollinearity). Obviously collinearity may be a problem in nets with a large # of hidden units. I am presently studying how and why collinearity in the hidden layer affects generalization and whether ridge solutions may help in this situation. I cannot see though how these ridge solutions relate to weight decay. -Javier ------------------------------ Subject: Re: Weight Decay From: kanderso@BBN.COM Date: Tue, 24 Jan 89 13:54:04 -0500 [[ referring to the previous note ]] Yes i was confused by this too. Here is what the connection seems to be. Say we are trying to minimize an energy function E(w) of the weight vector for our network. If we add a constraint that also attempts to minimize the length of w we would add a term kw'w to our energy function. Taking your linear least squares problem we would have E = (T-IW)'(T-IW) + kW'W dE/dW = I'IW - I'T + kW setting dE/dW = 0 gives [I'I +k<1>]W = I'T, ie. Ridge Regression. W = [I'I + k<1>]^-1 I'T The covariance matrix is [I'I + k<1>]^-1 so the effect of increasing k 1. Make the matrix more invertable. 2. Reduces the covariance so that new training data will have less effect on your weights. 3. You loose some resolution in weight space. I agree that collinearity is probably very important, and i'll be glad to discuss that off line. k ------------------------------ Subject: Re: Weight Decay From: Yann le Cun <neural!yann@hplabs.HP.COM> Date: Wed, 25 Jan 89 15:13:58 -0500 Consider a single layer linear network with N inputs. When the number of training pattern is smaller than N , the set of solutions (in weight space) is a proper linear subspace. adding weight decay will select the minimum norm solution in this subspace (if the weight decay coefficient is decreased with time). The minimum norm solution happens to be the solution given by the pseudo-inverse technique (cf Kohonen), and the solution which optimally cancels out uncorrelated zero mean additive noise on the input. - Yann Le Cun ------------------------------ Subject: Where to order NIPS proceedings? From: Jose A Ambros-Ingerson (Dept of ICS, UC Irvine) <jose%harlie.ics.uci.edu@PARIS.ICS.UCI.EDU> Date: Sat, 21 Jan 89 20:46:58 -0800 Would someone be so kind as to send me the address of where to order the proceedings for NIPS 87 and 88 (IEEE Conference on Neural Information Processing Systems). Thanks in advance, Jose A. Ambros-Ingerson email: jambros@ics.uci.edu Dept. of Information and Computer Science Phone: (714) 856-7310 University of California (714) 856-7473 Irvine CA, 92717 ------------------------------ Subject: Harmony theory From: andrew@berlioz.NSC.COM (Andrew Palfreyman) Date: Tue, 24 Jan 89 19:34:05 -0800 Having just reviewed the simple model of Smolensky's harmony theory in "Explorations in PDP" (disk programs), I fell to musing whether hard problems describable by a set of nonlinear coupled equations might yield to such a parallelised approach; weather and the gravitational many-body problem come first to mind. Note that these have been attacked by supers and, in the latter case, by a special-purpose super called the GF-11 with gigaflop capability. Might there not be a cheaper way was the muse... anybody done any serious exploration of problem domains like this with such paradigms as harmony? Andrew Palfreyman, MS D3969 PHONE: 408-721-4788 work National Semiconductor 408-247-0145 home 2900 Semiconductor Dr. there's many a slip P.O. Box 58090 'twixt cup and lip Santa Clara, CA 95052-8090 DOMAIN: andrew@logic.sc.nsc.com ARPA: nsc!logic!andrew@sun.com USENET: ...{amdahl,decwrl,hplabs,pyramid,sun}!nsc!logic!andrew ------------------------------ Subject: wish to find out more about neural nets From: gerry@toadwar.UCAR.EDU (gerry wiener) Date: Tue, 24 Jan 89 23:07:28 -0700 [[ Editor's Note: Normally, I'd rather not carry too many "I'm just beginning" messages, but his application is intrguing since it normally uses copious amounts of copmute time anyway. I'd like to hear of any references as well. -PM ]] I'm interested in finding out more about neural nets. I work at the National Center for Atmospheric Research and we're interested in seeing if neural network ideas can be applied to weather forecasting and prediction. Any useful information such as a bibliography containing references would be helpful. Thank you very much. Gerry Wiener NCAR P.O. Box 3000 Boulder, Co. 80307-3000 ------------------------------ Subject: Presenting Enidata Research Group From: mcvax!enidbo.bo.enidata.it!daniele%bo.enidata@uunet.UU.NET (Daniele Montanari) Date: Wed, 25 Jan 89 08:34:48 -0800 [[ Editor's Note: Part of the raison d'etre of this Digest is for researchers to make themselves and their projects known. I welcome more of this biographical entries so otehrs may learn of your work. -PM ]] This message is a presentation of the group that has been formed at Enidata and works on neural nets, classifier systems, and in general systems with complex dynamics. We are four people coordinated by Roberto Serra, with backgrounds in physics, mathematics, and electronic engineering. Complex dynamics was the original field of interest of the early members of the group. The first work involving neural nets concerned modified Hopfield models. Our major interests are currently in multilayer nets with back-prop, and classifier systems. Machine learning, genetic algorithms, complex dynamics, pattern recognition, parallel distributed processing, higher-order neural nets, are some of the areas we are interested in. Basic research and application are both interesting for us. Some of our work has been organized in form of papers and/or internal reports, which we are happy to distribute to anyone interested. Compiani M., Montanari D., Serra R., and G. Valastro, ``Neural Nets and Classifier Systems'', in the Proceedings of the First Italian Workshop on Parallel Architectures and Neural Networks (E. Caianiello and R. Tagliaferri Eds.), World Scientific Publishers, Singapore (in press). Serra R., Zanarini G., and F. Fasano, ``Attractors, learning and recognition in generalised Hopfield networks'', in the Proceedings of Cognitiva 87, Volume 1, p. 459 (May 1987). Serra R., Zanarini G., and F. Fasano, ``Cooperative phenomena in Artificial Intelligence'', J. Molec. Liquids, Volume 39, pp. 207-231, 1988. Serra R., Zanarini G., and F. Fasano, ``Generalised Hopfield learning rules'', in Chaos and Complexity, Livi R. et al. Eds., World Scientific, Singapore, (1988). Serra R., Zanarini G., and F. Fasano, ``A theorem on complementary patterns in Hopfield--like networks'', Enidata internal report SAP--2--88 SFZ (1988). Serra R., ``Dynamical systems and expert systems'', in the Proceedings of Connectionism in Perspective, R. Pfeifer Ed., Elsevier (in press). Compiani M., Montanari D., Serra R., Simonini P., Valastro G., ``Dynamics of classifier systems'', Enidata internal report (1988). Our e-mail addresses are Mario Compiani: mc@enidbo.it.uucp Daniele Montanari: daniele@enidbo.it.uucp Roberto Serra: rse@enidbo.it.uucp Gianfranco Valastro: gv@enidbo.it.uucp ( {any backbone, mcvax!}i2unix!enidbo!<username> may also be used). Ciao Daniele ------------------------------ Subject: Request for Information From: "Walter L. Peterson, Jr." <ucsdhub!calmasd!wlp@SDCSVAX.UCSD.EDU> Organization: Prime-Calma, San Diego R&D, Object and Data Management Group Date: 30 Jan 89 17:04:34 +0000 I have posted this request before, but after I did I realized that it was at the beginning of semester break. Since many readers of this group are in academia, I'm posting again now that everyone is back to school. I am looking for references to recent work in the area of learning rates in artificial neural networks, particularly back-propagation networks. This is for some research that I am doing for my thesis for my MS in Computer Science. The most recent references that I have are from the "Procedings of the 1988 Connectionist Models Summer School" at CMU. Also, if anyone "out there" is doing or has done work in this field, I would like to hear from you. (P.S. Thanks for the couple of responses I did get the last time). Walter L. Peterson email: wlp@calmasd.Prime.COM snail mail: Calma - A Divison of Prime Computer, inc. 9805 Scranton Rd. San Diego, CA 92121 Walt Peterson. Prime - Calma San Diego R&D (Object and Data Management Group) "The opinions expressed here are my own and do not necessarily reflect those Prime, Calma nor anyone else. ...{ucbvax|decvax}!sdcsvax!calmasd!wlp ------------------------------ Subject: summer-schools From: andreas herz <BY9%DHDURZ1.BITNET@CUNYVM.CUNY.EDU> Date: Thu, 02 Feb 89 00:55:29 +0700 Is there anybody, who knows about interesting summer-schools on neural networks in the U.S.A. or Canada this summer/fall, where the biological "roots" of the field are treated as well as theoretical approaches ???? What about conferences or smaller meetings? Thanks for helping me! Andreas Herz, University of Heidelberg,FRG ------------------------------ Subject: Image and Data Compression From: Jon Ryshpan <jon@nsc.NSC.COM> Date: Wed, 01 Feb 89 14:02:01 -0800 I am interested in collating data on image (or other) data compression techniques from the Neural Network research and development arena. Would you kindly send your contributions (in any mode) to: Andrew Palfreyman, MS D3969 PHONE: 408-721-4788 work National Semiconductor 408-247-0145 home 2900 Semiconductor Dr. there's many a slip P.O. Box 58090 'twixt cup and lip Santa Clara, CA 95052-8090 DOMAIN: andrew@logic.sc.nsc.com ARPA: nsc!logic!andrew@sun.com USENET: ...{amdahl,decwrl,hplabs,pyramid,sun}!nsc!logic!andrew ------------------------------ Subject: Markov chains and Multi-layer perceptrons From: kruschke@cogsci.berkeley.edu (John Kruschke) Date: Wed, 08 Feb 89 18:40:44 -0800 [[ Re: a request for citation ]] Herve Bourlard and C.J. Wellekens ``Links between markov models and multilayer perceptrons'' Tech Report TR-88-008 27 pages, $1.75 write to: Librarian International Computer Science Institute 1947 Center St., Suite 600 Berkeley, CA 94704 info: info@icsi.berkeley.edu (415)643-9153 Hope that answers your question (of 21 January, Neuron Digest 5(8) ). --John. ------------------------------ Subject: Speech recognition information requested From: Christel Kemke <kemke%fb10vax.informatik.uni-saarland.dbp.de@RELAY.CS.NET> Date: 09 Feb 89 10:09:00 -0100 I am interested in speech recognition using neural networks, especially in combination with natural language processing components (using e.g. syntactic information for disambiguation etc.). I would be grateful for any hints to literature and existing systems. If possible, I will write a small study and post it to the newsletter. Thanks in advance. Christel Kemke DFKI Standort Saarbruecken Stuhlsatzenhausweg D-6600 Saarbruecken 11 dfn: kemke@fb10vax.informatik.uni-saarland.dbp.de ------------------------------ Subject: ART and BP From: tony bell <tony%ifi.unizh.ch@RELAY.CS.NET> Date: 08 Feb 89 15:18:00 +0100 I'm sure that Andrew Palfreyman' provacative message comparing BP and ART will attract a lot of feedback. I'd just like to add my piece about why BP is more popular: Looking at the two algorithms in terms of what they compute, ART is nothing more than a clustering algorithm similar to Rumelhart & Zipser's Competitive learning (though predating it). Also the metric for the clustering is not Euclidean, but dependant on the order of presentation of the patterns. (See Barbera Moore's paper in the Proc. Connectionist Summer School 1988 for details). But BP, using the richness of error information to optimise, can produce arbitrary segmentations of the input space, given sufficient computational units. Thus, BP is more powerful a categoriser if nothing else. Grossberg's preoccupations with the stability of learning and the construction of prototypes derive more from psychological than computational requirements. The ART style of learning without a teacher is certainly very interesting, but it has, in my view, been better analysed from a 'computational' point of view in works by Linsker and Kohonen. The most accessible examples of these people's work appear in IEEE Computer of March 1988. Tony Bell, University of Zurich. ------------------------------ End of Neurons Digest *********************