neuron-request@HPLABS.HP.COM ("Neuron-Digest Moderator Peter Marvit") (07/08/89)
Neuron Digest Friday, 7 Jul 1989 Volume 5 : Issue 29 Today's Topics: rochester connectionist simulator available on uunet.uu.net Two Problems with Back-propagation How to simulate Foreign Exchange Rates Re: How to simulate Foreign Exchange Rates References on learning and memory in the brain DARPA Neural Network Study ART and non-stationary environments Karmarkar algorithm NEURAL NETWORKS TODAY (a new book on the subject) Re: Neural-net marketplace RE: climatological data wanted Kohonen musical application of neural network Back Propagation Algorithm question... Back Propagation question... (follow up) Re: Back Propagation question... (follow up) Re: Back Propagation question... (follow up) Re: Back Propagation question... (follow up) Re: Back Propagation question... (follow up) Re: Back Propagation Algorithm question... Accelerated learning using Dahl's method Info on DYSTAL? 3-Layer versus Multi-Layer Re: 3-Layer versus Multi-Layer Re: 3-Layer versus Multi-Layer Re: 3-Layer versus Multi-Layer Send submissions, questions, address maintenance and requests for old issues to "neuron-request@hplabs.hp.com" or "{any backbone,uunet}!hplabs!neuron-request" ARPANET users can get old issues via ftp from hplpm.hpl.hp.com (15.255.16.205). ------------------------------------------------------------ Subject: rochester connectionist simulator available on uunet.uu.net From: bukys@cs.rochester.edu (Liudvikas Bukys) Organization: U of Rochester, CS Dept, Rochester, NY Date: Fri, 21 Apr 89 14:39:56 +0000 A number of people have asked me whether the Rochester Connectionist Simulator is available by uucp. I am happy to announce that uunet.uu.net has agreed to be a redistribution point of the simulator for their uucp subscribers. It is in the directory ~uucp/pub/rcs on uunet: -rw-r--r-- 1 8 11 2829 Jan 19 10:07 README -rw-r--r-- 1 8 11 527247 Jan 19 09:57 rcs_v4.1.doc.tar.Z -rw-r--r-- 1 8 11 9586 Jul 8 1988 rcs_v4.1.note.01 -rw-r--r-- 1 8 11 589 Jul 7 1988 rcs_v4.1.patch.01 -rw-r--r-- 1 8 11 1455 Apr 19 19:18 rcs_v4.1.patch.02 -rw-r--r-- 1 8 11 545 Aug 8 1988 rcs_v4.1.patch.03 -rw-r--r-- 1 8 11 837215 May 19 1988 rcs_v4.1.tar.Z These files are copies of what is available by FTP from the directory pub/rcs on cs.rochester.edu (192.5.53.209). We will still send you, via U.S. Mail, a tape and manual for $150 or just a manual for $10. If you are interested in obtaining the simulator via uucp, but you aren't a uunet subscriber, I can't help you, because I don't know how to sign up. Maybe a note to postmaster@uunet.uu.net would get you started. Liudvikas Bukys <simulator-request@cs.rochester.edu> ------------------------------ Subject: Two Problems with Back-propagation From: Rich Sutton <rich@gte.com> Date: Tue, 25 Apr 89 14:03:03 -0400 A recent posting referred to my paper that analyzes steepest descent procedures such as back-propagation. That posting requested the full citation to the paper: Sutton, R.S. ``Two problems with backpropagation and other steepest-descent learning procedures for networks'', Proceedings of the Eighth Annual Conference of the Cognitive Science Society, 1986, pp. 823-831. The paper is not really ``an attack'' on gradient descent, but an analysis of its strengths and weaknesses with an eye to improving it. The analysis suggests several directions in which to look for improvements, but pursues none very far. Subsequent work by Jacobs (Neural Networks, 1988, p.295) and Scalettar and Zee (Complex Systems, 1987) did pursue some of the ideas, but others remain unexplored. Most of the discussion is still relevant today, though I now have more doubts about simply adopting conventional (non-steepest) descent algorithms for use in connectionist nets. -Rich Sutton ------------------------------ Subject: How to simulate Foreign Exchange Rates From: harish@mist.CS.ORST.EDU (Harish Pillay) Organization: Oregon State University, E&CE, Corvallis, Oregon 97331 Date: Wed, 26 Apr 89 05:32:51 +0000 I am taking a grad course on NN and am planning on doing a project trying to predict foreign exchange rates specifically the following: US$ vs British Pound vs Japanese Yen vs Singapore $ vs German Marks I am using NeuralWorks and am thinking of using the backprop strategy. So far, all I've done is to gather the exchange rates reported in the WSJ from March 17 to today. I've normalized it to be within 0 and 1 but my problem is in trying to train the network. Has anyone out there done anything similar to this? If so, what desired output values did you use to train? I understand that it is naive to just take the rates themselves and try to get a pattern or correlation. Should I be looking at other values too? What kind of transfer function should I use? I think one hidden layer may be sufficient. I would really appreciate any suggestions, and will post something once I get this project done. Thanks. Harish Pillay Internet: harish@ece.orst.edu Electrical and Computer Engineering MaBell: 503-758-1389 (home) Oregon State University 503-754-2554 (office) Corvallis, OR 97331 ------------------------------ Subject: Re: How to simulate Foreign Exchange Rates From: andrew@berlioz (Andrew Palfreyman) Organization: National Semiconductor, Santa Clara Date: Wed, 26 Apr 89 09:06:30 +0000 One brute force method, to separate the chicken from the egg, might be to use the changes instead of the absolute values (especially since you're using localised data which doesn't span a boom or a crash). Maybe then you could use 3 inputs in parallel (3 currencies) and 2 outputs, and just ring the changes (5c3 = 10 ways) until the input deltas produce correct output deltas. An associative net might do this better. Else, you could play with recursive nets (Jordan, etc.), whereby you try and predict tomorrow's 5-vector, given today's. Andrew Palfreyman USENET: ...{this biomass}!nsc!logic!andrew National Semiconductor M/S D3969, 2900 Semiconductor Dr., PO Box 58090, Santa Clara, CA 95052-8090 ; 408-721-4788 there's many a slip 'twixt cup and lip [[ Editor's Note: This problem points out the more general one of input/output representation; this is still a hot topic in AI circles and even in many traditional computing fields. The representation often determines the architecture and ouytcome. Strict and useful guidelines for ANNs don't yet exist. Modeling the changes, rather than values, seems like a neat solution amenable to many problems, however. -PM ]] ------------------------------ Subject: References on learning and memory in the brain From: honavar@goat.cs.wisc.edu (Vasant Honavar) Organization: U of Wisconsin CS Dept Date: Wed, 26 Apr 89 17:20:57 +0000 I am looking for papers (good reviews in particular) on learning and memory mechanisms in the brain from the perspectives of neuroscience and psychology. Please e-mail me the lists of papers that you know of and I will compile a bibliography and make it available to anyone that is interested. Thanks. Vasant Honavar Computer Sciences Dept. University of Wisconsin-Madison honavar@cs.wisc.edu [[ Editor's Note: Hmmm, there are more papers and books on this subject than I can remember. However, one of the best recent (survey) books, complete with a reasonable bibliography to get you started, is "Memory and Brain" by Larry R. Squire (1987, Oxford University Press). -PM ]] ------------------------------ Subject: DARPA Neural Network Study From: djlinse@phoenix.Princeton.EDU (Dennis Linse) Organization: Princeton Unversity, Princeton, NJ Date: Thu, 27 Apr 89 01:57:43 +0000 I recently saw an advertisement for the complete report of the October 1987 to February 1988 DARPA report on U.S. national perspectives on neural networks. Has anyone seen/read this report? Is it useful for a researcher, or is it written more from the funding agency perspective? Any information would be useful. And before I get inundated with requests, the publication information is: $49.95 casebound/ over 600 pages shipping/handling, $5.00 for the first copy, $1.50 for each additional copy shipped to a U.S. or Canada address. $10 per copy to all other addresses. AFCEA International Press 4400 Fair Lakes Court, Dept. S1 Fairfax VA 22033-3899 (703) 631-6190 (800) 336-4583 ext. 6190 Dennis (djlinse@phoenix.princeton.edu) Found at the top of a looonnng homework assignment: "Activity is the only road to knowledge" G.B. Shaw ------------------------------ Subject: ART and non-stationary environments From: adverb@bucsb.UUCP (Josh Krieger) Organization: Boston Univ Comp. Sci. Date: Thu, 27 Apr 89 20:40:50 +0000 I think it's important to say one last thing about ART: ART is primarily usefull in a statistically non-stationary environment because its learned categories will not erode with the changing input. If your input environment is stationary, then there may be little reason to use the complex machinery behind ART; your vanilla backprop net will work just fine. -- Josh Krieger ------------------------------ Subject: Karmarkar algorithm From: andrew@berlioz (Lord Snooty @ The Giant Poisoned Electric Head) Organization: National Semiconductor, Santa Clara Date: Sat, 29 Apr 89 00:49:04 +0000 Does anybody have comparitive data on the Karmarkar algorithm in respect of neural-net implementations? The algorithm is apparently quite efficient at optimising constraints in large parameter spaces, an area where comparitive data on the neural approach would be very interesting. In particular, does anybody know of attempts to somehow encode this algorithm in a neural/parallel fashion, or indeed if this is possible? Finally, could anybody recommend a reference work on Karmarkar? (for novices, please!) - thanks in advance. Andrew Palfreyman USENET: ...{this biomass}!nsc!logic!andrew National Semiconductor M/S D3969, 2900 Semiconductor Dr., PO Box 58090, Santa Clara, CA 95052-8090 ; 408-721-4788 there's many a slip 'twixt cup and lip ------------------------------ Subject: NEURAL NETWORKS TODAY (a new book on the subject) From: mmm@cup.portal.com (Mark Robert Thorson) Organization: The Portal System (TM) Date: Thu, 04 May 89 22:36:01 +0000 [[ Editor's Note: Caution! Advertisement here. Caveat Emptor, especially via-a-vis the slight hyperbole below. However, possibly quite useful to the small audience who needs it. I also saw a service advertised at IJCNN which would mail you quarterly updates on what patents were files in this field. Sorry, I didn't pick up the literature. -PM ]] I have just finished a book titled NEURAL NETWORKS TODAY, which is available for $35 (plus $5 for postage and handling and 7% sales tax if you live in California). It's 370 pages, not counting title page, table of contents, and the separators between chapters. Velo-bound with soft vinyl covers. Its contents include descriptions of 14 hardware implementations of neural networks, by Leon Cooper, John Hopfield, David Tank, Dick Lyon, and others. These implementations come from Nestor Associates, AT&T, Synaptics, and others. The source material is the U.S. patents currently in force in the field. About 250 pages are copies of these patents, about 100 pages are my commentary on these patents. (Patents are somewhat difficult to read; my commentary makes everything clear.) This book should be of primary interest to researchers doing patentable work in the field of neural networks. It's like getting a patent search for a fraction of the usual price. This book should also be of interest to people new to the field of neural networks. My commentary is at an introductory level, while the source material is at a very detailed and technical level. My commentary can help someone acquire expert knowledge in a short period of time. I will accept checks and corporate or university purchase orders. Mark Thorson 12991 B Pierce Rd. Saratoga, CA 95070 ------------------------------ Subject: Re: Neural-net marketplace From: demers@beowulf.ucsd.edu (David E Demers) Organization: EE/CS Dept. U.C. San Diego Date: Sun, 14 May 89 20:51:52 +0000 In article <159@spectra.COM> eghbalni@spectra.COM (Hamid Eghbalnia) writes: > This is a purely a curiosity question. How are the "NN-type" > companies doing? I suppose the underlying question is: Has > anybody been able to use the technology to develop applications > that has excited government or industry beyond just research? I just read that SAIC just received a $100 million contract to provide airports with bomb-sensing luggage-scanning devices based on neural nets. I believe the order was from the FAA. I don't have the article handy, it might have been in EETimes - definitely a trade publication - within the past week or so. SAIC, of course, is not primarily a NN company, but $100MM is a big chunk of business no matter who you are. Dave ------------------------------ Subject: RE: climatological data wanted From: Albert Boulanger <bbn.com!aboulang@BBN.COM> Date: 20 May 89 19:54:50 +0000 maurer@nova2.Stanford.EDU (Michael Maurer) writes: Does anybody out there know of electronic sources for climatological data compiled by the US weather service? The National Environmental Satellite Data and Information Service publishes thick books full of daily weather records from weather stations around the country, but I would prefer the data in machine-readable format. I am doing a research project for an Adaptive Systems class and am interested in short-term weather prediction using a system that learns from past weather records. Please e-mail any info you might have. How do you plan to do short-term prediction from data that is long term (and hence below the sampling resolution you want)? You can get computer media of this from: National Climate Center, Environmental Data Information Service NOAA, Dept of Commerce Federal Building Asheville NC 28801 (704) 258-2850 I got the initial pointer from a friend and the complete address form the book: Information USA Matthew Lesko Viking Press 1983 ISBN 0-670-39823-3 (hardcover) ISBN 0 14 046.564 2 (paperback) For shorter term records (~several days), there are a couple of dozen services that provide such information (WSI in MA being one). Albert Boulanger BBN Systems & Technologies Corp aboulanger@bbn.com ------------------------------ Subject: Kohonen musical application of neural network From: viseli@uceng.UC.EDU (victor l iseli) Organization: Univ. of Cincinnati, College of Engg. Date: Tue, 23 May 89 22:57:44 +0000 [[ Editor's Note: This subject was discussed at some length in previous issues of Neuron Digest. At IJCNN, Kohonen gave a "paper" describing some of his latest experements using a variation of an associative memory to analyze pieces of music (by one composer or period) and remember sequences, then generate "new" music by recursively recalling those sequences of notes with some small variation. The paper can be found in the Proceedings of IJCNN 89. My judgement? As a musician, I found the music tedious and lacking in substance; Mozart's musical dice do better! The music we heard owed more to its production (echoing organ-like synthesizer tones in various registers and timbres) than with the notes. As an engineer, good start without much musical foundation. I certainly hope that the scheduled two hour concert at Winter IJCNN will be more than an indulgence in a famous man's toy. -PM ]] I am looking for information regarding Kohonen's demonstration of a neural network at the INN conference last September, 1988. He apparently trained a neural net to compose music or harmonize to a melody in the manner of famous classical composers (??). I am also interested in any other information/references regarding the recent discussion on neural nets in music. Please send email. || || Victor Iseli (viseli@uceng.uc.edu) || University of Cincinnati \\ // || || Dept. Electrical Engr. \\ // || || 811K Rhodes Hall \\// || || Cincinnati, OH 45219 ------------------------------ Subject: Back Propagation Algorithm question... From: camargo@cs.columbia.edu (Francisco Camargo) Organization: Columbia University Department of Computer Science Date: Mon, 29 May 89 23:26:49 +0000 Can anyone put some light in the following issue: How should one compute the weight adjustments in BackProp ? From reading PDP, one gathers the impression that the DELTAS should be acumulated over all INPUT PATTERNS and only then a STEP is taken towards the gradient. Robins Monroe suggests a stochastic algorithm with proved convergency if one takes one step at each pattern presentation, but dumps its effect by a factor 1/k where "k" is the presentation number. Other people,(from codes that I've seen flying around) seems to take a STEP a each presentation a don't take into account any dumping factors. I've tried myself both approaches and they all seem to work. After all, which is the correct way of adjusting the weights ? Acumulate the errors over all patterns ? Or, work towards the minimum as new patterns are presented. Which are the implications ? Any light is this issue is extremelly appreciated. Francisco A. Camargo CS Department - Columbia University camargo@cs.columbia.edu PS: A few weeks ago, I requested some pointers to Learning Algorithms in NN and promissed a summary of the replies. It is comming. I have not forgoten my responsibilities with this group. Even though I got more requests than really new info, I'll have a summary posted shortly. And thanks for all who contributed. ------------------------------ Subject: Back Propagation question... (follow up) From: camargo@cs.columbia.edu (Francisco Camargo) Organization: Columbia University Department of Computer Science Date: Tue, 30 May 89 14:18:30 +0000 Hi there, I'm re-posting my previous message together with a reply that I received from Tony Plate and my reply to him. I'd really appreciate comments on this issue. Thanks to all. | There are two standard methods of doing the updates, sometimes called | "batch" and "online" learning. | | In "batch" learning, all the changes are accumulated for one pass through | all the examples. At the end of the pass (or "epoch") the update is made. | Thus, each link requires an extra storage field in which to accumulate | the changes. | | In "online" learning, the change is made after seeing each example. | | Some people claim online is better, others claim batch is better. | | "dumping" (you mean "weighting") each change by 1/k, where k is the number | of the example (?) sounds really wierd, do you mean if you had four examples | in your training set changes from the fourth would be worth only a quarter | as much as changes from the second? surely you don't mean this! | | Some people use a momentum term, and some change the learning rate during | learning. Using momentum seems to be generally a good thing, and it's | easy to do. Automatically changing the learning rate is much harder. | | ..... | ..... Connectionist Learning Algorithms by Hinton.... | ..... | | tony plate Hi Tony, Sorry for my previous message being so unspecific. What I meat is that the dumping occurs after each "epoch." The idea is that the changes in the weights tend to be of lesser and lesser importance. Actually, the way the algorithm is stated, one should dump (I really mean dump) the step size by a series of terms {a_k} where "sum({a_k}^2)<infinity", with no restriction in the sum({a_k}). In any case, using {a_k}=1/k for k="epoch number" should be enough. My problem is that I can find any (theoretical) justification for the "online" method other that "Robins Monroe algorithm" (I may have misspelled his name, for which I apologize, but I don't have my references near by). But then, the "dumping" factor is required for guaranteed convergence. I tried the "online" method and it does seem to perform better. But, WHY does it work ? How come it converges so well (despite of making {a_k}=1) ? I am familiar with the use of "momentum" in the learning process, but I really want to understand more the theoretical reasons for the "online" method. Having started my studies with the "batch" mode, it seems a little like black magic that the "online" method works. I have the paper by Hinton, "Connectionist Learning Procedures", CMU-CS-87-115. Is this the paper you refered to ? Any other improvements to this work? I appreciate your time and effort. Thanks, /Kiko. camargo@cs.columbia.edu ------------------------------ Subject: Re: Back Propagation question... (follow up) From: demers@beowulf.ucsd.edu (David E Demers) Organization: EE/CS Dept. U.C. San Diego Date: Tue, 30 May 89 19:23:47 +0000 [Tony replied] | In "batch" learning, all the changes are accumulated for one pass through | all the examples. At the end of the pass (or "epoch") the update is made. | Some people use a momentum term, and some change the learning rate during | learning. Using momentum seems to be generally a good thing, and it's | easy to do. Automatically changing the learning rate is much harder. [No it's not...] >------------------------------------------------------------------------------ [Francisco tries to explain what he means by "dumping", and the "Robins Monroe" algorithm...] [[Editor's Note: Most of the quotations deleted from above. -PM]] Sorry to quote so much of the prior postings, but I thought it worth it to retain context. I am not sure that I fully understand Francisco's question. But I'll answer it anyway :-) Essentially, what backpropogation is trying to do is to acheive a minimum mean squared error by following the gradient of the error as a function of the weights. The "batch" method works well because you get a good picture of the true gradient after seeing all of the input-output pairs. However, as long as corrections are made which go "downhill", then we will converge (possibly to a local rather than global minimum). Making weight changes after presentation of each training example will not necessarily follow the gradient, but with a small learning rate, in the aggregate we will still be moving downhill (reducing MSE). Dave ------------------------------ Subject: Re: Back Propagation question... (follow up) From: dhw@itivax.iti.org (David H. West) Organization: Industrial Technology Institute Date: Tue, 30 May 89 20:09:55 +0000 ]My problem is that I can find any (theoretical) justification for the "online" ]method other that "Robins Monroe algorithm" ]But, WHY does it work ? How come it ]converges so well (despite of making {a_k}=1) ? It's related to an old statistical hack for calculating the change in the mean of a set of observations when another is added. That formula takes 2 or 3 lines of algebra to derive, on a bad day. -David dhw@itivax.iti.org ------------------------------ Subject: Re: Back Propagation question... (follow up) From: mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel) Organization: Princeton University, NJ Date: Tue, 30 May 89 20:28:32 +0000 >But, WHY does it work ? How come it >converges so well (despite of making {a_k}=1) ? > >I am familiar with the use of "momentum" in the learning process, but I >really want to understand more the theoretical reasons for the "online" >method. Having started my studies with the "batch" mode, it seems a little >like black magic that the "online" method works. I have an intuitive explanation, but it's not rigorous by any means, and it could even be completely wrong, but here goes... In most problems, there is some underlying regularity that _all_ examples possess that you're trying to learn. Thus, if you update the weights after each example, you get the benefit of learning from the previous examples, but if you only update after a whole run through the training set, it takes much longer to learn this regularity. In my experiments, I've found that "online" learning works much better at the beginning, when the network is completely untrained, because presumably it's learning the general features of the whole set quickly, but later on, when trying to learn the fine distinctions among examples, "online" learning does worse, because it tries to "memorize" each example in turn instead of learning the whole mapping. In this regime, you have to use batch learning. For many problems though, you never need this level of accuracy (I needed continuous-valued outputs accurate to <1%) and so "online" learning is good enough, and often significantly faster, especially with momentum. Momentum smooths out the weight changes from a few recent examples. (Actually, for my stuff, I like conjugate gradient on the whole "batch" error surface.) Matt Kennel mbkennel@phoenix.princeton.edu (6 more days only!!! ) kennel@cognet.ucla.edu (after that) ------------------------------ Subject: Re: Back Propagation question... (follow up) From: artzi@cpsvax.cps.msu.edu (Ytshak Artzi - CPS) Organization: Michigan State University, Computer Science Department Date: Tue, 30 May 89 23:51:56 +0000 As a general comment, you must be careful in choosing the particular instance of the problem you try to solve. If the initial state is close to the correct solution than both methods will work. For any problem there exists an instance for which the convergence is not guaranteed for either method. Unfortunately, there is no good method available to detect such an instance, given an arbitrary problem. Now consider the following equation: DELTA w = n(t - O )i = nd i p ji pj pj pi pj pi This rule changes weights following presentation of I/O pair p. t is target input for j-th component of output pattern p pj O is the j-th element of the actual output pattern, resulted by pj input p i is the i-th input element pi d = t - O pj pj pj DELTA w is the change to be made from the i-th to j-th unit after p ij input p Hope it helps... Itzik. ------------------------------ Subject: Re: Back Propagation Algorithm question... From: heumann@hpmtlx.HP.COM ($John_Heumann@hpmtljh) Organization: HP Manufacturing Test Division - Loveland, CO Date: Wed, 31 May 89 14:42:03 +0000 A few comments. 1) Note that if backprop is modified by addition of a search (rather than fixed step size) in the minimization direction, its simply a form of gradient descent. In light of this, 2) If you want to accumulate the gradient accurately over the entire search space your forced to either a) accumulate over all samples before altering any weights, or b) take a tiny (actually infinitesmal) step after each sample is presented. Doing a full step after each sample presentation destroys the descent property of the algorithm. 3) If your're using backprop as originally presented (i.e. either fixed step size or with a momentum term), I don't believe there is any general way to establish that one method is superior to the other for all problems. I've seen search spaces on which one wonders rather aimlessly and the other converges rapidly; the trouble is that which is good and which is bad depends on the particular problem! Its certainly true that doing something which causes you to deviate from a true descent path (like adding a momentum term or taking a step after each sample) can help you escape local minima in selected cases and can lead to more rapid convergence on some problems. Unfortunately, it can also lead to aimless wandering and poor performance on others. 4) If you can find the full reference, I'd be interested in seeing the Monroe paper, since I'm unaware of any backprop-like method with proven convergence for non-convex search spaces. 5) Personally, my choice for optimizing NN's is to modify backprop to be a true gradient descent method and the use either the Fletcher-Reeves or or Pollak-Ribiere methods for accelerated convergence. Doing so means you WILL have trouble with local minima if they're present in your search space, but avoids all the tweaky parameters in the original backpropagation algorithm. (Since there's no one set of paramaters that appear applicable across a wide range of problems, you can waste a huge amount of time trying to tweak the learning rate or the size of the momentum term; to my mind this is simply not practical for large problems). Hope this is of some help. ps: Note that Rummelhart et al are purposefully rather vague on whether the weight adjustment is to be done after each sample presentation. If you carefully compare the chapter on backprop in PDP with that in their Nature paper, you'll find that each paper uses a different tactic! ------------------------------ Subject: Accelerated learning using Dahl's method From: csstddm@cc.brunel.ac.uk (David Martland) Organization: Brunel University, Uxbridge, UK Date: Tue, 06 Jun 89 16:34:28 +0000 Has anyone out there tried to implement the accelerated learning method described by Dahl in ICNN87 vol II, p523-530? It appears to work by parabolic interpolation, but is not very clearly described. Alternatively, does anyone have an email address for Dahl? Thanks, dave martland ------------------------------ Subject: Info on DYSTAL? From: "Pierce T. Wetter" <wetter@CSVAX.CALTECH.EDU> Organization: California Institute of Technology Date: 14 Jun 89 05:34:24 +0000 In the new Scientific American, D.L. Alkon describes some work he has done on biological learning and describes a program called DYSTAL which uses this work to train artificial NN. Unfortunately, he doesn't describe the algorithm. Does anyone have any info on DYSTAL or its training method so that I can include it in my NN software? Pierce wetter@csvax.caltech.edu | wetter@tybalt.caltech.edu | pwetter@caltech.bitnet ------------------------------ Subject: 3-Layer versus Multi-Layer From: Jochen Ruhland <mcvax!unido!cosmo!jochenru%cosmo.UUCP@uunet.uu.net> Organization: CosmoNet, D-3000 Hannover 1, FRG Date: 20 Jun 89 01:19:09 +0000 During a local meeting here in Germany I heard somebody talking about a theorem that a three layer perceptron is capable to perform any given In/Out function with an maximum number of hidden units in the network. I forgot to ask where to look for the proof - so I'm asking here. Response may be in german or english. Thanks in advance Jochen ------------------------------ Subject: Re: 3-Layer versus Multi-Layer From: merrill@bucasb.bu.edu (John Merrill) Organization: Boston University Center for Adaptive Systems Date: Tue, 27 Jun 89 18:56:01 +0000 One reference to such a result is Funahashi, K. (1989). "On the Approximate Realization of Continuous Mappings by Neural Networks", {\bf Neural Networks} (2) 183-192. There are actually several different theorems which prove the same thing, but Funahashi's is the first that I know of which does it from standard sigmoid semi-linear nodes. John Merrill | ARPA: merrill@bucasb.bu.edu Center for Adaptive Systems | 111 Cummington Street | Boston, Mass. 02215 | Phone: (617) 353-5765 ------------------------------ Subject: Re: 3-Layer versus Multi-Layer From: demers@beowulf.ucsd.edu (David E Demers) Organization: EE/CS Dept. U.C. San Diego Date: Wed, 28 Jun 89 18:57:45 +0000 For "perceptrons", there is no such proof, since multilayer linear units can easily be collapsed into two-layers. See, e.g., Minsky & Papert, "Perceptrons" (1969). If, however, units can take on non-linear activations, then it can be shown that a three layer network can approximate any Borel-measurable function to any desired degree of accuracy (exponential in the number of units, however!). Hal White et al have shown this, and have also shown that the mapping is learnable. This paper is going to appear this year in the Journal of INNS, Neural Networks. The source of this is frequently listed as the Kolmogorov superposition theorem. Robert Hecht-Nielsen has a paper in the 1987 Proceedings of the First IEEE conference on Neural Networks about this theorem. The theorem is not constructive, however. It shows that a function from R^m to R^n can be represented by the superposition of {some number linear in m & n} bounded, monotonic, non-linear functions of the m inputs. However, there is no way of determining these functions... I am writing all of this from memory, all of my papers are elsewhere right now... but I know that others have similar results. Dave ------------------------------ Subject: Re: 3-Layer versus Multi-Layer From: Matthew Kennel <mara!mickey.cognet.ucla.edu!kennel@LANAI.CS.UCLA.EDU> Organization: none Date: 03 Jul 89 18:06:42 +0000 I recently saw a preprint by some EE professors at Princeton who made a constructive proof using something called the "inverse Radon transform", or something like that. What I think the subject needs is work on characterizing the "complexity" of continuous mappings, w.r.t. neural networks--- i.e. how many hidden units (free coefficients) are needed to reproduce some mapping with a certain accuracy? Obviously, this depends crucially on the functional basis and architecture of the network---we might be able to thus evaluate various network types on their power and efficiency in a practical way, and not just formal (i.e. given infinite hidden neurons). My undergrad thesis adviser, Eric Baum, has been working on this type of problem, but for binary-valued networks, i.e. networks that classify the input space into arbitrary categories. The theory is quite mathematical---as a "gut feeling" I suspect that for continous-valued networks, only approximate results would be possible. Matt Kennel kennel@cognet.ucla.edu ------------------------------ End of Neurons Digest *********************