jochenru@cosmo.UUCP (Jochen Ruhland) (06/20/89)
During a local meeting here in Germany I heard somebody talking about a theorem that a three layer perceptron is capable to perform any given In/Out function with an maximum number of hidden units in the network. I forgot to ask where to look for the proof - so I'm asking here. Response may be in german or english. Thanks in advance Jochen
merrill@bucasb.bu.edu (John Merrill) (06/28/89)
One reference to such a result is Funahashi, K. (1989). "On the Approximate Realization of Continuous Mappings by Neural Networks", {\bf Neural Networks} (2) 183-192. There are actually several different theorems which prove the same thing, but Funahashi's is the first that I know of which does it from standard sigmoid semi-linear nodes. -- John Merrill | ARPA: merrill@bucasb.bu.edu Center for Adaptive Systems | 111 Cummington Street | Boston, Mass. 02215 | Phone: (617) 353-5765 -- John Merrill | ARPA: merrill@bucasb.bu.edu Center for Adaptive Systems | 111 Cummington Street | Boston, Mass. 02215 | Phone: (617) 353-5765
demers@beowulf.ucsd.edu (David E Demers) (06/29/89)
In article <3417@cosmo.UUCP> jochenru@cosmo.UUCP (Jochen Ruhland) writes: >During a local meeting here in Germany I heard somebody talking >about a theorem that a three layer perceptron is capable to >perform any given In/Out function with an maximum number of hidden >units in the network. For "perceptrons", there is no such proof, since multilayer linear units can easily be collapsed into two-layers. See, e.g., Minsky & Papert, "Perceptrons" (1969). If, however, units can take on non-linear activations, then it can be shown that a three layer network can approximate any Borel-measurable function to any desired degree of accuracy (exponential in the number of units, however!). Hal White et al have shown this, and have also shown that the mapping is learnable. This paper is going to appear this year in the Journal of INNS, Neural Networks. The source of this is frequently listed as the Kolmogorov superposition theorem. Robert Hecht-Nielsen has a paper in the 1987 Proceedings of the First IEEE conference on Neural Networks about this theorem. The theorem is not constructive, however. It shows that a function from R^m to R^n can be represented by the superposition of {some number linear in m & n} bounded, monotonic, non-linear functions of the m inputs. However, there is no way of determining these functions... I am writing all of this from memory, all of my papers are elsewhere right now... but I know that others have similar results. Dave
ychen@metasoft.UUCP (Yongdeng Chen) (07/02/89)
In article <6730@sdcsvax.UCSD.Edu>, demers@beowulf.ucsd.edu (David E Demers) wri tes: > [...] > This paper is going to appear this year in the > Journal of INNS, Neural Networks. > [...] Could someone give me some info so I can subscribe to that Journal? Thanks for your response! __________________________________________________________________ Yongdeng Chen Meta Software Corporation ychen%metasoft@bbn.com 150 Cambridge Park Drive (617)576-6920 Cambridge, MA 02140
kennel@mickey.cognet.ucla.edu (Matthew Kennel) (07/04/89)
In article <MERRILL.89Jun27145601@bucasb.bu.edu> merrill@bucasb.bu.edu (John Merrill) writes: >One reference to such a result is > >Funahashi, K. (1989). "On the Approximate Realization of Continuous >Mappings by Neural Networks", {\bf Neural Networks} (2) 183-192. > >There are actually several different theorems which prove the same >thing, but Funahashi's is the first that I know of which does it from >standard sigmoid semi-linear nodes. > >-- >John Merrill | ARPA: merrill@bucasb.bu.edu >Center for Adaptive Systems | >111 Cummington Street | >Boston, Mass. 02215 | Phone: (617) 353-5765 I recently saw a preprint by some EE professors at Princeton who made a constructive proof using something called the "inverse Radon transform", or something like that. What I think the subject needs is work on characterizing the "complexity" of continuous mappings, w.r.t. neural networks--- i.e. how many hidden units (free coefficients) are needed to reproduce some mapping with a certain accuracy? Obviously, this depends crucially on the functional basis and architecture of the network---we might be able to thus evaluate various network types on their power and efficiency in a practical way, and not just formal (i.e. given infinite hidden neurons). My undergrad thesis adviser, Eric Baum, has been working on this type of problem, but for binary-valued networks, i.e. networks that classify the input space into arbitrary categories. The theory is quite mathematical---as a "gut feeling" I suspect that for continous-valued networks, only approximate results would be possible. Matt Kennel kennel@cognet.ucla.edu