[comp.ai.neural-nets] 3-Layer versus Multi-Layer

jochenru@cosmo.UUCP (Jochen Ruhland) (06/20/89)

During a local meeting here in Germany I heard somebody talking
about a theorem that a three layer perceptron is capable to
perform any given In/Out function with an maximum number of hidden
units in the network.
I forgot to ask where to look for the proof - so I'm asking here.
Response may be in german or english.

Thanks in advance
       Jochen

merrill@bucasb.bu.edu (John Merrill) (06/28/89)

One reference to such a result is

Funahashi, K. (1989). "On the Approximate Realization of Continuous
Mappings by Neural Networks", {\bf Neural Networks} (2) 183-192.

There are actually several different theorems which prove the same
thing, but Funahashi's is the first that I know of which does it from
standard sigmoid semi-linear nodes.

--
John Merrill			|	ARPA:	merrill@bucasb.bu.edu
Center for Adaptive Systems	|	
111 Cummington Street		|	
Boston, Mass. 02215		|	Phone:	(617) 353-5765
--
John Merrill			|	ARPA:	merrill@bucasb.bu.edu
Center for Adaptive Systems	|	
111 Cummington Street		|	
Boston, Mass. 02215		|	Phone:	(617) 353-5765

demers@beowulf.ucsd.edu (David E Demers) (06/29/89)

In article <3417@cosmo.UUCP> jochenru@cosmo.UUCP (Jochen Ruhland) writes:
>During a local meeting here in Germany I heard somebody talking
>about a theorem that a three layer perceptron is capable to
>perform any given In/Out function with an maximum number of hidden
>units in the network.

For "perceptrons", there is no such proof, since multilayer
linear units can easily be collapsed into two-layers.  See, e.g.,
Minsky & Papert, "Perceptrons" (1969).  If, however, units can
take on non-linear activations, then it can be shown that a
three layer network can approximate any Borel-measurable
function to any desired degree of accuracy (exponential in
the number of units, however!).  Hal White et al have shown
this, and have also shown that the mapping is learnable.
This paper is going to appear this year in the
Journal of INNS, Neural Networks.

The source of this is frequently listed as the Kolmogorov
superposition theorem.  Robert Hecht-Nielsen has a paper
in the 1987 Proceedings of the First IEEE conference on
Neural Networks about this theorem.  The theorem is not
constructive, however.  It shows that a function from
R^m to R^n can be represented by the superposition of
{some number linear in m & n} bounded, monotonic, non-linear
functions of the m inputs.  However, there is no way of
determining these functions...

I am writing all of this from memory, all of my papers are
elsewhere right now...  but I know that others have similar
results.

Dave

ychen@metasoft.UUCP (Yongdeng Chen) (07/02/89)

In article <6730@sdcsvax.UCSD.Edu>, demers@beowulf.ucsd.edu (David E Demers) wri
tes:
> [...]
> This paper is going to appear this year in the
> Journal of INNS, Neural Networks.
> [...]

Could someone give me some info so I can subscribe to that Journal?
Thanks for your response!

__________________________________________________________________
Yongdeng Chen                   Meta Software Corporation
ychen%metasoft@bbn.com          150 Cambridge Park Drive
(617)576-6920                   Cambridge, MA 02140

kennel@mickey.cognet.ucla.edu (Matthew Kennel) (07/04/89)

In article <MERRILL.89Jun27145601@bucasb.bu.edu> merrill@bucasb.bu.edu (John Merrill) writes:
>One reference to such a result is
>
>Funahashi, K. (1989). "On the Approximate Realization of Continuous
>Mappings by Neural Networks", {\bf Neural Networks} (2) 183-192.
>
>There are actually several different theorems which prove the same
>thing, but Funahashi's is the first that I know of which does it from
>standard sigmoid semi-linear nodes.
>
>--
>John Merrill			|	ARPA:	merrill@bucasb.bu.edu
>Center for Adaptive Systems	|	
>111 Cummington Street		|	
>Boston, Mass. 02215		|	Phone:	(617) 353-5765

I recently saw a preprint by some EE professors at Princeton who
made a constructive proof using something called the
"inverse Radon transform", or something like that.

What I think the subject needs is work on characterizing the "complexity"
of continuous mappings, w.r.t. neural networks--- i.e.  how many hidden
units (free coefficients) are needed to reproduce some mapping with a
certain accuracy?

Obviously, this depends crucially on the functional basis and
architecture of the network---we might be able to thus evaluate various
network types on their power and efficiency in a practical way, and not
just formal (i.e. given infinite hidden neurons).

My undergrad thesis adviser, Eric Baum, has been working on this type of
problem, but for binary-valued networks, i.e. networks that classify the
input space into arbitrary categories.  The theory is quite
mathematical---as a "gut feeling" I suspect that for continous-valued
networks, only approximate results would be possible.

Matt Kennel
kennel@cognet.ucla.edu