[comp.ai.neural-nets] Minimum # of internal nodes to form

mehra@s.cs.uiuc.edu (10/13/89)

>
>/* ---------- "Minimum # of internal nodes to form" ---------- */
>
>  I've been told that Robert Heicht Neilson recently proved that any
>boolean function of n inputs to n outputs can be realized with a neural
>net having n input nodes, n output nodes and 2n-1 intermediate nodes
>(a total of 3 layers). Is there any truth to this statement? Please
>forgive me if this has been discussed here before.

See Hecht-Nielson's paper in ICNN 87 proceedings.
Briefly, the argument uses Kolmogorov's theorem (1967) and Sprecher's
theorem (1972). Kolmogorov's paper solved the notorious tenth problem
of Hilbert: whether a function of several variables can be represented
as a sum of functions (possibly composite) of only one variable. The
trouble is that if we use Kolmogorov's theorem, then there is no
restriction on the form of function in each node.

Perhaps more relevant are papers of George Cybenko. He proves that a
two-layered network (having only one hidden layer) of sigmoidal units
having only a summation at the output is sufficient. However, there is
no limit on the size of the hidden layer. An important distinction
between Hecht-Nielson's and Cybenko's results is that whereas the former
talks about exact representation, the latter talks about arbitrarily
accurate approximation.

Abu-Mostafa and Eric Baum also have interesting results in this direction
but I won't get into that here.

Pankaj {Mehra@cs.uiuc.edu}

aam9n@uvaee.ee.virginia.EDU (Ali Minai) (10/14/89)

In article <218600001@s.cs.uiuc.edu#, mehra@s.cs.uiuc.edu writes:
# 
# >
# >/* ---------- "Minimum # of internal nodes to form" ---------- */
# >
# >  I've been told that Robert Heicht Neilson recently proved that any
# >boolean function of n inputs to n outputs can be realized with a neural
# >net having n input nodes, n output nodes and 2n-1 intermediate nodes
# >(a total of 3 layers). Is there any truth to this statement? Please
# >forgive me if this has been discussed here before.
# 
# See Hecht-Nielson's paper in ICNN 87 proceedings.
# Briefly, the argument uses Kolmogorov's theorem (1967) and Sprecher's
# theorem (1972). Kolmogorov's paper solved the notorious tenth problem
# of Hilbert: whether a function of several variables can be represented
# as a sum of functions (possibly composite) of only one variable. The
# trouble is that if we use Kolmogorov's theorem, then there is no
# restriction on the form of function in each node.
# 
# Perhaps more relevant are papers of George Cybenko. He proves that a
# two-layered network (having only one hidden layer) of sigmoidal units
# having only a summation at the output is sufficient. However, there is
# no limit on the size of the hidden layer. An important distinction
# between Hecht-Nielson's and Cybenko's results is that whereas the former
# talks about exact representation, the latter talks about arbitrarily
# accurate approximation.
#

Also of interest in this regard is the work of Halbert White at UCSD. He
and his co-workers have shown that any Borel measurable function from
 n     m
R  to R  can be approximated with arbitrary accuracy using a single layer
of hidden nodes, finite in number and using one of a wide range of squashing
functions. In fact, from what I remember of Hal White's presentation at
IJCNN, he has now extended that to include any neuron transfer function
as long as:

1: It is not identically zero everywhere!
2: Its integral from - to + infinity is greater than 0.

Within those limits, neurons can have all kinds of squiggly transfer characteristics.

Hecht-Nielsen too had a similar result to present at the IJCNN. His argument was
actually quite simple to understand: overlapping sigmoids can form an approximation
to sinusoids (since sigmoids are steps, and a large number of them can be superposed
to get an arbitrarily good "digitized" approximation of a sinusoid), and we know
that superposed sinusoids can approximate any other function, so sigmoids can too.
Of course, we are talking about LARGE NUMBERS of sigmoids. Unfortunately, no one knows
how many.

One *extremely* important point to remember here: THAT AN APPROXIMATION EXISTS
DOES NOT MEAN THAT BACKPROPAGATION (or some such procedure) WILL NECESSARILY
FIND IT.

For the references to the work I mentioned above, see the proceedings of
IJCNN 89. Hal White's paper has references to his older ones. Also, there
was a paper with a similar proof in Neural Networks a few months ago. I
cannot recall the author's name, but I think he was Japanese.

# Abu-Mostafa and Eric Baum also have interesting results in this direction
# but I won't get into that here.

Can you please post the references, or send them by e-mail. Thanks.

Ali Minai

mehra@s.cs.uiuc.edu (10/17/89)

># Abu-Mostafa and Eric Baum also have interesting results in this direction
># but I won't get into that here.
>
>Can you please post the references

Abu-Mostafa's paper on "Random problems" suggests using the difference between
Kolmogorov Complexity and Shannon Entropy as a measure of "hidden complexity".

Baum and Haussler's paper is called "What size net gives valid generalization?"
They bring in several ideas from computational learning theory, including
the relationship between the VCdimension of a 1-hidden-layer n/w of LTUs, and
the number of examples it can "shatter".

You will probably find both papers in NIPS proceedings. I am not prepared to
answer any further questions on this matter.

Pankaj