[comp.ai.neural-nets] nn implementation of "max" function

finton@cambizola.cs.wisc.edu (David Finton) (11/22/89)

Can anyone describe to me what an implementation of the function
max{xi} looks like in a connectionist net?  I'm using linear-sum,
sigmoid-output units.  

What I want is a layer which outputs a "1" from the unit whose
input (from outside the layer) is greatest, with the other nodes
in the layer outputing a "0".  

Assume that each node in the layer has an input from outside the 
layer, and these inputs are clamped to their values.  Node 
activation is in the range [0,1], while link weights are in the 
range [-1, 1].

The obvious thing is to have mutual inhibition within the layer,
possibly with each unit also sending itself some activation (or,
equivalently, each unit retains some fraction of its previous
activation level).  But what I've come up with so far seems 
to give me a net which sends out "1" for *each* unit whose input
is above the *average*.

I bet this has been done before;  can someone show me where I'm 
going wrong?  

--David Finton ( finton@cs.wisc.edu )

lange@lanai.cs.ucla.edu (Trent Lange) (11/22/89)

In article <9202@spool.cs.wisc.edu> finton@cs.wisc.edu (David Finton) writes:
>Can anyone describe to me what an implementation of the function
>max{xi} looks like in a connectionist net?  I'm using linear-sum,
>sigmoid-output units.  
>
>What I want is a layer which outputs a "1" from the unit whose
>input (from outside the layer) is greatest, with the other nodes
>in the layer outputing a "0".  
>
>The obvious thing is to have mutual inhibition within the layer,
>possibly with each unit also sending itself some activation (or,
>equivalently, each unit retains some fraction of its previous
>activation level).  But what I've come up with so far seems 
>to give me a net which sends out "1" for *each* unit whose input
>is above the *average*.
>
>I bet this has been done before;  can someone show me where I'm 
>going wrong?  
>
>--David Finton ( finton@cs.wisc.edu )


You are describing a simple Winner-Take-All (WTA) network.

Your re-invention is conceptually correct: the nodes in the WTA should be
mutually inhibitory, and need to "retain a fraction of their previous
activation level".

A typical activation function used for units in WTA networks is that
described by McClelland and Rumelhart in page 71 and 72 of the PDP book,
Volume 1:

   if net > 0:
      a(t+1) = a(t) * (1 - decay-rate) + net * (max - a(t))
   otherwise:
      a(t+1) = a(t) * (1 - decay-rate) + net * (a(t) - min)

Where net is the (weighted) net input to the unit, decay-rate is the
node's activation decay rate, max is the maximum desired activation
(generally 1) and min is the minimum desired (generally 0 or -1).

The most difficult part of using WTA networks is selecting the
exact values of the inhibitory weights between units.  If the
weights are too small, then they will end up having no effect.
If they are too large, then the competing units can end up constantly
clobbering each other.

An example of this problem is for a simple two node WTA:

		-0.1
	    A <------> B
            ^          ^
	    |          |
           iA         iB

The inhibitory weight (-0.1) between A and B will cause A to have
activation 1 and B to have activation 0 if, for example, A's normal
net input (iA) is 0.12 and B's (iB) is 0.08.  In this case, A's
activation will eventually climb to close to 1.  When this happens,
B's net input is then 0.08 + -0.1*1, or -0.02.  B's activation will
thus be driven down to its minimum, so the WTA functions as it should.

On the other hand, if both iA and iB are above 0.1, then the would-be
losers net-input will always be positive (since the greatest inhibition
it can receive from the winner is -0.1*1, or -0.1).  In this case both A
and B's activation will eventually climb towards their maximum, and the
WTA will fail.

I suspect that the problem in your WTA (all of the nodes whose net-input
is above average "win") is primarily because of the inhibitory weights
chosen for your network:  they are probably big enough to drive down
the activation of the weakest nodes, but too small to drive down all
except for the one winner.

An interesting example of how the inhibitory weights were selected for
the complex WTA's of a particular model can be found in pages 455-488 of
[Touretzky and Hinton, "A Distributed Connectionist Production System",
Cognitive Science, 12(3), 1988].  Their derivations of the weights
needed for the various WTA networks in their model to work correctly
give a good example of both how they should be calculated and how complex
those calculations can sometimes become.

Trent Lange

Artificial Intelligence Laboratory
Computer Science Department
University of California, Los Angeles
Los Angeles, CA 90024

Office Phone:  (213) 825-5199
E-Mail Address: lange@cs.ucla.edu
**********************************************************************
*      "UCLA:  National Men's Volleyball, Women's Softball, and      *
*              Computer Programming Champs"                          *
**********************************************************************

eric@jane.Jpl.Nasa.Gov (Eric Majani) (11/30/89)

In article <9202@spool.cs.wisc.edu> finton@cs.wisc.edu (David Finton) writes:
>Can anyone describe to me what an implementation of the function
>max{xi} looks like in a connectionist net?  I'm using linear-sum,
>sigmoid-output units.  
>
>What I want is a layer which outputs a "1" from the unit whose
>input (from outside the layer) is greatest, with the other nodes
>in the layer outputing a "0".  
>
>Assume that each node in the layer has an input from outside the 
>layer, and these inputs are clamped to their values.  Node 
>activation is in the range [0,1], while link weights are in the 
>range [-1, 1].
>
>The obvious thing is to have mutual inhibition within the layer,
>possibly with each unit also sending itself some activation (or,
>equivalently, each unit retains some fraction of its previous
>activation level).  But what I've come up with so far seems 
>to give me a net which sends out "1" for *each* unit whose input
>is above the *average*.
>
>I bet this has been done before;  can someone show me where I'm 
>going wrong?  
>
>--David Finton ( finton@cs.wisc.edu )

Check out the following paper which I presented at the Neural Information
Processing Systems Conference in Denver last year (88):

On the K-Winners-Take-All Network
by E. Majani, R. Erlanson, Y. Abu-Mostafa
pp. 634-642
Advances in N.I.P.S., (David Touretzky Editor)
Morgan Kaufmann Pub.

PS: With your approach, you get 1 for each unit above the median
I believe.

-eric