finton@cambizola.cs.wisc.edu (David Finton) (11/22/89)
Can anyone describe to me what an implementation of the function max{xi} looks like in a connectionist net? I'm using linear-sum, sigmoid-output units. What I want is a layer which outputs a "1" from the unit whose input (from outside the layer) is greatest, with the other nodes in the layer outputing a "0". Assume that each node in the layer has an input from outside the layer, and these inputs are clamped to their values. Node activation is in the range [0,1], while link weights are in the range [-1, 1]. The obvious thing is to have mutual inhibition within the layer, possibly with each unit also sending itself some activation (or, equivalently, each unit retains some fraction of its previous activation level). But what I've come up with so far seems to give me a net which sends out "1" for *each* unit whose input is above the *average*. I bet this has been done before; can someone show me where I'm going wrong? --David Finton ( finton@cs.wisc.edu )
lange@lanai.cs.ucla.edu (Trent Lange) (11/22/89)
In article <9202@spool.cs.wisc.edu> finton@cs.wisc.edu (David Finton) writes: >Can anyone describe to me what an implementation of the function >max{xi} looks like in a connectionist net? I'm using linear-sum, >sigmoid-output units. > >What I want is a layer which outputs a "1" from the unit whose >input (from outside the layer) is greatest, with the other nodes >in the layer outputing a "0". > >The obvious thing is to have mutual inhibition within the layer, >possibly with each unit also sending itself some activation (or, >equivalently, each unit retains some fraction of its previous >activation level). But what I've come up with so far seems >to give me a net which sends out "1" for *each* unit whose input >is above the *average*. > >I bet this has been done before; can someone show me where I'm >going wrong? > >--David Finton ( finton@cs.wisc.edu ) You are describing a simple Winner-Take-All (WTA) network. Your re-invention is conceptually correct: the nodes in the WTA should be mutually inhibitory, and need to "retain a fraction of their previous activation level". A typical activation function used for units in WTA networks is that described by McClelland and Rumelhart in page 71 and 72 of the PDP book, Volume 1: if net > 0: a(t+1) = a(t) * (1 - decay-rate) + net * (max - a(t)) otherwise: a(t+1) = a(t) * (1 - decay-rate) + net * (a(t) - min) Where net is the (weighted) net input to the unit, decay-rate is the node's activation decay rate, max is the maximum desired activation (generally 1) and min is the minimum desired (generally 0 or -1). The most difficult part of using WTA networks is selecting the exact values of the inhibitory weights between units. If the weights are too small, then they will end up having no effect. If they are too large, then the competing units can end up constantly clobbering each other. An example of this problem is for a simple two node WTA: -0.1 A <------> B ^ ^ | | iA iB The inhibitory weight (-0.1) between A and B will cause A to have activation 1 and B to have activation 0 if, for example, A's normal net input (iA) is 0.12 and B's (iB) is 0.08. In this case, A's activation will eventually climb to close to 1. When this happens, B's net input is then 0.08 + -0.1*1, or -0.02. B's activation will thus be driven down to its minimum, so the WTA functions as it should. On the other hand, if both iA and iB are above 0.1, then the would-be losers net-input will always be positive (since the greatest inhibition it can receive from the winner is -0.1*1, or -0.1). In this case both A and B's activation will eventually climb towards their maximum, and the WTA will fail. I suspect that the problem in your WTA (all of the nodes whose net-input is above average "win") is primarily because of the inhibitory weights chosen for your network: they are probably big enough to drive down the activation of the weakest nodes, but too small to drive down all except for the one winner. An interesting example of how the inhibitory weights were selected for the complex WTA's of a particular model can be found in pages 455-488 of [Touretzky and Hinton, "A Distributed Connectionist Production System", Cognitive Science, 12(3), 1988]. Their derivations of the weights needed for the various WTA networks in their model to work correctly give a good example of both how they should be calculated and how complex those calculations can sometimes become. Trent Lange Artificial Intelligence Laboratory Computer Science Department University of California, Los Angeles Los Angeles, CA 90024 Office Phone: (213) 825-5199 E-Mail Address: lange@cs.ucla.edu ********************************************************************** * "UCLA: National Men's Volleyball, Women's Softball, and * * Computer Programming Champs" * **********************************************************************
eric@jane.Jpl.Nasa.Gov (Eric Majani) (11/30/89)
In article <9202@spool.cs.wisc.edu> finton@cs.wisc.edu (David Finton) writes: >Can anyone describe to me what an implementation of the function >max{xi} looks like in a connectionist net? I'm using linear-sum, >sigmoid-output units. > >What I want is a layer which outputs a "1" from the unit whose >input (from outside the layer) is greatest, with the other nodes >in the layer outputing a "0". > >Assume that each node in the layer has an input from outside the >layer, and these inputs are clamped to their values. Node >activation is in the range [0,1], while link weights are in the >range [-1, 1]. > >The obvious thing is to have mutual inhibition within the layer, >possibly with each unit also sending itself some activation (or, >equivalently, each unit retains some fraction of its previous >activation level). But what I've come up with so far seems >to give me a net which sends out "1" for *each* unit whose input >is above the *average*. > >I bet this has been done before; can someone show me where I'm >going wrong? > >--David Finton ( finton@cs.wisc.edu ) Check out the following paper which I presented at the Neural Information Processing Systems Conference in Denver last year (88): On the K-Winners-Take-All Network by E. Majani, R. Erlanson, Y. Abu-Mostafa pp. 634-642 Advances in N.I.P.S., (David Touretzky Editor) Morgan Kaufmann Pub. PS: With your approach, you get 1 for each unit above the median I believe. -eric