danforth@riacs.edu (Douglas G. Danforth) (07/14/89)
Kai-Fu Lee writes ... >I am interested in NN approaches to combine multiple knowledge sources in a >classification task. Each knowledge source produces a score vector for the >categories, and NN is trained with data containing these score vectors, and >the correct answer. > >Please send any references to me, or post them on this newsgroup. > >Thanks, >Kai-Fu Lee >kfl@speech2.cs.cmu.edu One approach is to use a Sparse Distributed Memory (SDM) (Kanerva, 1988) where the total input information is mapped into a single large bit vector, e.g. 256 bits or larger. At this point 2 different techniques can be used: (1) The "correct answer" is part of the bit vector and the memory is run in auto-associative mode. That is, for training, the input pattern is X and the output pattern is X where X contains the correct answer and the score vectors. For recognition the input pattern is X0 (where the correct answer bits are unknown and are set randomly). The output is then X1 where now some of the bits in the correct answer have been "completed". Cycle the output back into the input until convergence, i.e. Xt=Xt+1 or no convergence (which can be detected for t>4). If X0 was "sufficiently close" to some previously stored X then convergence will take place. The conditions for "close" are specified in Kanerva's book. Note that if the number of bits allocated to the "correct answer" is too large then convergence can not be guarenteed. This is usually not the case since the size of the score vectors dominate the correct answer size. (2) Operate the memory in hetero-associative mode where the input is X and contains only the score vectors (coded as binary vectors on, say a temperature scale). The output is then Y, the "correct answer", also encoded as a binary vector. In this case only a 1-step recognition process is used. If the correct answer takes on only a small number of values (say 16 bits worth) then an error correction scheme can be used by encoding the correct answer in Y using, say, 256 bits. This spreads or separates the data so that any errors in the reconstruction of Y can be "corrected", i.e Y' is in error for some of the bits but is close to a previously written Y value. This second correction scheme can be performed by a second SDM running in auto-associative mode where it has been trained with Y as input and Y as output. Again iteration is performed on the second memory until convergence or non-convergence (t>4 approx). Notice that nothing special has been said about the structure of the score vectors. Their encodings need only satisfy the condition that 2 vectors that are "close" in the raw data form are still "close" in their binary encoded form (Hamming distance metric). This combining and blending of knowledge is well described in chapter 10, The Organization of an Autonomous Learning System, of Kanerva's book: Kanerva, P. 1988. "Sparse Distributed Memory", Cambridge, MA. MIT Press. Douglas G. Danforth RIACS M/S 230-5 NASA Ames Research Center Moffett Field, CA 94035 danforth@riacs.edu