[comp.ai.neural-nets] Re. Using NN to combine multiple knowledge sources

danforth@riacs.edu (Douglas G. Danforth) (07/14/89)
Kai-Fu Lee writes ...

>I am interested in NN approaches to combine multiple knowledge sources in a
>classification task.  Each knowledge source produces a score vector for the
>categories, and NN is trained with data containing these score vectors, and
>the correct answer.
>
>Please send any references to me, or post them on this newsgroup.
>
>Thanks,
>Kai-Fu Lee
>kfl@speech2.cs.cmu.edu


     One approach is to use a Sparse Distributed Memory (SDM) (Kanerva, 1988)
where the total input information is mapped into a single large bit vector,
e.g. 256 bits or larger.  At this point 2 different techniques can be used:

(1) The "correct answer" is part of the bit vector and the memory is run
    in auto-associative mode.  That is, for training, the input pattern is X
    and the output pattern is X where X contains the correct answer and the
    score vectors. For recognition the input pattern is X0 (where the correct
    answer bits are unknown and are set randomly).  The output is then X1 where
    now some of the bits in the correct answer have been "completed".  Cycle
    the output back into the input until convergence, i.e.  Xt=Xt+1 or no
    convergence (which can be detected for t>4).
    If X0 was "sufficiently close" to some previously stored X then convergence
    will take place.  The conditions for "close" are specified in Kanerva's book.
    Note that if the number of bits allocated to the "correct answer" is too
    large then convergence can not be guarenteed.  This is usually not the
    case since the size of the score vectors dominate the correct answer size.


(2) Operate the memory in hetero-associative mode where the input is X and
    contains only the score vectors (coded as binary vectors on, say a
    temperature scale).  The output is then Y, the "correct answer", also
    encoded as a binary vector.  In this case only a 1-step recognition
    process is used.  If the correct answer takes on only a small number of
    values (say 16 bits worth) then an error correction scheme can be used
    by encoding the correct answer in Y using, say, 256 bits.  This spreads
    or separates the data so that any errors in the reconstruction of Y
    can be "corrected", i.e Y' is in error for some of the bits but is close
    to a previously written Y value.  This second correction scheme can be
    performed by a second SDM running in auto-associative mode where it has
    been trained with Y as input and Y as output.  Again iteration is performed
    on the second memory until convergence or non-convergence (t>4 approx).


Notice that nothing special has been said about the structure of the score vectors.
Their encodings need only satisfy the condition that 2 vectors that are "close" in 
the raw data form are still "close" in their binary encoded form (Hamming distance 
metric).  This combining and blending of knowledge is well described in chapter
10, The Organization of an Autonomous Learning System, of Kanerva's book:


Kanerva, P. 1988. "Sparse Distributed Memory", Cambridge, MA. MIT Press.



     Douglas G. Danforth
     RIACS M/S 230-5
     NASA Ames Research Center
     Moffett Field, CA 94035
     danforth@riacs.edu