[comp.ai.digest] Boltzmann Machine

amres@uvaee.ee.virginia.EDU (Ali Minai) (09/27/87)

While reading two different references about the Boltzmann Machine, I came
across something I did not quite understand. I am sure that there is a
perfectly reasonable explanation, and would be glad if someone could point
it out.

In chapter 7 of PARALLEL DISTRIBUTED PROCESSING (Vol 1), by Hinton and
Sejnowski, the authors define Pij+ as the probability of units i and j
being on when ALL visible units are being clamped, and Pij- as the
probability of i and j being on when NONE of the visible units are
being clamped (pp 294, 296). They then proceed to present the expression
for the gradient of G with respect to weights Wij as -1/T (Pij+ - Pij-).

However, in the paper entitled LEARNING SYMMETRY GROUPS WITH HIDDEN
UNITS: BEYOND THE PERCEPTRON, by Sejnowski, Keinker and Hinton, in
Physica 22D (1986), pp 260-275, it is explicitly stated that Pij+
is the probability when ALL visible units (input and output) are being
clamped, BUT Pij- is the probability of i and j being on when ONLY THE
INPUT UNITS ARE CLAMPED (pp 264). So there seems to be no concept of
FREE-RUNNING here.

Since the expression for dG/dWij is the same in both cases, the
definitions of Pij- must be equivalent. The only explanation I could
think of was that "clamping" the inputs ONLY was the same thing as letting
the environment have a free run of them, so the case being described is
the free-running one. If that is true, obviously there is no contradiction,
but the terminology sure is confusing. If that is not the case, will
someone please explain. 

Also, can anyone point out any latest references to work on the Boltzmann
Machine?

Thanks,
  
       Ali.

---------------------------------------------------------------------------

       Ali Minai,
       Department of Electrical Engg.
       University of Virginia,
       Charlottesville, Va 22901.

       ARPANET: amres@uvaee.ee.Virginia.EDU

---------------------------------------------------------------------------

krulwich@giraffe..arpa (Bruce Krulwich) (10/01/87)

> Since the expression for dG/dWij is the same in both cases, the
> definitions of Pij- must be equivalent. The only explanation I could
> think of was that "clamping" the inputs ONLY was the same thing as letting
> the environment have a free run of them, so the case being described is
> the free-running one.


The point is that for any given inputs learning is done by comparing
the desired outputs with the outputs computed by the machine.  This
called monitored learning, and is similar in this sense to back
propogation learning.  This is used for networks that perform a
computation based on some input being clamped in the input units.
When the output units are clamped, the P values are something like
what they "should" be, so comparing these to the P values for
unclamped output units lets you approximate the error between the
units in qestion and learn from it.


Bruce Krulwich

ARPA:   krulwich@yale.arpa		      If you're right 95% of the time,
     or krulwich@cs.yale.edu		      why worry about the other 3% ??
Bitnet:	krulwich@yalecs.bitnet
UUCP:   {harvard, seismo, ihnp4}!yale!krulwich