amres@uvaee.ee.virginia.EDU (Ali Minai) (09/27/87)
While reading two different references about the Boltzmann Machine, I came across something I did not quite understand. I am sure that there is a perfectly reasonable explanation, and would be glad if someone could point it out. In chapter 7 of PARALLEL DISTRIBUTED PROCESSING (Vol 1), by Hinton and Sejnowski, the authors define Pij+ as the probability of units i and j being on when ALL visible units are being clamped, and Pij- as the probability of i and j being on when NONE of the visible units are being clamped (pp 294, 296). They then proceed to present the expression for the gradient of G with respect to weights Wij as -1/T (Pij+ - Pij-). However, in the paper entitled LEARNING SYMMETRY GROUPS WITH HIDDEN UNITS: BEYOND THE PERCEPTRON, by Sejnowski, Keinker and Hinton, in Physica 22D (1986), pp 260-275, it is explicitly stated that Pij+ is the probability when ALL visible units (input and output) are being clamped, BUT Pij- is the probability of i and j being on when ONLY THE INPUT UNITS ARE CLAMPED (pp 264). So there seems to be no concept of FREE-RUNNING here. Since the expression for dG/dWij is the same in both cases, the definitions of Pij- must be equivalent. The only explanation I could think of was that "clamping" the inputs ONLY was the same thing as letting the environment have a free run of them, so the case being described is the free-running one. If that is true, obviously there is no contradiction, but the terminology sure is confusing. If that is not the case, will someone please explain. Also, can anyone point out any latest references to work on the Boltzmann Machine? Thanks, Ali. --------------------------------------------------------------------------- Ali Minai, Department of Electrical Engg. University of Virginia, Charlottesville, Va 22901. ARPANET: amres@uvaee.ee.Virginia.EDU ---------------------------------------------------------------------------
krulwich@giraffe..arpa (Bruce Krulwich) (10/01/87)
> Since the expression for dG/dWij is the same in both cases, the > definitions of Pij- must be equivalent. The only explanation I could > think of was that "clamping" the inputs ONLY was the same thing as letting > the environment have a free run of them, so the case being described is > the free-running one. The point is that for any given inputs learning is done by comparing the desired outputs with the outputs computed by the machine. This called monitored learning, and is similar in this sense to back propogation learning. This is used for networks that perform a computation based on some input being clamped in the input units. When the output units are clamped, the P values are something like what they "should" be, so comparing these to the P values for unclamped output units lets you approximate the error between the units in qestion and learn from it. Bruce Krulwich ARPA: krulwich@yale.arpa If you're right 95% of the time, or krulwich@cs.yale.edu why worry about the other 3% ?? Bitnet: krulwich@yalecs.bitnet UUCP: {harvard, seismo, ihnp4}!yale!krulwich