dario@techunix.BITNET (Dario Ringach) (11/06/88)
Is it fair to assume a constant probabilistic distribution Px on space X during the learning process? I mean a *good* teacher would draw points of X so as to minimize the error between the current hypothesis and the concept to be learnt , so that the distribution Px could change after presenting each sample (i.e. Px(n) is now a stochastic process). Are these two models equivalent in the sense that they can learn the same classes of concepts? Has anyone attempted to approach learning as a discrete time Markov process on the hypothesis space H? For instance at any time k let h1=h(k) be the current hypothesis obviously there is defined for any h2 in H a transition probability P(h(h+1)=h2|h(k)=h1) that depends on the probability distribution Px and the learning algorithm A.
bwk@mitre-bedford.ARPA (Barry W. Kort) (11/08/88)
In article <6083@techunix.BITNET> dario@techunix.BITNET (Dario Ringach) writes: > Has anyone attempted to approach learning as a discrete time Markov > process on the hypothesis space H? For instance at any time k let > h1=h(k) be the current hypothesis obviously there is defined for any > h2 in H a transition probability P(h(h+1)=h2|h(k)=h1) that depends > on the probability distribution Px and the learning algorithm A. Look into Bayesian inference, Kalman filtering, and Kailath's Innovations Process. In each of these approaches, a current best guess is updated as new information comes in. I believe Widrow's adaptive networks also exhibit such behavior. --Barry Kort