cho@sol4.cs.psu.edu (Sehyeong Cho) (01/31/91)
Hi, netland. Has anyone built a system (or read a paper describing a system) that: Begins with very rough estimate of cond. prob's. Then learns (updates) the conditional probabilities by examples? For instance, suppose I believed P(death|shoot SCUD) = 0.01, and then experience a few "shoot SCUD" events along with the results. (say, 5 SCUD's shot, 1 caused death) The 5 events are too few to conclude P=0.2. So, it must be somewhere between 0.01 and 0.2. Any theory (or heuristics, or psychological evidence..) about how to? Thanks in advance. -- | Yesterday I was a student. Sehyeong Cho | Today I am a student. cho@cs.psu.edu | Tomorrow I'll probably still be a student. | Sigh.. There's so little hope for advancement.
gblee@maui.cs.ucla.edu (Geunbae Lee) (01/31/91)
I think the following book and other Pearl's works can answer your
question (sorry for latex format). Pearl's Bayesian network can
formulate, propagate, and revise beliefs (similar to your conditional
probablility) according to prior experiences. The book has a good bibliography
for this field too.
@Book{pearl:probabilistic,
author = "Judea Pearl",
title = "Probabilistic Reasoning in Intelligent Systems:
Networks of Plausible Inference",
publisher = KAUF,
year = "1988",
address = KAUF-ADDR,
rem = "pearl.88b"
}
--
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Geunbae Lee, Artificial Intelligence Lab, Computer Science Dept, UCLA. +
+ INTERNET:gblee@cs.ucla.edu, PHONE:213-825-5199 (office) +
+ Sir, AI is the science that makes machines smart, but people dumb!!! +
rpg@cs.tulane.edu (Robert Goldman) (01/31/91)
In re Pearl's work on Bayesian networks, Spiegelhalter and Lauritzen have done some work on training these networks in a Bayesian way. The paper I have is titled "Sequential Updating of conditional probabilities on directed graphical structures," in Networks, 1990. Also, Geoff Hinton's Boltzmann machine is, if I remember correctly, a trainable Markov Random Field. Hidden Markov Models are also a class of trainable probability models. I think maybe we need more information about the kinds of probabilities you want to learn. As far as I can tell, almost all of bayesian estimation falls under the rubric of this question! R
almond@lisbon.stat.washington.edu (Russell Almond) (02/01/91)
I would also be interested in information on learning probabilities and I am willing to compile and post a reference list. Let me recap the problem as I understand it. Sehyeong Cho asks about learning a conditional probability $P(A|B)$. The standard Bayesian model call $P(A|B)$ some parameter $\theta$. {\it A priori\/}, that is before any data are available, a probability distribution is used to express our state of knolwedge (or ignorance) about the parameter $\theta$. This is called a "prior distribution." There is some disagreement in the statistical community about what the correct prior distribution for representing ignorance in this case is. Bayes and Laplace advocated using a uniform distribution, which is also a beta distribution with parameters (1,1). Jeffreys advocates using a beta (1/2,1/2). Others have advocated a beta(0,0) which is really not a probability distribution, but is the limit of a series of probability distributions. There are other possibilities which are not beta distributions, but they lead to greater complexity. Assume for the sake of simplicity we have chose as our prior distribution a beta distribution with (hyper)parameters $\alpha,\beta$. We then observe $n$ cases in which $B$ occurs and that in $x$ of them A occurs as well. Note that we must make an additional assumption here that our observation is unbiased; that is, we have no reason to believe that if $(A,B)$ occurs we are no more likely have it brought to our attention than if $(\neg A,B)$ occurs. This might not be the case in Sehyeong's original example, if for example, newspapers were more likely to omit reporting on SCUD launches if no deaths occur. Assuming this is the case, we are lead to believe that {\it a posteriori\/} our knowledge about the parameter has a beta distribution with (hyper)parameters $\alpha+x, \beta+n-x$. There is a slightly more complex belief function formulation of this problem (based loosly on Fisher's fiducial arguments) which results instead of an exact probability distribution for $\theta$ upper and lower bounds for $\theta$ in the form of a "bivariate beta" belief function. This is developed in Dempster[1966], and recaped in Almond[1989,1991]. The bounds capture the poseterior distributions corresponding to all three of the posterior distributions cited above. Using these bound has the advantage of simpler assumptions but the disadvantages of greater computational complexity and weaker decision-making power. Robert (Goldman) brings up the next logical question, which is the one that I am currently working on. Suppose we have build a probabilistic graphical model of the kind developed in Pearl[1988] or Lauritsen and Spieglehalter[1988]. We jointly elict the probabilities $\theta_1=P(A|B)$ and $\theta_2=P(A|\neg B)$, but there is uncertainty about those values. By the Bayesian paradigm, we should express that uncertainty by a (joint) probability distribution over the two parameters. This is the upshot of the 1990 Lauritzen and Speigelhalter paper which Robert cites. There are some non-trivial technical problems here of which L&S only scratch the surface. For example, L&S build a number of models which assume the independence of $\theta_1$ and $\theta_2$. This assumption is particularly suspect, even if only made for the sake of convenience. In the case of many graphical models, it may be the case that we known with some certainty that $P(A|B) > P(A|\neg B)$ or visa versa. L&S also note that when they observe incomplete data (that is observe A but not B) that $\theta_1$ and $\theta_2$ will be dependent {\it a posteriori\/}, even if they are independent {\it a priori\/}. David Madigan, Jeremy York and I have been noodling around with some alternative models, but we have not yet written anything up. I would be eager to talk with anybody else who is working on the problem. --Russell Almond University of Washington, Department of Statistics, GN--22 Seattle, WA 98195 (206) 543-4302 almond@stat.washington.edu
kgo@iesd.auc.dk (Kristian G. Olesen) (02/04/91)
A short version of Spiegelhalter and Lauritzens work on learning probabilities can be found in: Spiegelhalter and Lauritzen: Techniques for Bayesian Analysis in Expert Systems. Annals of Mathematics and Artificial Intelligence, 2 (1990) 353-366. As pointed out by Almond there are some problems involved with e.g. inclomplete data. At Aalborg university we're currently working on a prototype implementation of the scheme where uncertainty on conditional probabilities are modelled as Dirichlet distributions. A series of experiments with this prototype aims at a clarification of the strengths and limits of the method. We're dealing with learning as well as adaption of conditional probabilities as cases become known. Currently four factors are being investigated: 1. Significance of the precision of original conditional probabilities. 2. Observational schemes (incomplete data). 3. Different types of prior distributions. 4. Learning schemes (learning, adaption). The results so far seems promising, but it's still to early to conclude. I do, however, feel confident that practically applicable methods will turn up. If succesfull the method will be integrated in the HUGIN shell. Kristian G. Olesen Aalborg University Institute of Electronic Systems Frederik Bajers Vej 7 D 9220 Aalborg Ost Denmark Phone +45 98 15 85 22, 4960 E-mail: kgo@iesd.auc.dk
kyoden@arpeggio.rl1.isl.mei.co.jp (Tatsuro Kyoden) (02/07/91)
-- ----------------------------------------------- $@7PED(J $@<yO/!wBh#18&5f<<(J.$@>pJsDL?.4X@>8&5f=j(J.$@>>2<EE4o(J
gowj@novavax.UUCP (James Gow) (02/10/91)
I am interested in some references in the MIT media lab. Can anyone suggest access to this facility and these references? Harel, I. (1990) (Ed.) Constructionist Learning: A 5th Anniversary Collection of Papers. Papert, S. (1990). A Unified computer Envrionment For Schools: A Constructionist/Cultural Approach. Resnick,M. (1990). Logo:A Computaional Environment for Exploring Self-Organizing Behavior. Ph.D. Thesis Proposal. I have been told these items are not available through inter=-library loan or other normal channels. James