colonel%sunybcs@math.waterloo.EDU.UUCP (04/05/87)
The problem of Clyde the elephant brings up one of the biggest controversies in statistics, one which is starting to spill over into A.I. To recapitulate: 1. 95% of elephants are grey; 2. 40% of royal elephants are yellow; 3. Clyde is a royal elephant. But we know nothing about what percentage of elephants are royal. The distribution could look like this: | royal common ------+--------------- grey | 15 175 yellow| 10 0 or like this: | royal common ------+--------------- grey | 0 95000 yellow| 2 0 red | 3 4995 Can we assign a valid probability to "Clyde is grey" without knowing the likelihood of either distribution (or any other)? One school of thought says no--the best we can do is follow Boole's suggestion of computing upper and lower bounds for the probability. Other schools, notably that led by A. P. Dempster, say yes. And this topic is all too philosophical enough to be discussed here in mod.ai! -- Col. G. L. Sicherman UU: ...{rocksvax|decvax}!sunybcs!colonel CS: colonel@buffalo-cs BI: colonel@sunybcs, csdsiche@ubvms [SRI is a hotbed of Dempster-Shaferism, so I'll take a chance on clarifying this. Tom Garvey or other readers can correct me if I'm off base. The Dempster-Shafer (D-S) is to track upper and lower bounds for probability. This is controversial in two ways: Dempster's rule for combining contradictory evidence, and the power/appropriateness/usefulness of the interval approach in general. (Conflicting evidence really doesn't enter into the Clyde problem.) It is the Bayesians who generally assign probabilities, although they don't do it is blindly as their "loyal opposition" would imply -- while underlying uniform or even Gaussian distributions are typically assumed for predictive power under random sampling, Bayesians might choose a "pessimal" a priori distribution to model tricky situations such as this one. They can also do symbolic Bayesian analysis with free parameters in order to derive formulas that are valid for any state of the world. Fuzzy logicians use a very similar theory, but are likely to assume that the underlying distributions are typically implied by the manner in which the problem is stated. A fourth group, perhaps led by Tversky and Kahneman, are more interested in the analogy-based reasoning of humans than in optimal decision theory. And others, e.g. Cohen and various expert systems researchers, are willing to consider any type of estimate as long as the justification is given (for use in further reasoning). Intervals are nice because they make no unwarranted statements. (Disclaimer: The endpoints may themselves by subject to sampling errors. Logic-based methods, including D-S, can be very sensitive to errors in the intial evidence -- as can methods based on tightly constrained a priori distributions.) Upper and lower probabilities are also more informative than single point estimates, and can be interpreted as recording what is unknown as well as what is known. In cases where a parametric distribution is appropriate, however, the parameters of that distribution (or optimal estimates thereof) are the most powerful estimates of the state of the world. Intervals are not convenient for representing true Gaussian distributions, for instance, since the intervals must be infinite in extent. (One might want to use intervals for the mean and standard deviation, though.) I tend to believe that all sampled data is Gaussian unless there is evidence to the contrary (either a priori or from examination of the data), partly because that leads to points estimates and distributions thereof that are useful. I would not attempt to impose this assumption on Clyde, however, and there are many situations calling for non-Bayesian reasoning. -- KIL]