[comp.ai.digest] Clyde the elephant

colonel%sunybcs@math.waterloo.EDU.UUCP (04/05/87)
The problem of Clyde the elephant brings up one of the biggest
controversies in statistics, one which is starting to spill over
into A.I.  To recapitulate:

	1. 95% of elephants are grey;
	2. 40% of royal elephants are yellow;
	3. Clyde is a royal elephant.

But we know nothing about what percentage of elephants are royal.
The distribution could look like this:

              |	royal	common
	------+---------------
	grey  |  15      175
	yellow|  10        0

or like this:

              |	royal	common
	------+---------------
	grey  |    0    95000
	yellow|    2        0
	red   |    3     4995

Can we assign a valid probability to "Clyde is grey" without knowing
the likelihood of either distribution (or any other)?  One school of
thought says no--the best we can do is follow Boole's suggestion of
computing upper and lower bounds for the probability.  Other schools,
notably that led by A. P. Dempster, say yes.

And this topic is all too philosophical enough to be discussed here
in mod.ai!
-- 
Col. G. L. Sicherman
UU: ...{rocksvax|decvax}!sunybcs!colonel
CS: colonel@buffalo-cs
BI: colonel@sunybcs, csdsiche@ubvms

  [SRI is a hotbed of Dempster-Shaferism, so I'll take a chance on
  clarifying this.  Tom Garvey or other readers can correct me if
  I'm off base.  The Dempster-Shafer (D-S) is to track upper and
  lower bounds for probability.  This is controversial in two ways:
  Dempster's rule for combining contradictory evidence, and the
  power/appropriateness/usefulness of the interval approach in general.
  (Conflicting evidence really doesn't enter into the Clyde problem.)
  It is the Bayesians who generally assign probabilities, although
  they don't do it is blindly as their "loyal opposition" would
  imply -- while underlying uniform or even Gaussian distributions
  are typically assumed for predictive power under random sampling,
  Bayesians might choose a "pessimal" a priori distribution to model
  tricky situations such as this one.  They can also do symbolic Bayesian
  analysis with free parameters in order to derive formulas that are
  valid for any state of the world.  Fuzzy logicians use a very similar
  theory, but are likely to assume that the underlying distributions
  are typically implied by the manner in which the problem is stated.
  A fourth group, perhaps led by Tversky and Kahneman, are more interested
  in the analogy-based reasoning of humans than in optimal decision
  theory.  And others, e.g. Cohen and various expert systems researchers,
  are willing to consider any type of estimate as long as the justification
  is given (for use in further reasoning).

  Intervals are nice because they make no unwarranted statements.
  (Disclaimer: The endpoints may themselves by subject to sampling
  errors.  Logic-based methods, including D-S, can be very sensitive
  to errors in the intial evidence -- as can methods based on tightly
  constrained a priori distributions.)  Upper and lower probabilities
  are also more informative than single point estimates, and can
  be interpreted as recording what is unknown as well as what is
  known.  In cases where a parametric distribution is appropriate,
  however, the parameters of that distribution (or optimal estimates
  thereof) are the most powerful estimates of the state of the world.
  Intervals are not convenient for representing true Gaussian distributions,
  for instance, since the intervals must be infinite in extent.  (One
  might want to use intervals for the mean and standard deviation, though.)
  I tend to believe that all sampled data is Gaussian unless there is
  evidence to the contrary (either a priori or from examination of the
  data), partly because that leads to points estimates and distributions
  thereof that are useful.  I would not attempt to impose this assumption
  on Clyde, however, and there are many situations calling for non-Bayesian
  reasoning.  -- KIL]