[comp.ai.digest] On the D/S Theory of Evidence

hummel@RELAXATION.NYU.EDU (Bob Hummel) (03/31/88)

     Reading the renewed discussion on the meaning of  various  calculi  of
uncertainty  reasoning  prompts  me to inject a note on the Dempster/Shafer
formalism.  I will summarize a few observations here, but direct interested
readers  to  a joint paper with Michael Landy, which appeared this month in
IEEE Pattern Analysis and Machine Intelligence [1].

     These observations were inspired, oddly enough, by reading  Dempster's
original  paper,  in which he introduces the now-famous combination formula
[2].  It seems logical that the original source should contain the  motiva-
tion  and  interpretation  of  the  formula.  But  what  is odd is that the
interpretation migrated over the years, and that the clear, logical founda-
tions  became obscure.  There have been numerous attempts to reconstruct an
explanation of the true meaning of the belief values and the  normalization
terms  and  the  combination  method  in the Dempster/Shafer work.  Some of
these attempts succeed reasonably well.  Along these  lines,  I  think  the
best  work  is  represented  by Kyberg's treatment in terms of extrema over
collections of opinions [3], and Ruspini's work connecting  Dempster/Shafer
formalism to Bayesian analysis [4].  Shafer also constructs what he calls a
"canonical example" which is supposed to be  used  as  a  scale  to  invoke
degrees  of  belief into general situations, based on "coded messages." The
idea, described for example in [5], is isomorphic to the observations  made
here and in [1] based on the foundations laid by Dempster [2].  The problem
is that none of these interpretations lead to generalizations  and  explain
the precise intent of the original formulation.

     Before giving the succinct interpretation, which, it turns out,  is  a
statistical  formulation,  I should comment briefly on the compatibility of
the various interpretations.  When lecturing on the  topic,  I  have  often
encountered  the  attitude  that  the  statistical  viewpoint is simply one
interpretation, limited in scope, and not very  helpful.   The  feeling  is
that  we  should  be willing to view the constructs in some sort of general
way, so as to be able to map the formalism onto more general  applications.
Here, I believe, is one source of the stridency:  that if I would only per-
mit myself to view certain values as subjective degrees of belief in  anal-
ogy  with  some  mystical  frame  of reference, then I will see why certain
arguments make perfectly logical sense.  Accordingly, our treatment of  the
statistical  viewpoint  is  introduced in the framework of algebraic struc-
tures, and our results are based  on  proving  an  isomorphism  between  an
easily  interpreted  algebraic  structure  and the structure induced by the
Dempster rule of combination acting on states of belief.  So when I use the
terms  "experts"  and  "opinions"  and  related  terms  below, an alternate
interpretation might easily use different concepts.  However, any interpre-
tation that truly captures the Dempster/Shafer calculus must necessarily be
isomorphic, under some mapping identifying corresponding concepts,  to  the
interpretation given here.


     Here is the  formulation.   Consider  a  frame  of  discernment,  here
denoted  S.   Instead  of giving a probability distribution over S, we con-
sider a collection of experts, say E, where each expert e  in  E  gives  an
opinion.   The opinions are boolean, which is to say that expert e declares
which labels in S are possible, and which are ruled out.

     For the combination  formula,  suppose  we  have  two  collections  of
experts,  E1  and E2.  Each expert in E1 and each expert in E2 expresses an
opinion, in a boolean fashion, over the labels in S.  (An  important  point
is  that  the  frame  of  discernment  S is the same for all collections of
experts).  We now wish to combine the two collections of experts.  We  con-
sider the cross product E1 X E2, which is the set of all committees of two,
with a pair of experts comprised of one expert from E1 and one expert  from
E2.  For any such committee, say (e1,e2), we define the committee's boolean
opinion to be the logical intersection of the two composing opinions.  Thus
the  committee says that a label is possible only if both committee members
say that the label is possible.  We regard the collection of all such  com-
mittees  and their opinions to be a new collection of experts E, with their
boolean opinions.

     We have defined an algebraic structure.   Call  it  the  the  "Boolean
opinions  of  Experts."  The elements of this space consist of pairs (E,f),
where E is a collection of experts, and f is their opinions,  formed  as  a
map  from  E  to a vector of boolean statements about the labels in S.  Now
define an equivalence relation.  We will say that  two  such  elements  are
equivalent  if  the statistics over the collections of experts, among those
experts giving at least one possibility, are the same.  By the  statistics,
we  mean  the  following.  Let E' be the subset of experts in E for whom at
least one label is possible.  For any given subset A of S, let m(A) be  the
percentage  of  experts  in  E' that designate precisely A as the subset of
possible labels.  Note that m of the empty set is 0, by the  definition  of
E', and that m forms a probability distribution over the power set of S.

     It turns out that m is a mass function, used to define a belief  state
on  S.   Further, when sets of experts combine, the statistics, represented
by the corresponding m functions, combine in exactly the Dempster  rule  of
combination.  (This is no accident.  This is the way Dempster defined it.)

     Accordingly, the set of equivalence classes in the space  of  "Boolean
opinions  of  Experts"  is  isomorphic  to  the  Dempster/Shafer formalism,
represented as a space of belief states formed by mass distributions m.


     Some people express disappointment in the Dempster/Shafer theory, when
it  is viewed this way.  For example, it should be noted that no where does
the theory make use of probabilities of the labels.  The  theory  makes  no
distinction  between  an expert's opinion that a label is likely or that it
is remotely possible.  This is despite the fact that the belief values seem
to give weighted results.  Instead, the belief in a particular subset A, it
can be shown, corresponds to the fraction of experts in E' who  state  that
every  label outside of A is impossible.  The weighted values come about by
maintaining multiple boolean opinions, instead of one single weighted opin-
ion.

     In the PAMI paper, Mike Landy and I suggest an extension [1], where we
track  the  statistics  of probabilistic opinions.  In this formulation, we
track the mean and covariance  of  the  log's  of  probabilistic  opinions.
Details are in Section 5 of the paper.

     In a follow-on paper [6], presented at the 1987 IJCAI, Larry  Manevitz
and I extend the formulation to weaken the necessary notion of independence
of information.  It is always true that  some  independence  assumption  is
necessary.   Larry  and  I  defined  a one-parameter measure of a degree of
dependence, and show how the formulas tracking means  and  covariances  are
transformed.  We also consider a case where we combine bodies of experts by
union, as opposed to cross product.

     To those who study these extensions, it will  become  clear  that  the
formulas bear some resemblance to treatments of uncertainty based on Kalman
filtering.  For specific applications involving the observation of data and
the  estimation  of parameters, the Kalman theory is certainly to be recom-
mended if it can be applied.
						Robert Hummel
						Courant Institute
						New York University

References

1.   Hummel, Robert A. and Michael S. Landy, "A  statistical  viewpoint  on
     the  theory  of  evidence,"  IEEE Transactions on Pattern Analysis and
     Machine Intelligence,  pp. 235-247 (1988).

2.   Dempster, A. P., "Upper and lower  probabilities  induced  by  a  mul-
     tivalued  mapping," Annals of Mathematical Statistics Vol. 38 pp. 325-
     339 (1967).

3.   Kyburg, Jr., Henry E., "Bayesian and  non-bayesian  evidential  updat-
     ing," University of Rochester Dept. of Computer Science Tech. Rep. 139
     (July, 1985).

4.   Ruspini, E., Proceedings of  the  International  Joint  Conference  on
     Artificial Intelligence, (August, 1987).  Also SRI Technical Note 408.

5.   Shafer, Glenn, "Belief functions and parametric  models,"  Journal  of
     the Royal Statistical Society B Vol. 44 pp. 322-352 (1982).  (Includes
     commentaries).

6.   Hummel, Robert and Larry  Manevitz,  "Combining  bodies  of  dependent
     information,"  Tenth  International  Joint  Conference  on  Artificial
     Intelligence, (August, 1987).