hummel@RELAXATION.NYU.EDU (Bob Hummel) (03/31/88)
Reading the renewed discussion on the meaning of various calculi of uncertainty reasoning prompts me to inject a note on the Dempster/Shafer formalism. I will summarize a few observations here, but direct interested readers to a joint paper with Michael Landy, which appeared this month in IEEE Pattern Analysis and Machine Intelligence [1]. These observations were inspired, oddly enough, by reading Dempster's original paper, in which he introduces the now-famous combination formula [2]. It seems logical that the original source should contain the motiva- tion and interpretation of the formula. But what is odd is that the interpretation migrated over the years, and that the clear, logical founda- tions became obscure. There have been numerous attempts to reconstruct an explanation of the true meaning of the belief values and the normalization terms and the combination method in the Dempster/Shafer work. Some of these attempts succeed reasonably well. Along these lines, I think the best work is represented by Kyberg's treatment in terms of extrema over collections of opinions [3], and Ruspini's work connecting Dempster/Shafer formalism to Bayesian analysis [4]. Shafer also constructs what he calls a "canonical example" which is supposed to be used as a scale to invoke degrees of belief into general situations, based on "coded messages." The idea, described for example in [5], is isomorphic to the observations made here and in [1] based on the foundations laid by Dempster [2]. The problem is that none of these interpretations lead to generalizations and explain the precise intent of the original formulation. Before giving the succinct interpretation, which, it turns out, is a statistical formulation, I should comment briefly on the compatibility of the various interpretations. When lecturing on the topic, I have often encountered the attitude that the statistical viewpoint is simply one interpretation, limited in scope, and not very helpful. The feeling is that we should be willing to view the constructs in some sort of general way, so as to be able to map the formalism onto more general applications. Here, I believe, is one source of the stridency: that if I would only per- mit myself to view certain values as subjective degrees of belief in anal- ogy with some mystical frame of reference, then I will see why certain arguments make perfectly logical sense. Accordingly, our treatment of the statistical viewpoint is introduced in the framework of algebraic struc- tures, and our results are based on proving an isomorphism between an easily interpreted algebraic structure and the structure induced by the Dempster rule of combination acting on states of belief. So when I use the terms "experts" and "opinions" and related terms below, an alternate interpretation might easily use different concepts. However, any interpre- tation that truly captures the Dempster/Shafer calculus must necessarily be isomorphic, under some mapping identifying corresponding concepts, to the interpretation given here. Here is the formulation. Consider a frame of discernment, here denoted S. Instead of giving a probability distribution over S, we con- sider a collection of experts, say E, where each expert e in E gives an opinion. The opinions are boolean, which is to say that expert e declares which labels in S are possible, and which are ruled out. For the combination formula, suppose we have two collections of experts, E1 and E2. Each expert in E1 and each expert in E2 expresses an opinion, in a boolean fashion, over the labels in S. (An important point is that the frame of discernment S is the same for all collections of experts). We now wish to combine the two collections of experts. We con- sider the cross product E1 X E2, which is the set of all committees of two, with a pair of experts comprised of one expert from E1 and one expert from E2. For any such committee, say (e1,e2), we define the committee's boolean opinion to be the logical intersection of the two composing opinions. Thus the committee says that a label is possible only if both committee members say that the label is possible. We regard the collection of all such com- mittees and their opinions to be a new collection of experts E, with their boolean opinions. We have defined an algebraic structure. Call it the the "Boolean opinions of Experts." The elements of this space consist of pairs (E,f), where E is a collection of experts, and f is their opinions, formed as a map from E to a vector of boolean statements about the labels in S. Now define an equivalence relation. We will say that two such elements are equivalent if the statistics over the collections of experts, among those experts giving at least one possibility, are the same. By the statistics, we mean the following. Let E' be the subset of experts in E for whom at least one label is possible. For any given subset A of S, let m(A) be the percentage of experts in E' that designate precisely A as the subset of possible labels. Note that m of the empty set is 0, by the definition of E', and that m forms a probability distribution over the power set of S. It turns out that m is a mass function, used to define a belief state on S. Further, when sets of experts combine, the statistics, represented by the corresponding m functions, combine in exactly the Dempster rule of combination. (This is no accident. This is the way Dempster defined it.) Accordingly, the set of equivalence classes in the space of "Boolean opinions of Experts" is isomorphic to the Dempster/Shafer formalism, represented as a space of belief states formed by mass distributions m. Some people express disappointment in the Dempster/Shafer theory, when it is viewed this way. For example, it should be noted that no where does the theory make use of probabilities of the labels. The theory makes no distinction between an expert's opinion that a label is likely or that it is remotely possible. This is despite the fact that the belief values seem to give weighted results. Instead, the belief in a particular subset A, it can be shown, corresponds to the fraction of experts in E' who state that every label outside of A is impossible. The weighted values come about by maintaining multiple boolean opinions, instead of one single weighted opin- ion. In the PAMI paper, Mike Landy and I suggest an extension [1], where we track the statistics of probabilistic opinions. In this formulation, we track the mean and covariance of the log's of probabilistic opinions. Details are in Section 5 of the paper. In a follow-on paper [6], presented at the 1987 IJCAI, Larry Manevitz and I extend the formulation to weaken the necessary notion of independence of information. It is always true that some independence assumption is necessary. Larry and I defined a one-parameter measure of a degree of dependence, and show how the formulas tracking means and covariances are transformed. We also consider a case where we combine bodies of experts by union, as opposed to cross product. To those who study these extensions, it will become clear that the formulas bear some resemblance to treatments of uncertainty based on Kalman filtering. For specific applications involving the observation of data and the estimation of parameters, the Kalman theory is certainly to be recom- mended if it can be applied. Robert Hummel Courant Institute New York University References 1. Hummel, Robert A. and Michael S. Landy, "A statistical viewpoint on the theory of evidence," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 235-247 (1988). 2. Dempster, A. P., "Upper and lower probabilities induced by a mul- tivalued mapping," Annals of Mathematical Statistics Vol. 38 pp. 325- 339 (1967). 3. Kyburg, Jr., Henry E., "Bayesian and non-bayesian evidential updat- ing," University of Rochester Dept. of Computer Science Tech. Rep. 139 (July, 1985). 4. Ruspini, E., Proceedings of the International Joint Conference on Artificial Intelligence, (August, 1987). Also SRI Technical Note 408. 5. Shafer, Glenn, "Belief functions and parametric models," Journal of the Royal Statistical Society B Vol. 44 pp. 322-352 (1982). (Includes commentaries). 6. Hummel, Robert and Larry Manevitz, "Combining bodies of dependent information," Tenth International Joint Conference on Artificial Intelligence, (August, 1987).