Peeling algorithm on Zadeh’s Example

Claude Boivin1

2022-03-03

Summary

The criticism of Dempster-Shafer Theory (DST) by L. A. Zadeh 2 has generated a lot of discussions and articles on the subject of conflicting evidence. In 2005, R. Haenni 3 showed that the surprising result obtained by applying Dempster’s rule to Zadeh’s example was more due to the modelling of the situation than Dempster’s rule not working. I add my grain of salt to the debate on Zadeh’s example by showing a formulation of the problem as a small belief network and using Dempster’s rule of combination to obtain a realistic result. This belief network gives the same results as the combination of the two evidences with the disjunctive rule of combination 4. At the same time, I show how to do the calculations using my R package dst5.

Zadeh’s Example

We suppose that a patient is examined by two doctors, A and B. A’s diagnosis is that P has either meningitis (M), with probability 0.99, or brain tumor (T), with probability 0.01. B agrees with A that the probability of a brain tumor is 0.01, but believes that it is the probability of concussion (C) rather than meningitis that is 0.99.

Zadeh considers the same space of diseases {M, T, C} for the two experts. Hence, after the combination of the two pieces of evidence by Dempster’s rule, we find as a result that the belief of a brain tumor is certain.

Space of possibilities and Basic Chance Assignment of Expert 1
$Diagnosis1
[1] "M" "T" "C"
  Expert1 specnb mass
1       M      1 0.99
2       T      2 0.01
Space of possibilities and Basic Chance Assignment of Expert 2
$Diagnosis2
[1] "M" "T" "C"
  Expert2 specnb mass
1       T      1 0.01
2       C      2 0.99
Combination of the two experts by Dempster's rule
      M T C mass bel disbel      unc     plau    rplau
M     1 0 0    0   0      1 -1.1e-13 -1.1e-13 -1.1e-13
T     0 1 0    1   1      0 -1.1e-13  1.0e+00 -9.1e+12
C     0 0 1    0   0      1 -1.1e-13 -1.1e-13 -1.1e-13
frame 1 1 1    0   1      0 -1.1e-13  1.0e+00 -9.1e+12

Indeed, the result does not reflect the opinions of the two experts. Before rejecting Dempster’s rule as inappropriate to this situation, let’s look more closely at the problem at hand.

A correct solution using Dempster’s rule of combination

Diagnosis of the two experts

Let’s take Expert one. Expert number one distributes the whole mass between the two singletons {M} and {T}. Expert number one does not consider {C} as a possibility. Hence we conclude that the space of possibilities of Expert number one cannot be {M, T, C}. We can say that Expert number one has restricted the space of possibilities of his/her diagnostic to the set {M, T}:

\(F(D1) = \{M, T\}\). For simplicity, we write \(D1 =\{M, T\}\).

The same line of reasoning is applied to Expert number two. The whole mass of one is allotted to the set {T, C}, and the third possibility (M) is not considered at all. Hence \(D2 = \{T, C\}\).

I show the coding of these two pieces of evidence with the function bca of the package dst.

Space of possibilities and Basic Chance Assignment of Expert 1
$D1
[1] "M" "T"
  e1 specnb mass
1  M      1 0.99
2  T      2 0.01
Space of possibilities and Basic Chance Assignment of Expert 2
$D2
[1] "C" "T"
  e2 specnb mass
1  C      1 0.99
2  T      2 0.01

Linking the Experts to… the Patient

The two experts are reasoning in two different spaces of possibilities. To be able to combine their diagnosis, we need a common ground. This can be done if we introduce a third person, the patient, with a variable of interest, his/her disease (D). Then it is natural to take the union of the space of possibilities of Expert number one and Expert number two as the space of possibilities of the patient:

\(D = \{M, T\} \cup \{C, T\} = \{M, T, C\}\).

Thus, the diagnosis of the patient’s disease involves pooling the assessments of the two experts, using the “or” operator. This situation is described by a relation of implication between experts and patient:

r1: \(D1 \cup D2 \rightarrow D\).

The relation r1 is represented in the product space \(\prod(D1, D2, D)\) by one focal set of mass one:

\(m(M C M + M C C + M T M + M T T + T C T + T C C + T T T) = 1\) (for simplicity, the “+” sign is used as the \(\vee\) disjunctive operator in the functions of the package dst). Now I use the function bcaRel to code this relation.

 The relation r1
                                                     r1 specnb mass
1 M C M + M C C + M T M + M T T + T C T + T C C + T T T      1    1

The belief network

We now have all the elements of a small network made of one relation (r1) between three variables: Disease (D), Diagnosis1 (D1), Diagnosis2 (D2), and two pieces of evidence coming from Expert one (e1) and Expert two (e2).

The three variables D, D1 and D2 are the nodes of the graph. The edges (hyperedges) are given by the relation r1 and the two pieces of evidence e1 and e2.

Using the igraph package, 6 a bipartite graph corresponding to the hypergraph can be obtained.

Calculations on the Belief Network: The peeling algorithm

Our goal is the calculation of the belief function of the variable of interest “D” (Disease of the patient). We apply an algorithm called “Peeling” to the belief network. This is a process of successive elimination of variables (peeling) until only the variable of interest (D here) remains. The elimination of a variable has the effect of integrating its contribution to the reduced graph.

Four parameters are necessary to trigger the algorithm. The first three are already defined when constructing the hypergraph. The fourth parameter is an order of elimination of the variables that we have to set.

  1. Identification of variables and their space of possibilities \[F(D1) = \{M, T\}\] \[F(D2) = \{M, C\}\] \[F(D) = \{M, T, C\}\]

  2. Incidence matrix of the graph (nodes and edges)

Row names are variables names (nodes).
Column names are for pieces of evidence and relations (edges).
   e1 e2 r1
D1  1  0  1
D2  0  1  1
D   0  0  1
  1. The names of data specifications (evidence and relations between variables)
[1] "e1" "e2" "r1"
  1. Variable numbers are used to fix the order of elimination. Here we eliminate D1 first, then D2.
   varnb size V3
D1     1    2 D1
D2     2    2 D2
D      3    3  D

The calculations involved follow the principles of the valuation language of Shenoy 7; see also 8. The variables are linked to functions (called valuations). A function can be a piece of evidence attached to a variable or a relation between two or more variables.

Three kinds of operations are involved in the process of variable elimination: a) the minimal (vacuous) extension of a mass function to a larger space of possibilities; b) the combination of two mass functions by Dempster’s rule; c) the marginalization of a mass function, i.e. eliminating a variable to reduce the function to a smaller space of possibilities. Let’s do it.

First step: Eliminate variable D1 (Diagnosis1). The mass function e1 is extended to the space \(\prod(D1, D2, D)\); then e1 extended is combined with r1 by Dempster’s rule; finally, D1 is eliminated by marginalizing the result of the combination to \(\prod(D2, D)\). The mass function obtained is named rel_2.

Second step: Eliminate variable D2 (Diagnosis2). Evidence e2 is extended to the space \(\prod(D2, D)\); Then e2 extended is combined with rel_2 by Dempster’s rule; the result of the combination is marginalized to D to produce the final result.

The result

i = : 1 . Eliminating variable no  1 : D1 
rels numbers to elim 1 3 
i = : 2 . Eliminating variable no  2 : D2 
rels numbers to elim 2 4 
Peeling ended 
      M T C   mass    bel disbel  unc plau rplau
T     0 1 0 0.0001 0.0001 0.9801 0.02 0.02  0.02
M     1 0 0 0.0000 0.0000 0.0100 0.99 0.99  0.99
C     0 0 1 0.0000 0.0000 0.0100 0.99 0.99  0.99
T + C 0 1 1 0.0099 0.0100 0.0000 0.99 1.00  1.01
M + C 1 0 1 0.9801 0.9801 0.0001 0.02 1.00 50.25
M + T 1 1 0 0.0099 0.0100 0.0000 0.99 1.00  1.01
frame 1 1 1 0.0000 1.0000 0.0000 0.00 1.00   Inf

The plausibility ratio column shows that the odds of \(M \vee C\) against T are 50:1. hence, the disease of the patient must be M or C. We also see that each single hypothesis, M and C, remains highly plausible (0.99). Although there is some support for T, its plausibility is very weak at 0.019.

Finally, we use the plausibility transformation 9 to look at the results from the point of view of probability distribution. We see again that the odds for M against T or for C against T are very similar.

  M T C  trplau
M 1 0 0 0.49502
T 0 1 0 0.00995
C 0 0 1 0.49502

  1. Retired Statistician, Stat.ASSQ↩︎

  2. L. A. Zadeh. A mathematical theory of evidence (book review). AI Magazine, 55(81—83), 1984↩︎

  3. R. Haenni. Shedding New Light on Zadeh’s Criticism of Dempster’s Rule of Combination. Conference: Information Fusion, 2005 8th International Conference↩︎

  4. P. Smets (1993). Belief Functions: The Disjunctive Rule of Combination and the Generalized Bayesian Theorem. IRIDIA - Université Libre de Bruxelles, Brussels, Belgium↩︎

  5. https://cran.r-project.org/package=dst↩︎

  6. Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. https://igraph.org↩︎

  7. P. P. Shenoy. A Valuation-Based Language for Expert systems. International Journal of Approximate Reasoning 1989, 3 383–411↩︎

  8. P. P. Shenoy. Valuation-Based Systems. Third School on Belief Functions and Their Applications, Stella Plage, France. September 30, 2015↩︎

  9. Cobb, B. R. and Shenoy, P.P. (2006). On the plausibility transformation method for translating belief function models to probability models. Journal of Approximate Reasoning, 41(3), April 2006, 314–330↩︎

mirror server hosted at Truenetwork, Russian Federation.