When do Numbers Really Matter?

Abstract:Common wisdom has it that small distinctions in the probabilities (parameters) quantifying a belief network do not matter much for the results of probabilistic queries. Yet, one can develop realistic scenarios under which small variations in network parameters can lead to significant changes in computed queries. A pending theoretical question is then to analytically characterize parameter changes that do or do not matter. In this paper, we study the sensitivity of probabilistic queries to changes in network parameters and prove some tight bounds on the impact that such parameters can have on queries. Our analytic results pinpoint some interesting situations under which parameter changes do or do not matter. These results are important for knowledge engineers as they help them identify influential network parameters. They also help explain some of the previous experimental results and observations with regards to network robustness against parameter changes.Extracted Text

Journal of Artificial Intelligence Research 17 (2002) 265-287 Submitted 10/01;
published 9/02
When do Numbers Really Matter? Hei Chan hei@cs.ucla.edu Adnan Darwiche darwiche@cs.ucla.edu
Computer Science Department University of California, Los Angeles Los Angeles,
CA 90095, USA
Abstract Common wisdom has it that small distinctions in the probabilities
(parameters) quan-tifying a belief network do not matter much for the results
of probabilistic queries. Yet,
one can develop realistic scenarios under which small variations in network
parameters canlead to significant changes in computed queries. A pending
theoretical question is then to analytically characterize parameter changes
that do or do not matter. In this paper, westudy the sensitivity of probabilistic
queries to changes in network parameters and prove some tight bounds on the
impact that such parameters can have on queries. Our analyticresults pinpoint
some interesting situations under which parameter changes do or do not matter.
These results are important for knowledge engineers as they help them identifyinfluential
network parameters. They also help explain some of the previous experimental
results and observations with regards to network robustness against parameter
changes.
1. Introduction A belief network is a compact representation of a probability
distribution (Pearl, 1988;Jensen, 2001). It consists of two parts, one qualitative
and the other quantitative. The
qualitative part of a belief network (called its structure) is a directed
acyclic graph inwhich nodes represent domain variables and edges represent
direct influences between these variables. The quantitative part of a belief
network is a set of conditional probability tables(CPTs) that quantify our
beliefs in such influences. Figure 1 depicts the structure of a belief network
and Figure 2 depicts its CPTs.1
Automated reasoning systems based on belief networks have become quite popular
re-cently as they have enjoyed much success in a number of real-world applications.
Central to
the development of such systems is the construction of a belief network (hence,
a probabil-ity distribution) that faithfully represents the domain of interest.
Although the automatic synthesis of belief networks--based on design information
in certain applications and basedon learning techniques in others--has been
drawing a lot of attention recently, mainstream methods for constructing
such networks continue to be based on traditional knowledge en-gineering
(KE) sessions involving domain experts. One of the central issues that arise
in such KE sessions is the assessment of impact that changes in network parameters
may haveon probabilistic queries of interest.
Consider for example the following common method for constructing belief
networks inmedical diagnosis applications (Coup'e, Peek, Ottenkamp, & Habbema,
1999). First, the
1. This specific network and its CPTs are distributed with the evaluation
version of the commercial HUGIN
system at http://www.hugin.com/.
cr^2002 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.
Chan & Darwiche
Fire Smoke
Tampering
Leaving
Alarm
Report
Figure 1: A belief network structure. Fire t,x|u
true .01 false .99
Fire Smoke t,x|u
true true .9 true false .1 false true .01 false false .99
Alarm Leaving t,x|u
true true .88 true false .12 false true .001 false false .999
Tampering t,x|u
true .02 false .98
Fire Tampering Alarm t,x|u
true true true .5 true true false .5 true false true .99 true false false
.01 false true true .85 false true false .15 false false true .0001 false
false false .9999
Leaving Report t,x|u
true true .75 true false .25 false true .01 false false .99
Figure 2: The CPTs of the belief network shown in Figure 1.
network structure is developed. Next, parameters are estimated by non-experts
using acombination of statistical data and qualitative influences available
from textbook materials. Finally, medical experts are brought in to evaluate
the network and fine-tune its parameters.One method of evaluation is to pose
diagnostic scenarios to the network, and compare the results of such queries
to those expected by the experts. For example, given some set ofsymptoms
e, and two potential diagnoses y and z, the network may give us the conclusionthat
Pr ( y | e)/Pr (z | e) = 2, while a domain expert may believe that the ratio
shouldbe no less than 4. Assuming that the network structure is correct,
a central question is
then: which network parameters should be changed to give us the correct ratio,
and by howmuch?
To automate the task of identifying such parameter changes, we have recently
devel-oped a belief network tool, called
SamIam (Sensitivity Analysis, Modelling, Inference And
266
When do Numbers Really Matter? Figure 3: A screen shot of SamIam performing
sensitivity analysis on the belief networkshown in Figure 1. More)2. One
of its feature is sensitivity analysis, which allows domain experts to fine-tunenetwork
parameters in order to enforce constraints on the results of certain queries.
Users can specify the constraint that they want to enforce, and SamIam will
automatically de-cide whether a given parameter is relevant to this constraint,
and if it is, will compute the minimum amount of change to that parameter
which is needed to enforce the constraint.The technical details of our approach
to sensitivity analysis are the subject of Section 2.
As we experimented with SamIam, we ran into scenarios that we found to be
surprisingat first glance. Specifically, there were many occasions in which
queries would be quite sensitive to small variations in certain network parameters.
Consider the scenario in Figure 3for one example, which corresponds to the
network detailed in Figures 1 and 2. Here, we have evidence e = report, smoke:
people are reported to be evacuating a building, but thereis no evidence
for any smoke. This evidence should make tampering more likely than fire,
and the given belief network does indeed reflect this with Pr (tampering
| e) = .50 andPr (fire |
e) = .03. We wanted, however, the probability of tampering to be no less
than .65.Hence, we asked
SamIam to identify parameter changes that can enforce this constraint,and
it made two recommendations:
1. either decrease the probability of a false report, Pr (report | leaving),
from its currentvalue of
.01 to <= .0047,
2. SamIam is developed by the UCLA Automated Reasoning Group. Its web page
is at
http://reasoning.cs.ucla.edu/.
267
Chan & Darwiche 2. or increase the prior probability of tampering from its
current value of .02 to >= .036. Therefore, the distinctions between .02
and .036, and the one between .01 and .0047, doreally matter in this case
as each induces an absolute change of
.15 on the probabilistic queryof interest. Note also that implicit in SamIam's
recommendations is that the parametersof variables Fire, Smoke, Leaving,
and Alarm are irrelevant to enforcing this constraint, i.e.
no matter how much we change any of these parameters, we would not be able
to enforceour desired constraint.
This example shows that the absolute change in a query can be much larger
than theabsolute change in the corresponding parameters. Later, we will show
an example where an infinitesimal change to a network parameter leads to
a change of .5 to a correspondingquery. We also show examples in which the
relative change in the probability of a query is larger than the corresponding
relative change in a network parameter. One wonders thenwhether there is
a different method for measuring probabilistic change (other than absolute
or relative), which allows one to non-trivially bound the change in a probabilistic
query interms of the corresponding change in a network parameter.
To answer this and related questions, we conduct in Section 3 an analytic
study of thepartial derivative of a probabilistic query Pr (
y | e) with respect to some network parameter t,x|u. Our study leads us to
three main results:
1. a bound on the derivative in terms of Pr (y | e) and Pr (x | u) only,
which is indepen-dent of any other aspect of the given belief network;
2. a bound on the sensitivity of queries to infinitesimal changes in network
parameters; 3. a bound on the sensitivity of queries to arbitrary changes
in network parameters. The last bound in particular shows that the amount
of change in a probabilistic query canbe bounded in terms of the amount of
change in a network parameter, as long as change is
understood to be the relative change in odds. This result has a number of
practical impli-cations. First, it can relieve experts from having to be
too precise when specifying certain parameters subjectively. Next, it can
be important for approximate inference algorithmsthat pre-process network
parameters to eliminate small distinctions in such parameters, in order to
increase the efficiency of inference (Poole, 1998). Finally, it can be used
to showthat automated reasoning systems based on belief networks are robust
and, hence, suitable for real-world applications (Pradhan, Henrion, Provan,
Del Favero, & Huang, 1996).Section 4 is indeed dedicated to exploring the
implications of the above bounds, where we provide an analytic explanation
of why certain parameter changes don't matter. Wefinally close in Section
5 with some concluding remarks. Proofs of all theorems are given in Appendix
A.
2. The Tuning of Network Parameters We report in this section on a tool that
we have been developing, called SamIam, for fine-tuning network parameters
(Laskey, 1995; Castillo, Guti'errez, & Hadi, 1997; Jensen, 1999;
Kjaerulff & van der Gaag, 2000; Darwiche, 2000). Given a belief network,
some evidence e, which is an instantiation of variables E in the belief network,
and two events y and z ofvariables
Y and Z respectively, where Y, Z 62 E, our tool can efficiently identify
parameterchanges needed to enforce the following types of constraints:
268
When do Numbers Really Matter? Difference: Pr (y | e) - Pr (z | e) >= s^;
Ratio: Pr (y | e)/Pr (z | e) >= s^. These two constraints often arise when
we debug belief networks. For example, we canmake event
y more likely than event z, given evidence e, by specifying the constraint,Pr
( y | e) - Pr (z | e) >= 0, or we can make event y at least twice as likely
as event z, givenevidence
e, by specifying the constraint, Pr (y | e)/Pr (z | e) >= 2. We will discuss
next howone would enforce the two constraints, but we need to settle some
notational conventions
and technical preliminaries first.
Variables are denoted by upper-case letters (A) and their values by lower-case
letters (a).Sets of variables are denoted by bold-face upper-case letters
(
A) and their instantiationsare denoted by bold-face lower-case letters (
a). For a variable A with values true and false,we use a to denote A = true
and a to denote A = false. The CPT for variable X withparents
U defines a set of conditional probabilities of the form Pr (x | u), where
x is a valueof variable
X, u is an instantiation of parents U, and Pr (x | u) is a probability known
as anetwork parameter and denoted by
t,x|u. We finally recall a basic fact about belief networks.The probability
of some instantiation
x of all network variables X equals the productof all network parameters
that are consistent with that instantiation. For example, the
probability of instantiation fire, tampering, smoke, alarm, leaving, report
in Figure 1 equals .01 * .98 * .9 * .99 * .12 * .01, which is the product
of network parameters (from Figure 2)that are consistent with this instantiation.
2.1 Binary Variables We first consider the parameters of a binary variable
X, with two values x and x and,hence, two parameters
t,x|u and t,x|u for each parent instantiation u. We assume that foreach variable
X and parent instantiation u we have a meta parameter #x|u, such that t,x|u
= #x|u and t,x|u = 1 - #x|u. Therefore, our goal is then to determine the
amount ofchange to the meta parameter
#x|u which would lead to a simultaneous change in both t,x|uand t,x|u. We
use the meta parameter #x|u because it is not meaningful to change only t,x|uor
t,x|u without changing the other since t,x|u + t,x|u = 1.
First we observe that the probability of an instantiation e, Pr (e), is a
linear functionin any network parameter
t,x|u in a belief network (Russell, Binder, Koller, & Kanazawa,1995; Castillo
et al., 1997). In fact, the probability is linear in any meta parameter
#x|u.
Theorem 2.1 The derivative of Pr (e) with respect to the meta parameter #x|u
is given by:
@Pr (e)
@#x|u =
Pr (e, x, u)
t,x|u -
Pr (e, x, u)
t,x|u , (1)
when t,x|u 6= 0 and t,x|u 6= 0.3 We will designate the derivative as constant
o""e.
In Theorem 2.1, o""e = Pr (e, x, u)/t,x|u - Pr (e, x, u)/t,x|u is a constant
in terms of both t,x|u and t,x|u (and consequently, #x|u) since Pr (e, x,
u) = Kxt,x|u and Pr (e, x, u) = Kxt,x|u,
3. If either of the previous parameters is zero, we can use the differential
approach by Darwiche (2000) to
compute the derivative directly.
269
Chan & Darwiche where Kx = Pr (u)Pr (e | x, u) and Kx = Pr (u)Pr (e | x,
u) are constants in terms of both t,x|u and t,x|u. By substituting y, e and
z, e for e in Theorem 2.1, we get:
o""y,e = @Pr (y, e)@#
x|u =
Pr (y, e, x, u)
t,x|u -
Pr (y, e, x, u)
t,x|u ; (2)
o""z,e = @Pr (z, e)@#
x|u =
Pr (z, e, x, u)
t,x|u -
Pr (z, e, x, u)
t,x|u . (3)
Now, if we want to enforce the Difference constraint, Pr (y | e) - Pr (z
| e) >= s^, itsuffices to ensure that Pr (
y, e) - Pr (z, e) >= s^Pr (e). Suppose that the previous constraintdoes not
hold, and we wish to establish it by applying a change of
s' to the meta parameter #x|u. Such a change leads to a change of o""es'
in Pr (e). It also changes Pr (y, e) and Pr (z, e)by
o""y,es' and o""z,es', respectively. Hence, to enforce the Difference constraint,
we need tosolve for
s' in the following inequality:
[Pr (y, e) + o""y,es'] - [Pr (z, e) + o""z,es'] >= s^[Pr (e) + o""es']. Rearranging
the terms, we get the following result. Corollary 2.1 To satisfy the Difference
constraint, we need to change the meta param-eter
#x|u by s', such that:
Pr (y, e) - Pr (z, e) - s^Pr (e) >= s'[-o""y,e + o""z,e + s^o""e], where
the o"" constants are defined by Equations 1, 2 and 3.
We can similarly solve for parameter changes s' that enforce the Ratio constraint,Pr
(
y | e)/Pr (z | e) >= s^, in the following inequality:
[Pr (y, e) + o""y,es']/[Pr (z, e) + o""z,es'] >= s^. Rearranging the terms,
we get the following result. Corollary 2.2 To satisfy the Ratio constraint,
we need to change the meta parameter #x|uby
s', such that:
Pr (y, e) - s^Pr (z, e) >= s'[-o""y,e + s^o""z,e],
where the o"" constants are defined by Equations 2 and 3.
For both the Difference and Ratio constraints, the solution of s', if any,
is always inone of two forms:
* s' <= q, for some computed q < 0, in which case the new value of meta parameter
#x|umust be in the interval [0
, p + q].
* s' >= q, for some computed q > 0, in which case the new value of meta parameter
#x|umust be in the interval [
p + q, 1].
270
When do Numbers Really Matter? Note that p is the current value of meta parameter
#x|u (before the change). For manyparameters, these intervals are empty and,
therefore, there is no way we can change these meta parameters to enforce
the constraint.
The question now is how to solve these inequalities, efficiently, and for
all meta parame-ters. Note that there may be more than one possible parameter
change that would enforce
the given constraint, so we need to identify all such changes. With either
Corollary 2.1or 2.2, we can easily solve for the amount of change needed,
s', once we know the followingprobabilities: Pr ( e), Pr (y, e), Pr (z, e),
Pr (e, x, u), Pr (e, x, u), Pr (y, e, x, u), Pr (y, e, x, u),Pr ( z, e, x,
u), and Pr (z, e, x, u). This leads to the following complexity of our technique.
Corollary 2.3 If we have an algorithm that can compute Pr (i, x, u), for
a given instanti-ation
i, and all family instantiations x, u of every variable X, in time O(f ),
then we cansolve for Corollaries 2.1 and 2.2 for all parameters in time
O(f ). We do this by runningthe algorithm three times, once with i = e, and
then with i = y, e, and finally with i = z, e.
Recall that the family of a variable X is the set containing X, and its parents
U in thebelief network.
The join-tree algorithm (Jensen, Lauritzen, & Olesen, 1990) and the differential
ap-proach (Darwiche, 2000) can both compute Pr (
i, x, u), for a given instantiation i and allfamily instantiations x, u of
every variable X in O(n exp w) time. Here, n is the number ofvariables in
the belief network, and
w is the width of a given elimination order. SamIamuses the differential
approach, and thus its running time to identify all possible parameter
changes in a network is also O(n exp w). Note that this is also the time
needed to answerone of the simplest queries, that of computing the probability
of evidence
e.
2.2 Multi-Valued Variables Our results can be easily extended to multi-valued
variables, as long as we assume a modelfor changing co-varying parameters
when one of them changes (Darwiche, 2000; Kjaerulff &
van der Gaag, 2000). After the parameter t,x|u changes, we need to use a
scheme to changethe other parameters,
t,xi|u for all xi 6= x, in order to ensure the sum-to-one constraint.The
most common way to do this is to use the proportional scheme. In this scheme,
we change the other parameters so that the ratios between them remain the
same. Forexample, suppose we have three parameters
t,x1|u = .6, t,x2|u = .3 and t,x3|u = .1. After t,x1|u changes to .8, the
other two parameter values will be changed to t,x2|u = .3(.2/.4) = .15and
t,x3|u = .1(.2/.4) = .05 accordingly. We now define the meta parameter #x|u
such that itsimultaneously changes all parameters according to the proportional
scheme. We can then
obtain a linear relation between Pr (e) and #x|u, and the partial derivative
is given by:
@Pr (e)
#x|u =
Pr (e, x, u)
t,x|u - P
xi6=x Pr (e, xi, u)P
xi6=x t,xi|u .
This is very similar to the result in Theorem 2.1, in the way that we have
grouped all thevalues
xi 6= x into the value x. We can then use Corollaries 2.1 and 2.2 to solve
for the Difference and Ratio constraints.
We now present another example to illustrate how the results above are used
in practice.
271
Chan & Darwiche Example 2.1 Consider again the network in Figure 3. Here,
we set the evidence such thatwe have smoke, but no report of people evacuating
the building, i.e.
e = smoke, report. Wethen got the posteriors Pr (fire | e) = .25 and Pr (tampering
| e) = .02. We thought in thiscase that the posterior on fire should be no
less than
.5 and asked SamIam to recommendthe necessary changes to enforce the constraint,
Pr (fire |
e) - Pr (fire | e) >= 0. There werefive recommendations in this case, three
of which could be ruled out based on qualitative
considerations:
1. increase the prior on fire to >= .03 (from .01); 2. increase the prior
on tampering to >= .80 (from .02); 3. decrease Pr (smoke | fire) to <= .003
(from .01); 4. increase Pr (leaving | alarm) to >= .923 (from .001); 5. increase
Pr (report | leaving) to >= .776 (from .01). Clearly, the only sensible change
here is either to increase the prior on fire, or to decreasethe probability
of having smoke without a fire.
This example and other similar ones suggest that identifying such parameter
changesand their magnitudes is inevitable for developing a faithful belief
network, yet it is not trivial for experts to accomplish this task by visual
inspection of the belief network, often due toits size and complexity. Sensitivity
analysis tools such as
SamIam can help facilitate thisby identifying important parameters that need
to be fine-tuned in order to satisfy certain
constraints. Of course, if we are given multiple constraints, we need to
be cautious whenimplementing a recommendation made by
SamIam due to one constraint, because this mayresult in violating other constraints.
In this case, the parameter changes recommended
by SamIam should be used to help experts in focusing their attention on the
relevantparameters.
Moreover, the previous examples illustrate the need to develop more analytic
tools tounderstand and explain the sensitivity of queries to certain parameter
changes. There is also a need to reconcile the sensitivities exhibited by
our examples with previous experimen-tal studies demonstrating the robustness
of probabilistic queries against small parameter changes in certain application
areas, such as diagnosis (Pradhan et al., 1996). We addressthese particular
questions in the next two sections.
3. The Sensitivity of Probabilistic Queries to Parameters Changes Our starting
point in understanding the sensitivity of a query Pr (y | e) to changes in
ameta parameter
#x|u is to analyze the derivative @Pr (y | e)/@#x|u. In our analysis, weassume
that X is binary, but Y and all other variables in the network can be multi-valued.The
following theorem provides a simple bound on this derivative, in terms of
Pr (
y | e)and Pr ( x | u) only. We then use this simple bound to study the effect
of changes to metaparameters on probabilistic queries.
272
When do Numbers Really Matter?
0.2
0.4
0.6
0.8Pr(x| u)
0.2
0.4
0.6
0.8
Pr(y|e) 0 2 4 6 pd bound
0.2
0.4
0.6
0.8Pr(x| u)
Figure 4: The plot of the upper bound on the partial derivative @Pr (y |
e)/@#x|u, as givenin Theorem 3.1, against Pr (
x | u) and Pr (y | e).
Theorem 3.1 If X is a binary variable in a belief network, then:4r'r'r'r'
r'
@Pr (y | e)
@#x|u r'r'r'r'r' <=
Pr (y | e)(1 - Pr (y | e)) Pr (x | u)(1 - Pr (x | u)) .
The bound in Theorem 3.1 is tight, and we will show later an example for
which thederivative assumes the above bound exactly. The main point to note
about this bound is that it is independent of any given belief network.5
The plot of this bound against Pr (x | u) and Pr (y | e) is shown in Figure
4. A numberof observations are in order about this plot:
* For extreme values of Pr (x | u), the bound approaches infinity, and thus
a smallabsolute change in the meta parameter
#x|u can have a big impact on the queryPr ( y | e).
* On the other hand, the bound approaches 0 for extreme values of the query
Pr (y | e).Therefore, a small absolute change in the meta parameter
#x|u will have a small effecton the absolute change in the query.
One of the implications of this result is that if we have a belief network
where queriesof interest Pr (
y | e) have extreme values, then such queries will be robust against smallchanges
in network parameters. This of course assumes that robustness is understood
to
4. This theorem and all results that follow requires that #x|u 6= 0 and #x|u
6= 1, since we can only use the
expression in Equation 2.1 under these conditions. 5. Note that we have an
exact closed form for the derivative @Pr(y | e)/@#x|u (Darwiche, 2000; Greiner,
Grove, & Schuurmans, 1997), but that form includes terms which are specific
to the given belief network.
273
Chan & Darwiche
X Y
E
Figure 5: The network used in Example 3.1.
be a small change in the absolute value of the given query. Interestingly
enough, if y is adisease which is diagnosed by finding
e--that is, the probability Pr (y | e) is quite high--then it is not surprising
that such queries would be robust against small perturbations to
network parameters. This seems to explain some of the results by Pradhan
et al. (1996),where robustness have been confirmed for queries with Pr (
y | e) >= .9.Another implication of the above result is that one has to be
careful when changing
parameters that are extreme. Such parameters are potentially very influential
and onemust handle them with care.
Therefore, the worst situation from a robustness viewpoint materializes if
one has ex-treme parameters with non-extreme queries. In such a case, the
queries can be very sensitive to small variations in the parameters. Example
3.1 Consider the network structure in Figure 5. We have two binary nodes,
Xand
Y with respective parameters t,x, t,x and t,y, t,y. We assume that E is a
deterministicbinary node where the value of
E is e iff X = Y . This dictates the following CPT for E:Pr ( e | x, y) =
1, Pr (e | x, y) = 1, Pr (e | x, y) = 0 and Pr (e | x, y) = 0. The conditionalprobability
Pr (
y | e) can be expressed using the root parameters t,x and t,y as:
Pr (y | e) = t,xt,yt,
xt,y + t,xt,y .
Since @t,x/@#x = 1 and @t,x/@#x = -1, the derivative of Pr (y | e) with respect
to the metaparameter
#x is given by:
@Pr (y | e)
@#x =
(t,xt,y + t,xt,y)t,y - t,xt,y(t,y - t,y)
(t,xt,y + t,xt,y)2
= t,yt,y(t,
xt,y + t,xt,y)2 .
This is equal to the upper bound given in Theorem 3.1:
Pr (y | e)(1 - Pr (y | e))
Pr (x)(1 - Pr (x)) =
(t,xt,y)(t,xt,y) t,xt,x(t,xt,y + t,xt,y)2
= t,yt,y(t,
xt,y + t,xt,y)2 .
Now, if we set t,x = t,y, the derivative becomes:
@Pr (y | e)
@#x =
1 4t,xt,x ,
274
When do Numbers Really Matter? and as t,x (or t,x) approaches 0, the derivative
approaches infinity. Finally, if we set t,x = t,y = s^, we have Pr (y | e)
= .5, but if we keep t,y and t,y constant and change #x from s^ to0, we get
the new result Pr (
y | e) = 0.
Example 3.1 then illustrates three points. First, it shows that the bound
in Theorem 3.1is tight, i.e. we can construct a belief network that assumes
the bound. Second, it gives an example network for which the derivative @Pr
(y | e)/@#x|u tends to infinity, and therefore wecannot bound the derivative
by any constant. Finally, it shows that an infinitesimal absolute change
in a meta parameter (changing #x from s^ to 0) can induce a non-infinitesimal
absolutechange in some query (Pr (
y | e) changes from .5 to 0). The following theorem, however,shows that this
is not possible if we consider a relative notion of change.
Theorem 3.2 Assume that #x|u <= .5 without loss of generality.6 Suppose that
Delta #x|u is aninfinitesimal change applied to the meta parameter
#x|u, leading to a change of Delta Pr (y | e)to the query Pr ( y | e). We
then have:r'r'r'r'
Delta Pr (y | e)
Pr (y | e) r'r'r'r' <= 2 r'r'r'r'r'
Delta #x|u
#x|u r'r'r'r'r' .
For a function f (x), the quantity:
lim(x-x
0)!0
(f (x) - f (x0))/f (x0)
(x - x0)/x0 ,
is typically known as the sensitivity of f to x at x0. Therefore, Theorem
3.2 shows that thesensitivity of Pr (
y | e) to #x|u is bounded.As an example application of Theorem 3.2, consider
Example 3.1 again. The change
of #x from s^ to 0 amounts to a relative change | - s^/s^| = 1. The corresponding
change ofPr (
y | e) from .5 to 0 amounts to a relative change of | - .5/.5| = 1. Hence,
the relativechange in the query is not as great from this viewpoint.
7
The relative change in Pr (y | e) may be greater than double the relative
change in #x|ufor non-infinitesimal changes because the derivative
@Pr (y | e)/@#x|u depends on the valueof #x|u (Darwiche, 2000; Jensen, 1999).
Going back to Example 3.1, if we set t,x = .5 and t,y = .01, we obtain the
result Pr (y | e) = .01. If we now increase #x to .6, a relative changeof
20%, we get the new result Pr (
y | e) = 0.0149, a relative change of 49%, which is morethan double of the
relative change in
#x.The question now is: Suppose that we change a meta parameter
#x|u by an arbitraryamount (not an infinitesimal amount), what can we say
about the corresponding change in
the query Pr (y | e)? We have the following result. Theorem 3.3 Let O(x |
u) denote the odds of x given u: O(x | u) = Pr (x | u)/(1 - Pr (x | u)),
and let O(y | e) denote the odds of y given e: O(y | e) = Pr (y | e)/(1 -
Pr (y | e)).Let
O0(x | u) and O0(y | e) denote these odds after having applied an arbitrary
change to
6. For a binary variable X, if #x|u > .5, we can instead choose the meta
parameter #x|u without loss of
generality. 7. If we consider the meta parameter #x = 1 - s^ instead, the
relative change in #x will then amount to
s^/(1 - s^). But Theorem 3.2 will not be applicable in this case (assuming
that s^ is close to 0) since the theorem requires that the chosen meta parameter
be no greater than .5.
275
Chan & Darwiche the meta parameter #x|u where X is a binary variable in a
belief network. If the change ispositive, then:
O(x | u) O0(x | u) <=
O0(y | e)
O(y | e) <=
O0(x | u)
O(x | u) ; or if it is negative, then:
O0(x | u)
O(x | u) <=
O0(y | e)
O(y | e) <=
O(x | u) O0(x | u) .
Combining both results, we have:
| ln(O0(y | e)) - ln(O(y | e))| <= | ln(O0(x | u)) - ln(O(x | u))|. Theorem
3.3 means that the relative change in the odds of y given e is bounded by
therelative change in the odds of
x given u, if X is a binary variable.8 Note that the resultmakes no assumptions
whatsoever about the structure of the given belief network.
To illustrate this theorem, we go back to Example 2.1. We intend to increase
theposterior Pr (fire |
e) from .25 to .5, for e = smoke, report . The log-odds change for thequery
is thus Delta lo(Pr (y | e)) = | ln(O0(y | e)) - ln(O(y | e))| = 1.1. There
were fiverecommendations made by
SamIam and we can calculate the log-odds change, Delta lo(#x|u) =| ln( O0(x
| u)) - ln(O(x | u))| for each parameter change:
1. increase the prior on fire to >= .03 (from .01): Delta lo(#x|u) = 1.1;
2. increase the prior on tampering to >= .80 (from .02): Delta lo(#x|u)
= 5.3; 3. decrease Pr (smoke | fire) to <= .003 (from .01): Delta lo(#x|u)
= 1.2; 4. increase Pr (leaving | alarm) to >= .923 (from .001): Delta lo(#x|u)
= 9.4; 5. increase Pr (report | leaving) to >= .776 (from .01): Delta lo(#x|u)
= 5.8. Therefore, we can see that all the recommended parameter changes satisfy
Theorem 3.3,i.e. the log-odds change of the query is bounded by the log-odds
change of the parameter.
An interesting special case of Theorem 3.3 is when X is a root node and X
= Y . Frombasic probability theory, we have:
O(x | e) = O(x) Pr (e | x)Pr (e | x) . As the ratio Pr (e | x)/Pr (e | x)
is independent of Pr (x), the ratio O(x | e)/O(x) is alsoindependent of this
prior. Therefore, we can conclude that:
O0(x | e)
O(x | e) =
O0(x)
O(x) . (4)
This means we can find the exact amount of change needed for a meta parameter
#x inorder to induce a particular change on the query Pr (
x | e). There is no need to use themore expensive technique of Section 2
in this case.
8. We recently expanded our results to multi-valued variables, where we arbitrarily
change parameters
t,x|u to new values t,0x|u, for all values x. The resulting bound is: | ln(O0(y
| e)) - ln(O(y | e))| <=
ln(maxx t,0x|u/t,x|u) - ln(minx t,0x|u/t,x|u) (Chan & Darwiche, 2002).
276
When do Numbers Really Matter? Example 3.2 Consider the network in Figure
3. Suppose that e = report, smoke. Cur-rently, Pr (tampering) =
.02 and Pr (tampering | e) = .50. We wish to increase the condi-tional probability
to .65. We can compute the new prior probability Pr 0(tampering) usingEquation
4:
.65/.35 .50/.50 =
Pr 0(tampering)/(1 - Pr 0(tampering))
.02/.98 ,
giving us Pr 0(tampering) = .036, which is equal to the result we obtained
using SamIamin Section 1. Both the changes to Pr (tampering) and Pr (tampering
|
e) bring a log-oddsdifference of .616.
Theorem 3.3 has a number of implications. First, given a particular query
Pr (y | e) anda meta parameter
#x|u, it can be used to bound the effect that a change in #x|u will have
onthe query Pr ( y | e). Going back to Example 3.2, we may wish to know what
is the impacton other conditional probabilities if we apply the change making
Pr 0(tampering) =
.036.The log-odds changes for all conditional probabilities in the network
will be bounded by
.616. For example, currently Pr (fire | e) = .029. Using Theorem 3.3, we
can find the rangeof the new conditional probability value Pr 0(fire |
e):r'r'r'r'
r'ln A~
Pr 0(fire | e) 1 - P r0(fire | e) ! - ln t,
.029 .971 u""r'r'r'r'r' <= .616,
giving us the range .016 <= Pr 0(fire | e) <= .053. The exact value of Pr
0(fire | e), obtainedby inference, is
.021, which is within the computed bounds.Second, Theorem 3.3 can be used
to efficiently approximate solutions to the
Differenceand Ratio problems we discussed in Section 2. That is, given a
desirable change in thevalue of query Pr (
y | e), we can use Theorem 3.3 to immediately compute a lower boundon the
minimum change to meta parameter
#x|u needed to induce the change. This methodcan be applied in constant time
and can serve as a preliminary recommendation, as the
method proposed in Section 2 is much more expensive computationally.Third,
suppose that
SamIam was used to recommend parameter changes that wouldinduce a desirable
change on a given query. Suppose further that
SamIam returned anumber of such changes, each of which is capable of inducing
the necessary change. The
question is: which one of these changes should we adopt? The main principle
appliedin these situations is to adopt a "minimal" change. But what is minimal
in this case? As Theorem 3.3 reveals, a notion of minimality which is based
on the amount of absolute changecan be very misleading. Instead, it suggests
that one adopts the change that minimizes the relative change in the odds,
as other queries can be shown to be robust against such achange in a precise
sense.
For example, we are given two parameter changes, one from .1 to .15, and
another from .4 to .45. Both these changes give us the same absolute change
of .05. However, the firstchange has an log-odds change of
.462, while the second one has an log-odds change of .205.Therefore, two
parameter changes that give us the same absolute change can have different
amounts of log-odds change.On the other hand, two parameter changes that
give us the same relative change can also have different amounts of log-odds
change. For example, we are given two parameterchanges, one from
.1 to .2, and another from .2 to .4. Both these changes double the original
277
Chan & Darwiche parameter value. However, the first change has a log-odds
change of .811, while the secondone has a log-odds change of
.981.
Finally, the result can be used to obtain a better intuitive understanding
of parameterchanges that do or do not matter, a topic which we will discuss
in the next section.
4. Changes that (Don't) Matter We now return to a central question: When
do changes in network parameters matter andwhen do they not matter? As we
mentioned earlier, there have been experimental studies
investigating the robustness of belief networks against parameter changes
(Pradhan et al.,1996). But we have also shown very simple and intuitive examples
where networks can be very sensitive to small parameter changes. This calls
for a better understanding of theeffect of parameter changes on queries,
so one can intuitively sort out situations in which such changes do or do
not matter. Our goal in this section is to further develop suchan understanding
by looking more closely into some of the implications of Theorem 3.3. We
start first by highlighting the difference between this theorem and previous
results onsensitivity analysis.
4.1 Network-Specific Sensitivity Analysis One of the main differences between
our results and other sensitivity analysis approachesis that we do not need
to know the belief network, and hence, do not need to perform
inference. To clarify this difference, we compare it with the sensitivity
function approach(van der Gaag & Renooij, 2001), which computes the sensitivity
function that relates a query, f (x), and a parameter, x, in the form:
f (x) = a * x + bc * x + d , where a, b, c, d are constants that depend on
the given network and are computed byperforming inference as suggested by
van der Gaag and Renooij (2001).
Going back to Example 2.1, we can express the query Pr (fire | smoke, report)
as afunction of the parameter
x = Pr (smoke | fire). The function is given by:
f (x) = 0.0031650.9684 * x + 0.003165 , and we plot this function in Figure
6. We can see that at the current parameter value .01,the query value is
.25, but if we decrease it to .003, the query value increases to .5, whichis
one of the suggested parameter changes by
SamIam.
However, we can find a bound on the relations between the query and the parameterusing
Theorem 3.3, without doing inference on the network (and without knowing
the
network). For example, by changing the current parameter value from .01 to
.003, the newquery value will be within the bounds of
.09 and .53. On the other hand, if we want thequery value to increase to
.5, we have to at least decrease the parameter value from .01 to .003, or
increase it to .03.
278
When do Numbers Really Matter? 0.2 0.4 0.6 0.8 1 Pr(smoke,fire
___)
0.1
0.2 0.3 0.4 0.5
Pr(fire|smoke,report___ )
0.005 0.01 0.015 0.02 Pr(smoke,fire
___)
0.2
0.4 0.6 0.8
1 Pr(fire|smoke,report___ )
Figure 6: The plot of the query Pr (fire | smoke, report ) against the parameter
Pr (smoke |fire). The second graph shows a magnification of the first graph
for the region
where Pr (smoke | fire) is between 0 and .02.
4.2 Assuring Query Robustness One of the important issues we have yet to
settle is: "What does it mean for a parameterchange to not matter?" One can
think of at least three definitions. First, the absolute
change in the probability Pr (y | e) is small. Second, the relative change
in the probabilityPr (
y | e) is small. Third, relative change in the odds O(y | e) is small. The
first notion isthe one most prevalent in the literature, so we shall adopt
it in the rest of this section.
Suppose we have a belief network for a diagnostic application and suppose
we are con-cerned about the robustness of the query Pr (
y | e) with respect to changes in networkparameters. In this application,
y is a particular disease and e is a particular finding whichpredicts the
disease, with Pr ( y | e) = .9. Let us define robustness in this case to
be anabsolute change of no more than
.05 to the given query. Now, let X be a binary variable inthe network and
let us ask: What kind of changes to the parameters on
X are guaranteedto keep the query within the desirable range? We can use
Theorem 3.3 easily to answer this
question. First, if we are changing a parameter by s', and if we want the
value of the queryto remain <=
.95, we must ensure that:
| ln((p + s')/(1 - p - s')) - ln(p/(1 - p))| <= | ln(.95/.05) - ln(.9/.1)|
= .7472, where p is the current value of the parameter. Similarly, if we
want to ensure that the queryremains >=
.85, we want to ensure that:
| ln((p + s')/(1 - p - s')) - ln(p/(1 - p))| <= | ln(.85/.15) - ln(.9/.1)|
= .4626. Figure 7 plots the permissible change s' as a function of p, the
current value of theparameter. The main point to observe here is that the
amount of permissible change
depends on the current value of p, with smaller changes allowed for extreme
values of p.It is also interesting to note that it is easier to guarantee
the query to stay <=
.95 than toguarantee that it stays >= .85. In general, it is more likely
for a parameter change to reduce
279
Chan & Darwiche
0.2 0.4 0.6 0.8 1 p -0.15
-0.1 -0.05
0.05
0.1 0.15
d
Figure 7: The amount of parameter change s' that would guarantee the query
Pr (y | e) = .9to stay within the interval [
.85, .95], as a function of the current parameter value p. The outer envelope
guarantees the query to remain <= .95, while the innerenvelope guarantees
the query to remain >=
.85.
0.2 0.4 0.6 0.8 1 p -0.04 -0.02
0.02 0.04
d
Figure 8: The amount of parameter change s' that would guarantee the query
Pr (y | e) = .6to stay within the interval [
.55, .65], as a function of the current parameter value p. The outer envelope
guarantees the query to remain <= .65, while the innerenvelope guarantees
the query to stay in >=
.55.
the value of a query which is close to 1 (and to increase the value of a
query which is closeto 0). Finally, if we are increasing the parameter, then
a parameter value close to
.4 willallow the biggest absolute change. But if we are decreasing the parameter,
then a value
close to .6 will allow the biggest absolute change.
Now let us repeat the same exercise but assuming that the initial value of
the queryis Pr (
y | e) = .6, yet insisting on the same measure of robustness. Figure 8 plots
the
280
When do Numbers Really Matter?
0.2
0.4
0.6
0.8Pr(x| u)
0.2
0.4
0.6
0.8
Pr'(x| u) 0 2 4 6 8
Delta lo
0.2
0.4
0.6
0.8Pr(x| u)
Figure 9: The plot of the log-odd difference, Delta lo = | ln(O0(x | u))
- ln(O(x | u))|, againstPr (
x | u) and Pr 0(x | u).
0.2 0.4 0.6 0.8 1 p
c' 1
2 3 4 5 6
Delta lo
0.2 0.4 0.6 0.8 1 p
c'
1
2 3 4
Delta lo
0.2 0.4 0.6 0.8 1 p
c' 1
2 3 4 5 6
Delta lo
Figure 10: The plots of the log-odd difference, Delta lo = | ln(O0(x | u))
- ln(O(x | u))|, againstthe new parameter value
p0 = Pr 0(x | u). The figures correspond to differentinitial values of the
parameter,
p = Pr (x | u) = .1, .5, .9, respectively.
permissible changes s' as a function of p, the current value of the parameter.
Again, theamount of permissible change becomes smaller as the probability
p approaches 0 or 1. Theother main point to emphasize is that the permissible
changes are now much smaller than
in the previous example, since the initial value of the query is not as extreme.
Therefore,this query is much less robust than the previous one.
More generally, Figure 9 plots the log-odds difference, | ln(O0(x | u)) -
ln(O(x | u))|,against Pr (
x | u) = p and Pr 0(x | u) = p + s', and Figure 10 shows cross-sections of
Figure 9for three different values of
p. Again, the plots explain analytically why we can afford moreabsolute changes
to non-extreme probabilities (Pradhan et al., 1996; Poole, 1998).
281
Chan & Darwiche From Figure 10, we also notice that although the plot is
symmetric for p = .5, it is notfor both
p = .1 and p = .9, i.e. absolute changes of Delta p and -Delta p give us
different amountsof log-odds change. For example, changing the parameter
from
.1 to .05 give us a largerlog-odds change than changing the parameter from
.1 to .15. We also notice that the plotsfor p = .1 and p = .9 are mirror
images of each other. Therefore, the log-odds change is thesame for complementary
parameter changes on
t,x|u and t,x|u.
We close this section by emphasizing that the above figures identify parameter
changesthat guarantee keeping queries within certain ranges. However, if
the belief network has
specific properties, such as a specific topology, then it is possible for
the query to be robustagainst parameter changes that are outside the identified
bounds.
5. Conclusion In this paper, we presented an efficient technique for fine-tuning
the parameters of a beliefnetwork. The technique suggests minimal changes
to network parameters which ensure that certain constraints are enforced
on probabilistic queries. Based on this technique, wehave experimented with
some belief networks, only to find out that these networks are more sensitive
to parameter changes than previous experimental studies seem to suggest.This
observation leads us to an analytic study on the effect of parameter changes,
with the aim of characterizing situations under which parameter changes do
or do not matter. Wehave reported on a number of results in this direction.
Our central result shows that belief networks are robust in a very specific
sense: the relative change in query odds is boundedby the relative change
in the parameter odds. A closer look at this result, its meaning, and its
implications provides interesting characterizations of parameter changes
that door do not matter, and explains analytically some of the previous experimental
results and observations on this matter.
Acknowledgments A shorter version of this paper appeared in Proceedings of
the 17th Conference on Uncer-tainty in Artificial Intelligence (UAI-01),
pp. 65-74. This work has been partially supported by NSF grant IIS-9988543,
MURI grant N00014-00-1-0617, and by DiMI grant 00-10065.
Appendix A. Proofs Theorem 2.1 The derivative of Pr (e) with respect to the
meta parameter #x|u is givenby:
@Pr (e)
@#x|u =
Pr (e, x, u)
t,x|u -
Pr (e, x, u)
t,x|u ,
when t,x|u 6= 0 and t,x|u 6= 0.
282
When do Numbers Really Matter? Proof From Russell et al. (1995), the semantics
of the first derivative of Pr (e) with respectto parameter
t,x|u is given by:9
@Pr (e)
@t,x|u =
Pr (e, x, u)
t,x|u ,
if t,x|u 6= 0, and:
@Pr (e)
@t,x|u =
Pr (e, x, u)
t,x|u ,
if t,x|u 6= 0. Because t,x|u = #x|u and t,x|u = 1 - #x|u, we have:
@Pr (e)
@#x|u =
@Pr (e)
@t,x|u -
@Pr (e)
@t,x|u
= Pr (e, x, u)t,
x|u -
Pr (e, x, u)
t,x|u ,
if t,x|u 6= 0 and t,x|u 6= 0.2 Theorem 3.1 If X is a binary variable in a
belief network, then:r'r'r'r'r'
@Pr (y | e)
@#x|u r'r'r'r'r' <=
Pr (y | e)(1 - Pr (y | e)) Pr (x | u)(1 - Pr (x | u)) .
Proof From Darwiche (2000), the derivative @Pr (y | e)/@t,x|u is equal to:
@Pr (y | e)
@t,x|u =
Pr (y, x, u | e) - Pr (y | e)Pr (x, u | e)
Pr (x | u) .
Since: @Pr (y | e)
@#x|u =
@Pr (y | e)
@t,x|u -
@Pr (y | e)
@t,x|u ,
we have:
@Pr (y | e)
@#x|u
= Pr (y, x, u | e) - Pr (y | e)Pr (x, u | e)Pr (x | u) - Pr (y, x, u | e)
- Pr (y | e)Pr (x, u | e)Pr (x | u)
= Pr (y, x, u | e) - Pr (y | e)Pr (x, u | e) - Pr (x | u)(Pr (y, u | e) -
Pr (y | e)Pr (u | e))Pr (x | u)(1 - Pr (x | u)) .
In order to find an upper bound on the derivative, we would like to bound
the termPr (
y, x, u | e)-Pr (y | e)Pr (x, u | e). Since, Pr (y, x, u, e) <= Pr (y, u,
e) and Pr (y, x, u, e) <=Pr ( x, u, e), we have:
Pr (y, x, u | e) - Pr (y | e)Pr (x, u | e) <= Pr (y, x, u | e) - Pr (y |
e)Pr (y, x, u | e)
= Pr (y, x, u | e)Pr (y | e)<=
Pr (y, u | e)Pr (y | e).
9. We allow the notations @Pr(e)/@t,x|u and @Pr(e)/@t,x|u by assuming Pr(e)
as functions of t,x|u and t,x|u,
even though it is not allowed in belief networks to change only t,x|u or
t,x|u.
283
Chan & Darwiche Therefore, the upper bound on the derivative is given by:
@Pr (y | e)
@#x|u <=
Pr (y, u | e)Pr (y | e) - Pr (x | u)(Pr (y, u | e) - Pr (y | e)Pr (u | e))
Pr (x | u)(1 - Pr (x | u)) ,
which is equal to the following term:
Pr (y | e)Pr (y, u | e)
Pr (x | u) +
Pr (y | e)Pr (y, u | e)
1 - Pr (x | u)
= (1 - Pr (x | u))Pr (y | e)Pr (y, u | e) + Pr (x | u)Pr (y | e)Pr (y, u
| e)Pr (x | u)(1 - Pr (x | u))
= Pr (y, u | e)Pr (y | e) - Pr (x | u)(P r(y, u | e) - Pr (y | e)Pr (u |
e))Pr (x | u)(1 - Pr (x | u)) . Since Pr (y, u | e) <= Pr (y | e) and Pr
(y, u | e) <= Pr (y | e), the upper bound on thederivative is given by:
@Pr (y | e)
@#x|u <=
Pr (y | e)Pr (y, u | e)
Pr (x | u) +
Pr (y | e)Pr (y, u | e)
1 - Pr (x | u)
<= Pr (y | e)Pr (y | e)Pr (x | u) + Pr (y | e)Pr (y | e)1 - Pr (x | u)
= Pr (y | e)(1 - Pr (y | e))Pr (x | u)(1 - Pr (x | u)) . In order to find
a lower bound on the derivative, we note that Pr (y | e) = 1 - Pr (y | e),and
thus
@Pr (y | e)/@#x|u = -@Pr (y | e)/@#x|u. Therefore, we can get our lower bound
byfinding the upper bound on the derivative
@Pr (y | e)/@#x|u and multiplying by -1:
@Pr (y | e)
@#x|u >= -
Pr (y | e)(1 - Pr (y | e)) Pr (x | u)(1 - Pr (x | u))
= - Pr (y | e)(1 - Pr (y | e))Pr (x | u)(1 - Pr (x | u)) .
Combining the upper bound and the lower bound, we have:r'r'r'r'
r'
@Pr (y | e)
@#x|u r'r'r'r'r' <=
Pr (y | e)(1 - Pr (y | e)) Pr (x | u)(1 - Pr (x | u)) .2
Theorem 3.2 Assume that #x|u <= .5 without loss of generality. Suppose that
Delta #x|u is aninfinitesimal change applied to the meta parameter
#x|u, leading to a change of Delta Pr (y | e)to the query Pr ( y | e). We
then have:r'r'r'r'
Delta Pr (y | e)
Pr (y | e) r'r'r'r' <= 2 r'r'r'r'r'
Delta #x|u
#x|u r'r'r'r'r' .
284
When do Numbers Really Matter? Proof Because Delta #x|u is infinitesimal,
from Theorem 3.1:r'r'r'r'
r'
Delta Pr (y | e)
Delta #x|u r'r'r'r'r' ' r'r'r'r'r'
@Pr (y | e)
@#x|u r'r'r'r'r'
<= Pr (y | e)(1 - Pr (y | e))Pr (x | u)(1 - Pr (x | u)) .
Arranging the terms, we have:r'r'r'r'
Delta Pr (y | e)
Pr (y | e) r'r'r'r' <=
1 - Pr (y | e) 1 - Pr (x | u) r'r'r'r'r'
Delta #x|u
#x|u r'r'r'r'r'
<= 1.5 r'r'r'r'r' Delta #x|u#
x|u r'r'r'r'r'
= 2 r'r'r'r'r' Delta #x|u#
x|u r'r'r'r'r' ,
since Pr (x | u) = #x|u <= .5.2 Theorem 3.3 Let O(x | u) denote the odds
of x given u: O(x | u) = Pr (x | u)/(1-Pr (x | u)), and let O(y | e) denote
the odds of y given e: O(y | e) = Pr (y | e)/(1 - Pr (y | e)).Let
O0(x | u) and O0(y | e) denote these odds after having applied an arbitrary
change tothe meta parameter
#x|u where X is a binary variable in a belief network. If the change ispositive,
then:
O(x | u) O0(x | u) <=
O0(y | e)
O(y | e) <=
O0(x | u)
O(x | u) ;
or if it is negative, then:
O0(x | u)
O(x | u) <=
O0(y | e)
O(y | e) <=
O(x | u) O0(x | u) .
Combining both results, we have:
| ln(O0(y | e)) - ln(O(y | e))| <= | ln(O0(x | u)) - ln(O(x | u))|.
Proof We obtain this result by integrating the bound in Theorem 3.1. In particular,
ifwe change
#x|u to # 0x|u > #x|u, and consequently Pr (y | e) changes to Pr 0(y | e),
we canseparate the variables in the upper bound on the derivative in Theorem
3.1, integrate over
the intervals, and yield:Z
Pr0(y|e) Pr(y|e)
dPr (y | e) Pr (y | e)(1 - Pr (y | e)) <= Z
#0x|u #x|u
d#x|u #x|u(1 - #x|u) .
This gives us the solution:
ln(Pr 0(y | e)) - ln(Pr (y | e)) - ln(1 - Pr 0(y | e)) + ln(1 - Pr (y | e))<=
ln(# 0x|u) - ln(#x|u) - ln(1 - # 0x|u) + ln(1 - #x|u),
285
Chan & Darwiche and after taking exponentials, we have:
Pr 0(y | e)/(1 - Pr 0(y | e))
Pr (y | e)/(1 - Pr (y | e)) <=
# 0x|u/(1 - # 0x|u) #x|u/(1 - #x|u) ,
which is equivalent to:
O0(y | e)
O(y | e) <=
O0(x | u)
O(x | u) .
Similarly, we can separate the variables in the lower bound on the derivative
in Theo-rem 3.1, integrate over the intervals, and yield:Z
Pr0(y|e) Pr(y|e)
dPr (y | e) Pr (y | e)(1 - Pr (y | e)) >= - Z
#0x|u #x|u
d#x|u #x|u(1 - #x|u) .
This gives us the solution:
ln(Pr 0(y | e)) - ln(Pr (y | e)) - ln(1 - Pr 0(y | e)) + ln(1 - Pr (y | e))>=
-
ln(# 0x|u) + ln(#x|u) + ln(1 - # 0x|u) - ln(1 - #x|u),
and after taking exponentials, we have:
Pr 0(y | e)/(1 - Pr 0(y | e))
Pr (y | e)/(1 - Pr (y | e)) >=
#x|u/(1 - #x|u) # 0x|u/(1 - # 0x|u) ,
which is equivalent to:
O0(y | e)
O(y | e) >=
O(x | u) O0(x | u) .
Therefore, we have the following inequality if # 0x|u > #x|u:
O(x | u) O0(x | u) <=
O0(y | e)
O(y | e) <=
O0(x | u)
O(x | u) .
On the other hand, if we now change #x|u to # 0x|u < #x|u, we can instead
integrate from # 0x|u to #x|u. The integrals will satisfy these two inequalities:Z
Pr(y|e) Pr0(y|e)
dPr (y | e) Pr (y | e)(1 - Pr (y | e)) <= Z
#x|u #0x|u
d#x|u #x|u(1 - #x|u) ;Z Pr(y|e)
Pr0(y|e)
dPr (y | e) Pr (y | e)(1 - Pr (y | e)) >= - Z
#x|u #0x|u
d#x|u #x|u(1 - #x|u) .
We can solve for the inequalities similarly and get the result:
O0(x | u)
O(x | u) <=
O0(y | e)
O(y | e) <=
O(x | u) O0(x | u) .
Combining the results for both # 0x|u > #x|u and # 0x|u < #x|u, we have:
| ln(O0(y | e)) - ln(O(y | e))| <= | ln(O0(x | u)) - ln(O(x | u))|.2
286
When do Numbers Really Matter? References Castillo, E., Guti'errez, J. M.,
& Hadi, A. S. (1997). Sensitivity analysis in discrete Bayesiannetworks.
IEEE Transactions on Systems, Man, and Cybernetics, 27, 412-423.
Chan, H., & Darwiche, A. (2002). A distance measure for bounding probabilistic
beliefchange. In Proceedings of the Eighteenth National Conference on Artificial
Intelligence
(AAAI), pp. 539-545. Coup'e, V. M. H., Peek, N., Ottenkamp, J., & Habbema,
J. D. F. (1999). Using sensitiv-ity analysis for efficient quantification
of a belief network. Artificial Intelligence in
Medicine, 17, 223-247. Darwiche, A. (2000). A differential approach to inference
in Bayesian networks. In Pro-ceedings of the 16th Conference on Uncertainty
in Artificial Intelligence (UAI), pp.
123-132. Greiner, R., Grove, A., & Schuurmans, D. (1997). Learning Bayesian
nets that performwell. In Proceedings of the 13th Conference on Uncertainty
in Artificial Intelligence
(UAI), pp. 198-207. Jensen, F. V., Lauritzen, S., & Olesen, K. (1990). Bayesian
updating in recursive graphicalmodels by local computation. Computational
Statistics Quarterly, 4, 269-282.
Jensen, F. V. (1999). Gradient descent training of bayesian networks. In
Proceedings of theFifth European Conference on Symbolic and Quantitative
Approaches to Reasoning
with Uncertainty (ECSQARU), pp. 190-200. Jensen, F. V. (2001). Bayesian Networks
and Decision Graphs. Springer-Verlag, Inc., NewYork.
Kjaerulff, U., & van der Gaag, L. C. (2000). Making sensitivity analysis
computationallyefficient. In Proceedings of the 16th Conference on Uncertainty
in Artificial Intelligence
(UAI), pp. 317-325. Laskey, K. B. (1995). Sensitivity analysis for probability
assessments in Bayesian networks.IEEE Transactions on Systems, Man, and Cybernetics,
25, 901-909.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks
of Plausible In-ference. Morgan Kaufmann Publishers, Inc., San Mateo, California.
Poole, D. (1998). Context-specific approximation in probabilistic inference.
In Proceedingsof the 14th Conference on Uncertainty in Artificial Intelligence
(UAI), pp. 447-454. Pradhan, M., Henrion, M., Provan, G., Del Favero, B.,
& Huang, K. (1996). The sensitivityof belief networks to imprecise probabilities:
an experimental investigation. Artificial
Intelligence, 85, 363-397. Russell, S., Binder, J., Koller, D., & Kanazawa,
K. (1995). Local learning in probabilisticnetworks with hidden variables.
In Proceedings of the Fourteenth International Joint
Conference on Artificial Intelligence (IJCAI), pp. 1146-1152. van der Gaag,
L. C., & Renooij, S. (2001). Analysing sensitivity data from probabilistic
net-works. In Proceedings of the 17th Conference on Uncertainty in Artificial
Intelligence
(UAI), pp. 530-537.
287