Bayesian inference

Table of Contents

Summary

\(P(a|b) = P(b|a)P(a)/P(b)\) is Bayes' formula ("Bayes' rule", "Bayes'
theorem"); it is just a rewrite of the rules of probability. It is
required that \(P(b) \neq 0\).

Sometimes, we only want to know if \(P(h_1|e) > P(h_2|e)\)
(probability of hypothesis 1 is greater than probability of
hypothesis 2, given the evidence). Then we only have to compare
\(\alpha P(e|h_1)P(h_1)\) vs. \(\alpha P(e|h_2)P(h_2)\), where \(\alpha =
1/P(e)\), which we never need to calculate.

\(P(h)\) is the "prior" of a hypothesis (cause/explanation) \(h\).

\(P(h|e)\) is the "posterior" of \(h\), given evidence \(e\) is observed.

Imagine building an expert system for medical diagnosis. You may
include a rule like,

hasToothache(X) :- hasCavity(X).

The problem is that not every toothache is caused by a cavity. You may
expand it thus,

Now there are three different possible causes of the toothache. Yet
still, some are missing. And cavities do not always cause
toothaches. And a person may have both a cavity and an abscess. How do
we deal with all these qualifications?

One answer is to use probabilistic reasoning. We will be able to say
that cavities cause toothaches only some percentage of the time, and
furthermore that having both a toothache and red, swollen gums makes
gum disease more likely and a cavity less likely (observing swollen
gums counts against the cavity diagnosis).

Russell and Norvig (from the textbook) provide three good reasons why
we might choose to use probabilistic reasoning rather than logic-based
reasoning for the medical domain:

Laziness: It is too much work to list the complete set of
antecedents or consequents needed to ensure an exceptionless rule
and too hard to use such rules.

Theoretical ignorance: Medical science has no complete theory
for the domain.

Practical ignorance: Even if we know all the rules, we might be
uncertain about a particular patient because not all the necessary
tests have been or can be run.

The basics

We'll use propositional logic to represent we can be true or
false. Then, with the \(P()\) notation, we'll be able to talk about the
probability of whether something is true or false.

Notation

Meaning

\(P(a)\)

The probability that \(a\) (a proposition) is true

\(P(a \wedge b)\)

The probability that both \(a\) and \(b\) are true

\(P(\neg a)\)

The probability that \(a\) is false

\(P(a \vert{} b)\)

The probability that \(a\) is true if \(b\) is assumed to be true

Rule

Explanation

\(0 \leq P(a) \leq 1\)

A probability is always between \(0\) and \(1\).

\(P(a) = 1.0 - P(\neg a)\)

The probability of something being true and the probability of the opposite add up to \(1\).

\(P(a \wedge b) = P(a \vert{} b) P(b)\)

The probability of two statements being true simultaneously equals the probability that one is true, assuming the other already is known to be true, times the probability that the other is true (i.e., no longer assuming it is).

\(P(a \vee b) = P(a) + P(b) - P(a \wedge b)\)

The probability of either of two statements being true equals the sum of the probabilities that either is true separately minus the probability they are both true simultaneously.

Causal graphs

Sometimes, like in medical diagnosis, we want to think about the
propositions as events or causes. For example,

Proposition

Interpretation

\(t\)

This person has a toothache.

\(c\)

This person has a cavity.

\(g\)

This person has gum disease.

We can specify how diseases cause symptoms:

This graph shows us that having a cavity somehow influences the chance
that a toothache is also present. This is what we expect (and that's
why I put the arrows in the graph).

This means that it should be the case that,

Suppose that…

Interpretation

\(P(t) \neq P(t \vert{} c)\)

Knowing that a person has a cavity changes the probability that the person has a toothache.

On the other hand, consider,

Suppose that…

Interpretation

\(P(r) = P(r \vert{} c)\)

Knowing that a person has a cavity does not change the probability that the person has red hair.

We will say that \(P(c|g) = P(c)\) and likewise \(P(g|c) = P(g)\) — that
is, having a cavity does not change the chance of having gum disease,
and vice versa. We claim that they are independent events.

Let's flesh out the probabilities for the toothache:

\(T\)

\(C\)

\(G\)

Probability of \(P(T \vert{} C \wedge G)\)

\(t\)

\(c\)

\(g\)

\(P(t \vert{} c \wedge g) = 1.0\) (ouch!)

\(t\)

\(c\)

\(\neg g\)

\(P(t \vert{} c \wedge \neg g) = 0.6\)

\(t\)

\(\neg c\)

\(g\)

\(P(t \vert{} \neg c \wedge g) = 0.3\)

\(t\)

\(\neg c\)

\(\neg g\)

\(P(t \vert{} \neg c \wedge \neg g) = 0.05\)

\(\neg t\)

…

…

(just \(1.0\) minus the other rows)

We'll also need to know the chance of having a cavity and, separately,
the chance of having gum disease:

\(P(c) = 0.10\)

\(P(g) = 0.05\)

To calculate \(P(t)\), that is, the probability of having a toothache
for whatever reason, we have to "condition" across all the possible
causes:

Notice the common term, \(P(t)\). This means if we only want to figure
out which is more likely, cavity or gum disease, given that the person
has a toothache, we don't care how common toothaches are in general
(\(P(t)\)).

Thus, we often write \(\alpha\) for the denominator and just never
calculate it:

Here's another example. This one models the causes of a possible
report of a fire alarm and a possible report of smoke.

Table for \(report\) ("did somebody report an alarm?"):

\(report\)

\(leaving\)

\(P(report \vert{} leaving)\)

true

true

\(0.75\)

true

false

\(0.01\)

Table for \(leaving\) ("are people leaving the building?"):

\(leaving\)

\(alarm\)

\(P(leaving \vert{} alarm)\)

true

true

\(0.88\)

true

false

\(0.0\)

Table for \(alarm\) ("is there a fire alarm sounding?"):

\(alarm\)

\(tampering\)

\(fire\)

\(P(alarm \vert{} tampering \wedge fire)\)

true

true

true

\(0.50\)

true

true

false

\(0.85\)

true

false

true

\(0.99\)

true

false

false

\(0.0\)

Table for \(tampering\) ("did somebody tamper with the fire alarm?"):

\(tampering\)

\(P(tampering)\)

true

\(0.02\)

Table for \(fire\) ("is there a fire?"):

\(fire\)

\(P(fire)\)

true

\(0.01\)

Table for \(smoke\) ("is there smoke?"):

\(smoke\)

\(fire\)

\(P(smoke \vert{} fire)\)

true

true

\(0.90\)

true

false

\(0.01\)

Now, suppose there is a fire and the alarm was not tampered with. What
is the probability that somebody will report a fire? Notice that
people have to leave the building before somebody will report the
fire.