Bayes' theorem can be a muddle without visual representation-as so often in maths. Why not use probability squares or probability trees for Bayesian probabilities? When the new data comes in, it shuts off some of the sample space (e.g. being tested positive for a disease shuts off being tested negative). Then the sample space becomes only a sub-set of the probabilities-being tested positive, perhaps- and one considers solely this. The difficulty I have is applying Bayes to probability distributions instead of discrete probabilities. The maths is hellishly terrible!
–
user56901Oct 3 '14 at 15:35

8 Answers
8

Bayes' theorem is a relatively simple, but fundamental result of probability theory that allows for the calculation of certain conditional probabilities. Conditional probabilities are just those probabilities that reflect the influence of one event on the probability of another.

Simply put, in its most famous form, it states that the probability of a hypothesis given new data (P(H|D); called the posterior probability) is equal to the following equation: the probability of the observed data given the hypothesis (P(D|H); called the conditional probability), times the probability of the theory being true prior to new evidence (P(H); called the prior probability of H), divided by the probability of seeing that data, period (P(D); called the marginal probability of D).

Formally, the equation looks like this:

The significance of Bayes theorem is largely due to its proper use being a point of contention between schools of thought on probability. To a subjective Bayesian (that interprets probability as being subjective degrees of belief) Bayes' theorem provides the cornerstone for theory testing, theory selection and other practices, by plugging their subjective probability judgments into the equation, and running with it. To a frequentist (that interprets probability as limiting relative frequencies), this use of Bayes' theorem is an abuse, and they strive to instead use meaningful (non-subjective) priors (as do objective Bayesians under yet another interpretation of probability).

good answer. I have a small quibble: the use of the word "subjective" and "objective" is not quite appropriate, because no methods are "objective". I'd say more the frequentist and "objective" Bayesians simply derive their probability distributions by using certain rules or standards. So rather than tayloring for the specific case at hand, a frequentist/objective Bayesian will apply "default" choices (thus hiding their subjectivity).
–
probabilityislogicMay 24 '11 at 12:02

If you're measuring something real-valued (say the height of children aged 6), then what is P(D)? Is it the pdf of the data? In which case do you just calculate the posterior point-wise, like this: $P(x|H|D) = \frac{P(x|D|H)P(x|H)}{P(x|D)}$?
–
naught101Aug 7 '12 at 4:30

I'm sorry, but there seems to be some confusion here:
Bayes' theorem is not up for discussion of the neverending Bayesian-Frequentist debate. It is a theorem that is consistent with both schools of thought (given that it is consistent with Kolmogorov's probability axioms).

Of course, Bayes' theorem is the core of Bayesian statistics, but the theorem itself is universal. The clash between frequentists and Bayesians mostly pertains to how prior distributions can be defined or not.

So, if the question is about Bayes' theorem (and not Bayesian statistics):

Bayes' theorem defines how one can calculate specific conditional probabilities. Imagine for instance that you know: the probability of somebody having symptom A, given that they have disease X p(A|X); the probability of somebody in general having disease X p(X); the probability of somebody in general having symptom A p(A). with these 3 pieces of information you can calculate the probability of somebody having disease X, given that they have sympotm A p(X|A).

I disagree in part with your initial paragraph because the questions asks about the concept of Bayes theorem. The Frequentist-Bayesian debate is relevant to this part of the question. The Kolmogorov axioms do not give Bayes theorem the same conceptual importance as the "probability as extended logic" axioms do.
–
probabilityislogicApr 14 '11 at 3:31

Bayes' theorem is a way to rotate a conditional probability $P(A|B)$ to another conditional probability $P(B|A)$.

A stumbling block for some is the meaning of $P(B|A)$. This is a way to reduce the space of possible events by considering only those events where $A$ definitely happens (or is true). So for instance the probability that a thrown, fair, dice lands showing six, $P(\mbox{dice lands six})$, is 1/6, however the probability that a dice lands six given that it landed an even number, $P(\mbox{dice lands six}|\mbox{dice lands even})$, is 1/3.

You can derive Bayes' theorem yourself as follows. Start with the ratio definition of a conditional probability:

$P(B|A) = \frac{P(AB)}{P(A)}$

where $P(AB)$ is the joint probability of $A$ and $B$ and $P(A)$ is the marginal probability of $A$.

Currently the formula makes no reference to $P(A|B)$, so let's write down the definition of this too:

$P(A|B) = \frac{P(BA)}{P(B)}$

The little trick for making this work is seeing that $P(AB) = P(BA)$ (since a Boolean algebra is underneath all of this, you can easily prove this with a truth table by showing $AB = BA$), so we can write:

$P(A|B) = \frac{P(AB)}{P(B)}$

Now to slot this into the formula for $P(B|A)$, just rewrite the formula above so $P(AB)$ is on the left:

$P(AB) = P(A|B)P(B)$

and hey presto:

$P(B|A) = \frac{P(A|B)P(B)}{P(A)}$

As for what the point is to rotating a conditional probability in this way, consider the common example of trying to infer the probability that someone has a disease given that they have a symptom, i.e., we know that they have a symptom - we can just see it - but we cannot be certain whether they have a disease and have to infer it. I'll start with the formula and work back.

So to work it out, you need to know the prior probability of the symptom, the prior probability of the disease (i.e., how common or rare are the symptom and disease) and also the probability that someone has a symptom given we know someone has a disease (e.g., via expensive time consuming lab tests).

It can get a lot more complicated than this, e.g., if you have multiple diseases and symptoms, but the idea is the same. Even more generally, Bayes' theorem often makes an appearance if you have a probability theory of relationships between causes (e.g., diseases) and effects (e.g., symptoms) and you need to reason backwards (e.g., you see some symptoms from which you want to infer the underlying disease).

Bayes theorem is to do with the latter and can be seen as a way of understanding how the probability that a theory is true is affected by a new piece of evidence. This is known as conditional probability. You might want to look at this to get a handle on the math.

Let me give you a very very intuitional insight. Suppose you are tossing a coin 10 times and you get 8 heads and 2 tails. The question that would come to your mind is whether this coin is biased towards heads or not.

Now if you go by conventional definitions or the frequentist approach of probability you might say that the coin is unbiased and this is an exceptional occurrence. Hence you would conclude that the possibility of getting a head next toss is also 50%.

But suppose you are a Bayesian. You would actually think that since you have got exceptionally high number of heads, the coin has a bias towards the head side. There are methods to calculate this possible bias. You would calculate them and then when you toss the coin next time, you would definitely call a heads.

So, Bayesian probability is about the belief that you develop based on the data you observe. I hope that was simple enough.

Of course, there is more data in a coin toss than just the result - A sensible bayesian will still probably bet even, because of the weight of past data, and because the coin and coin flip looks fair. Unless, perhaps, you can't see the coin, or the coin being flipped. In which case you don't even know if the data isn't just forged to start with, and you may as well toss your priors out the window...
–
naught101Aug 7 '12 at 4:27

Bayes' theorem relates two ideas: probability and likelihood. Probability says: given this model, these are the outcomes. So: given a fair coin, I'll get heads 50% of the time. Likelihood says: given these outcomes, this is what we can say about the model. So: if you toss a coin 100 times and get 88 heads (to pick up on a previous example and make it more extreme), then the likelihood that the fair coin model is correct is not so high.

One of the standard examples used to illustrate Bayes' theorem is the idea of testing for a disease: if you take a test that's 95% accurate for a disease that 1 in 10000 of the population have, and you test positive, what are the chances that you have the disease?

The naive answer is 95%, but this ignores the issue that 5% of the tests on 9999 out of 10000 people will give a false positive. So your odds of having the disease are far lower than 95%.

My use of the vague phrase "what are the chances" is deliberate. To use the probability/likelihood language: the probability that the test is accurate is 95%, but what you want to know is the likelihood that you have the disease.

Slightly off topic: The other classic example which Bayes theorem is used to solve in all the textbooks is the Monty Hall problem: You're on a quiz show. There is a prize behind one of three doors. You choose door one. The host opens door three to reveal no prize. Should you change to door two given the chance?

I like the rewording of the question (courtesy of the reference below): you're on a quiz show. There is a prize behind one of a million doors. You choose door one. The host opens all the other doors except door 104632 to reveal no prize. Should you change to door 104632?

My favourite book which discusses Bayes' theorem, very much from the Bayesian perspective, is "Information Theory, Inference and Learning Algorithms ", by David J. C. MacKay. It's a Cambridge University Press book, ISBN-13: 9780521642989. My answer is (I hope) a distillation of the kind of discussions made in the book. (Usual rules apply: I have no affiliations with the author, I just like the book).

Bayes theorem in its most obvious form is simply a re-statement of two things:

the joint probability is symmetric in its arguments $P(HD|I)=P(DH|I)$

the product rule $P(HD|I)=P(H|I)P(D|HI)$

So by using the symmetry:

$$P(HD|I)=P(H|I)P(D|HI)=P(D|I)P(H|DI)$$

Now if $P(D|I) \neq 0$ you can divide both sides by $P(D|I)$ to get:

$$P(H|DI)=P(H|I)\frac{P(D|HI)}{P(D|I)}$$

So this is it? How can something so simple be so awesome? As with most things "its the journey that's more important than the destination". Bayes theorem rocks because of the arguments that lead to it.

What is missing from this is that the product rule and sum rule $P(H|I)=1-P(\overline{H}|I)$, can be derived using deductive logic based on axioms of consistent reasoning.

Now the "rule" in deductive logic is that if you have a relationship "A implies B" then you also have "Not B implies Not A". So we have "consistent reasoning implies Bayes theorem". This means "Not Bayes theorem implies Not consistent reasoning". i.e. if your result isn't equivalent to a Bayesian result for some prior and likelihood then you are reasoning inconsistently.

This result is called Cox's theorem and was proved in "Algebra of Probable inference" in the 1940's. A more recent derivation is given in Proability theory: The logic of science.

The essence of the Bayesian approach is to provide a mathematical rule explaining how you should change your existing beliefs in the light of new evidence. In other words, it allows scientists to combine new data with their existing knowledge or expertise. The canonical example is to imagine that a precocious newborn observes his first sunset, and wonders whether the sun will rise again or not. He assigns equal prior probabilities to both possible outcomes, and represents this by placing one white and one black marble into a bag. The following day, when the sun rises, the child places another white marble in the bag. The probability that a marble plucked randomly from the bag will be white (ie, the child's degree of belief in future sunrises) has thus gone from a half to two-thirds. After sunrise the next day, the child adds another white marble, and the probability (and thus the degree of belief) goes from two-thirds to three-quarters. And so on. Gradually, the initial belief that the sun is just as likely as not to rise each morning is modified to become a near-certainty that the sun will always rise.