Post navigation

Abusing Bayes

I am hoping that some members here are familiar with Bayes’ Theorem and willing to share their knowledge or at the very least interested enough in the topic to do some research and share their opinions.

– What is Bayes Theorem
– What can it tell us
– How does it work
– Can Bayes’ Theorem be abused and if so how

As evidence that Bayes’ Theroem can be abused I offer the following book:

The blurb on Amazon says: …a math equation developed more than 200 years ago by noted European philosopher Thomas Bayes can be used to calculate the probability that God exists.

This book is written by an author with a PhD in theoretical physics, but I think this book is nonsense and an abuse of Bayes’ Theorem and I would like to know what others think and why.

And closer to home I offer the following:

At least one prominent poster here believes that if we observe an eye we can form an opinion about how easy it is to evolve an eye, and that if we observe many eyes in different lineages that we can conclude that it is easy to evolve an eye, and that Bayes’ Theorem supports that conclusion. I think this is nonsense and an abuse of Bayes’ Theorem and I would like to know what others think and why.

I’ve done some reading on Bayes’ Theorem and hope to use this thread as both means and motivation to learn more about it and to expand on the OP as we go along.

79 Replies to “Abusing Bayes”

This book is written by an author with a PhD in theoretical physics, but I think this book is nonsense and an abuse of Bayes’ Theorem and I would like to know what others think and why.

Yes, on its face it appears to be nonsense.

In all honesty, I don’t have any interest in reading the book so I cannot point to the exact error.

However, abuses of Bayes’ theorem are a dime a dozen. What is far harder to find, is a proper use of Bayes’ theorem, though I’m sure you can find some in books on mathematical statistics.

At least one prominent poster here believes that if we observe an eye we can form an opinion about how easy it is to evolve an eye, and that if we observe many eyes in different lineages that we can conclude that it is easy to evolve an eye, and that Bayes’ Theorem supports that conclusion.

Well, yes, that is nonsense. But I doubt that anyone here actually believes that. What a person here more likely believes is:

If we observe an evolved eye we can form an opinion about how easy it is to evolve an eye, and that if we observe many evolved eyes in different lineages that we can conclude that it is easy to evolve an eye, and that Bayes’ Theorem supports that conclusion.

This belief is not so unreasonable. But whether the person has actually collected the data and gone through the calculation — that, I do not know.

I’ve studied Bayes’ Theorem in connection with the idea that it can be used to explain how brains process information — an idea called “predictive processing.”

The fly in the ointment is the prior probabilities or priors. It’s easy to tweak the priors so you get the conclusion you want. That’s a big problem with trying to apply Bayes’ Theorem to neuroscience.

In general, it’s probably pretty easy to abuse any theorem — just assume that the world is the way the starting conditions of the theorem say it is, and you have all the advantages of theft over honest toil.

Mung: My opinion is that it does not, since it can be derived from standard probability theory.

No, but …

Traditionally, we construct a mathematical model of the problem, then gather the statistics, and make an inference. These days, that is often based on the Neyman-Pearson lemma. If you tried to use Bayes’ rule instead, it would not change much.

However, there’s a different approach used with Bayesian methods. A probability is treated as a degree of belief, and each data item is used with Bayes’ rule to recompute what should be the degree of belief. It uses whatever data is available, instead of constructing a mathematical model. And there is a great deal of skepticism about using such methods.

Mung: Are the prior probabilities available independently of Bayes’ Theorem? Does Bayes’ Theorem tell us what the prior probabilities are, or how to derive them?

Bayes’ Theorem doesn’t tell us anything beyond what we already know, it simply provides a way to answer a different question given what we know.

Thoughts?

Interestingly, Rev. Thomas Bayes’s application of his theorem in the 1760s was also one of those does-God-exist calculations.

Anyway, it is a simple and easily-proven theorem in conditional probabilities which calculates what is the probability of one particular value of a random outcome B, given an event A, from all the probabilities of event A given each of the possible outcomes of B, as well as the prior probabilities of these different outcomes of B. Those are the notorious prior probabilities.

The theorem shows you how to do the calculation. So yes, it does not provide you with the priors or with the conditional probabilities of A, you need to supply it with them. There is a good explanation in Wikipedia (in general statistical and probability concepts are well-explained there, as the statisticians organized themselves to make sure that was done).

Bayes’ Theorem is not at all controversial. How you then use it in statistical inference, and where you get the priors from, well, that’s another matter, suhject to much debate.

The fly in the ointment is the prior probabilities or priors. It’s easy to tweak the priors so you get the conclusion you want. That’s a big problem with trying to apply Bayes’ Theorem to neuroscience.

If you are referring to Friston’s stuff, the math is more about Variational Bayes. If you are interested, I can give some links tutorials which I found helpful; they do use some math.

On Bayesian inference and priors in general, I would say that having to be explicit about your priors is a good thing. For scientific work, there would often be an existing set of research programs and associated priors, so any Bayesian inference could and should include an analysis of the sensitivity of the results to the choice of priors.

If you take a Peircean approach to the scientific community of enquirers, then you would expect their priors to converge.

However, there’s a different approach used with Bayesian methods. […]And there is a great deal of skepticism about using such methods.

I’m curious as to why you say that last sentence. I suppose by definition frequentists are skeptical (!), but my impression is that Bayesian inference is becoming more popular, both in new fields like Machine Learning and as an alternative to the problems with NP or Fisherian traditional hypothesis testing that have become hot topics in eg psychology. It may vary by field, I suppose.

In terms of the basic questions in the OP, I often see Base Rate Fallacy discussions used to introduce Bayes theorem. Besides the ones in the linked Wiki article, you’ll often see it used in analyzing how to understand how likely it is you have a disease if a test comes back positive. It’s also used in explanation of Monty Hall problem.

As other posters have noted, Bayes theorem is elementary stuff. The bigger issues are in frequentist versus Bayesian inference. You mentioned the issues with choosing priors, but there are also difficult assumptions in frequentist statistics, they are just more hidden. See the discussions here and here.

Finally, there is a related difference in the interpretation of probability. Bayesianism tends to be associated with subjective understandings of probability; frequentism with objective. But this is a philosophical issue. The inference one applies to the details of how one does science and statistics.

Are the prior probabilities available independently of Bayes’ Theorem? Does Bayes’ Theorem tell us what the prior probabilities are, or how to derive them?

Bayes’ Theorem doesn’t tell us anything beyond what we already know, it simply provides a way to answer a different question given what we know.

Thoughts?

Bayesian theorem and Bayesian inference in general are about how to update your priors in the light of observations. How should we update what we know in the light of experiment?

Prior probabilities come from the science. One can argue they are always available, it is an advantage for them to be explicitly shown in Bayesian inference, and a disadvantage of “traditional” frequentist (NP/Fisherian) approaches that this background knowledge is ignored. Further, Bayesian inference allows you to test the sensitivity of your conclusions to the assumed prior knowledge.

I don’t know the details of the eye example you refer to in the OP. I’ve seen popularizations which argue that the evolution of eyes is not as surprising as one might think, given the mechanisms of evolution. I’ve had not seen those formalized as Bayesian inferences.

But perhaps the person making the eye argument does not want to assume evolutionary mechanisms, but rather is trying to use Bayesian intuitions to somehow argue that the appearance of eyes in many species with no recent common ancestors shows that evolutionary mechanisms work. I’m not sure how that one goes.

But any formal Bayesian analysis would have to be able to specify the probability of observations on the the alternative hypotheses and that difficulty for evolutionary alternatives has often been discussed in posts at TSZ. I have a vague, possibly wrong, memory of one by she-who-is-taking-a-hike.

One issue with publishing results from Bayesian inference is whether the priors that you use in the study are also the priors your readership would use. Specifying your priors is a good thing, but assuming that they are noncontroversial is not always warranted.

So, with published studies, readers may want to look hard at the priors and may end up updating their own instead.

His most recent book is on Ockham’s Razors.
Mung, I saw a post by you a week or so ago where you claimed that many people at TSZ misunderstood or misapplied that principle. But you did not say how, and I did not notice any followup.

What I previously tried to get across to Mung on several occasions is that new data you come into contact with should cause you to change your priors.

Suppose you have knowledge that indicates that some event is unlikely to happen. Say you have been informed that a coin you have been given is unfair and will come up heads 10% of the time only (lets ignore that it’s probably very hard to really bias a coin in this way). It’s from a person that doesn’t have any history of dishonesty, so you trust the coin is biased. That’s your background information. Or your priors.

Using those as your priors, you can plug them into Bayes’ theorem and compute the probability of getting a particular sequence of flips using a coin that is biased 90/10. But that of course assumes the coin really is biased 90/10.

But suppose you now go and start flipping the coin, and over the course of several hundred coin flips it seems to be much closer to 50/50. You now have new information you didn’t have before, your background knowledge has changed, and you should update your priors. Your new priors should be altered to reflect the observed behavior of the coin. Every new flip you make with the coin is another piece of data you must put into your “background knowledge” of the coin. If you keep your previous priors (the 90/10), any extended sequence of coinflips with a coin that is actually much closer to 50/50 will look like a “statistical miracle” because the coin seems to keep defying the odds.

But the whole point is you should update your priors, and the priors you have to begin with should be based on evidence(and some times that evidence is weak, like if you base your priors on only 10 flips of the coin). You can’t just make them up and then use Bayes’ theorem to prove that some particular piece of data must be evidence of miracles because you have priors that say the data is virtually impossible.

If you keep your priors (90/10), despite having observed that the coin does not conform to that expectation over the course of several hundred flips, then you are being irrational. That’s the message I tried to get across to Mung.

If you have some background knowledge that says evolving eyes must be extremely improbable and you therefore consider eyes to be a “miracle”, yet you keep finding evidence that eyes have evolved multiple times, then you should change your priors in light of your new evidence. At some point, the idea you have that eyes are miraculous* because.. well because that’s what your priors say, that idea comes into contact with new evidence, and that new evidence should cause you to change your priors. Finding that eyes have evolved 50 times is not evidence that 50 miracles occurred, it’s evidence that eyes aren’t actually miracles.

* Assuming you define a miracle to be something that is very unlikely to happen.

I’d like to thank you all for your comments. I’m dedicating myself to some consistent study of Bayes’ Rule/Theorem over the next couple weeks. I have a significant amount of material and have just never buckled down and gone through it in any sort of systematic way. I’ll probably be asking for help if there’s something I don’t understand.

Bruce, thanks for reminding me of the Sober book on Evolution. I have that one and will be sure to see what it has to say about Bayes.

I also have his book Ockham’s Razors. Still haven’t gotten around to more than a cursory look yet (about 10% through it). I think the issue of parsimony and Ockham’s razor comes up often enough here that it would be a good topic to cover. I do have it on my list of OPs that I intend to create. Some day.

As I understand it: A single experiment analysed with Bayes theorem would yield a posterior. If you were going to do future experiments with the same model, then you would set the prior for the next one to that posterior.

Can you people please stop abusing Byers? Don’t you get that he must have been abused? Why are you calling it a Byers Theorem …Anything Byers writes, if one sentence hAs soMe logic aT AlL is a theorem… Leave the poor bustard alone!!! 😉

Mung:
I’m dedicating myself to some consistent study of Bayes’ Rule/Theorem over the next couple weeks. I have a significant amount of material and have just never buckled down and gone through it in any sort of systematic way.

If you want more to add to that material….

This 2018 paper includes an intro for psychologists. The paper also talks a bit about likelihood ratios, which my notes on Sober’s book say he prefers there to Bayes (because it avoids priors). But it doesn’t help with analysis of design hypothesis versus evolution, since you still need a denominator for the design alternative for that ratio.

I think you work in IT, so possibly you may find this Wiki article on Naive Bayes classifiers in machine learning as helpful.

Good one, J-Mac.
Did you see my post about the novel The Gone World? Right up your alley if you like SF: quantum multiverses, quantum-related time travel, and Penrose-Hameroff consciousness theory are all used in the plot.

BruceS: Good one, J-Mac.
Did you see my post about the novel The Gone World?Right up your alley if you like SF:quantum multiverses, quantum-related time travel, and Penrose-Hameroff consciousness theory are all used in the plot.I thought of you and your posts while I read the novel.

Where the hell is it??? link please!!!
ETA: You have become my favorite blogger…after OMagain of course … 😉

Mung: Rumraket, I understand you were trying to teach me a principle, but I don’t think how easy it is to evolve an eye was a good choice of example.

Probability is concerned with events and I can’t even conceive of “how easy it is to evolve an eye” as an event to which we could begin to assign probabilities.

Perhaps you could address that concern.

You know I actually agree with that.

Without being able to give the actual probabilities, what we can say is that with evolution(and of course the conditions under which the sense of sight is useful), whatever the probability of eyes is, goes up. And for every new, independent evolutionary origin of eyes we get evidence of, the higher that probability must be.

Fair Witness: I recommend “The Theory that Would Not Die” by McGrayne.

I’ve actually come across that book before but did not purchase it. This little exercise reminds me of how many books I already have on the subject. If the big one ever hits here look for me under a huge pile of books.

BruceS: I think you work in IT, so possibly you may find this Wiki article on Naive Bayes classifiers in machine learning as helpful.

I don’t actually work in IT but as a sometime developer I brush up against it. But thanks for the tip to check out my machine learning books. I also have a couple books in the Texts in Statistical Science series and one on Bayesian Computation with R.

As far as straight through reading I am going to go through Bayes’ Rule: A Tutorial Introduction to Bayesian Analysis.

It provides a good background of Bayesian versus frequentist approaches to probability.

I THINK that’s the one I bought because it had a couple of chapters on the workers’ comp connection. If it is, I didn’t like it very much. I actually got better answers to my questions from somebody willing to answer stuff about it at the amazon site for the book.

Rumraket: If you have some background knowledge that says evolving eyes must be extremely improbable and you therefore consider eyes to be a “miracle”, yet you keep finding evidence that eyes have evolved multiple times, then you should change your priors in light of your new evidence.

I don’t see what that has to do with Bayes’ Theorem. Are you perhaps confusing Bayes’ Rule with Bayesian inference?

I guess that leads to the question, is there a difference between Bayes’ Theorem and Bayesian inference and if so what is it.

I guess that leads to the question, is there a difference between Bayes’ Theorem and Bayesian inference and if so what is it.

Bayesian inference builds on Bayes rule, so you need to understand the rule first. If you understand the concepts of joint and conditional probabilities, then Bayes rule has a one line proof. Once you have that idea, I think it’s a good idea to see some simple numerical examples; if these are not included in whatever tutorial you settle on, the base rate fallacy Wiki topic I linked is good.

Bayesian inference treats the parameters of probability distributions, such as the mean, as random variables. Frequentists do not treat parameters as random.

Take the coin flip example that has come up in this thread. If you flip a coin N times, the number of heads you get is a random variable with a binomial distribution which depends on a the probability that the coin comes up heads when tossed.

That probability is the parameter of the binomial distribution. The frequentist would treat that parameter as fixed; the Bayesian would allow it too to be a random variable.

The Bayesian would assume some prior distribution for the parameter, say that it was uniformly distributed over the interval [0,1], which would be used in the case where there was no knowledge of the coin’s fairness. The experimental results would apply Bayes rule with the Binomial and uniform distributions plugged in to update that prior to give a posterior distribution of the values of the parameter. If one got a lot of heads, then the shape of the parameter’s graphed probability distribution would change to show more probability above 0.5 than below.

I’m glad you enjoy my posts, but I wish I could persuade you to look at stuff by Carroll or Greene on consciousness and the “collapse of the wave function” in addition to the videos you post in other threads. These two explain the view that one has nothing to do with the other, which I think is the consensus view of philosophers and others working in the area of interpreting QM math.

Thanks Bruce. Another reason I’d been holding off on creating more OPs is that I am awaiting the one that is upcoming from Vincent on the resurrection. I hope he will break it up and make it a series though, rather than one long OP.

So back to Bayes’ Theorem, and opening to all.

Let’s say there is an urn and you know there is one red marble and one black marble in the urn. You draw a marble from the urn but do not look at it. You then assign a probability to the event – draw a red marble from the urn. You would probably put it at 50/50.

Now I draw a marble from the urn and tell you that it is red.

Does Bayes’ Theorem tell you that you ought to change the probability you originally assigned to the first event?

Does it merely describe the way to perform the probability calculation in such cases (or similar but more complex cases)?

How can your knowledge of the result of the second event change the probability of the first event, and is that what the debate with the frequentists is about?

What is the simplest application of the theorem that folks can come up with?

Does Bayes’ Theorem tell you that you ought to change the probability you originally assigned to the first event?

[…]

How can your knowledge of the result of the second event change the probability of the first event, and is that what the debate with the frequentists is about?

I’m afraid I don’t see how Bayes Rule would apply to this situation. Can you explain how you think it does? Or if this situation has come up before at TSZ, to a post where someone says it does? (See note ** on how to extend to involve Bayes rule in the analysis).

Rather, I think the experiment highlights the difference between ontological and epistemic interpretations of probability. The ontological view is that probabilities are out there in the world independent of us; it is usually associated with frequentist statistical inference. The epistemic view is that probabilities reflect our knowledge and is usually associated with Bayesian inference. (ETA: in one sense all probability except possibly QM is epistemic. That is, if we knew all the exact parameters of how the balls were mixed and you would move your hand to pick one, we would know which ball would be chosen. I’ll leave that issue for future posts, if anyone wants to discuss.)

So when you say that one would put the probability that the ball you drew was red was 50/50, that only applies if you take an epistemic view. If you take an ontological view, then the probability only applies before you draw one ball from the urn at random. Once the ball is drawn and in your hand, it is either red or it is not. There is no probability involved.

Under the epistemic view, once someone tells you the color of the other ball, then your probability estimate collapses*. At that point, you know it is either red or black and there is no probability at issue.

I am not sure what it means to say that you should change the probability assigned to the first event. I can interpret this as a counterfactual — if you knew the future, would it change what you assigned to the probability? (ETA: changed some questionable stuff in first version of rest of paragraph). Under epistemic view, there would be no probabilities. Under ontological view, I don’t know what one should say. The answer is possibly related to God’s omniscience and whether that means there are no ontological probabilities.

——————————
* If “collapses” reminds you of QM, that was intentional. But that’s a different post for a different thread, I think. If you cannot take the suspense, take a look at introductory page or two of Bertlmann’s socks.and the nature of reality, replacing urn balls by socks.

** Bayes rule can be introduced if you say that some third party puts the balls int the urn and that (say 1/4 time they are both red, 1/4 both black, 1/2 different colors.

I’m glad you enjoy my posts, but I wish I could persuade you to look at stuff by Carroll or Greene on consciousness and the “collapse of the wave function” in addition to the videos you post in other threads.These two explain the view that one has nothing to do with the other, which I think is the consensus view of philosophers and others working in the area of interpreting QM math.

Thanks Bruce!
I’ve ordered the book, since you recommended it 🙂
I’m familiar with both Carroll’s and Greene’s views, I think….
Is there anything particular that you like or dislike about their views?

Suppose that a test for using a particular drug is 99% sensitive and 99% specific. That is, the test will produce 99% true positive results for drug users and 99% true negative results for non-drug users. Suppose that 0.5% of people are users of the drug. What is the probability that a randomly selected individual with a positive test is a drug user?

The linked Wiki article then shows how to Bayes rule to answer the question.
But more fun is the Monty Hall problem

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

The answer is yes, it is always to your advantage. There are many explanations of why at Wiki, one uses Bayes rule

J-Mac: Thanks Bruce!
I’ve ordered the book, since you recommended it
I’m familiar with both Carroll’s and Greene’s views, I think….
Is there anything particular that you like or dislike about their views?

I learn from them but I don’t know enough to disagree with them. Not sure how “liking” would apply with their views. I like Carroll’s podcast Mindscape as well as his latest book. I like Greene’s videos and really enjoyed a course on SR he did for World Science U. In finding that link, I see he has some Q&A on QM where the answers are videos.

I hope you enjoy the Gone World book. Let me know what you think of it.

BruceS: I learn from them but I don’t know enough to disagree with them.Not sure how “liking” would apply with their views.I like Carroll’s podcast Mindscape as well as his latest book.I like Greene’s videos and really enjoyed a course on SR he did for World Science U.In finding that link, I see he has some Q&A on QM where the answers are videos.

I hope you enjoy the Gone World book.Let me know what you think of it.

Here is pretty much a short summary of Carroll’s views on Meaning and Consciousness.
He lost me at 0.20 min mark when he said that “…fish climbed out of water…”

And at 0.45 min mark when he said how evolutionary pressure caused the ability to develop separate choices of action…

Can you imagine what would happen if I were, at that meeting, and asked Dr. Maybe by what means the fish climbed out of the water and when was the last time he has seen the fish “climbing” out of water?

One humble but wise man once said that:
“Even the stupidest idea will find its supporters”.

You see Bruce, science fiction has to be maintained by the wildest assumptions…
Fish climbing out of the water using a quantum ladder is just as good assumption as any other…
The illusion has to look real… 😉

Does it really matter whether which one it was? Anyone in the right frame of mind knows that fish don’t “climb” out of the water by any means unless it is an random accident…
Why? Because it means death no matter how much evolutionists would like you to believe that evolutionary pressure preforms miracles….

I asked my teenagers about it… You don’t want to know what they thought of this bright idea…

They did? I guess that’s how the fish retained the ability of climbing out of the water… The bones the fins are attached to used to be legs of a kangaroo like land creature… so hopping out of the water for thousands of generations was a piece of cake… It’s funny how evolutionary pressures work… You don’t see much the fish climbing out of water… flying seems to be the evolutionary preferential ability…

I’ve read Carrier’s books using this theorem, and he does actually start with a strong prior of historicity. Then, chapter by chapter, he examines in each chapter a category of extant information, assessing it for provenance, for probability, and for degree of support of competing models. These assessments are then plugged into the theorem as the new prior, and the process repeats.

Carrier emphasizes that in most cases, extant information is sparse, spotty (in suspicious ways), and often from dubious sources (there are a good many holes, inconsistencies, forgeries, interpolations, lacunae, etc.) All of which leads to wildly conditional conclusions. I recall that Carrier concludes that historicity’s likelihood ranges from moderately likely to vanishingly unlikely, depending.

So if nothing else in this realm, Bayes’ theorem makes it crystal clear and fully explicit that garbage in, garbage out.

My kids questioned this very notion when they did simple experiment using our old pool table…

Yes, Siegel is reliable and his columns interesting. I try to read them all.

I think the theme of this one is that visualizations may help introduce you to a theory like GR, but to really understand it you have to do the math. (And even Einstein needed Grossman to help him with the math of GR!).

The same need to understand the math to really understand (let alone criticize!) a theory applies to QM, as I think we’ve touched on in past.

I don’t know what your children did with a pool table, but Siegel’s column would be a good way of describing the limitations of that kind of visualizations for understanding GR.

I’ll leave discussions of evolution for others and am happy to restrict my Sean stuff to physics. Here is one of his for you. You posted elsewhere about the relation between time and QM entanglement (through entropy); here Sean describes some recent work of his on how it might be that space emerges from entanglement.