The Old Evidence Problem

I’m in the middle of writing up a post sketching a some ideas I have about Bayesian inference in order to stir up a hornet nest – in particular to prod the hornet queen, David Chapman. In the process, I ran across this old blog post by Andrew Gelman discussing this (pdf) paper by Bandyopadhyay and Brittan criticizing one form of Bayesianism – in particular the form espoused by E.T. Jaynes. One of the issues they bring up is called the old evidence problem:

Perhaps the most celebrated case in the history of science in which old data have been used to construct and vindicate a new theory concerns Einstein. He used Mercury’s perihelion shift (M) to verify the general theory of relativity (GTR). The derivation of M is considered the strongest classical test for GTR. However, according to Clark Glymour’s old evidence problem, Bayesianism fails to explain why M is regarded as
evidence for GTR. For Einstein, Pr(M) = 1 because M was known to be an anomaly for Newton’s theory long before GTR came into being. But Einstein derived M from GTR; therefore, Pr(M|GTR) = 1. Glymour contends that given equation (1), the
conditional probability of GTR given M is therefore the same as the prior probability of GTR; hence, M cannot constitute evidence for GTR.

Oh man, do I have some thoughts on this problem. I think I even wrote a philosophy paper in undergrad that touched on it after reading Jaynes. But I’m going to refrain from commenting until after I finish the main post because I think the old evidence problem illustrates several points that I want to make. In the mean time, what do *you* think of the problem? Is there a solution? What do you think of the solution Bandyopadhyay and Brittan propose in their paper?

Edit: Here’s a general statement of the problem. Suppose we have some well know piece of evidence E. Everyone is aware of this evidence and there is no doubt, so P(E)=1. Next, suppose someone invents a new theory T that perfectly accounts for the evidence – it predicts is with 100% accuracy so that P(E|T)=1. Then by Bayes’ rule we have P(T|E)=P(E|T)P(T)/P(E) = P(T), so the posterior and prior are identical and the evidence doesn’t actually tell us anything about T.

Share this:

15 thoughts on “The Old Evidence Problem”

Isn’t the correct take simply “M is evidence against not-GTR”? All previous theories were falsified by M, GTR was not. Thus what M shows is just that GTR is Not Disproven. And in fact GTR is wrong (bla bla quantum mechanics bla); M is evidence that GTR is superior to other theories. That’s all.

Bayes’ Law says flatly that the new evidence doesn’t change your prior in this case. Now, there’s no hidden flaw in Bayes. Bayes tells you, correctly, how to update probabilities in the light of new evidence. That’s a theorem.

But in this situation, Bayes tells you nothing. The correct conclusion–indeed, an immediate easy lemma–is that your original premise is wrong: there’s no probability theory here. Whatever you’re doing when you evaluate confidence in a general theory of this kind, it cannot be described probabilistically.

The reason that M constitutes evidence for GTR is because physical theories that give different predictions are exclusive; at most only one of them can be ‘correct’ and representative of reality (or so we trust). Assuming that there is a ‘correct’ theory somewhere in hypothesis-space, the sum over all hypotheses H of the probability that H is ‘correct’ is 100%. So the probability that any theory is ‘correct’ is normalized by how many competing theories it is up against, and they are weighed by how well each one’s predictions are supported by the evidence, as well as by their prior probability of being true (based on Occam’s razor).

If any previously-probable theory gets ruled out by new data, then that data also boosts the probability of correctness of all remaining theories which survived.

In this case, P(Newton|M) was very small, even before GTR was developed — basically, it would be the probability that the measurements had a systematic error (small), plus the probability that Newton’s theory maybe actually predicts M but was being misinterpreted (due to unconsidered celestial dynamics).

So that makes M evidence for every theory that is not Newton’s, eliminating the front runner at the time and distributing a lot of probability mass to all the other hypotheses in hypothesis-space.

Now, it’s a different question whether GR’s explanation of also serves as evidence against Newton. If nobody could, after centuries, come up with any consistent theory that predicted M, I might be tempted to increase my estimate of the probability that the measurement of M itself had a systematic error, or that some quirk of solar wind was responsible, rather than gravitation. But GR’s mere existence in hypothesis space makes it easier to doubt Newton. In a twisted way, that makes GR act as evidence for itself. I am not sure if this is a reasonable conclusion or a result of some kind of irrational way of thinking.

For example, when faster-than-light neutrinos were detected, many scientists immediately assumed that it was due to an experimental error and that the reason would eventually be found, and the results withdrawn. (It was, and they were). But if there had been a strong contending theory which predicted faster-than-light neutrinos, I wonder if the CERN result would have been received so much skepticism? The competing theory would have made those results more believable, and scientists may have been faster to discard the previous theory. The experiment may have been scrutinized less, and the error never found. Then the competing theory, by its own existence, may have caused itself to become adopted (until the next generation fails to replicate the experiment, at least).

It is important to distinguish between 3 issues: (1) what scientists do; (2) what one should do; (3) how to formally justify what one should do. I have many times heard the claim that scientists do not allow old evidence in general and that, in particular, they were not convinced by the precession of Mercury, but by Eddington’s (falsified) eclipse data . My understanding of the history of science is that this is completely wrong, that everyone considered Mercury better evidence than the eclipse. At first glance, I think that Glymour, Bandyopadhyay, and Brittan are in favor of old evidence, just not sure about (3). But I’m not sure, and I think that is something to get cleared up.

To elaborate on strevdrrev, Glymour seems to equivocate between on whether he is assuming logical omniscience. I think that is B&B’s point.

1) If one commits to the probability framework, it seems more appropriate to assign Pr(M) = 1 – eps, for an eps really really small but > 0, representing for example all possible devices that made the measure being wrong, or us being manipulated in a simulation.

2) Along the lines that strevdrrev mentioned, updating the probability of a theory based on M will not bring it down for GTR, but it will do so for other candidates – I’d identify that as the main sense in which M provides evidence for GTR

3) What I see as the main obstacle to apply Bayes here is that setting up the prior within the space of possible theories is very non-trivial – e.g. need to not be overwhelmed by uncountably infinite grue-like theories, likely by using some Ockham Razor’s like criterium.

I don’t think that point (2) is actually possible. The probability of the other candidates, assuming a perfect Bayesian reasoner, is equal to Pr(~GTR). If Pr(~GTR) goes down, Pr(GTR) goes up. But Pr(GTR|M) doesn’t go up, for the reasons outlined in the post. Therefore Pr(~GTR) can’t go down. Obviously our ideal Bayesian is impossible, but we can be confident of this part of its reasoning.

Your point (3) captures the real issue. Here’s how I think I should adjust for the evidence provided by Mercury, as an actual physically-possible Bayesian –

I know I can’t assign priors over the entirety of the hypothesis space, so I have a list of likely theories produced by the heuristic reasoning used in theoretical physics. Einstein hasn’t revealed GTR yet, so it’s not on the list.

Hypotheses inconsistent with M, including Newtonian gravity, get assigned a probability smaller then epsilon. (In real life the probability of a measurement error isn’t really epsilon-small)

The remaining hypotheses on my list get assigned probabilities by Bayes where possible, and approximations otherwise. I leave a space, say 5%, for hypotheses I haven’t considered.

Einstein reveals GTR, which explains M. I add it to my list, distributing probability primarily from the hypotheses that do explain M. All of the M-explaining, non-GTR theories have dumb fudge factors, so GTR gets a lot of probability mass from them.

Further experiments, using new evidence, confirm GTR. It becomes the dominant hypothesis, until new evidence shows it doesn’t work. It still hangs around as a model for how objects behave in gravitational fields.

In the future, I probably assign a higher probability to the hypothesis that the real theory isn’t on my list.

At no point am I deviating from a Bayesian and probabilistic framework, but I’m also not updating via Bayes Theorem. Instead I’m changing my approximation of an ideal reasoner, which I’m forced to use because of the uncountable infinity of grue-like theories. And at the end of the day, the fact that GTR explains M still leads me to assign a higher probability to GTR.

On a separate line of attack, let me suggest that the place to apply probability (if you really must) is to M, not GTR. Attack the question: if GTR is wrong in general, what’s the likelihood of it getting M correct by accident? The “credibility update” ought to be proportional to 1 / P( M | “random theory” ).

Note that I’m saying “credibility” and not probability, because confidence still isn’t representable as a probability. But you can reasonably talk about P( M | “random theory” ), just by treating M as a small interval in the real line and applying a Benford’s Law-style argument. Note however that even this heuristic is dependent on a lot of qualitiative extra context: we don’t increase our confidence in a theory of cannon fire if the shell hits the target for the 10,000th time. A critical aspect of the situation here is that M is a novel observation, and that’s an extramathematical criterion.

Let me stress that last bit: What is special about M as a piece of evidence? What distinguishes the significance of “GTR predicts M” from the significance of “GTR predicts the 10^100th rock will fall down”? It’s not in the math.

My first intuition is that this would be a correct objection if the fact that GTR predicts M was connected to the prior knowledge of M, as if someone did a massive best-fit analysis of many competing theories and found that GTR matched M the best. Whereas, in reality, Einstein carried out a long string of reasoning from a set of hypothetical first principles, then did some calculations and M popped out.

Thus, M is dependent on GTR but GTR -> M is independent of the prior knowledge of M, and so this constitutes new evidence for GTR.

Reading this latter I think I need to clarify that last bit, because GTR -> M is always true, which is the problem. Rather the fact that GTR is the theory under consideration is independent of the prior knowledge of M, so the discovery that GTR -> M is new evidence for GTR.

I think it is clear that M is not new evidence and so cannot be used to update the prior.

Every scientific theory is proposed because there are some known facts that need explaining, and the theory is designed to explain those facts. Consequently they cannot be used to test the theory, actual new data is needed.

I don’t think the fact that P(M) = 1 in reality is relevant to anything here.

You are working within a finite hypothesis space of models, and what matters is P(M|model1) versus P(M|model2), not the “absolute” probability of M. Indeed when you apply Bayes’ Rule you should not set P(M) = 1 because P(M) is the sum of terms: P(M|model1)P(model1) + P(M|model2)P(model2) etc. and some models under consideration do not predict M (notably Newtonian gravity).

The whole name of the problem, the “old evidence problem”, is kind of weird. Who cares whether evidence is old or new?

P does not equal NP, therefore while searching through a database of known facts we can still “learn” surprising connections between ideas. Einstein used one process to come up with the idea for GR and then used a totally different process in checking whether GR explained the facts, the fact these processes matched is not something he could have anticipated in advance. I’m probably articulating this thought badly, I hope it makes sense anyways.