Probability Theory and the Undefinability of Truth

In 1936 Tarski proved a fundamental theorem of logic: the undefinability of truth. Roughly speaking, this says there’s no consistent way to extend arithmetic so that it talks about ‘truth’ for statements about arithmetic. Why not? Because if we could, we could cook up a statement that says “I am not true.” This would lead to a contradiction, the Liar Paradox: if this sentence is true then it’s not, and if it’s not then it is.

This is why the concept of ‘truth’ plays a limited role in most modern work on logic… surprising as that might seem to novices!

However, suppose we relax a bit and allow probability theory into our study of arithmetic. Could there be a consistent way to say, within arithmetic, that a statement about arithmetic has a certain probability of being true?

We can’t let ourselves say a statement has a 100% probability of being true, or a 0% probability of being true, or we’ll get in trouble with the undefinability of truth. But suppose we only let ourselves say that a statement has some probability greater than and less than , where Is that okay?

But there’s a catch, or two. First there are many self-consistent ways to assess the probability of truth of arithmetic statements. This suggests that the probability is somewhat ‘subjective’ . But that’s fine if you think probabilities are inherently subjective—for example, if you’re a subjective Bayesian.

A bit more problematic is this: their proof that there exists a self-consistent way to assess probabilities is not constructive. In other words, you can’t use it to actually get your hands on a consistent assessment.

Fans of game theory will be amused to hear why: the proof uses Kakutani’s fixed point theorem! This is the result that John Nash used to prove games have equilibrium solutions, where nobody can improve their expected payoff by changing their strategy. And this result is not constructive.

In game theory, we use Kakutani’s fixed point theorem by letting each player update their strategy, improving it based on everyone else’s, and showing this process has a fixed point. In probabilistic logic, the process is instead that the thinker reflects on what they know, and updates their assessment of probabilities.

The statement

I have not yet carefully checked the proof of Barasz, Christiano, Herreshoff and Yudkowsky’s result. Some details have changed in the draft since I last checked, so it’s probably premature to become very nitpicky. But just to encourage technical discussions of this subject, let me try stating the result a bit more precisely. If you don’t know Tarski’s theorem, go here:

The context of this work is first-order logic. So, consider any language in first-order logic that lets us talk about natural numbers and also rational numbers. Let be the language with an additional function symbol thrown in. We require that be a rational number whenever is a natural number. We want to stand for the probability of the truth of the sentence whose Gödel number is This will give a system that can reflect about probability that what it’s saying is true.

So, suppose is some theory in the language How can we say that the probability function has ‘reasonable opinions’ about truth, assuming that the axioms of are true?

The authors have a nice way of answering this. First they consider any function assigning a probability to each sentence of They say that is coherent if there is a probability measure on the set of models of such that is the measure of the set of models in which is satisfied. They show that is coherent iff these three conditions hold:

1) for all sentences

2) for each tautology.

3) for each contradiction.

(By the way, it seems to me that 1) and 2) imply and thus 3). So either they’re giving a slightly redundant list of conditions because they feel in the mood for it, or they didn’t notice this list was redundant, or it’s not and I’m confused. It’s good to always say a list of conditions is redundant if you know it is. You may be trying to help your readers a bit, and it may seem obvious to you, but it you don’t come out and admit the redundancy, you’ll make some of your readers doubt their sanity.)

(Also by the way, they don’t say how they’re making the set of all models into a measurable space. But I bet they’re using the σ-algebra where all subsets are measurable, and I think there’s no problem with the fact that this set is very large: a proper class, I guess! If you believe in the axiom of universes, you can just restrict attention to ‘small’ models… and your probability measure will be supported on a countable set of models, since an uncountable sum of positive numbers always diverges, so the largeness of the set of these models is largely irrelevant.)

So, let’s demand that be coherent. And let’s demand that whenever the sentence is one of the axioms of

At this point, we’ve got this thing that assigns a probability to each sentence in our language. We’ve also got this thing in our language, such that is trying to be the probability of the truth of the sentence whose Gödel number is But so far these two things aren’t connected.

To connect them, they demand a reflection principle: for any sentence and any rational numbers

Here is the Gödel number of the sentence So, this principle says that if a sentence has some approximate probability of being true, the thinker—as described by —knows this. They can’t know precise probabilities, or we’ll get in trouble. Also, making the reflection principle into an if and only if statement:

is too strong. It leads to a contradictions, very much as in Tarski’s original theorem on the undefinability of truth! However, in the latest draft of the paper, the authors seem to have added a weak version of the converse to their formulation of the reflection principle.

Anyway, the main theorem they’re claiming is this:

Theorem (Barasz, Christiano, Herreshoff and Yudkowsky). There exists a function assigning a probability to each sentence of such that

1) is coherent,

2) whenever the sentence is one of the axioms of

and

3) the reflection principle holds.

Related

This entry was posted on Sunday, March 31st, 2013 at 8:34 pm and is filed under mathematics. You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.

By the way, I believe a function that’s coherent and gives probability 1 to all the axioms in a theory also gives probability 1 to all the theorems of theory . So, even if we should interpret these probabilities as ‘subjective’, this is the probability distribution for a very very smart mathematician: someone who can see all the logical consequences of any bunch of axioms!

For example, if extends Peano arithmetic, and Goldbach’s conjecture is provable in Peano arithmetic, then must assign probability 100% to Goldbach’s conjecture… even if my best guess of the probability is something like 99.99%.

But don’t axioms and their consequences exhaust all true statements in a theory? Unless there can be true statements that are not a consequence of the axioms it would seem to me that such function would define truth.

For any consistent finitely axiomatizable theory including arithmetic, there are lots of statements that can neither be proved nor disproved. This is Gödel’s first incompleteness theorem. Some of these statements are rather clearly true: for example, the unprovable statements that say “This statement is unprovable”. So truth is different from provability. Tarski’s theorem is about truth.

However, you’re touching on a good point: I believe it’s only the statements that are neither provable nor disprovable that can be assigned probabilities other than 0 or 1 by a probability function meeting the conditions of the big theorem at the bottom of my post.

I wish the paper stated this more clearly—or else said why it’s not true.

I believe it’s only the statements that are neither provable nor disprovable that can be assigned probabilities other than 0 or 1 by a probability function meeting the conditions of the big theorem at the bottom of my post.

Let me say why I think this.

Given a theory there’s a (large) set consisting of all models of this theory. This becomes a measurable space if we say all subsets are measurable. Then I believe coherent probability assignments assigning probability 1 to all the axioms of are in 1-1 correspondence with probability measures on The probability of a sentence is the measure of the subset of consisting of models in which is satisfied.

The soundness and completeness theorems of first-order logic say is a theorem of if and only if it’s satisfied by every model in .

The measure of itself is 1. So, if is satisfied in every model of , we get

Putting these together, we see any theorem of the theory must have for every coherent probability assignment Similarly, if is a theorem, So it’s only the sentences that are neither provable nor disprovable that can have probabilities

So, if we interpret these probabilities in a subjective Bayesian way, we’re assuming a rational agent of infinite powers, who is 100% sure of any statement that can proved from an axiom system, given that it’s sure of the axioms.

Gödel’s first incompleteness theorem and Tarski’s theorem are different but related.

Tarski’s theorem says you can’t define truth of arithmetic statements within arithmetic—since if you could, you could write a statement saying “I am false”, and get a contradiction. If it’s true, it’s false. If it’s false, it’s true. Very bad.

Gödel knew this before Tarski, but he noticed something else: you can define provability of arithmetic statements within arithmetic. So you can write a statement saying “I am unprovable”. But you don’t get a contradiction! If it’s unprovable, it’s true. If it’s provable, it’s false. Not so bad, but still very interesting: either arithmetic contains true but unprovable statements, or false but provable ones.

Gödel then restated this in a way that avoids mentioning the concept of “truth”.﻿

They are researching this subject because they believe it to be a problem that needs to be solved in order to save the world:

“So you want to save the world. As it turns out, the world cannot be saved by caped crusaders with great strength and the power of flight. No, the world must be saved by mathematicians, computer scientists, and philosophers.” (http://lukeprog.com/SaveTheWorld.html)

Any mathematician who wants to contribute to that charitable cause should consider researching those issues.﻿

I’m not sure how helpful this work will be in developing artificial intelligence. In my article, I mentioned that their proof that well-behaved assignments of probabilities to mathematical statements exist is nonconstructive. In other words, it doesn’t provide an algorithm to actually find probabilities.

In an earlier email to Yudkowsky I asked whether any well-behaved assignment of probabilities is actually uncomputable: in other words, there’s no algorithm to compute the probabilities to arbitrary accuracy. The new draft of the paper says that indeed any well-behaved assignment is uncomputable… but it doesn’t (yet) provide a proof, and I don’t see why this must be true.

But anyway:

Even if the ideas here can’t be implemented algorithmically, they could lay the groundwork for ideas that can. And I think the idea of blending probability theory and logic this way is a good one. ﻿

By the way, that quote neglected to mention that the world can be saved by caped mathematicians with the power of flight.

OK, so I skimmed this very quickly, but … can we Godel-number all the various statements about probability (e.g. as suggested by the reflection principle, as stated for rationals, not reals)? If so, isn’t there a problem? That is, if we can ‘arithmetize’ this theorem itself, and all the elements of its proof, we come back to Tarski’s thm. I see only two ways out: (a) my observation above has no bearing on the theorem, or (b) some kind of magic happens in going from a countable variant of probability/measure-theory, to one set up on a proper class.

I suspect (a) holds: the theorem is not really defining truth, but a concept of the probability of truth.

And this concept is weak enough that we can’t say a statement is true if and only if its probability of being true equals 1… because if we could, we’d get a paradox. We’d be able to make a statement S that says

“S has a probability 0 of being true”

and this would imply that negation of S has a probability 1 of being true, so the negation of S would be true, so S would be false… so it would be true! In short, we’d get a paradox.

You might be interested in Section 3.4 of Christiano et al‘s paper, currently the last section, which is called “Going Meta”. They talk about systems of arithmetic that “know” the reflection principle is true. They don’t currently talk about what happens when you arithmetize their whole argument. That could have some interesting consequences. But I don’t think they’d be “problems”, exactly.

These are not as revolutionary ideas. There are decades old works on many valued logic and on its applications to paradoxes. See for example Petr Hajek’s book Methamatematics of Fuzzy Logic. These are well-known to experts in many value and philosophical logic.

I have modest familiarity with metamathematics of fuzzy logics. You can’t carry out this construction in fuzzy logic, because the handling of quantifiers is different. In general, the goals and philosophy of fuzzy logic seem significantly different, and so we diverge on many implementation details.

I should clarify because this may be an important point, and does deserve more serious discussion. There will definitely be a more extensive discussion in the paper itself, which should be forthcoming soon.

In fuzzy logic, we can introduce a truth predicate Tr that returns “true” or “false,” in the usual way, and the truth of self-referential sentences involving Tr may simply be fuzzy. However the logic cannot make statements about Tr(phi)’s fuzzy value, it can simply include it in a proposition whose truth value will then be fuzzy.

At face value this would be a modestly limited but satisfactory way to make use of a truth predicate. But for such a fuzzy truth predicate, you might have Tr(A) & Tr(B) != A & B, which is problematic for this usage. These issues are well-understood; if there have been significant advances in the last few years however then I may be ignorant of them.

In our system we can directly discuss the values of P. This is the difference between P being a procedure that returns a number, and P being a routine that returns “True” with the appropriate probability. Again, the latter usage would be fine, except that in the case of the fuzzy logic truth predicate we must admit arbitrary correlations between this return probability and the actual state of the world (such that e.g. when P(phi) is run on a statement with probability 1/2, it returns “True” half of the time as desired, but it returns true in precisely those worlds where phi itself is false).

Our system has an analogous problem to the fuzzy truth predicate, in that we might find that P(phi) 1/2 in worlds where phi is false (e.g. if phi := P(phi) < 1/2). But now we have isolated the problem in an infinitesimal error in the value of P, which is not an issue for any system that makes careful use of it (whereas arbitrary correlations between the return value of a stochastic truth predicate and the actual truth of the world, would be a quite serious problem).

If we take to be the language of set theory and take to be the axioms of ZFC (Zermelo-Fraenkel set theory including the axiom of choice), we can state the Barasz, Christiano, Herreshoff and Yudkowsky theorem in and prove it using So, fire away!

(ZFC sounds like overkill, but their theorem relies on the Kakutani fixed point theorem, which is nonconstructive, so the axiom of choice could easily be lurking in there someplace.)

This reminds me my ancient thesis in Physics (I had ro read it again) regarding the fuzzy logic.
There are some value associated to adjectives (colors,temperature,etc.), that are not probability: for example the colors of the rainbow; a sentence can have a fuzzy propositional value in a chain of logic:

because of the proposition have different truths.

I use the functional calculus to associate the optimal value to sentence:

where the p are the truth value, and I use the Lagrange multipliers to obtain the optimal probability to associate cluster data to the value.

This choice divides the space into clusters of propositions.
The output of a sentence is:

and this value can be optimized for some value of the proposition (that are not boolean): I use they only for function approximation, not for a boolean-like logic.

I am thinking today to the true liar.
I think that logic, and mathematics, can give us the truth only if are falsifiable, like the physics.
I think that a true liars are the proposition that applied to the reality give us no information: the probability of true, and false, are equal (p=0.5) and the series of probability have not correlation (we cannot extract information by the true liar).
An example of true liar can be a proposition: a random point on a fractal surface has positive curvature (each measure gives a different curvature, and the proposition don’t gives we nothing).
If logic, and mathematics, are applied iteratively to the reality, then we cannot give truths (like the physics), but we have nearly-true proposition.

Sorry about the confusing conditions 1-3; condition 3 seems to be necessary; you can’t derive it from 1 and 2 because e.g. you don’t know a priori that the probability of and the probability ( AND TRUE) are the same (this can be derived from condition 3, or you could use some different axioms).

We mention in the paper that the consistency of a principle (e.g. the reflection principle) can be relevant to constructing practical systems even if the existence proof is non-constructive. For example, the principle could be used as a rule of inference (allowing a system to infer that is likely to be true once it learns that is likely to be close to 1), in which case the consistency statement shows that adopting it is unproblematic.

Intuitively, our P represents “truth” not “knowledge.” As you say, it would be a smart mathematician indeed! We imagine the reflection principle as being a rule which constrains our beliefs, in the same way that a formal correctness criterion for a truth predicate would do so.

To see that no coherent P can be computable, consider two recursively enumerable, provably disjoint, but recursively inseparable sets and (e.g. Turing machines that halt and output 0 and Turing machines that halt and output 1). Then if , and if , so you could use to separate and .

Finally, the point of this research program is to contribute as much as possible to our understanding of reasoning; the aim is to (as a society) ensure we have an understanding of intelligence which matches our ability to build intelligences. This is an instance of the general principle that if you are going to build something which might have negative effects, it is worthwhile to understand it well first. I think it is fair to say that both Tarski’s result and our paper contribute (at least modestly) to our understanding of semantics and reflective reasoning. It will take many such steps before we have amassed something really useful, but the point is to indicate that there is a real research program here which is currently being neglected.

Thanks for your clarifications! I’m sorry for giving your paper a public blogging before you were done writing it, but it was too interesting for me to resist thinking about it, and the way I think is by writing… and these days, the way I write is by blogging.

I like your proof that any coherent has to be uncomputable. I was looking in the wrong places for a proof of this! (I guess this is only for ‘s that assign probability 1 to the axioms in some sufficiently powerful theory, like Peano arithmetic or Robinson arithmetic, right?)

Since I like logic and probability theory, I think it’s great to combine them this way even if it’s not immediately relevant to designing rational agents with limited powers.

Over on G+, Richard Elwes wrote:

There is other work in uncertain reasoning which blends logic and probability in a slightly different way – by replacing cast iron rules of deduction (“from A deduce B”) with probabilistic variants (“if P(A)=0.9 then P(B)=0.75” sort of thing – obviously I’m butchering it somewhat).

Jeff Paris has done a lot of work in this area (after earlier establishing a wonderful classical incompleteness result with Leo Harrington).

What I don’t really know is how useful this has been to AI researchers thus far – but that is certainly a goal.

The use of the Kakutani fixed-point theorem demonstrates that there is a fixed point – an enumeration of probabilities for sentences in L’. Do we know that such an enumeration is unique? How do we know this?

It seems to me that much of the value of having probabilities for statements analogous to the use of the predicate True relies upon uniqueness, and not just consistency.

For any axiomatization of arithmetic with finitely many axioms, there are infinitely many statements that can be neither proved nor disproved using these axioms. In the setup discussed here, we have no choice at all about assigning probabilities to statements that can be proved: they have to get probability 100%. And we have no choice at all about assigning probabilities to statements that can be disproved: they have to get probability 0%. But we have quite a bit of freedom in assigning probabilities to statements that can neither be proved nor disproved. The reflection principle lessens this freedom, but it doesn’t eliminate it.

What’s more problematic than the nonuniqueness, in my opinion, is that in this setup, no choice of probabilities obeying the rules listed is computable. That is, you can’t write a program to compute these probabilities, no matter what choice you make.

My own interest in this question is whether this result casts light upon the bivalency of the Continuum Hypothesis (CH), and a number of other questions. A ‘fixed’ probability would suggest strongly that CH is bivalent. As things stand, some progress seems to have been made on this question, although I’m not quite sure what.

As you probably know, the Continuum Hypothesis can neither be proved nor disproved from ZFC (Zermelo-Fraenkel set theory plus the Axiom of Choice). So, by the completeness theorem, there are models of ZFC in which the Continuum Hypothesis holds and models in which it doesn’t. Arguments about which of these models is ‘true’ must invoke mathematical taste, or some other axioms which one is willing to declare ‘true’. Personally I think it’s best to explore both kinds of models and learn what they’re like. And indeed that’s what set theorists have been doing.

I bet that if you take ZFC plus the refleciton principle governing as the theory you ‘know’, the one I’d been calling you can find probability assignments meeting all the conditions of the Barasz-Christiano-Herreshoff-Yudkowsky theorem that assign whatever probability you want to the Continuum Hypothesis. However, my attempted proof of this has a hole.

I’m having trouble replying to your latest reply, so I’m responding to an earlier reply. The bivalency of CH is a really slippery problem, and I was interested to see if this probabilistic approach could make any traction in resolving it. At first blush, I would expect that any probability could be attributed to CH, but then it doesn’t surprise me if even that is hard, or impossible to establish.

I should be clear that I am gathering data for writing an essay in logic on the continuum hypothesis as part of my philosophy Masters, so I’m not looking for homework help, but only pointers at best.

The question of bivalency is a philosopher’s question, as much as it is a mathematician’s. In classical first order logic, for any X, (X v ¬X) is true, ie. every X is bivalent. Indeed, you can enter (X v ¬X) as a line in a formal proof. The trouble with this is that in second order logic, sentences such as the liar sentence L used in Gödel’s theorem cause serious problems for this rule.

So possibly some sentences are bivalent, and others are not. Intuitionists hold that you cannot perform certain operations that rely on the law of the excluded middle, although others are allowed, or else some basic mathematics couldn’t be done. Naïve intuitionism has a problem dealing with the Goldback Conjecture (GC), since it seems pretty clear that either there is a counterexample, or there isn’t, so it could conceivably have no counterexamples and yet no proof – yet surely in such a case it would be true, so (GC v ¬GC) holds?

In the light of this, the bivalency of CH could conceivably be resolved without answering the question as to whether it is in fact true or false. Also, CH could turn out to be resolvable with (say) a large cardinality axiom, or else a strengthening of the logic being used. I’ve listed some relevant papers below:

December 13-21, 2013 – I’ll be attending Workshop on Probability, Logic and Reflection at the Machine Intelligence Research Institute at 2721 Shattuck Avenue in Berkeley, with Eliezer Yudkowsky, Paul Christiano and others.

I thought that each mathematical demonstration can be translate in boolean algebra, the boolean net can be transformed in an electronic circuit, if the truth table of the circuit is constant then the demonstration have not inconsistencies; but it exist the flip flop, in the electronic, that have not a constant value, so that the boolean algebra can be inconsistent.
If the value of the truth of a sentence is a spin value (spin up and down), then with a quantum computer there is not more inconsistence, because there is a quantum value for each node of the net, and it is possible deduce theorem from flip flop outputs: there is an overlap between boolean net (without incongruence) and quantum spin net (the truth tables are the same).
The boolean net have two state of truth, with an oscillation between the two state, but I am thinking that the quantum computer have a single quantum equilibrium state for a boolean net, and there is not more incongruence (intermediate values).

It is a my great mistake: I don’t know if each mathematical theorem can be represented like a boolean net (this is the muddle), with nodes of propositions and axioms, so that the Godel’s incompleteness theorems can be solved with a quantum computer.
Thank you.

Here’s another way to show that is uncomputable in general: suppose is the sentence that expresses “the Turing machine halts on empty input”. Then if if not zero or one, the machine does not halt! This is clearly undecidable in general.

I think this work is interesting! My question is this: what is the difference with working with a probabilistic assignment of truth and the more “standard” provability predicate? It seems that both assign a value to each sentence into some (boolean) lattice. Is there a way of connecting both points of view?

Probabilities (or rather, events in a probability space) really are elements of a boolean lattice as are most spaces stable under intersection, union and complement, but I’m not sure my suggestion isn’t junk, as I don’t think the “classical” point of view brings much more to the table than the usual analysis of modal logic in S4.

How To Write Math Here:

You need the word 'latex' right after the first dollar sign, and it needs a space after it. Double dollar signs don't work, and other limitations apply, some described here. You can't preview comments here, but I'm happy to fix errors.