Friday, January 21, 2011

Subjective Probability

The system I described in the previous post doesn't directly lend itself to "subjective" higher-order probabilities. Specifically, we can't just have a degree of belief in an arbitrary statement. We need to choose a variable randomly in order to have a probability-- a completely specific statement (with no variables) can't have a probability other than 1 or 0. Probabilities are reduced to a sort of counting -- specifically, what's known as a "measure" -- rather than being a way of representing uncertainty.

There are some ways we can try to interpret subjective probabilities as measures. We can introduce "possible worlds" (or "possible situations"), which we treat as being chosen randomly-- so if we're uncertain about whether it will rain or snow tomorrow, it's because in some possible worlds it rains, and in some it snows. The probability of each is obtained by counting the possibilities. However, prompted by Lukasz's comment on my previous post, I took a much closer look at how higher-order subjective probability measures should be represented. I found myself re-inventing the standard higher-order probability theory which I complained about in the first place. This can be represented within my framework, but it isn't clear that it should be. Each theory can stand on its own, and is more convenient for different purposes.

One problem with my proposed system is that it does not directly endorse the idea of generalizing from examples. Having knowledge of more examples will give us more knowledge of the probability involved, but only because the probability is defined by literally counting examples. If we have 100 buttons (each measured equally) and we know only that each button is either red or blue, seeing 98 of those buttons will tell us the probability to within a range of .02, but that tells us nothing about the last two buttons! 98 blue buttons are no indication that the last two are blue, unless we add a possible-world framework so that we can measure probabilities of the last two colors as a function of randomly chosen worlds.

Frequentism is a bit better here: probabilities are not just arbitrary measures, but rather, are limiting frequencies. What this means is that the probability of an event is defined as the ratio one would get after an infinite number of experiments. It seems more justifiable to use a limiting frequency as a generalization; if we have a limiting frequency, then we know that if we run experiments long enough, we'll get close to that ratio. This in-the-long-run statement is potentially quite weak with respect the the immediate future, but at least it tells us something about the future!

There are some problems with limiting frequencies, however. One problem is that there is no mathematical guarantee that limiting frequencies exist! Mathematically speaking, the limiting frequency will typically depend on such things as the order in which we perform the experiments; some orderings will have different limits, and some will have none at all (ie, the ratio varies up and down infinitely). We have to assume that the actual order in which we perform experiments will have a limit. Another problem is how we might get knowledge of limiting frequencies. A limiting frequency is not something we can directly use as a degree of belief concerning another limiting frequency-- limiting frequencies require something called a reference class (meaning that a probability for a specific event is only defined when we think of that event in the context of a specific sequence of random experiments). Furthermore, a ratio from a finite series of experiments does not necessarily tell us anything about the ratio after an infinite number of experiments; we need additional assumptions to try and make this connection.

This brings in the idea that we have to start with some probabilities in order to get more (effectively, we need a Bayesian prior). Taken to its extreme, we get the theory of subjective Bayesian probabilities; all probabilities are interpreted as personal degrees of belief. New information provides evidence for hypotheses by Bayes' Law, so that we can generalize from examples by performing bayesian updates on our probabilistic models of the world.

Really, all three of these options are valid applications of probability theory. One does not have to be a purist-- the different types of probability can be mixed. In particular, it seems useful to take Bayesian-style degrees of belief as a way of estimating the two other kinds of probabilities.

However, that sort of inclusiveness does not settle the issue of how probabilities should be used for particular applications. For my curiosities about higher-order probabilities, the system I presented in the previous post offers a somewhat nice, tight connection between higher-order probabilities and first-order logic (probabilities being a generalization of universal quantifiers). This may be appealing for certain probabilistic logic issues. On the other hand, the Bayesian theory of higher order probabilities has a nicer way of allowing a degree of belief for any statement... (though, higher-order belief distributions require us to just use expected probabilities when making bets, as I complained in the previous post).

I'll end this here... next time, I hope to talk a bit about the structure of possible-world semantics for purely subjective probabilities, and what it provides in the way of a theory of truth and foundation for mathematics.