http://en.wikipedia.org/wiki/Standard_probability_space would be a place to start, although it’s a bit heavy. Chapter 7 of Introductory Real Analysis by Kolmogorov and Fomin would be a good introduction to measure theory. I think Halmos in his book Measure Theory touches upon the idea of the metric structure on the measurable subsets of a space with measure.

He defines the metric by d(X,Y)=m(X-Y)+m(Y-X) where m is the measure. The sets of measure zero considered negligible etc.

But the intuitive idea is simple. The unit interval with Lebesgue [0,1] measure is approximated by the sequence of discrete measure spaces of numbers with n digits, where each number has measure 10^(-n), when n goes to infinity.

If you use binary system instead of decimal, you get an approximation of infinite sequences of coin tosses by finite sequences.

Of course, all these infinite series of independent trials, as well as real numbers are examples of mathematical mythology. In applications we deal only with finite series of trials, finite accuracy approximations, finite number of choices etc. That’s why the approach indicated by Strogatz is so good. It brings us from mathematical abstractions back to common sense, so to speak. I wish he did the same with calculus.

“To those of us, like me or Mr. Price, who make that jump across the algebraic curtain without a moment’s thought, this seems bizarre (so indeed, as Mr. Price says, Prof. Strogatz has indeed done precisely the calculations prescribed by Bayes’ Theorem). But obviously there’s something going on in crossing the algebraic curtain that is a real challenge to many people, and perhaps it would be interesting to find out what researchers in cognitive psychology and mathematics education have to say about that.”

Phil Price is right about Strogatz’s excellent column being precisely an application of Bayes’ Theorem, but it appears that both he and Strogatz miss a key point, which Sam L. Savage (who is mentioned in another context in Michael Schrage’s comment) discovered and named the “algebraic curtain”: there are lots of useful mathematical concepts and algorithms that people can understand clearly in concrete example after concrete example, but those same people are incapable of performing the exact same reasoning in the abstract, with the kinds of symbols that make it possible to state the relevant concept or algorithm in generality. (Whether that incapability is surmountable by good teaching, and if so by what methods of good teaching, I don’t know, and it isn’t relevant to what I’m saying right now.)

A very good point indeed. This is about reification: on some cognitive substrates, reification is easier than on others. In case of teaching probability, it is a good idea to exploit the low level (and, in effect, visual by its nature) intuition of natural frequencies for as long as practically possible.

The trouble is that people do not have “natural” intuition of probability but they have intuition of natural frequencies in the discrete domain. As soon as people start to deal with magnitudes (as opposite to sizeable discrete quantities), their “probabilistic” intuition is actually about something that is closer to mathematical expectation than to “mathematical” probability.

But isn’t mathematical probability supposed to be a mathematical model for natural frequencies, at least for people who use it for practical purposes? Too bad it became shrouded in opaque formalism to the point that it is incomprehensible to many that would benefit the most from understanding it. I must say it’s the case with many other mathematical theories.

The trouble is that (especially “pure”) mathematicians are preoccupied with the ideal world of their theoretical constructs where they live and work to the extent that they forget about the real world where “normal” people live. So we have 2 parallel universe with very little communication between them. Unfortunately it’s these visitors from the parallel universe that teach many mathematical courses in colleges and universities. The results are predictable, especially considering total mathematical immaturity of their students.

“The trouble is that people do not have “natural” intuition of probability but they have intuition of natural frequencies in the discrete domain.”

I would argue that the reason that people do not have a natural intuition for probability is because there is no natural intuition for it. (I think it better to use the more neutral term, “uncertainty”, since “probability” can refer to one specific theory for formal modeling of uncertainty.)

European mathematicians in France, Italy, Britain, Germany and the Netherlands first started to formalize uncertainty in one exciting decade around 1665. I think it no coincidence that coffee and coffee shops also arrived in Europe in that decade. 350 years later and mathematically-adept experts in Mathematics, Statistics, Philosophy, Physics and Artificial Intelligence who model uncertainty formally still cannot agree on what the semantics of probability statements are, nor the scope of natural language statements to which probabilities may validly be applied. This is not because these people are stupid; on the contrary, it is because these questions are hard, and because some formalisms for modeling uncertainty are more appropriate for some applications than are others.

For an introduction to the arguments here, see the nice book by Donald Gillies [2000]: “Philosophical Theories of Probability”. (London, UK: Routledge)

“Before going on vacation for a week, you ask your spacey friend to water your ailing plant. Without water, the plant has a 90 percent chance of dying. Even with proper watering, it has a 20 percent chance of dying. And the probability that your friend will forget to water it is 30 percent. (a) What’s the chance that your plant will survive the week? (b) If it’s dead when you return, what’s the chance that your friend forgot to water it? (c) If your friend forgot to water it, what’s the chance it’ll be dead when you return?”

A large part of the confusion in examples such as these is that the problem is stated in a way in which time is modeled implicitly, rather than explicitly. Question (b) is asked at the end of the week, about events that happened during the week (and given an event that happened at the end of the week). Question (c) is asked at the end of the week about an event at the end of the week (and given events that took place during the week).

The first objection to make is that Question (b) seems to be ill-posed, since it is asking about events in reverse chronological order. I think that question (b) is retrievable (ie, it is possible to ask it in way which does not seem to force us to accept a reversal of time), but this possibility is not at all obvious when you encounter one of these problems initially.

In my experience, people have difficulty with these types of problems while-ever time is implicit. Explicitly assuming a model of time (eg, a branching linear model) makes the problems easier to state, and their answers easier to construct.

It is to the ongoing shame of the statistics profession that they continue to present problems to students in this ill-formed, ambiguous manner, while blaming the students for having poor understanding.