17 August 2011

Academic Exercises and Real World Commitments

My exchange last week with James Annan, a climate modeler, was interesting for several reasons, not least, for his suggestion that the projections of the IPCC could not be judged to be wrong because of their probabilistic nature. This view strains common sense, to put it mildly.

Here I argue that our disagreement lies not in different views about the nuts and bolts of probabilistic forecasting, but rather our views on whether the IPCC is engaged in providing guidance to decision makers about the probable course of the future, or instead, is engaged in an academic exercise. This would seem to be a natural point of disagreement between an academic involved in modeling and a policy scholar. James does the climate science community no favors by personalizing the debate, so here I'll stick to the issues, which are worth a discussion.

Let’s start with an analogy to make this distinction more clear. Please note that I reject James choice of a die roll as an analogy because it begins with a presumption that the probabilities for various future outcomes are in fact known with certainty. While such an assumption is highly complementary to climate modelers, the fact is that the actual probability distributions for the future are unknown. That is why evaluation of probabilistic statements is necessary.

Let’s say that I make the following statement:

[A] It is very likely that a team from the AFC will win the 2012 Super Bowl.

You, being a Green Bay Packer fan, take issue with my statement and ask me if I want to bet, but first you ask me what I mean by “very likely.” I explain that by “very likely” I mean at least a 90% chance. You then ask if I will give 9 to 1 odds on a $1 bet on the Super Bowl. Since I am confident in my statement (and I believe that the bet gives me good odds as I think that the chances could even be higher than 90%), I agree to bet and we shake hands.

Consider the following way that events might play out -- Let’s flash forward to the Super Bowl in 2012. Imagine that the Denver Broncos (hey, it is my example;-) beat the Packers. In that case, I would win the bet and you would pay me $1. We would both agree that the expectation expressed in my statement [A] was correct in a common sense understanding of the term. Had the Packers won the game, I would part with $9 and we’d agree that my expectation was incorrect.

Now let’s consider an alternative ending to this scenario. Let’s assume that the Packers win the game. I immediately protest to you that I actually did not lose the bet -- at least not yet, because my statement was probabilistic in the sense that it referred to the entire set of potential Super Bowl games in which I judged that the AFC would win 90% or more of them. It just so happens that that this was one of those cases that happens to fall in the 10% of possible outcomes. In order to judge the winner of the bet we would have to replay the Super Bowl many times, and I suggest that we do so on Madden 2012, a computer-game simulation.

You would reply, “Huh? Whatever, you lost the bet, where’s my $9?”

The difference between the two endings lies in whether one views a probabilistic statement as a basis for committing resources for a future that will play out in only one way or as an academic exercise to be explored using simulations of the real world.

On the one hand, in the first case, a Packers’ victory would mean that the expressed judgment in [A] turned out to form the basis for a losing bet. It is important to understand that the judgment [A] may have been perfectly sound and defensible at the time that it was made – perhaps it was based on my expertise or perhaps I actually ran a Madden 2012 simulation 10,000 times under various scenarios to create an ensemble projection space as the basis for my judgment. Perhaps then the outcome was just bad luck, meaning that the 10% is realized 10% of the time. Actually, we can never know the answer to whether the expectation was actually sound or not, only that a commitment based on the judgment lead to a loss of $9.

On the other hand, if I actually meant statement [A] simply as an academic exercise, perhaps in relation to some simulation of the real world, I should not be betting in the real world.

Let’s continue with the analogy and say I was a sports handicapper and you wanted to get a sense of my skill in giving odds on outcomes. One way that you might evaluate my performance is to look at all of my Super Bowl predictions over time. But that doesn’t offer much of an experiential basis with which to judge probabilistic projections. Another way that you might judge my performance is to look across my handicapping of many sports and see how the collective outcome compares with what I have projected. For instance, if I make 100 projections that are judged to be “very likely” you will have a lot of confidence in my skill if 90 of those 100 projections actually realize, even if they are across 100 different, incommensurable events. But even then there would be problems in assessing my skill (e.g., what if there are 10,000 sports handicappers and you come to me because my record is better than the others, is that because I am good or just lucky? But I digress).

It is very likely that hot extremes, heat waves and heavy precipitation events will continue to become more frequent.

If decision makers commit resources (let’s assume intelligently, fully recognizing and understanding a 90% probability) based on this projection, and it turns out that the actual future experienced on planet Earth is one of constant or declining hot extremes, heat waves and heavy precipitation events, then the common sense view will no doubt be that the IPCC got this one wrong.

Climate modelers who protest such a common sense view based on what they describe as the impossibility of verification of probabilistic statements generated from ensemble projections from modeled simulations of the real world will be laughed off as out-of-touch academics, and rightly so. Infallibility doesn't even work for the Pope.

68 comments:

If modeling results cannot be judged wrong because they are probabilistic, than that's just another way to say that they are not falsifiable = not scientific. Interesting claim to make, coming from a scientist.

That is so much football analogy that I'm not even sure what you are talking about anymore. Anywhoo. You an Annan can have your meaningless debate all year if you like. The issue is with your headline. It is false and misleading, but of course you meant it to be provocative and it was. So why are you now crying about it? It perfectly accomplished your objective. You won, Roger.

Given the close analogy between what you are trying to do and the well-documented problem of verifying probabilistic weather forecasts, it's surprising that you seem to have such a hard time understanding the principle:

"An important property of probability forecasts is that single forecasts using probability have no clear sense of "right" and "wrong." That is, if it rains on a 10 percent PoP forecast, is that forecast right or wrong? Intuitively, one suspects that having it rain on a 90 percent PoP is in some sense "more right" than having it rain on a 10 percent forecast. However, this aspect of probability forecasting is only one aspect of the assessment of the performance of the forecasts. In fact, the use of probabilities precludes such a simple assessment of performance as the notion of "right vs. wrong" implies. This is a price we pay for the added flexibility and information content of using probability forecasts. Thus, the fact that on any given forecast day, two forecasters arrive at different subjective probabilities from the same data doesn't mean that one is right and the other wrong! It simply means that one is more certain of the event than the other. All this does is quantify the differences between the forecasters."

Roger - This was a very clear explanation and appropriate analogy for explaining your position. Annan and some others that commented on the last thread didn't get the subtle but important distinction between probability of an event occuring and quantified estimates of our certainty of findings/predictions.

Thanks, but in your example you are not offering a prediction, but rather you are simply providing a definition of a well understood PDF. You have thus provided a tautology, not a prediction.

This same question was asked on the previous thread and I answered as follows:

"On your question about a die, it is a bad analogy.

A die has a known PDF, So if you say, something like "a die has 6 sides with each side having equal probability of turning up" that is by definition a correct statement."

On this post I explained why a die is not a good analogy to climate prediction:

"Please note that I reject James choice of a die roll as an analogy because it begins with a presumption that the probabilities for various future outcomes are in fact known with certainty. While such an assumption is highly complementary to climate modelers, the fact is that the actual probability distributions for the future are unknown. That is why evaluation of probabilistic statements is necessary."

I posit that "There is a 60% chance the AFC will win the 2012 Super Bowl".

If NFC wins, was my statement more or less correct?

We clearly can't answer this question based only on the outcome. For all we know my initial statement was perfectly correct. We need more information in order to evaluate the correctness of my probabilistic statement. Furthermore, lack of additional information does not give one a license to make an evaluation of the statement anyway.

Roger, the issue of the "known" pdf is a complete red herring, but since you are so hung up about it (or perhaps, keen to use it as an excuse to avoid answering an uncomfortable question?), let's consider the case where the die is a handmade cube, a bit rough around the edges. And it has not been previously rolled (at least, neither of us know of such experiments).

Even though the long-run frequency of 6s is unknown, I still use my skill and judgement (maybe from experiments with a number of approximate copies, maybe from some theories) to say a 6 is "unlikely".

Thanks for the comment ... If you predict a 1-5 on a roll of an irregularly shaped cube with 83.3% probability (let's say you give me 5-1 odds on a bet) and it turns up 6 on the roll, then you would unambiguously lose the bet, and you would owe me $5, i.e., your wager would have been incorrect with respect to what happened, because you lost money.

Your judgment about the probabilities of outcomes on the irregular cube may have been sound and you were just unlucky. Or your judgment may have been mistaken. I wouldn't be able to resolve these possibilities for you, but with the one data point I'd lean just a bit towards the latter.

What I would know for 100% certain is that I am $5 richer because your bet turned out wrong.

Is the wager *incorrect* precisely because it lost? Ie, is this the grounds you are using for the word "incorrect" in these posts? If not, please explain what else "incorrect" means.

As for your #6, I don't know what you mean by "more correct". I can hazard a guess in which your answer holds, but I'd rather hear your definition, as it doesn't seem standard terminology to me. Also, I don't know how you can reconcile your use of *degrees of correctness* here, with your assertion that a losing wager is simply "incorrect". Perhaps you meant to say that the losing wager was "less correct", or perhaps the degree of correctness of a statement depends on how much you stake? It doesn't make much sense to me either way.

I'm sure you agree with the quote from Doswell and Brookes, "single forecasts using probability have no clear sense of "right" and "wrong." " so I need some explanation of how "wrong" and "incorrect" differ in your world-view. In my dictionary they are synonymous, in fact their definitions seem amusingly circular:

This gives a rather clear explanation of probabilistic climate predictions:

http://ukclimateprojections.defra.gov.uk/content/view/1989/500/

“It is very important to understand what a probability means in UKCP09. The interpretation of probability generally falls into two broad categories. The first type of probability relates to the expected frequency of occurrence of some outcome, over a large number of independent trials carried out under the same conditions: for example the chance of getting a five (or any other number) when rolling a dice is 1 in 6, that is, a probability of about 17%. This is not the meaning of the probabilities supplied in UKCP09, as there can only be one pathway of future climate. In UKCP09, we use the second type (called Bayesian probability) where probability is a measure of the degree to which a particular level of future climate change is consistent with the information used in the analysis, that is, the evidence. In UKCP09, this information comes from observations and outputs from a number of climate models, all with their associated uncertainties. The methodology which allows us to generate probabilities is based on large numbers (ensembles) of climate model simulations, but adjusted according to how well different simulations fit historical climate observations in order to make them relevant to the real world. The user can give more consideration to climate change outcomes that are more consistent with the evidence, as measured by the probabilities. Hence, Figure 8(a) does not say that the temperature rise will be less than 2.3ºC in 10% of future climates, because there will be only one future climate; rather it says that we are 10% certain (based on data, current understanding and chosen methodology) that the temperature rise will be less than 2.3ºC. One important consequence of the definition of probability used in UKCP09 is that the probabilistic projections are themselves uncertain, because they are dependent on the information used and how the methodology is formulated.”

Which makes me think of Richard Tol’s comment on the previous thread here:

“Roger and James are both right. Roger is right if one assumes that the IPCC predicts events, James is right if one assumes that the IPCC predicts probability density functions.”

Could it be that the argument is really about what the IPCC actually predicts?

This also relates to the difference in the IPCC lingo between likelihood and confidence. Another hornet’s nest. It seems that the quote by Doswell and Brookes conflates the two where it says that “It simply means that one is more certain of the event than the other.” Being (un)certain relates to confidence; not likelihood (in IPCC lingo at least). It’s rather blurry at this point though and perhaps I’m utterly confused here.

Roger:I think your American Football analogy works. As would a reference to your Premier League and Bundesliga predictions. Bookies make odds based upon how the punters are betting (which in part reflects the ability of punters to accurately or inaccurately integrate the available correct and incorrect information about a team or horse, the opposition and the playing conditions). When you bet on the roll a die, you are betting largely on the nature of the die. IPCC predictions appear to be a mysterious combination of the two.There remains, however, the question of whether the punter, i.e., the IPCC, effectively integrates the available valid information, avoids being swayed by what they would like to happen or places bets on dice that have been tampered with in some as yet unknown way.In Dan Gardner's Future Babble he describes (p250-251) a similar process to the one in your paper that was used by an Canadian Intelligence group to assess its own ability to predict complex future events. Hopefully your database of IPCC predictions can be used in a similar fashion.

On your semantic point, I suggest that we use the term "correct" as the IPCC did in its uncertainty guidance for AR4 and subsequently applied in the report.

The AR4 WGI SPM provides some more detail:

"the following terms have been used to indicate the assessed likelihood, using expert judgement, of an outcome or a result: Virtually certain > 99% probability of occurrence, Extremely likely > 95%, Very likely > 90%, Likely > 66%, More likely than not > 50%, Unlikely < 33%, Very unlikely < 10%, Extremely unlikely < 5%"

and

"the following levels of confidence have been used to express expert judgements on the correctness of the underlying science: very high confi dence represents at least a 9 out of 10 chance of being correct; high confidence represents about an 8 out of 10 chance of being correct"

So there are two uses here, one having to do with the correctness of a prediction/projection and the other having to do with the correctness of the "underlying science".

2. This point is important "because there will be only one future climate; rather it says that we are 10% certain"

A statement of uncertainty is thus not simply a statement about the future climate, it is a statement about our confidence in our expectations about the future climate.

The statement from Brooks and Doswell about the meaning of different likelihood estimates for the same single event helps to close the loop:

"It simply means that one is more certain of the event than the other."

Thus, one way to interpret this is that the is very little practical distinction between the IPCC's "likelihood" and "degree of confidence" -- e.g., we have more confidence in a 99% likelihood than a 50% likelihood..

Perhaps this distinction helps to explain my ongoing debate with James Annan -- he seems to think that IPCC statements are saying something about the climate, whereas I see them as saying something about what we think about the climate.

Surely to meaningfully predict a PDF you need to have a very clear and persuasive understanding of what determines the outcome that you are predicting and a set of very precise/unambiguous measures? If you don't, then you are no better than the life-long Bolton Wanderer supporter who started the season betting on Bolton Wanderers to win the English Premier League and who has now bet his house based on the fact that after the first of 38 games they now lead the league.

I would argue that IPCC got the prediction wrong if and only if they computed the probability incorrectly. The statement we are discussing is not the prediction itself, it is that the prediction has a 90% chance of being correct. Such an a priori probability cannot be falsified by one or another outcome. If the NWS says there is a 60% chance it will rain in the next 24 hours, that probability will rapidly swing to 100% at the first raindrop, or 0% at the expiration of the period. But in neither case was the original 60% probability estimate necessarily incorrect, even though the a posteriori probability after a single prediction will never be 60%.

If we compared 100 NWS predictions against outcomes, now, we would obtain a frequency of the particular outcome which we could compare against their probability, and we could then determine a probability that the experimental frequency is consistent with the declared probability, given some declared distribution function.

So if the IPCC made 100 predictions each of which had an assigned probability of >90%, and 30 turned out to be wrong, we could state it is very unlikely (though not certain) their original estimate of probabilities (as a group) was in fact accurate.

if the point of IPCC predictions is that they do not necessarily shed light on the likelihood of a particular event actually occurring in the real world, then the IPCC has very little value to those of us who have to make decisions about how to allocate resources for different issues.

And I think that's the point Roger is making with respect to Annan's argument.

Either the IPCC is actually trying to say something about the real world and getting some stuff wrong (who could hope for a 100% correct track record anyway) or the IPCC is engaged in some kind of forecaster's gamesmanship against others that they keep out of the process.

Since I believe they actually care about informing people about the future of the planet, I'm going to side with Roger on this one and feel that Annan and others should just accept the fact that some predicts are wrong.

I mean, admitting you're wrong can be a release sometimes. Just ask my wife...

If the IPCC only identified findings that they were 100% certain of, we could quickly assess how effective they were about making predictions. Why couldn't we assess their effectiveness when they make predictions that they attach >99%, >95%, >66%, probabilities to? If not, lets ask them to only make predictions that they are 100% certain about. I do not care whether they are right or wrong on a particular prediction, but the degree to which they are more correct than a naive prognosticator.

I'm not so sure. I think the statement that it's >90% probable that AGW will bring about an increase in the frequency of extreme weather events is wrong, not because I know the answer one way or another, but because my reading of the literature says that in fact the evidence is sparse, inconclusive and discordant. Isn't the question whether one can rely on the IPCC to give one a valid estimate of certainty an important one? Maybe more important than the truth of any single prediction?

(And, FWIW, I do a lot of analysis of noisy data, and I think the IPCC in trying to attach quantitative estimates of probability to its predictions is facing a hopeless task. If Roger's point is that it's likely mere gamesmanship, I agree.)

You have an ensemble, the entire collection of predictions. To simplify, if there are 10 predictions that different things will happen with > 60 % probability and 6-10 happen, the set of predictions is validated.

To which Eli (27) should have added, "And that does NOT mean that the predictions were 40% wrong."

RP (6),Being Even Older Than Eli gives the Climate Capitalist the perspective to realize that life is too frickin' short to parse vernacular notions of probability tarted up in the guise of rigorous statistics. On the other hand, "it is very likely" that no one ever got poor serving up confusion to the willingly befuddled.

You should avoid the sports betting and crap shooting and pay a little more attention to poker.

It makes perfect sense for a poker player to say, "Well, I lost a lot of money on that hand, but I played the cards right."

In this case, the poker player correctly assessed the odds and acted rationally to maximize her expectation of winning -- even though her opponent happened to beat the odds and hit his flush.

Now, in this case should we say that the poker player was "incorrect" (to use the word from your title that started this)?

It seems to me that we should say that as long as she recognized the cards, calculated the odds correctly, and bet appropriately, then it would be silly to say that her "finding" was "incorrect". Yes, she lost, but she knows she'll lose, say 40% of the time in that situation; in the long run she's going to come out ahead.

Applying this to climate change, the relevant point is that we need to act rationally based on the probabilities. It might be a mistake to bet that everything that the IPCC says is very likely will happen; because if the IPCC is right, 10% of the time those very likely things won't happen. (It's important to remember that if 100% of the events designated "likely" occur, then -- given enough such events -- the IPCC assessment of likelihood will be proven wrong.)

We need to figure out how maximize our winnings given our best assessment of the odds -- that's what rationality is all about. So it's counterproductive to treat losing money in a poker hand as evidence of an "incorrect finding." Instead we need to concentrate on (1) making sure our probability assessments are as good as they can be, and (2) developing a strategy that maximizes our chances of getting what we want given those odds.

Roger:With reference to your question at #6:You write:I posit that "It is very likely that a team from the AFC will win the 2012 Super Bowl."

Now here are two possible outcomes:

A. An AFC team wins the Super BowlB. A NFC team wins the Super Bowl

Which of the two outcomes would make the forecast more correct, A or B? I would say A. You?

I agree. Indeed if you said: "It is more likely than not that a team from the AFC will win the 2012 Super Bowl." then A would still make the forecast more correct.

The IPCC's ability to make predictions, like any expert or prognosticator or weather bureau, needs to be evaluated as to the accuracy of their forecasts. The Wall Street Journal used to run a competition on stock pickers versus a dartboard. Perhaps it is time to construct a similar benchmark test.

The problem with risk probabilities by themselves is that they don't really help very much. One also needs the range of likely costs associated with the risk and the cost of mitigation or avoidance. The problem isn't so much with the probabilities assigned in the WG-I report, it's the perceived softness of the reports from WG-II and -III.

The poker analogy would be knowing the odds of winning the hand, but not knowing the size of the pot. If you don't know how much you could win, you can't know how much you can bet and come out ahead in the long run.

Your statement has a fatal flaw, that poker player only played the cards right based on what he "knew". Well it turned out that what he "knew" was wrong. Making a "correct" decision based on flawed assumptions does not make your decision right, you are still wrong.

And that is what is being talked around especially by James Annan. He knows perfectly well any models output is only as good as the assumptions it is based on. Wrong assumptions in = Wrong Prediction or what ever you want to term the output.

OK, I'll handle #6. The forecast is not more correct. The forecast is more likely to be correct, although a single positive event won't raise the probability that is correct as much as you might like.

This is my amateur attempt at a Bayesian analysis. Your assertion is 'There is a >90% chance the AFC will win'. In the absence of prior information, we will assign a 50% probability your assertion is correct. If it is correct, the probability of AFC win is between 90% and 100%. In the absence of other information, we will set it at 95%. If it is wrong, the probability is between 0% and 90%. In the absence of other information, we will set it at 45%. We can now construct a 2 X 2 matrix of probabilities

AFC wins NFC winsRight .95*.5 .05*.5Wrong .45*.5 .55*.5

Now, taking the outcome the AFC wins, it appears we have raised the probability you are correct from 50% to .95/(.95+.45) x 100% ~ 68%. Or about 2 in 3.

This exhausts my knowledge, or perhaps my misconception, of Bayesian statistics, and I am open to correction from any expert in the subject who cares to weigh in.

You've already had the standard viewpoint explained to you from Doswell and Brookes: "single forecasts using probability have no clear sense of "right" and "wrong." "

Further, it is of course clear that the IPCC quote only applies the term "correct" to the underlying science, not the probabilistic statement. It is your mistaken application to probabilistic statements that is the issue here.

What we have here is a disagreement, which happens among people from time to time on highly complex issues.

I fully understand that you think that adding a probability expression to an IPCC finding then means that the finding cannot be evaluated as right or wrong based on how things turn out. This makes sense in the context of a die roll where all you have is aleatory uncertainty. I find this nonsensical in the context of football games and climate prediction where you also have a large dose of epistemic uncertainty.

You have dodged my question in #6 above, I see;-)

Let me also repeat myself:

The IPCC says: "It is very likely that hot extremes, heat waves and heavy precipitation events will continue to become more frequent."

If the actual future experienced on planet Earth is one of constant or declining hot extremes, heat waves and heavy precipitation events, then the common sense view will no doubt be that the IPCC got this one wrong. Good luck arguing the other side of that one.

My view is that the IPCC use of "likelihood" and "confidence" are not nearly as clear cut as you are positing, and in practice are both statements about the strength of belief, not about some unverifiable, hypothetical PDF of future events.

As scientists and politicians continue to assert that ongoing climate events show that the IPCC was right in its predictions, I look forward to your dressing them down for making such claims. Somehow I doubt that'll happen;-)

Roger, this is not a complex issue, it's an utterly trivial one. You are merely dodging with sophistry and rhetoric.

I have not dodged your question, I have explained that the concept of (partial) correctness does not apply to the verification of probabilistic statements (except in trivial cases). Doswell and Brooke are quite explicit on the subject.

Of course the probabilities are statements about strength of belief and I have never suggested otherwise. That is another complete red herring that you are throwing up out of nowhere. Please stop trying to throw up a smokescreen.

I'll give you another chance to explain clearly and unambiguously what you mean by "incorrect" as applied to probabilistic statements.

Consider statement S: "We assign probability P to event E"

Can you please state clearly under what conditions you consider S to be "correct" versus "incorrect". From what you have written, you seem to be saying that S is incorrect precisely when event E does not take place. Is this actually your view?

If not, then what does make S correct or incorrect (in your interpretation)?

I am happy to reply, but first, can I respectfully ask that you just leave the attitude behind? You may think the world of yourself and the opposite of me, but comments like "You are merely dodging with sophistry and rhetoric" are just juvenile and add nothing to the exchange. (And in doing so it'll reduce the number of nasty comments directed at you that I have to delete;-) You can do whatever you want at your blog, but here I'd ask that you treat me with the same respect that I treat you with. Fair enough?

Now to your question:

----------Consider statement S: "We assign probability P to event E"

Can you please state clearly under what conditions you consider S to be "correct" versus "incorrect".----------

ANSWER: In general, the larger that P is the more incorrect S is if E does not occur. If P = 1.0 and E does not occur then the statement is 100% incorrect.

If P is 0.999999999999 and E does not occur, I am perfectly comfortable holding the view that S was incorrect (or pick a larger P if you'd like, I am sure that there is some value for P < 1.0 that you'd agree with me on this).

In fact, a conventional threshold for statistical significance would probably be comfortable for most people to make such a judgment (it sure is in hypothesis testing).

In the example above with heat waves, I'd guess that most people would judge a statement incorrect at a 0.9 threshold.

Importantly, I am not referring to dice or poker games where PDFs are perfectly known and statements of probability are merely definitional. I am talking about the real world where probability judgments themselves are uncertain to an unknown degree.

In addition to the context of P, the context of E matters too. If you say that you are 50% certain of an asteroid impact in Boulder tomorrow (a low P), and it does not occur, I would have no problem saying that the S in this case was "incorrect."

I believe that you are taking simplistic textbook cases related to known PDFs like dice and applying them to situations full of (epistemic) uncertainties, uncertainty and ignorance about uncertainties and subjective judgements.

Again, to express a view that the findings of the IPCC cannot be judged right or wrong based on the future is just silly -- we make such judgments all the time in the real world.

Well, that looked like it was going to help a little, but then you threw in the asteroid example which seems clearly inconsistent with the rest. If instead of your version, I make the predicted probability 0.9 that there is no impact, would you say my prediction was correct? This prediction is obviously incorrect (in your interpretation) if there *is* an impact! Or is it just wrong irrespective of the outcome (in which case, you must surely accept that some probabilistic predictions are correct irrespective of their outcome)?

Why is it important that your usage is not applied to dice and poker games? What would happen to your theory if it *was* applied in these situations?

If you predict a 10% chance of an asteroid impact tomorrow, and there is one, I'd say that forecast is correct (and I'd guess so too would the global public). Why? Because the difference between the baseline expectation (~0% chance) and the prediction (10% chance) is huge. I know you don't like it, but this is subjective and the context matters.

Dice and poker are subject only to aleatory uncertainties -- randomness, and context does not matter. Thus, the probabilities are not uncertain, i.e., we don't need experience in order to evaluate them. This is distinctly different than football games or climate prediction where evolving experience shapes our view of the probabilities of different outcomes.

-43-Roger when you said context does not matter in conjunction with poker you are wrong.

There is more than one type of "Poker" and the way you play 5 card Draw is very different then Texas Hold em. In 5 Card Draw the only cards you see is the ones in your hand, in Texas hold em there is up to 5 community cards that are visible to everyone as well as every players 2 hole cards.

So the amount of starting information is different to calculate "probabilities" and as the Texas Holdem hand continues you keep gathering more information on what cards are in play and which are not. Also the way betting works in Texas Holdem is also factors into the "Probabilities" of who has the winning hand. Is the person your playing against a pro with a known "style" or an unknown? That also changes the "probabilities".

5 card draw is different, you only know how many cards other players discard. This means unlike in Texas Holdem you have no idea what the other players are betting on, only on what you have and the fixed odds of getting certain types of hands. In Texas holdem if a guy doesn't bet or raise when the first 3 community cards are shown but bets/raises after seeing a 10 of spades on the fourth card raises the probability that he has a at least one 10 as a hole card.

And to make things even more complicated (Or easier based on POV) Stud poker is even easier to figure the probabilities.http://en.wikipedia.org/wiki/Stud_poker

So the "Context" is very important to poker.

Also do not forget that in Poker you can "Bluff" so randomness is very much in play in poker.

Anyone that tries to play Poker just based on the fixed "Odds" of attaining a certain type of hand should not play for money. Pros call those type of players "Suckers" and "Marks" but the Casinos love you!

How does the "baseline expectation" affect your evaluation of climate predictions? And how is someone supposed to know what the "baseline expectation" is? It must surely vary depending on the audience....so are you evaluating the IPCC from a US-centric viewpoint, or a global one? I'm not at all sure what my baseline expectation should be for a US football game...

I'll try the dice game again: it's a hand-made rough cube that has not been rolled (to our knowledge). I say it is "likely" that the first roll will come up 1-5, and "unlikely" to be 6. We roll once, 6 turns up, and you evaluate my statement as "incorrect" - right?

The catch now is, we can subsequently roll as many times as we want, and accurately determine the long-run frequency of a 6 to be 20% (plus or minus 1%, say). Was my original statement still incorrect?

The reason for presenting these simple idealised cases is not to trick you out but rather to demonstrate the fallacies and inconsistencies in your approach. If your method does not work for simple textbook cases where the answer is known, then what makes you think it is useful for cases where the answer is not known?

"I'm not at all sure what my baseline expectation should be for a US football game"

Try this, if I offer a 30% probability that the Bolton Wanderers win the EPL this year, and they do so, that will have been an excellent prediction. Why? Because I could go to Ladbrokes and get 1000 to 1 odds against their winning. If I put forward the exact same probability on Manchester United and they win the league, I could certainly make a case that my prediction was correct, but most people would see it as being less correct than the first case. The baseline expectation matters.

In meteorology see the Finley case in tornado forecasting. Surely you'd agree that a correct and skillful prediction is more correct than a correct and unskillful prediction?

Dice -- the strangest thing about this conversation is that you think such textbook cases are relevant to climate prediction ;-)

In response to your question -- let's say we are betting. After that first roll you pay me out $1. After we determine the probabilities over many rolls, and we judge that you had the probabilities well calibrated (I assume that you meant 13.3% not 20%;-). Then, do you get that $1 back that you lost on the first roll? No.

You are confusing the (a) calibration of uncertainties with (b) the outcome of a single instance under those uncertainties.

Dice cases are irrelevant because (a) and (b) are independent, in climate forecasting (a) is related to (b).

If you have a complaint about civility here, please make it. However, disagreements about words used in blog post headlines don't make the cut. However, you are welcome to explain why you'd have used a different word.

I think the question you pose as to whether the IPCC, or any other organization (scientific or not), has enough information about what the future holds to make a meaningful prediction is an excellent point. And I agree that most of their predictions will be 'wrong' in the sense that most will not happen as stated in the AR4 or will be stated in the AR5.

And I think that's perfectly acceptable. I have to go to work and try to make a poorly behaving laser do its job. I predict I can do that, but only time will tell. There is a great lunch special at the local burger joint. And they serve good beer...

I think the broader point that Roger and James are arguing about via some basic statistical inferences is whether the IPCC sees itself as a political actor that cannot now turn away from posing these types of predictions from a position of authority. Roger's position seems to be(and it seems you would agree, either of you can feel free to correct me if I'm wrong) that the IPCC should back down from this type of tactic because it makes it very easy for political enemies to use overly confident assertions about the future to undermine the possible political solutions to combating climate change, whether those are mitigation, adaptation or what have you.

I think it's a worthy point to make at this juncture. The political positioning of the IPCC has not led to more/better climate policy and we're still as ignorant as ever, if not more so, about what the climate has in store.

James' point is funny for another, unrelated reason, however. Weather forecasters don't usually like to tell people what they do for a living...

If everything is probabilistic why have the IPCC continue to look at models? Certainly it's not going to move the ball forward for policy purposes. From a policy perspective it doesn't seem things have changed much in 20 years. We've known for a very long time (Arrhenius 1896) that CO2 warms and there maybe feedback's as a result (e.g. inc. water vapor). It seems to me that we need to monitor the climate and test the models and move away from prediction.

The IPCC makes too confident of announcements in their Press Releases and in a lot of there conclusions. Most people do not look at exactly how much underlying information the IPCC admits they don't know. An example of this is in the SPM of WG1 where they provide a chart of forcing components. Of the 9 "natural" forcings only 2 the IPCC says they have high scientific knowledge of, while 4 of them they say they have low knowledge of, 2 of them have Med-Low and 1 Med Knowledge.

So the underlying basic information is made up mostly of assumptions and that is what the models are based off. So saying its very likely the results of your model runs are "Correct" are very dubious at best, since the "odds" that they are not is higher.

Another reason modelers like James Annan play this silly game, is if, the models are proven not to be very good at "Projecting" the future then you might as well chuck WG II and WGIII into the trash. Since the conclusions of WGII start with assuming the projection from WG I are right and WGIII starts by assuming the projections of WG I and II are right, showing the WGI projections are wrong ripples right through them.

Basically the IPCC report when boiled down is this statement: Assuming all cows are spherical, then all steaks are round. From there we got a working group coming up with how and why we need only circular packaging for steaks. Your conclusion is only as good as the assumptions it's based on.

"Your statement has a fatal flaw, that poker player only played the cards right based on what he "knew". Well it turned out that what he "knew" was wrong."

I don't see a flaw in my statement (let alone a fatal one), except perhaps a failure to be completely clear about the suggested scenario.

When I said that the poker player "correctly assessed the odds" even though her opponent "hit his flush," I intended to convey a situation in which she has correctly surmised her opponent's hole cards (I had a hold 'em game in mind) and is therefore only ignorant about the community cards that have not yet been turned up. For definiteness, let's say that on the turn she correctly infers that her opponent is drawing to a flush and correctly calculates that the odds will be in her favor if she goes all in.

When the flush comes on the river, she has made no error. She does lose, but -- contrary to your claim -- she made no false assumptions.

Now one could say (and I take it Roger Pielke Jr would want to say) that she erroneously assumed that the river card wouldn't complete the flush (otherwise she wouldn't have put her money in). But our poker player could quite reasonably respond, "No. I never said that I'd win the pot; I just said that the odds were three-to-two in my favor, and that's a good position to bet."

Note too (and this speaks to some of RP Jr's more recent points) that the rational assessment of probability can (and should) be iterated to account for the likelihood that one or another probability distribution "actually" obtains. (The scare quotes are there because all the probabilities we're interested in here are relative to some set of information. Quantum indeterminacy, for example, is irrelevant.)

So, for example, our poker player might correctly judge that her opponent will bluff in the current circumstances about 25% of the time. If she's right about this likelihood, and if she can develop a strategy for increasing her odds of winning based on it, then it would be foolish to say that she was "incorrect" when she acts on this rational strategy and ends up losing a hand.

Of course I agree with you that if her decisions are based on flawed assumptions (e.g., she thinks her opponent is drawing to a flush, but he actually has a made full house) then she is just wrong. (She thought she had a 60% chance of winning and really she was drawing dead.)

But my point (which point I gather is also James Annan's and William Connolley's) is that losing a single poker hand (or even several) does not indicate that any of her assumptions were wrong.

And it seems to me that Roger Pielke Jr.'s language serves to obscure the question of what evidence would indicate an error in the grounding assumptions. (I found Gerard Harbison's Bayesian sketch in -35- to be a helpful starting point for a discussion of the real issue.)

Thanks ... but be careful, when you say "an error in the grounding assumptions" that is very different than an incorrect forecast. If I fold and you are bluffing, I may have played the odds correctly but if I had called your bluff I'd have won the hand. In that case I may have been going against the odds (but perhaps I didn't trust your poker face) and made a better (more correct decision).

The player was not right nor correct, if they were they wouldn't have lost. See your player did not know the river card would complete the flush now did they? So again they made an assumption based on knowing the theoretical odds the river card would not complete it. Well reality showed their assumption was wrong. In reality the true odds of the river card filling the flush was 100% not the theoretical one. That is why your player lost.

What you, and James Annan,do not want to admit is that the final outcome is all that matters. No matter how logical your assumptions are for projecting the outcome you are either right or wrong in the end, there is no in between.

Remember a logic train built of an assumption is a great way to be spectacularly wrong with great certainty.

To make things simple so that those that are over thinking this, lets discuss a coin flip.

Tomorrow there is going to be a coin flip where you get to pick heads or tails. The outcome of this flip will determine if you get $1 million or not.

Now before the coin flip you can theoretically figure out the probabilities of whether any given flip will be a heads or a tail you find that there is a 49.9999999% for each. Coin flips are not a 50/50 proposition the coins have a 3rd side and have landed on their edge before(I have personally had this happen on a flip).

So going into the flip you have a slightly less than 50% chance of being right.

The next day at the flip you call heads and the coin comes up tails. Now was your call right or wrong? Correct or Incorrect?

Obviously you were wrong however the interesting thing is: What is the probability of getting a heads on that just completed coin toss?

If you say 49.9999% you are wrong. That percentage is only for any future coin toss not for one already completed. For completed ones you already know the outcome and for the one in the example the probability of head is 0% and tails 100%.

As shown reality unlike a model has no reset switch you are either right/correct or wrong/incorrect. So when dealing with GCMs we will know what the real temps and CO2 will be in 10, 20 and 100 years. Those numbers based in reality will be compared to the projections of the GCMs and if the projections do not match then they are incorrect/wrong.

No matter how elegant you theory is, no matter how many simulations runs you make if they do not match reality then you are wrong.

Thanks, and I agree that caution is called for. However, I'm inclined to think that the phrase "incorrect forecast" is horribly vague, and that it’s more likely to be misinterpreted than it is to be understood correctly.

Consider today’s weather forecast: The paper told me there was a 30% chance of precipitation, so I brought my umbrella with me. In fact, it did rain. Now, was that forecast correct or incorrect? It was certainly useful to me, I would have gotten wet if I’d thought there was a negligible chance of rain. Let’s assume, for the sake of argument, that in the past 1,000 times the weather service has told us there was a 30% chance of rain, it actually rained 300 times. Would you want to say some number of those forecasts were “incorrect”? Was today’s forecast incorrect (or even “more incorrect than correct”)?

It seems to me that when most people hear that someone made an “incorrect forecast” they are saying that some mistake was made. Either there was some information that could have been taken into account that wasn’t, or some calculational error was made, or some other cognitive error was committed. The implication of an “incorrect forecast” is that some more correct forecast could and should have been made.

Of course, one might mean something else by the term. Obviously we do need some way to describe whether a particular event under discussion occurs or not. And you are correct that typically when we say, “Event X is likely to occur,” we often mean something like “Even though I’m not completely certain, I am assuming that X will happen, and I’m willing to act on that assumption (and you should too).”

The problem with this reading in the current context is that it falls apart when one is making more fine-grained probability statements. Let’s suppose X happens. Which of the following forecasts are “correct” and which are “incorrect” (or even “more or less correct”)? “There is a 90% chance of X occurring.” “ “There is a 70% probability of X occurring.” ”. . . 50% . . . “ ”. . . 30% . . . “ ”. . . 10% . . . “

When I offer probabilities like this, there is no implicit commitment to the claim that X will happen. Of course, the forecast is about X, and X either happens or it doesn’t. But it’s misleading to say that the forecast is “incorrect” when X doesn’t occur.

From your above comments, I gather that you’re inclined to say that we have to rely on context to decide whether some event counts as evidence in favor of a forecast or against it. It seems to me that what you have in mind here is some sort of Bayesian analysis. We have to consider the prior probabilities in order to decide whether some event confirms or disconfirms a hypothesis. If I think there’s a negligible chance of a meteor hitting Boulder, and you say there’s a 10% chance, then if the meteor strikes, we’ll say your forecast was correct (even though you said the meteor is “unlikely” to hit). But if this is right, then you should put forward a Bayesian argument; it’s a red herring to appeal to the common language of football bets.

Coming back to the real world and the IPCC, it seems that the relevant point is just that we should make sure that people understand what the IPCC is saying and what they aren’t saying. I haven’t read your paper, but I gather that you’re calculating how many events discussed by the IPCC could fail to occur consistent with the report itself. Is that correct?

As I read your abstract, then, you’re saying, “If the IPCC is correct in its assessment, then up to 28% of the events that it discusses will not occur.” Now, this is an important point to communicate in order for us to act appropriately in the face of various degrees of certainty and uncertainty. But it seems to me that this is not well communicated by saying “28% of the findings of the IPCC are incorrect.”

I know it’s hard to imagine, but there could be people who would take this latter statement to mean that the IPCC has made a large number of errors. However, you are not pointing to any errors in IPCC assessment; your claims are perfectly compatible with the IPCC reports’ having captured precisely the actual probability distributions of all the events they discuss given all the knowledge that we human beings could possibly gather. (Of course, no one is making this claim, but such a possibility is in no way undermined by your treatment of the so-called “incorrect findings” of the IPCC).

Two final points:

First, your not trusting my poker face is an example of bringing more information into your assessment of the situation. The odds change based on that information. So, for example, suppose you fold because you think the odds are against you. We can imagine a pro poker player telling you, “No, you didn’t play that hand right. You should have noticed how he was sitting in his chair; that made it pretty likely that he was bluffing.” There you missed some information that was available, so your assessment of the relevant odds was off.

Of course, what we’re doing in both poker and in science is trying to get as much information as we can to all us to assign probabilities that will get us what we want.

Second, let me reiterate that your OP is misleading in suggesting that generating accurate assessments of probability is a mere “academic exercise” that is irrelevant to “real world commitments.” Of course action does require that we place bets on events actually occurring, but we cannot reasonably decide which actions to take unless we first take stock of the probabilities. The probability distributions have (or should have) important real-world consequences. They should shape our policy.

Thus we should be very interested in whether our assessments of those probabilities are correct or not. But you only undermine this effort if you claim that a completely accurate assessment of these probabilities implies that “28% of the findings are incorrect.”

As I see it there are two main points of disagreement here. One is a (perhaps trivial) difference about how to use words, the other is (I think) of practical importance.

First the trivial issue. Does betting on something mean that we’re “assuming” that it’s going to happen? I suppose there’s a way of using the word such that this is analytically true; after all, one might say that if I were assuming I’d lose, then I wouldn’t place the bet.

But on the whole, it seems to me that this isn’t what we usually mean when we say that someone “assumes” that X will happen. Consider my taking my umbrella today because there was a 30% chance of rain. Did I assume that it would rain? I’d be more inclined to say that I thought there was a significant chance of rain and that it was worth taking an umbrella. I didn’t assume anything except that the weather forecast was roughly accurate.

Or let’s suppose that our poker player is in a 10-person hand where everyone before her has bet (to keep it simple, let’s say that they’ve gone all in, so there are no further betting possibilities to consider). She calculates that she only has a 20% chance of hitting the winning hand, but since she’s getting nine-to-one on her money, she goes ahead and calls the bet.

Now, is she “assuming” that she’ll win? While I’ve admitted that we could understand the words this way, it seems much more reasonable to say that to say that she’s making no such assumption. If asked, she’s say it’s likely that she’ll lose; the odds are four to one against her, so it would be more natural to say that she’s assuming that she’ll lose.

What she is committed to is that there’s a 20% chance of her winning, and if she does win, then she’ll get nine times the money she bet. She committed to the claim that it’s a reasonable bet for her to make, and that if she follows this strategy in the long run, she’ll come out ahead.

This brings us to the more substantial point of disagreement. You say that “the final outcome is all that matters” and that I and James Annan fail to recognize this.

It is true that whether we win or lose depends on the actual outcomes. But what your gloss misses is the fact that the final outcome is irrelevant to our deciding – beforehand – what we should do.

Let’s change our poker example again to a case where the opponent gets incredibly lucky and wins the pot (he “sucks out”). The opponent goes all in on the flop with a backdoor straight flush draw, and our player calls with four of kind. Then the opponent beats incredible odds and makes the straight flush with the turn and river cards, and wins the pot.

Now, is it the case here that “the final outcome is all that matters”? It’s obviously true that the opponent wins the pot, but it’s also true that he reasoned very poorly – his action was ignorant and/or irrational.

The important point, though, is that if we’re trying to figure out which player to emulate – whose advice to listen to – we should recognize that our poker player did the right thing and the opponent did the wrong thing. The fact that he won the money doesn’t mean that he acted rationally – for the simple fact that he did not (and could not) know that he would win.

Given that we cannot know the future, we should be very very interested in figuring out what procedures to follow given our ignorance. Yes, there is a sense in which the poker player was “incorrect” in that she lost the money. But this sense is completely misses the most important point, which is that her strategy was correct and her assessment of the odds was as accurate as it could possibly be.

This is all that matters from a practical, decision-making point of view.

And it seems to me that you obscure this very important point if you insist on saying that the player’s judgment is often “incorrect” when she loses precisely the number of hands that she expects to lose.

One final point on this: “What is the probability of getting a heads on that just completed coin toss?. . . For completed ones you already know the outcome and for the one in the example the probability of head is 0% and tails 100%.”

The probabilities we’re dealing with here are all epistemic; they depend essentially on the information that is available. (There may be ontologically fundamental uncertainties in quantum mechanics, but that’s irrelevant to the issue at hand.)

The probability of getting heads on a fair toss (landing on the edge means it doesn’t count as a toss) is 50% whether the toss is in the future or the past. Of course, given the information about the outcome, that probability changes to 0 or 1. Likewise, the odds of my getting a particular card on the river will depend on the information that is available. The commentator on TV may calculate a percentage that takes account of all the player’s hole cards. The player herself, of course, doesn’t have that information and will come up with a slightly different number (which will change if she’s able to infer someone else’s cards). If someone has all the relevant information (i.e., if he looks in the deck) then he can be certain of the outcome.

But again, the important issue is how we deal with uncertainty. Because that’s the situation we’re in. Looking back with the certainty of hindsight is completely useless, unless that can be translated into looking forward and making the best decisions possible given our ignorance.

Plainly the question about a probabilistic forecast is not whether the event judged more likely is the one that happens, but rather whether it was a reasonable estimate of the likelihood. And for multiple such forecasts, we want to know about their reliability - that is, the quality of the forecaster.

For repeated events we know how to evaluate this. That applies to the football bet as well. If I only have one case to observe, and two forecasters who disagreed which team was more likely to win, I would after the first game be inclined by "Bayesianism" (I will explain another time why I use that in quotes) to think that the one who "got it right" was the better forecaster. But I would be a complete ninny to concluded with high confidence that he was the better forecaster; indeed if the other forecaster got the next one "right", I'd be right back at considering it a toss.

Some of the other questions in this thread deserve answer as well. Clearly the question of what the IPCC thinks its doing is important, and whether they have evidence to back up their levels of confidence on particular findings is actually an important concern. I think in fact that the IPCC does in fact believe that its statements should be guides for action - what we should act as if we believe the probability to be. With that AND an estimate of the consequences you can make a well justified decision.

I predict that it is likely that at least 60% of people who read this thread will conclude that James is right. I am disappointed that I can't be more confident in that.

In my statistics class, I ask my students "what is the probability that when I flip this coin, it will land heads." (And yes, assume it's a fair coin.)

Of course they answer 50% or some equivalent.

Then I flip it and hold it covered on the back of my hand. Then I ask, "What is the probability that this coin is heads." There's usually some puzzlement. Someone says "50%". And I say, "but either it's heads or it isn't. How can there be a fifty percent chance it's heads?"

Then I ask "what odds would you give me if I bet that it's not heads?" Eventually those who know what betting odds mean understand the point. Even when something has happened (like, the deck has been shuffled and the card that will be dealt could be known under some epistemic conditions DIFFERENT FROM OURS) we have to ACT as if the odds are, well, what we think they are.

Poor analogies and debate about the meaning of a word: All as per usual in climate debates. And all, as usual, irrelevant because the initial assumption is provably wrong; something can't continue to increase if it isn't increasing now. No need to wait any time at all to declare the prediction 100 per cent false. Assumption led conclusions are the bane of this field of study.

Here is some more fun, maybe worth a blog post eventually. From the IPCC:

"The uncertainty guidance provided for the Fourth Assessment Report draws, for the first time, a careful distinction between levels of confidence in scientific understanding and the likelihoods of specific results. This allows authors to express, as well as high confidence that an event is about as likely as not (e.g., a tossed coin coming up heads). Confidence and likelihood as used here are distinct concepts but are often linked in practice."http://www.ipcc.ch/publications_and_data/ar4/wg1/en/ch1s1-6.html

Now this statement within that passage seems completely wrong to me:

"high confidence that an event is extremely unlikely (e.g., rolling a dice twice and getting a six both times)"

In numbers this means that the IPCC is about 80% confident that the probability of rolling two sixes is less than 5%.

IMHO, [S1] was incorrect before [S2] and therefore remains incorrect after [S2]. The [S1 ]statement needs to be qualified otherwise it is an incomplete and, therefore, incorrect statement. If the student says [S3] "Since I do not know the nature of the coin or other factors influencing the probability of heads appearing, I do not know the probability of heads", then the student is correct. I do not think that putting "if" statements as conditionals will create a correct answer except for what amounts to a true explicit statement as to the probability of a head, i.e., you give the answer at the same time as asking the question. In these terms, Statement [S2] is also incomplete since it assumes that the instructor is telling the truth. The equivalent correct answer is [S4] "Since I do not know whether you are telling the truth or not, I do not know the probability of a head". If we do not know, we do not know.It all feels a bit solipsist, but that maybe the price of dealing with the forecasting of future events.

To continue the football example, asking me the odds of Bolton Wanderers winning the Premier League is an impossible question to answer "correctly" except by saying something that is equivalent to [S3] above. However, we know that people act as if they have an answer to this question by placing bets but their actions do not change the accuracy of the "I do not know" answer.

If you insist on torturing the text with your strained parsing, could you please have the courtesy to learn how to copy and paste accurately? Unlike the fractured version above, the actual text is the following:

"The uncertainty guidance provided for the Fourth Assessment Report draws, for the first time, a careful distinction between levels of confidence in scientific understanding and the likelihoods of specific results. This allows authors to express high confidence that an event is extremely unlikely (e.g., rolling a dice twice and getting a six both times), as well as high confidence that an event is about as likely as not (e.g., a tossed coin coming up heads). Confidence and likelihood as used here are distinct concepts but are often linked in practice."

"As there is no direct way of assessing the predictive skill of today’s climate models, researchers instead tend to talk in terms of uncertainty in the models and predictions. In contrast with skill, the concept of uncertainty in climate science has neither such an unambiguous definition nor a clear mathematical framework"http://www.jamstec.go.jp/frsgc/research/d3/jules/2010%20Hargreaves%20WIREs%20Clim%20Chang.pdf

From such a perspective, please recognize that you will find people who choose to discuss the topic in various ways from various disciplinary perspectives. This is call intellectual exchange. Climate scientists who express infallibility in an effort to close down such discussions are not doubt 100% incorrect ;-) (That last bit was a joke)