Tuesday, April 22, 2014

Perhaps this will provide a clearer demonstration of the limitations of Nic's method. In his post, he conveniently provided a simple worked example, which in his view demonstrates how well his method works. A big advantage of using his example is that hopefully no-one can argue I've misapplied his method :-) This is Figure 2 from his post:

This example is based on carbon-14 dating, about which I know very little, but hopefully enough to explain what is going on. The x-axis in the above is real age with 0 corresponding to the "present day", which I think is generally defined as 1950 (so papers don't need to be continually reparsed as time passes). The y-axis is "carbon age" which is basically a measure of the C14 content of something under investigation, typically something organic (plant or animal). The basic idea is that the plant or aminal took up C14 as it grew, but this C14 slowly decays so the proportion in the sample declines after death according to the C14 half-life. So in principle you would think that the age (at death) can be determined directly from measurement of the proportion of carbon that is C14. However, the proportion of C14 in the original organism depends on the ambient concentration of C14 which has varied significantly in the past (it's created by cosmic rays and the like), so there's quite a complicated calibration curve. The black line in the above is a simplified and stylised version of what a curve could look like (Nic's post also has a real calibration curve, but this example is clearer to work with).

So in the example above, the red gaussian represents a measurement of radiocarbon which represents a "carbon age" of about 1000y, with some uncertainty. This is mapped via the calibration curve into a real age distribution on the x-axis, and Nic has provided two worked examples using a uniform prior and his favoured Jefferies prior.

As some of you may recall, I lived in Japan until recently. Quite by chance, my home town of Kamakura was the capital of Japan for a brief period roughly 7-800y ago. Lots of temples date from that time, and there are numerous wooden artefacts which are well-dated to the Kamakura Era (let's assume, carved out of conteporaneous wood, though of course wood is generally a bit older than the date of the tree felling). Let's see what happens when we try to carbon-date some of these artefacts using Nic's method.

Well, one thing that Nic's method will say with certainty is "this is not a Kamakura-era artefact"! The example above is a plausible outcome, with the carbon age of 1000y covering the entire Kamakura era. Nic's posterior (green solid curve) is flatlining along the axis over the range 650-900y, meaning zero probability for this whole range. The obvious reason for this is that his prior (dashed line) is also flatlining here, making it essentially impossible for any evidence, no matter how strong, to overturn the prior presumption that the age is not in this range.

It is important to recognise that the problem here is not with the actual measurement itself. In fact the measurement shown in the figure indicates very high likelihood (in the Bayesian sense) of the Kamakura era. The problem is entirely in Nic's prior, which ruled out this time interval even before the measurement was made - just because he knew that a measurement of carbon age was going to be made!

Nic uses the emotionally appealing terminology of "objective probability" for this method. I don't blame him for this (he didn't invent it) but I do wonder whether many people have been seduced by the language without understanding what it actually does. You can see Richard Tol insisting that the Jefferies prior is "truly uninformative" in a comment on my previous post, for example. Well, that might be true, but only if you define "uninformative" in a technical sense not equivalent to common english usage. If you then use it in public, including among scientists who are not well versed in this stuff, then people are going to get badly misled. Frame and Allen went down this rabbit hole a few years ago, I'm not sure if they ever came out. It seems to work for many as an anchoring point, when you discuss in detail, they acknowledge that yes, it's not really "uninformative" or "ignorant" really, but then they quickly revert back to this usage, and the caveats somehow get lost.

I propose that it would be better to use the term "automatic" rather than "objective". What Nic is presenting is an automatic way of generating probabilities, though it remains questionable (to put it mildly) whether they are of any value. Nic's method insists that no trace remains of the Kamakura era, and I don't see any point in a probabilistic method that generates such obvious nonsense.

Long time no post, but I've been thinking recently about climate sensitivity (about which more soon) and was provoked into writing something by this post, in which Nic Lewis sings the praises of so-called "objective Bayesian" methods.

Firstly, I'd like to acknowledge that Nic has made a significant contribution to research on climate sensitivity, both through identifying a number of errors in the work of others (eg here, here and most recently here) and through his own contributions in the literature and elsewhere. Nevertheless, I think that what he writes about so-called "objective" priors and Bayesian methods is deeply misleading. No prior can encapsulate no knowledge, and underneath the use of these bold claims there is always a much more mealy-mouthed explanation in terms of a prior having "minimal" influence, and then you need to have a look at what "minimal" really means, and so on. Well, such a prior may or may not be a good thing, but it is certainly not what I understand "no information" to mean. I suggest that "automatic" is a less emotive term than "objective" and would be less likely to mislead people as to what is really going on. Nic is suggesting ways of automatically choosing a prior, which may or may not have useful properties.

[As a somewhat unrelated aside, it seems strange to me that the authors of the corrigendum here concerning a detail of the method, do not also correct their erroneous claims concerning "ignorant" priors. It's one thing to let errors lie in earlier work - no-one goes back and corrects minor details routinely - but it is unfortunate that when actually writing a correction about something they state does not substantially affect their results, they didn't take the opportunity to also correct a horrible error that has seriously mislead much of the climate science community and which continues to undermine much work in this area. I'm left with the uncomfortable conclusion that they still don't accept that this aspect of the work was actually in error, despite my paper which they are apparently trying to ignore rather than respond to. But I'm digressing.]

All this stuff about "objective priors" is just rhetoric - the term simply does not mean what a lay-person might expect (including a climate scientist not well-versed in statistical methodology). The posterior P(S|O) is equal to to the (normalised) product of prior and likelihood - it makes no more sense to speak of a prior not influencing the posterior, as it does to talk of the width of a rectangle not influencing its area (= width x height). Attempts to get round this by then footnoting a vaguer "minimal effect, relative to the data" are just shifting the pea around under the thimble.

In his blog post, Nic also extolls the virtue of probabilistic coverage as a way of evaluating methods. This initially sounds very attractive - the idea being that your 95% intervals should include reality, 95% of the time (and similarly for other intervals). There is however a devil in the detail here, because such a probabilistic evaluation implies some sort of (infinitely) repeated sampling, and it's critical to consider what is being sampled, and how. If you consider only a perfect repetition in which both the unknown parameter(s) and the uncertain observational error(s) take precisely the same values, then any deterministic algorithm will return the same answer, so the coverage in this case will be either 100% or 0%! Instead of this, Nic considers repetition in which the parameter is fixed and the uncertain observations are repeated. Perfect coverage in this case sounds attractive, but it's trivial to think of examples where it is simply wrong, as I'll now present.

Let's assume Alice picks a parameter S (we'll consider her sampling distribution in a minute) and conceals it from Bob. Alice also samples an "error" e from the simple Gaussian N(0,1). Alice provides the sum O=S+e to Bob, who knows the sampling distribution for e. What should Bob infer about S? Frequentists have a simple answer that does not depend on any prior belief about S - their 95% confidence interval will be (S-2e,S+2e) (yes I'm approximating negligibly throughout the post). This has probabilistically perfect coverage if S is held fixed and e is repeatedly sampled. Note that even this approach, which basically every scientist and statistician in the world will agree is the correct answer to the situation as stated, does not have perfect coverage if instead e is held fixed and S is repeatedly sampled! In this case, coverage will be 100% or 0%, regardless of the sampling distribution of S. But never mind about that.

As for Bayesians, well they need a prior on S. One obvious choice is a uniform prior and this will basically give the same answer as the frequentist approach. But now let's consider the case that Alice picks S from the standard Normal N(0,1), and tells Bob that she is doing so. The frequentist interval still works here (i.e., ignoring this prior information about S), but Bayesian Bob can do "better", in the sense of generating a shorter interval. Using the prior N(0,1) - which I assert is the only prior anyone could reasonably use - his Bayesian posterior estimate for S is the Normal N(O/2,0.7), giving a 95% probability interval of (O/2-1.4,O/2+1.4). It is easy to see that for a fixed S, and repeated observational errors e, Bob will systematically shrink his central estimates towards the prior mean 0, relative to the true value of S. Let's say S=2, then (over a set of repeated observations) Bob's posterior estimates will be centred on 1 (since the mean of all the samples of e is 0) and far more than 5% of his 95% intervals (including the full 27% of cases where e is more negative than -0.6) will fail to include the true value of S. Conversely, if S=0, then far too many of Bob's 95% intervals will include S. In particular, all cases where e lies in (-2.8,2.8) - which is about 99.5% of them - will generate posteriors that include 0. So coverage - or probability matching, as Nic calls it - varies from far too generous, when S is close to 0, to far too rare, for extreme values of S.

I don't think that any rational Bayesian could possibly disagree with Bob's analysis here. I challenge Nic to present any other approach, based on "objective" priors or anything else, and defend it as a plausible alternative to the above. Or else, I hope he will accept that probability matching is simply not (always) a valid measure of performance. These Bayesian intervals are unambiguously and indisputably the correct answer in the situation as described, and yet they do not provide the correct coverage conditional on a fixed value for S

Just to be absolutely clear in summarising this - I believe Bayesian Bob is providing the only acceptable answer given the information as provided in this situation. No rational person could support a different belief about S, and therefore any alternative algorithm or answer is simply wrong. Bob's method does not provide matching probabilities, for a fixed S and repeated observations. Nothing in this paragraph is open to debate.

Therefore, I conclude that matching probabilities (in this sense, i.e. repeated sampling of obs for a fixed parameter) is not an appropriate test or desirable condition in general. There may be cases where it's a good thing, but this would have to be argued for explicitly.

Tuesday, April 08, 2014

There is a weird divide between paleoclimate modellers and other model developers, that I have never understood. My impression is that the paleoclimate group at JAMSTEC was one of the larger and more integrated, but even there paleoclimate runs were always completed, without much interest from others, using the "spare CPU" left after the main CMIP runs were completed. However, in 2006, when we published a paper looking at different constraints on climate sensitivity, the Last Glacial Maximum seemed like it should provide a strong constraint on the high end. Since then I have always been of the opinion that the LGM should be a mandatory CMIP run. If a model has not been shown to reasonably reproduce a climate with a different atmospheric CO2 level (among other forcings), then of what use is it for projecting future climate change?

But no one in any position of power seems to share this opinion, and I was getting used to boring myself with always asking people to do paleoclimate runs if they wanted to increase confidence in their ensembles, or to check the importance of some correlation between present and future climate that they had found in the ensemble. For the first time in CMIP5, the paleoclimate runs were officially included (although as non-essential runs). Although not everyone has done the runs yet, at least this time they are being completed with the same model versions as those used for the other CMIP models. Because of this, I agreed, a year ago, at the last PMIP meeting, to start the Past to Future working group to aid work into using paleoclimate information to directly improve predictions. For this project to remain well focussed, I think it needs some input from the futurists (those who want to know about the future).

I was delighted, therefore, when I started getting some emails from CFMIP, the Cloudy Futurists MIP. I wrote some sufficiently stupid things in my replies to them, that I gained an invitation to the WCRP Grand Challenge workshop on Clouds, Circulation and Climate Sensitivity, previously blogged as the cloudy lock-in. There is certainly no doubt that finding out something useful about clouds from paleo is a very grand challenge (although Sandy Harrison did make a brave attempt at promoting a cloudiness proxy). This is because, compared to information gained today, paleoclimate information is much more sparse, more uncertain, and more indirect (i.e. you are usually measuring things like isotope ratios or amounts of pollen). Despite this gulf, and my complete ignorance about clouds or atmospheric circulation, it was the most inspiring meeting I have attended for several years. They came up with a focus for the Grand Challenge, which is changing patterns in future climate, with four sub-questions, related to particular changes. There also seems to be quite a strong impetus to increase connections between CFMIP and PMIP, which should be a thoroughly good thing. Maybe the barrier between the paleo and future scientists can be dissolved.

As the four questions were being presented on the last day I realised that I was not, actually, in a room of futurists, but of atmospheric physicists. Rather than asking questions about how the climate will change, the questions were written in terms of whether changes in X are important for climate change, where X is a thing of interest to atmospheric scientists (storm tracks, ITCZ, convective aggregation and convective mixing respectively). Using paleoclimate data to help answer these questions is what they call a "cross-cutting" theme, which means they hope it will help all of them. But this could be confusing to people who know about paleoclimate, because, although paleoclimate climate changes may be partly caused by clouds and circulation changes, they are much more likely to be whole "earth system" phenomena (including vegetation feedbacks, ocean circulation and carbon cycle changes, sea-level changes etc.). So, now I am thinking that it is me and James who are in fact the futurists, who actually want to use all the information we have available to predict future climate. I hope we are not the only ones!

I do not think I am very good at taking photos of people. I think this is partly because I am a bit strange and so people tend to look anguished when they look at me. However, this isn’t the pickturs blog, and some people seemed to be begging to be blogged… so here goes…

A breakout group – my least favourite thing at meetings

Working in splendid surroundings.

Official fun was a walk to the lake during which we were supposed to discuss science

Official fun – walk to lake – many took the instruction to talk about science very seriously – all the real plans were hatched that afternoon.

Collaboration continued well into the night

Isaac trying to make it on to the blog.

last breakout group, al fresco – it was actually too sunny!

caffeine fueled – with people from Japan, Australia and California there was plenty of jetlag to go round

Wednesday, April 02, 2014

Regular readers will have noticed that I follow the goings-on at EGU journals with some interest. So in that vein I'd like to point out there have been some recent changes at GMD. Perhaps most notably, our Dear Leader Dan Lunt has stepped down from the position of Chief Executive Editor, which he has held since the journal's inception about 6 years ago. Jules is the incoming chief. (Chief doesn't actually have any extra powers that I'm aware of, but is expected and trusted to take the lead on many decisions with or without discussion.) Bob Marsh has been added to the list of execs - this happened last year actually - having been a topical editor for some time. And...drum roll...I am no longer on the list of execs, though I'll remain a topical editor. All the execs feel that the journal (indeed all EGU journals) should be regarded as community assets rather than personal fiefdoms. So although it made sense to stick with a core team who shared a clear vision though the early years, we realised some time ago that it was time to bring in new ideas and let things evolve a bit. This feeling has been informally formalised though a rough plan to swap execs off the board on a biannualbiennial every two year basis - Bob's induction was the start of this, staggered with my resignation to allow a bit of settling in time - and also rotate the chief exec position among the board members. I'm happy to leave the journal in the capable hands of the new board.

Incidentally, it is rumoured that the new Impact Factor for the journal will be approaching 6, up from 5 last year. That should put us even closer to the top of the list for journals in the geosciences! I'm sure that GMD, and all the other EGU journals, will continue to go from strength to strength as the open access movement continues to gain momentum.