Marcott’s Zonal Reconstructions

I’m going to do a detailed post on my diagnosis of the Marcott uptick, but before I do so, I want to comment on the reconstructions for NH and SH extratropics, neither of which have attracted sufficient notice though both are very remarkable. In a substantive sense, because orbital changes have different effects on NH and SH, the difference between NHX and SHX proxies is a source of substantive interest and is what ought to have been reported on.

In the running text, Marcott et al had emphasized that their reconstruction was “indistinguishable within uncertainty”. This was illustrated in their Figure 1E, which showed the Mann08 GLB and NH reconstructions (which of several versions is not denoted in the caption), Moberg 2005 (NH) and the Wahl-Ammann version of MBH98 (NH).

Our global temperature reconstruction for the past 1500 years is indistinguishable within uncertainty from the Mann et al. (2) reconstruction… This similarity confirms that published temperature reconstructions of the past two millennia capture long-term variability, despite their short time span (3, 12, 13).

The zonal (NHX, SHX and tropics) reconstructions were illustrated in Marcott Figure 2I,J,K, but Marcott et al conspicuously did not compare their zonal reconstructions with previous NH and SH reconstructions, instead comparing them to proxies often considered to be precipitation proxies. In their text, they stated:

Their decision not to compare their reconstruction to popular hemispherical versions was a curious one and I’ve remedied this apparent oversight below.

Northern Hemisphere
First here is a comparison of the Marcott NH Extratropic reconstruction with the Moberg 2005 reconstruction (which Marcott et al showed in connection with their GLB reconstruction). I suspect that some readers will not find these two series to be “remarkably similar” (TM- climate science). (For greater certainty, Marcott et al did not say that these two series were “remarkably similar”, he said that the Moberg NH reconstruction was similar to his GLB reconstruction, but omitted any direct comparison with his NHX reconstructions.

Next here is a comparison of the Marcott NHX reconstruction with two Mann reconstructions: the Mann 2008 NH EIV iHAD variation (which in effect splices temperature from 1850 on) and the MBH NHX version (extracted from Briffa et al 2001 SI). I am not persuaded that Marcott’s NHX reconstruction is “indistinguishable within uncertainty” from the corresponding NH reconstructions, making one wonder why the GLB reconstruction is are supposedly “indistinguighable within uncertainty”.

Southern Hemisphere
In the figure below, I’ve first compared the Marcott et al SHX reconstruction to the SH reconstruction of Mann and Jones 2003. After readers recover from their awe at the remarkable similarity (TM-climate science) between the two reconstructions, they will note that the Marcott reconstruction began a pronounced and dramatic increase in the 18th century and reached its maximum in AD1900, which, in their Northern Hemisphere reconstruction, was the coldest year since the LGM. Marcott’s estimated AD1900 temperatures in SHX were much higher than even recent temperatures (1990-2010 HadCRU SH average denoted by red dot.)

Figure 3. Marcott SHX vs Mann and Jones 2003 SH.

Finally, here is a comparison of Mann 2008 SH EIV-CRU and CPS-CRU versus Marcott. There are important inconsistencies between Mannian EIV and CPS results, improving the odds of matching at least one of them. (Why anyone would regard Mannian EIV as an “improvement” on anything is a continuing source of puzzlement to me, but that is another day’s story.) Even doubling the odds of a match, I find the correspondence between the Mannian and Marcottian version to be underwhelming, though perhaps not so underwhelming as to preclude them being “remarkably similar TM-climate science”.

I will return to details of the uptick later. For now, I’ll close with this graphic showing the NHX and SHX reconstructions against the GLB reconstruction shown as the Marcott base. While one expects a difference between NHX and SHX in the Holocene, the remarkable difference between NHX and SHX not just in the 20th century, but in the 19th century is a source of considerable interest. According to Marcott, NHX temperatures increased by 1.9 deg C between 1920 and 1940, a surprising result even for the most zealous activists. But for the rest of us, given the apparent resiliency of our species to this fantastic increase over a mere 20 years, it surely must provide a small measure of hope for resiliency in the future.

Figure ^. SHX (blue) and NHX (red) shown against global. Third zone (tropics) not shown only to emphasize contrast. Full spaghetti would include a third zone.

66 Comments

According to Marcott, NHX temperatures increased by 1.9 deg C between 1920 and 1940, a surprising result even for the most zealous activists. But for the rest of us, given the apparent resiliency of our species to this fantastic increase over a mere 20 years, it surely must provide a small measure of hope for resiliency in the future.

But I don’t know how many more of these irony-laden exposes I can survive. A man has limits.

fyi, the thesis advisor, mentor, and also co-author for both Marcott (Marcott et al. 2013, Science) and Shakun (Shakun et al., 2012, Nature) is Peter Clark at Oregon State…. who just happens to be a CLA (one of only 8 CLAs from the USA) for the IPCC’s AR5:

So even if that sea level chapter is not in the target zone for these two papers the IPCC’s AR5 process is certainly a potential topic of discussion for co-authors who seemed to get the Marcott et al. (2013) paper in just under the wire for consideration.

Looks like “Shake and Bake” to me? An apparent uncritical compiling of proxies and no systematic comparisons to other reconstructions. Was there no one at OSU to ask the tough questions? Were there no reviewers to raise these issues earlier>

SteveM, like I said in the previous thread the hope is that when you average out these proxies all the bad stuff is removed and as any reconstructionist worth her salt would note here it is not the individual proxies (and in this case the hemispheres) but rather that final averaged result that is of paramount importance. Sensitivity tests be hanged.

Perhaps Nick Stokes will be along shortly to explain all this better than I.

NiV: I also noticed the discrepancy with the yellow grand aggregate line. I think it is because SM did not show the tropic zone. The y-axis is a form of anomaly. This suggests that the tropic zone will show to be relatively cooler. However there may be also some kind of weighting going on that is not based on area.

You don’t understand, the whole point of academic research is to have it published so you can list it in your CV. The greater the number of publications the better the chance of getting tenure. Surely you’re not naive enough to think that anyone actually wants to read this stuff. It’s getting published that counts. The only thing that counts. As far as actually reading, analyzing, understanding and trying to reproduce the results, are you kidding? Do that would be doing real science! If Steve does the analysis and stats that should have been done all along then he should be attacked. Any good climate “scientist” knows that. Just ask Mann for advice on how to become famous for acting petulant and thin skinned.

After reading Steve’s analysis do you honestly believe he’s going to just label an entire paper “crap”? That kind of approach is not his style; indeed it is more closely associated with his detractors.

I showed the NHX, SHX, and tropical proxies themselves … you’ve done the important step, showing the differences between the extratropical proxy stacks and the other “indistinguishable within uncertainty” proxy reconstructions.

Very, very well done. My hat is off to you. And I’m still working on becoming more Canadian, but I’m not sure it’s working …

Perhaps “Our global temperature reconstruction for the past 1500 years is indistinguishable within uncertainty…” should really read “Our global temperature reconstruction for the past 1500 years is indistinguishable from uncertainty.”

If the proxy reconstructions are not robust during the calibration period, how is it explained that they are robust for the previous 10K-years?
Steve: Their answer would be: there are more series in the mid-Holocene than in the 20th century. But it’s a fair question.

Thanks Nick and Steve. According to the Marcott Thesis, he used the calibrations from in the original proxy sources. It seems that there are “standard” calibration curves for the various proxy types (mg/ca, alkenone, TEX86, etc.) so individual proxies may not have been calibrated. This makes sense for the proxies that don’t extent to the modern period. I’m sure running the actual calibration analyses is very costly and is considered impractical for every proxy. However, given what is known about siting issues with Stevenson Screens, it seems very important to use uniquely calibrated proxies where possible. Fig C4 in the Thesis shows the correlation between the number of series and GMST. The curve is pretty flat, indicating if they used only 20-proxies instead of 78, the correlation would only decline by ~5%. This indicates to me that the quality of the proxy is much more important than the number of proxies used.

The temperature trends that the team identified for the past 2,000 years are statistically indistinguishable from results obtained by other researchers in a previous study, says Marcott. “That gives us confidence that the rest of our record is right too,” he adds.

Is there any explanation possible that would prevent this article from being pulled?

It is hard to believe that a comment such as “indistinguishable within uncertainty” could be made without some sort of analysis/justification. Some “test” must have been done. You are showing the averages; when Monte Carlo methods are applied, do the uncertainty bans of this study overlap the uncertainty bans of the other reconstructions? Wide uncertainties make apparent dissimilar averages become indistinguishable.

That phrase “indistinguishable within uncertainty” jumped out at me, too. What kind of a standard is that, when the wider the error bars, the better the correlation? Leaving the math off to the side for a minute, that’s a pointless standard.

Regarding the NH reconstructions, using the same reasoning as above, we do not think this increase in temperature in our Monte-Carlo analysis of the paleo proxies between 1920 − 1940 is robust given the resolution and number of datasets. In this particular case, the Agassiz-Renland reconstruction does in fact contribute the majority of the apparent increase.

Marcott’s comment was that the 1920-1940 NH increase was 1.93 deg C, and that Agassiz-Ren “does in fact contribute the majority” of the increase. The actual Agassiz-Renland 1920-1940 increase was 0.40 deg C. The response to that should be ‘Houston we have a problem.’

Even if we take the entire NH “blade” period – 1900-1940 – Marcott shows 2.14 deg C increase, while Agassiz-Renland shows just 1.38 deg c increase. We still have the problem. And that problem exists before factoring the other post 1900 temps – including a number that were decreasing.

Steve: Agassiz doesn’t contribute enough to account for the uptick as I observed at the time. There is another explanation that I’ll get to.

A possible clue might be in the rest of Marcott’s reply – and I suspect that might be what Steve is on to:

The reason is that the small age uncertainties for the layer-counted ice core chronology relative to larger uncertainties for the other lake/ocean records. The Monte Carlo analysis lets the chronology and analytical precision of each data series vary randomly based on their stated uncertainties in 1000 realizations. In this analysis method, the chronologically well constrained Agassiz-Renland dataset remains in the 1920 − 1940 interval in all the realizations, while the lake/ocean datasets do not (and thus receive less weight in the ensemble).

I think that says they weight the data used in the ensemble based on accuracy of the dating – and/or there is age shifting that occurs on the non-ice core data.

That still would not address the missing warmth though, it would seem. I can’t think of any way around that we still need other data than Agassiz-Renland that shows a considerably higher increase from 1900 or 1920 to 1940 than Agassiz-Renland, to get the higher NH ensemble increase shown.

A. Scott, I discussed that issue earlier. They don’t directly weight the data based on anything. Instead, they modify series in certain ways based on uncertainty levels for those series. That includes introducing time-shifts.

To explain the effect, suppose you had a series that covers the 1920-1940 period. Now suppose you do 1,000 runs, and in 200 of them you shift that series “left” far enough it ends before that period. That would mean the series would have an effect on the 1920-1940 period in only 80% of the runs. If another series had an effect for 100% of the runs, it would have a larger effect on the ensemble.

From what the authors say, the Agassiz-Renland series covers the 1920-1940 period in 100% of their runs. The rest don’t. That could cause the ensemble values to rise even if the Agassiz-Renland series was constant in the period as long as it was constant at an above average value.

Brandon … but even in that instance – where Agassiz-Renland is covered in 100% of the runs – wouldn’t it be correct that the affect of Agassiz-Renland on the ensemble values cannot be higher than the actual increase in Agassiz-Renland?

Nope. As I mentioned, even if the stationary series doesn’t increase at all, the effect I described can cause a rise. The reason is shown in McIntyre’s latest post. I’ll quote John C from it to explain:

As you can see, even if a series doesn’t change, it can be the source of a rise in averages when other series drop out. To demonstrate the effect I describe, imagine if John C’s example was a single run of the Marcott algorithm, and in it, Series 1 and Series 2 had been time-shifted to the “left” one space. That would make the original data:

What happens if you average the two results together to get an “ensemble”?

average – 2 2 2 2 2.5

The ensemble has increased. Now imagine you did a thousand runs based on that data. In each one, Series 3 was kept stationary, but Series 1 and Series 2 were not. Sometimes all three series would cover the final point, giving an average of 2. Sometimes Series 1 would drop out and give an average of 2.5. Sometimes Series 2 would drop out and give an average of 2. Sometimes Series 1 and 2 would drop out and give you an average of 3.

That’d give you a thousand runs with a final value ranging from 2 to 3. The average (ensemble) of those thousand runs would be some value above two, determined by how often Series 1 and Series 2 each dropped out.

If you want to play around for yourself with this, try using somewhat larger values, and allow time-shifting of more than one space. The effect should jump out at you.

Brandon:
I can see why one might shift the time scale for a proxy a bit to off set an error in dating but what is the physical justification for what I understand is the wholesale perturbation of a large subset of your proxies. It seems to me that it is the equivalent of a PCA where you grab a PC that fits your purpose.

That is not the right way to handle dating error. I discuss this in my paper:
Loehle, C. 2005. Estimating Climatic Timeseries from Multi-Site Data Afflicted with Dating Error. Mathematical Geology 37:127-140

Matt Skaggs, glad to! Writing out an explanation of it helped me too as it forced me to be clear/sure of each individual step. That helped as I had some trouble with the idea when I first stumbled upon it.*

bernie1815, it’s important to understand something. The authors didn’t pick values to perturb series by, and they didn’t pick results they “liked.” What they did is create a thousand perturbed versions of each series to “see what happened.” The idea was to see how dating errors could manifest. From that, you determine what range of values the series could have. The authors may or may not have messed up in the process, but the underlying idea is sound.

Craig Loehle, be very careful when you say it “is not the right way to handle dating error.” There is very rarely a “right way” to do things. There are often several different options, and none are “quite right.”

You may have found a “right” way to handle dating error, but don’t assume it’s the only one. And if your way is not the only right way, pointing people to your approach won’t help them see what’s wrong with Marcott et al’s approach. That makes it seem like nothing but self-promotion. I know that probably wasn’t your intention, but that is the effect.

*I’m still chewing on a secondary issue tied to this one. Time-shifting series in a Monte Carlo experiment like the authors did will reduce the signal of any given series. However, there are a lot of things that impact the extent of this effect. It’s difficult for me to estimate what effect it’d have on the overall results. It seems like it’d be a smooth with weird coefficients determined by dating uncertainty, but… it’s weird.

Second, even IF you can increase an average above the highest data point value in the group, to me the result would no longer represent reality. It physically did not get warmer than the highest temp achieved during the period?

I know it looks like the culprit may more likely be in the redating, but I’d still like to try and learn about this issue as well.

A. Scott, the average isn’t higher than the highest data point. That’d be impossible. You can’t have an average larger than the largest data point. However, the average isn’t above Agassiz-Renland. What you’re looking at is the change in Agassiz-Renland’s data. What matters is the value of Agassiz-Renland.

Imagine if you had a dozen series with values of -100 in 1920 and 1940. What would happen if you combined them with Agassiz-Renland using the authors methodology? Every time a series with -100 drops out, you’d have a huge increase in your average. It doesn’t matter how much Agassiz-Renland changes. As long as it is larger than the values dropping out, the average will increase. That’s true whether Agassiz-Renland increases from 1920 to 1940 or stays constant. In fact, it could even happen if it decreased in that period.

Second, even IF you can increase an average above the highest data point value in the group, to me the result would no longer represent reality. It physically did not get warmer than the highest temp achieved during the period?

Again, I want to stress you’re focusing on how much a series changes in the period. What matters is the value of the series, not how much it changes. However, you are right on the idea of this showing the methodology is bad. It does not work anytime series start to drop out (primarily, the endpoints).

I know it looks like the culprit may more likely be in the redating, but I’d still like to try and learn about this issue as well.

I don’t agree that is a likely culprit. It certainly is an odd issue with the paper, but redating proxies cannot create the effect we see. It could contribute by changing the lineup of proxies at the end (thus changing which series might be given undue weight), but it could not speak to the methodology.

Steve McIntyre may correct me, but I think what I’ve discussed is the methodological cause of their uptick. I know it can create upticks like that; I’ve done so with synthetic data. It’d be weird to me if I found something in the author’s methodology that can create them but didn’t in this case. Still, I’m open to new ideas.

But again, let me stress this. What matters for this issue is not the change in a series. What matters is the value of that series.

I kinda understand what you are saying … lets say we have 5 proxy’s – with Agassiz’s 1.38 deg C one of them, and the rest are each at -5 deg C. And that we have 5 “periods” of 5 years each starting 1920 and ending 1940, with one proxy dropping out each period, so only Agassiz-Renland is standing in 1940.

In 1920 the coverage includes all 5 proxy sets – so: [-5 -5 -5 -5 +1.38]
That “average” would be -3.72.

In 1925 the coverage include 4 proxy sets – so: [-5 -5 -5 +1.38]
That “average” would be -3.41, and the “change” from prior period “average” value would be +0.32.

In 1930 the coverage include 3 proxy sets – so: [-5 -5 +1.38]
That “average” would be -2.87, and the “change” from prior period “average” value would be +0.53.

In 1935 the overage include 2proxy sets – so: [-5 +1.38]
That “average” would be -1.81, and the “change” from prior period “average” value would be +1.06.

Last, in 1940 the coverage include just the 1 proxy set – Agassiz [+1.38]
The “average” would be +1.38, and the “change” from prior period “average” value would be +3.19.

I think that effect is what you are describing. And can see how you can get a number larger than one individual proxy.

But that is in no way remotely representative of the real world – and I simply cannot believe a legitimate scientist would ever publish it.

To try to better understand I tried some random numbers, but a little more representative of the range of values in the data. If we use the same method above but start with: [+1 -0.6 +2.14 -1.5 +1.38] then remove one each period – by the time we get to the last, 1940, period the change per period is +1.44.

It appears, in order to get a “change” number in the last step that is significantly higher than any one remaining data point in the remaining average, the remaining proxy’s in the prior period have to be pretty highly opposite – negative in this case. However, even if that did occur it still would not be a remotely accurate representation of the real data, and should have no place in a professional paper?.

A. Scott, I don’t think you’re confused at all. Series dropping out causes problems for any number of methodologies, especially for ones that use (relatively) basic averaging to combine series. This is just another example of an old problem.

That last chart is interesting to me. If I am understanding the footnote correctly, the line showing the Tropics was excluded from the graph for readability. If you added that line to the chart, though, the lines for the Tropics, NorthX and SouthX should average to the yellow line shown, correct?

For that to be true over most of the chart, the Tropics line would have to be about halfway between the NorthX and SouthX lines. (To the limits of my ability to visually interpolate the data, anyway.) That middle position seems plausible to me. The interesing bit is about 9000 BP where, if I’m reading the chart correctly, the Tropics line would have to drop well below both NorthX and SouthX for the average to stay at the yellow line.

Am I reading that correctly? And if it’s true, is there any reason why that might be true?

“Marcott et al conspicuously did not compare their zonal reconstructions with previous NH and SH reconstructions”

They also gave reasons for this being not such a good idea, eg:“The general pattern of high-latitude cooling in both hemispheres opposed by warming at low latitudes is consistent with local mean annual insolation forcing associated with decreasing orbital obliquity since 9000 years ago (Fig. 2C). The especially pronounced cooling of the Northern Hemisphere extratropics, however, suggests an important role for summer insolation in this region, perhaps through snow-ice albedo and vegetation feedbacks (21, 22).”

The contrast they are making is between tropical and extra-tropical. Yet this post just mixes up NH and NHX (and SH, SHX) in the comparisons.

They do compare with their own global recon (in S10).
Steve: Nick, they can make whatever arguments they want. However, if they wish to compare their NHX to prior data, then surely it is reasonable to compare it against the MBH99 NHX (which was calculated). I did not “mix up” NH and NHX. I noted the difference. If it is valid to compare MBH99 NH against GLB, surely it is valid to compare MBH99 NHX against Marcott NHX. As Briffa caustically pointed out against Mann, just because Mann called his MBH reconstructions “NH” or “SH” didn’t make it. If NHX/SHX are extracted from M08, the point made here will still hold.

first, hahahahahahahahha this is outrageous! Hilarious! After adding this to the canon any possible curve will be “remarkably similar” (TM-climate science) to one of the recons!
Second: move over Sherlock Holmes, SM is back!
third: I now have to go clean up the coffee I spilled all over my keyboard…

Interesting. If one is to, for the sake of argument, accept the claims/conclusions of the paper, one must doubt the validity of the concept of a single global temperature; clearly, northern and southern hemisphere temperatures may vary wildly and independently.