MBH98 Source Code: Cross-validation R2

The newly-archived source code and the data archive made public in June-July 2004 (mirrored at Nature) ftp://holocene.evsc.virginia.edu/pub/MANNETAL98/ are clearly connected with MBH98, but do not match. At present, it is impossible to do a run-through and get results. I’ll discuss this in an upcoming post. In Mann’s shoes, I’d have tried to ensure that everything matched. The new source code shows how various statistics were calculated and definitely shows the correctness of our surmise that MBH calculated cross-validation R2 statistics and that these statistics were withheld, presumably because the cross-validation R2 statistics were adverse. A Summary Table here shows a
probable reconciliation between statistics as calculated in the computer program and statistics reported in the original Supplementary Information — with the caveat that the identifications in this table are based on the structure of the calculations rather than from a run-through of Mann’s Fortran program. (The Supplementary Information link shown here is to the FTP site at the University of Massachusetts, rather than to Nature. The mirror version at Nature was deleted earlier this year. I don’t know how often Nature does this. The University of Massachusetts directory was deleted temporarily in November 2003 after the publication of MM03, but it was restored after the late John Daly complained. It’s lucky that it’s still extant.)

In the original SI , the cross-validation R2 statistic was not reported. You can see columns for calibration beta (which is equivalent to the calibration period R2) and for the verification beta, plus some r^2 and g^2 statistics pertaining to Nino, but, if you look closely, there is no verification R2 statistic. We remarked on this in MM05a and MM05b. We had previously speculated that it seemed inconceivable that the cross-validation R2 statistic would not been calculated (and thus withheld), but without source code, we were then unable to show this conclusively. However, the newly-archived source code demonstrates clearly that MBH did calculate the cross-validation R2 statistic (pages 28-29 in my printout).

Accordingly, I can now assert that the information was withheld in the original SI. At this point, we also know that the values of the cross-validation R2 were very insignificant (~0.0) in the controversial 15th century reconstruction. One can reasonably surmise that this information would have been very detrimental to widespread acceptance of the MBH98 reconstruction had it been disclosed. The IPCC assertion that the MBH98 reconstruction

“had significant skill in independent cross-validation tests”

is obviously not true for the withheld cross-validation R2 statistic.

I previously discussed this inaccurate disclosure by IPCC as illustrating the potential conflict of interest between an author in his capacity as an IPCC review author and in his capacity as the author of the underlying study. While I anticipated that the code would demonstrate the actual calculaiton of the cross-validation R2 statistic, there was a bit of a surprise in the form of another discrepancy between statistics calculated in the program and statistics reported in the original SI. The program shows that a verification period RE statistic was calculated for the Nino index; however, the original SI only reported a verification period R2 statistic — reversing the reporting pattern for the NH temperature index. In this case, I presume that the verification RE statistic for the Nino calculation will be adverse.

However, I have not attempted to replicate the MBH98 Nino calculations and this is merely a surmise at present. I strongly believe that the authors had a responsibility to report adverse statistics, such as the cross-validation R2, and were not entitled to withhold this information. This also applies to Wahl and Ammann, who similarly do not report a cross-validation R2 statistic. In their case, their code as published does not even include the calculation of cross-validation R2 statistics in their calculations , but I would be astonished if they had not calculated these values at some point and later edited the step out of their code. Mann has begun the process of trying to justify the withholding the R2 statistic in one of his answers to the House Committee letters. In my opinion, this attempted justification is very unsatisfactory.

If the authors had wished to argue (as they are now attempting to do at this late stage) that the RE statistic is “preferred”, this should have been done at the time, after ensuring that the reader was in possession of the statistics that the authors had calculated, thereby permitting the reader to come to his own conclusion on these matters. The selective omission of the cross-validation R2 statistic is a material distortion of the record. It’s late in the day to be arguing these matters after positions have been taken and locked in.

I have no doubt, as I’ve mentioned recently, that, if the IPCC had reported that the MBH98 reconstruction had a cross-validation R2 of ~0.0 (rather than claiming that it had “significant skill in independent cross-validation tests”), the MBH98 hockey stick graph would not have been featured in IPCC. If it had been reported in the original publication, it’s possible that the original article would not have been published in the first place. It will be interesting to see what the various learned societies and individuals will make of this.

One thing to note that what Mann says in his response is, strictly speaking, correct.

He claims on page 9 of his letter:

The Committee inquires about the calculation of the R2 statistic … My colleagues and I did not rely on this statistic in our assessments of “skill” (i.e., the reliability of a statistical model, based on the ability of a statistical model to match data not used in constructing the model) because, in our view, and in the view of other reputable scientists in the field, it is not an adequate measure of “skill.”

And later on the same page:

The linear correlation coefficient [r] is not a sufficient diagnostic of skill, …

What is not said here is that although a high cross-validation r2 is not sufficient as a measure of statistical skill, it is necessary.

A cross-validation statistic in general refers to a statistic that compares the predicted values of two (or more) models trained using the same algorithm but different data sets. The usual method is to train each of the models with a different portion of the calibration data removed, and then compare the values predicted by these models.

From MBH99:

MBH98 performed extensive cross-validation experiments to verify the reliability of the re-construction using global temperature data from 1854-1901 withheld from (1902-1980) calibration, and, further back, by the small number of instrumental temperature series available back through the mid-18th century.

The r2 statistic measures the degree of linear correlation between data sets; that is, if one set of data (say the temperature) is a linear function of another (say the prediction of a multiproxy climate model) then the r2 will be 1.0. For cross-validation purposes, this is similar, it’s just saying that that with some data withheld, the predicted temperature by one model (trained with all calibration data) and another (trained with a subset of the calibration data) are linearly correlated.

As Mann says, there are known issues with the r2 statistic (in that an r2 of .99 alone does not mean that you have a good fit). Still, the point of having multiple statistics is that each one confirms some portion of the expected correlation. Steve, it might be interesting to see a scatterplot of the predicted temperatures based on the withheld cross-validation models; usually a bad r2 is easily visible to the naked eye.

Re #2. Further to your comment that both r2 and RE are necessary, Mann himself demonstrates this in some supplemental material here:http://fox.rwu.edu/~rutherfo/supplements/jclim2003a/miscsupp.pdf
where he gives some synthetic examples showing how r2 and RE ( and another measure CE) are affected by various differences between calibration and validation data sets.
My question, after looking at this, is how do you get a step change in mean temperature!

I can’t see anyone being moved unless what you (Steve, others on a par perhaps) think is wrong can be spelt out in simple english – or at least english those of us who have at least some grasp of maths and a hold on climatology and are interested in both can cope with. RE and R2 means nothing to 99.9% of the population.

You needn’t be an expert in statistics to understand what happened. MBH calculated statistics but buried ones which didn’t support the desired conclusions. Not only is this reprehensible, it is also inexcusable that peer-reviewers ignored the lack of these statistics. As I’ve said several times before, the first question I was often asked (or heard asked) in graduate school for any experiment results, presentation, lecture, etc, dealt with the statistics of my results. There wasn’t even a possibility of trying to publish anything without a thorough statistical analysis. If the statistics were good, then it was supportive of the results. If the statistics were bad, then it was a caveat that had to be mentioned and taken into account when it came to conclusions (along with a signal for where future work was needed). But these statistics needed to be present for reviewers and readers to judge for themselves.

Can you explain the IPCC’s claim that MBH “had significant skill in independent cross-validation tests” without attacking either intelligence or ethics?

Michael, so insults are part of the mechanism of science? Simply put – NO THEY ARE NOT.

As you must know, I cannot answer your question because I don’t have the expertese – it is for scientists to decide these things.

Atm I think Steve’s view doesn’t prevail. It might yet, but I doubt it. That is not to say I don’t think Steve expert – he clearly is (in his disciplines), and in spades. But I also respect M B H and all the rest. Some contributors here get from me as good as they give, but I’m not going to insult Steve!

If I’m asked then I still think that now is a warm as it’s been for at least 2000 years, that it’s warming fast and that we better stop being distracted and at least curb our emissions else the warming will continue for some time. A quick 2C+ is not good idea.

….so insults are part of the mechanism of science? Simply put – NO THEY ARE NOT.

They are part of Hearnden Science, that strange branch of knowledge where bizarre stident assertion is equal to truth and anything that contradicts that (like facts) is an “ad hom”

I think Larry Hulden was expressing a valid point – like certain West Country farmers, most of the IPCC took for granted that which they took to be true because they lacked the knowledge to check it for themselves.

If I’m asked then I still think that now is a warm as it’s been for at least 2000 years, that it’s warming fast and that we better stop being distracted and at least curb our emissions else the warming will continue for some time. A quick 2C+ is not good idea.

The above statement is based on no evidence whatsoever.

1. There is no good evidence that its as warm as it has been for at least 2000 years.
2. There is no good evidence that curbing emissions will make any difference to the climate.
3. There is no good evidence that “a quick 2C” is happening, or is about to happen.

Why should anyone believe your assertions? Because you feel you must be right? Because you follow a “scientific consensus” that you don’t understand?

To me there is at least one very interesting result being exposed from Steve’s studies. It seems as if tree rings do not reflect the temperature as has been expected. Recent warming is not visible in the tree rings. One possibility is that surface temperature record is an artefact. However, if we accept the recent warming, then we can’t use the hockey stick as a reliable reconstruction. “Many independent reconstructions” claimed to support the hockey stick do not help if they depend on included anomalous data sets.
I doubt the CO_2 effect on tree rings in some of the anomalous series. If there had been geographically large scale increase in tree rings I think that this had been considered to be due to the warming.

Forget the Maths and Stats for a bit let’s try and look at it logically. Let’s assume that the MBH ‘hockey-stick’ reconstruction is correct and that it is an accurate representation of the NH climate for the past 1000 years.

The last IPCC report (see link) shows the reconstruction as it appeared in the Summary For Policymakers section.

Note the sharp increase in temperature just after 1900. The rise is without precedent in the previous 900 years – either in rate of increase or actual amount. In fact here is another reconstruction by Mann and Jones which goes back almost 2000 years.

Re 11, John ‘A’, it is part of whoever you are to insult others who have been open and honest with this place about who they are, what they know (and don’t know), and what they think? The answer is yes…

Look, I put my view and you dont like it. OK, either put up with it, censor me, or carry on insulting – I can take it🙂. Indeed, I suspect I’ve not much option but to put up with it, or be intimidated into silence.

Btw, your phrase ‘no good evidence’ misses some vital words, they are ‘in the opinion of John ‘A”.

It’s important to point out the MBH graph clearly shows its error bars, so you need to define what you mean by accurate. I think it’s, like all the other recons, basically telling the truth. But, I don’t think it’s spot on, and I’ve never thought it is (elese I’d dismiss all the other recons – I NEVER have).

As to the rise, well, it’s also clearly present in the surface record. So, you either have to accept the rise as valid (again, within the errors of the graph) or you don’t. I do, that’s my view (and shoot me down if you like). OK, why does it rise? As I understand it, much of this early rise is put down to an increase in the activity of the sun. So, that means, I don’t think much of it was due to our activities (yes, that’s right, a ‘warmer’ like me…).

Btw, your phrase “no good evidence’ misses some vital words, they are “in the opinion of John “A’

Or rather the statements made by Peter Hearnden should have been marked “true only to Peter Hearnden and no-one else”. It’s a remarkable gift Peter has to keep repeating the same stuff without apparently any verifiable evidence available in any of them.

For example, earlier in this thread you explained that you weren’t able to understand the mathematical concepts being discussed and yet you still manage to claim

It’s surely clear Steve’s view doesn’t prevail? We wouldn’t have Climate Audit if it did. It’s also true that his view ‘might yet’ prevail. Finally, it’s true I don’t think it will. That is *my opinion*. As I’ve said, I know (from long experience) you don’t agree. OK, Do one of the things I suggested then.

I understand it, much of this early rise is put down to an increase in the activity of the sun.

Yes – an increase in solar activity with possibly a decrease in volcanic activity. It’s certain that GHGs contribute little or nothing as atmospheric concentrations were not appreciably different to what they were 100 or 150 years earlier.

So – the increase in or around 1900 is pretty much entirely due to NATURAL forcings. Doesn’t this strike you as slightly odd? This is not some gentle upward curve but an abrupt and sharp inflection – completely different in character to the previous 900 years (in MBH) or 1700 years (Mann and Jones). It represents a major shift in NH climate yet there’s relatively little literature on the subject.

It’s not proof but it certainly suggests (to me at least) that both reconstructions seriously under-estimate pre-1900 climate variability. I take your point about the error bars and it’s quite possible that the ‘true’ values lie somewhere within them, but the H-S shape is misleading – particularly to non-scientists.

I’ve followed the ‘investigation’ by Steve McIntyre and Ross McKitrick and it seems to me that they have shown that the H-S with it’s anomalous early 20th century ‘kink’ is a result of the methodology (and data) used by Mann et al.

As you must know, I cannot answer your question because I don’t have the expertese – it is for scientists to decide these things.

This is a load of baloney, and given your earlier statement about putting it into english even the layman can understand you are now backpeddaling. Either it is something for everybody or it isn’t. Make up your mind. Further, the idea of not reporting statistics that are calculated is indeed curious. Why calculate them if they are deemed unworthy? If they aren’t unworthy and they were calculated why not report them? These are two interesting questions that Mann, et. al. should answer. Their silence is not helping them.

If I’m asked then I still think that now is a warm as it’s been for at least 2000 years, that it’s warming fast and that we better stop being distracted and at least curb our emissions else the warming will continue for some time. A quick 2C+ is not good idea.

That’s great, but if it is natural (at least in part) then curbing emissions might not do anything. Further, it could seriously hamper our ability to deal with other problems and issues.

It’s important to point out the MBH graph clearly shows its error bars, so you need to define what you mean by accurate.

So what? An inaccurate calculation, for example, will likely have inaccurate error bars. The presense of error bars, while important, does not ensure that the underlying calculation is correct.

Note the sharp increase in temperature just after 1900. The rise is without precedent in the previous 900 years – either in rate of increase or actual amount.

Please correct me if I’m wrong (and forgive me if this issue has been raised before), but this graph appears to show dissimilar data grafted together at the hockey stick ankle. The lower “flat” part of the graph shows results from the proxy data that is smoothed using a “40-year Hamming-weights lowpass filter”. The upper part of the graph appears to be made from surface measurements which was smoothed using a “21-point binomial filter giving near-decadal averages”.

One of the points of the graph is to show a decrease in stability of temperatures during recent times. However, a milder smoothing algorithm for the recent data would incorrectly give this impression even if the two underlying data sets had identical variability. Also, this would likely exaggerate an increasing trend if it existed. Can anyone comment on how the two smoothing algorithms would compare? How well does a “40-year Hamming-weights lowpass filter” approximate a decadal average?

Also, I have some concerns over the apparent truncation of proxy data and replacing it with surface temperature data for recent times. Maybe there is a good reason for this that I a not aware of. Do we have proxy reconstructions covering the last 50 years? How does this compare to surface data and MSU data?

Without the surface temperature portion of the graph (red line), the reported rise in temperature during the 19th century appears to be within the error bars for the early half of the chart. If the MM reconstruction is more accurate, then recent behavior is even more unremarkable. Please correct me I’ve missed something.

John, well, of course one of the graphs is global one NH. Much more sea in the SH, temperature changes are more muted there. You would expect to see more variation, quicker response to forcings, in the NH, I think, since it has more landmasses.

I don’t think MBH ‘misleads’. To understand a graph you have to, well, understand it. The error bars are clear, it’s no secret there are other recons, there will be another IPCC report, the page you linked to also shows the surface record. The recons show a NH somewhat warmer 1000 years ago, colder 500 years ago, and at it’s warmest now. Unless you want to quote older recons (like Lamb’s) that’s, for me, the best we have. Remember, Steve is only offering a critique, not a recon.

I don’t think Ross and Steve have ‘shown’. I do think they are trying to ‘show’.

It’s interesting how you say, “I’m not expert, so I defer to the scientists,” yet you select RSS over UAH with no explanation on another thread, think Steve’s arguments will be defeated, etc. I haven’t seen ONE scientist attack Steve’s position concerning this particular thread, yet you’ve already taken their side. I’m unaware of any group of scientists who present valid reasons for RSS over UAH, either (that doesn’t mean they don’t exist), yet you’ve stated on other threads you believe RSS over UAH. And you readily dismiss the importance of statistics because 99.9% of people don’t care or understand about them, yet the position of practically every single scientist is that statistics are important. So when exactly do you “defer to the scientists?”

Michael, so insults are part of the mechanism of science? Simply put – NO THEY ARE NOT.

No, but fraud, lack of due dilligence, and lack of disclosure are also not parts of the mechanism of science, either. Burying statistics because they do not support your conclusions is at best within the latter items. It’s time either to question why MBH forgot to include these statistics (and why Nature overlooked that during peer review) or why MBH chose not to include these statistics. Either line of questioning is insulting by nature – there’s no way around it.

It’s important to point out the MBH graph clearly shows its error bars, so you need to define what you mean by accurate.

Do you even know how those error bars were calculated? In various places on this website, plenty of questions have arisen about error bar irregularities (for example, why the error bars in the early portion of Mann’s 2000-yr reconstructon are narrower than those of more recent years). Keep in mind that those error bars are based on the data processing and methodology, which in some instances have been called into question. Regardless, let’s assume for the sake of discussion that those error bars are proper and acceptable for the data used in the reconstructions and that the proxy data themselves (yet another piece of the puzzle whose accuracy of interpretation is called into question) are acceptable. So tell me this – how much should those error bars be expanded to reflect an average global temperature as opposed to simply reflecting a composite of processed proxies which don’t come close to spanning the globe? Take a look at the map of proxies used for the 2000-yr reconstruction (http://www.ncdc.noaa.gov/paleo/pubs/mann2003b/mann2003b.html – currently down as I check it). Do you really think the error bars from processing data from those proxies can be extrapolated to whole world?

What I find particularly interesting in your case is that Spencer and Christy present an error range of their decadel satellite trend (still lower than the surface measurements at the high end of the range), yet on another thread here you dimissed that entirely and judged the satellite record based on some 0.2 deg C error you came up with off the top of your head that you thought you’d read about somewhere. Why do you have one rule for Mann’s error range and another for S&Cs? I’m curious to know how accurate you think the proxy temps are, too.

As to the rise, well, it’s also clearly present in the surface record.

Yes, but it’s much higher (usually 2x relative to UAH) than the satellite record of the last 25+ yrs (RSS or UAH – even with the latest UAH adjustments), and GCMs and greenhouse theory say the satellite trend should be at least as high as surface trend. And before you sell your soul to the surface record (since I know you think UHIs are fully accounted for), take a look at this http://ccc.atmos.colostate.edu/pdfs/BAMS_Davey&Pielke_Apr05.pdf . If that’s the state of climate records in Colorado, how accurate do you think they are in other places around the world?

To understand a graph you have to, well, understand it. The error bars are clear

Peter understands the error bars with clarity? The guy who doesn’t understand or care about the maths and statistics of Mann et als understands the maths and statistics surrounding Mann et als error bars?

John Daly, no less, claimed the satellite record was ‘accurate to 1/100 of a degree’ – so he was wrong too? I have to be honest, I’ve not seen the error range of S&C on any graph I’ve seen, where is there such a graph?

You seem to think if it say I’ll defer to scientists that I must therefore not have views. This is not the case. I’m happy with the scientific consensus, but, if you push me, you are right that the scientific jury is still out re satellite temperatures. So I accept that criticism.

Be careful about accusing people of fraud in a public place. The law deals with fruad, you have evidence of it take it to your local law enforcement agency, else it’s just defamation.

I didn’t ‘dismiss statistics’ in post 4, I said ‘I can’t see anyone being moved unless what you (Steve, others on a par perhaps) think is wrong can be spelt out in simple english – or at least english those of us who have at least some grasp of maths and a hold on climatology and are interested in both can cope with. RE and R2 means nothing to 99.9% of the population.’. Apart from it perhaps it being 99% not 99.9% I stand by that. Please don’t put words in my mouth and do try to understand the context of the post in a thread where the initial post is highly mathematical.

Aside from the fact that this looks like a scripting session for the TV show Numbers — all we need is the FBI to interceed and arrest “climate change, ” it does demonstrate one thing. Name calling is indeed a province of scientists!

My own experience is 20 years of spacecraft thermal design and engineering. While Mr. Manns analysis and graphs are interesting conceptually and can be entertaining in books at the local Barns & Noble palentology section they should not be taken for anything more than that. If the political agenda of environmentailists had not grabbed it up it would be just that. Mildly interesting conjecture.

It does not really mean anything since it has no useful application as a tool at all. It can’t make predictions, it can’t solve problems. It has so few proxies in it that it can’t even quantify the relationship of even one or two variables. It is just a curiosity that has spun out of control because people with political agendas spun it up in the press and got everybody excited.

Since my tax money pays for it I do hope that Congress does hold hearings and tears it apart. Then it will fade away like cold fusion and ozone holes.

Steve: Mike, it strikes me that there should be some form of due diligence on the climate models used for forecasting equivalent to “red teams” or “tiger teams” in aerospace engineering. I’d be pretty surprised if there’s any really adequate due diligence on the climate models and it strikes me that climate modelers are not very forthcoming about problems with their models, because they are so anxious to promote their conclusions. The models are pretty complicated in one sense, but Wigley and Raper were able to emulate big models with a very simple model, so I suspect that the models are being driven more simply than people think. If one set as a target that the climate models should meet some sort of engineering standards, do you have any ideas on how one might set such standards or go about checking them?

S&C often report trend errors. I’m not sure I’ve seen one greater than 0.05 deg C/decade (and I recall seeing an error in the surface trend calculated to be 0.06 deg C/decade somewhere). See Christy and Norris’ “What May We Conclude About Global Tropospheric Temperature Trends” for example (you may be able to google it and find it…I had a link to a PDF version, but it’s now broken). I’m not sure how accurate individual readings are. Raw satellite readings may be sensitive to 0.01 degree as you say Daly said (I don’t know), but the processing of the data can probably introduce errors larger than that. S&C also calculate their errors based partly on the difference with radiosonde readings.

With regard to error bars, many studies use +/- one standard deviation (or a multiple thereof). This can provide a false sense of security. For example, I can take a box of 100 shoes that are supposed to be the same size, measure them with a ruler, and find the average size and standard deviation. That standard deviation may look good enough to represent the error range in shoe size for that box. However, if I were using a wooden ruler that is warped, or if I were reading the ruler improperly (reading cm instead of inches or something like that), then +/- std dev is a very incomplete assessment of true error.

I don’t see where I accused anyone of fraud. I said that burying statistics which don’t support your argument is at least “lack of disclosure.” I don’t equate that with fraud. Fraud would require falsification.

Steve: In securities law, under a “full true and plain disclosure” regime, a material omission is an offence. I tried to compare this to falsification in scientific codes of conduct a little while ago – it seems to me that a material omission can rise to falsification.

It would be very helpful if people would stop rising to Peter’s provocations. The resulting dialogues are just a distraction, and Peter’s comments aren’t worthy of replies. Just tell him to get lost, and make room for those who may have something to contribute. To John Finn, you do not find unprecedented 20th century increases when you look at Moburg’s reconstruction. The problem is your starting premise that the hockey stick might be right. The almost straight shaft of the hockey stick is simply a statistical construct that has nothing to do with reality. As for people who do not understand arcane statistical analysis, I would agree that most IPCC reviewers are in the 99+% that don’t. I have only encountered 2 cases of IPCC reviewers who really tried to reproduce results using alternative but equally valid analysis, (ie., tried to really do the peer review they were asked to do) and both of them found the IPCC work they were reviewing wanting. The two were Peter Dietz and Jarl Ahlbeck, both of whom are experts, well qualified to carry out the analyses they did. Both cases can be found at John Daly. Now a third expert has done true peer review on another piece of IPCC evidence and again has found it wanting. The AGW community has argued with all three, but I have seen nothing in the arguments that invalidate the criticism in any of the three cases. Interestingly the letter replies from M, B, and H to Rep. Barton don’t stand much scrutiny either. Murray

Per your comment. I would think that the truth is that these models are single point curves as you suggested. It would supprise me if they were much more than complex exercises in extrapolating single variables with lots of correction factors to to match current average temperatures in places where the proxies come from. That is all they can ever be.

It is common sense that only simplistic single point solutions are all you can do on a desktop FORTRAN routine. Otherwise you are talking about something like IBMs Deep Blue running for weeks to to come up with a one hour model of just the fluid and temperature mechanics of the air over 1 square mile of the earth. And that does not even think about atomspheric chemistry, chemistry between life and the atomsphere, discrete chemical diffusion through layers of the atomsphere — to name a few. But, then you get into what people like the National Weather Service, NASA, NOAA, the Navy, the Air Force, etc. have been trying to do for 40 years.

Human activity does indeed affect the atomsphere and if the climate is changing for that reason or any other it is a valid thing to do to try to understand it. An independant review of these data and models would be a worthwhile thing. It would need a government sponsor and somebody like the Hewlett Packard Foundadtion to fund it so that it would draw on both side of the political fence. Just building a clearinghouse of vetted data and reports would go a long way since all that exists now are political action groups, newspapers trying to sell stories, and scientists who get grant money to study arcane bits of the world.

Steve: I wasn’t thinking about trivial models like Mann’s, but about the big climate models like the UCAR, Hadley Center or ECHAM models which appear to have developed out of weather forecasting models and which generate terabytes of output. I get the impression that for the purpose of forecasting (say) NH average temperature for the next 50 years, the big models are geing driven by fairly simple components and there’s a lot of work being done which essentially cancels out, but that’s just an impression.

A huge complex model can be very misleading, especially when you are trying to reduce the output to what your idea of an average earth temperature is. It is, as you say, very likely that much of the calculation is redundant since any single pertubation in the atomsphere will disipate and damp out locally and has no bearing on the long term. The drivers have to be either a huge energy source, the sun, or something that changes how it’s energy is absorbed — etuptions, clouds, El Nino, solar variations, whatever. In the long term those things damp out too to some extent so that leaves just something like CO2 and trying to drive the model to account for it based on physical properties of gas samples.

The only way out is to correlate the model with measured data at extremes of the range. That means you have to have accurate measured data over decades around the world and up and down through the atomsphere. You also have to have all the chemical data and reactions so you can predict what happens to your variable, like CO2, you are playing with.

In the end you are simply performing parametric studies around single variables you select and trying to compare that to almost no actual data. You can prove just about anything with that approach.

The hard part for people who have to vet a model like this is that you either spend an inordinant amount of time arguing over the validity of the physical property measurements you put in or you force the modeler to match actual data. Catch 22, I think it is called.

I have seen models I worked on for 6 months fail and a simple curve fit do the same job for a discrete component that was failing in orbit. The curve fit the actual data and worked.

I agree. Without accurate data to model and make predictions against big models go in circles and become expensive parametric studies on single variables using sample measurements of physical properties.

The models have to be correlated externally with extremes of data at the edges of the range. Internal correlation using the accepted “peer” approach of the methods proves nothing.

I agree. Big models without accurate data to compare to end up being expensive parametric studies based on physical property measurements of single variables, like CO2. You tend to get into endless arguments that go nowhere about the validity of the approach unless the modler is forced to match actual measurements and the model is correlated with measured data at the extremes of the range.

I would expect the big models to be just complex exercises in showing how they accurately replicated CO2 optical properties in the analysis and not much else.

If I do say so, you use very strong language for someone in a locality with such punitive laws of defamation.

I looked up guidelines on research misconduct from the NSF, COPE; for the NSF, you would have to rely on :

(2) Falsification means manipulating research materials, equipment, or processes, or changing or omitting data or results such that the research is not accurately represented in the research record.

Given MBH’s fig-leaf that they prefer the RE statistic (which I think they state in the paper), I would hesitate to use defamatory language. I would nonetheless quite enjoy widely publicising the issue, and asking people if they think this is good practice !
yours
per

Steve: Per, did you see my note on falsification here ? It’s pretty hard for me to see why the withholding of the R2 statistic without notice to the reader would not rise to “omitting data or results such that the research is not accurately represented in the research record”. Maybe one of the learned societies presently weighing in would be prepared to investigate this and advise Barton accordingly.

It’s pretty hard for me to see why the withholding of the R2 statistic without notice to the reader would not rise to “omitting data or results such that the research is not accurately represented in the research record”.

Just for instance, if MBH argued that they had excluded the r2 values because they believed that only RE was appropriate, this might be (for argument’s sake) unsustainable or bad science. But bad science is not misconduct ! To make a falsification charge stick, you would need to be able to show that they excluded the r2 statistic with the intention of misleading. [/legalistic pedantry]
As I intimated above, making their peer group aware of this, and inviting their opinions would likely reveal that many people had a similar attitude to yours !
cheers
per

Steve: The issue comes with the disclosure as much as the act. My question is how they can assert (as in IPCC) that it had significant skill in cross-validation statistics knowing of an abject failure in the cross-validation R2 statistics. Surely that’s different from the omission itself.

Briffa published in 2000 a long discussion about what kind of trends tree rings are showing. In that article he presents a graph (which is actually corrected in a separate sheat) of 400 data sets from northern hemisphere. It is representing the means of all material from 1400-1990 (or -1992) without advanced “maths”. In that graph there is NO SIGN of a warming trend in latter half of 20th century, rather we can see a cooling in the most recent 50 years. This graph should be presented here for the discussion (if of interest I can scan the graph). These data are usually truncated at 1980 by IPCC (or even earlier). In the conclusions Briffa does not give any particular answers but he speaks about an unusual warm 20th century. If we look at the graph we can only see the well known warm peak in the 1930’s which IPCC has attributed to solar effects. The mean for 1900-1990 looks lower than the mean for 1400-1990. If we compensate for the solar effect of the 1930’s the actual mean value is even lower.

Some of the odd growth rates noted on this site might also depend on precipitation, but as far as I know the precipitation effect is to some degree known for the particular tree species in question. In Finland spruce indicates precipitation more than temperature, but for pine it is mainly temperature. The Finnish pine series behave much like the mean of northern hemisphere presented by Briffa.

It is obvious that there are some unknown factors influencing the tree ring growth. Most of the claimed upward trends looks like hockus pockus manipulation of basic data. Steve’s and Ross’ findings of invalid statistics in Mann’s works are very logical.

The most dramatic explanation could be that the surface temperatures are exaggerated in recent decades. I wouldn’t completely exclude this possibility.

Please correct me if I’m wrong (and forgive me if this issue has been raised before), but this graph appears to show dissimilar data grafted together at the hockey stick ankle

As I understand it, the MBH reconstruction is complete up to 1980 – others, e.g. Steve Mc, will know if this is the case. There may be some confusion, though, because the surface temperature record is invariably superimposed on the reconstruction.

This is the Central England Temperature record – the longest measured record in existence. This is only one relatively small area, of course, but notice the big changes that occur around 1700. Recent years are warmer than in the past, but there is nothing particularly remarkable happening in the early 1900s.

Also note that the vertical scale is in WHOLE DEGREES, so the 10 year running mean increased 2 degrees (and ranged more than 3 degrees) between the 1690s and 1730s. Obviously this may be due to local conditions, but there are a multitude of studies which indicate there was a widespread Little Ice Age – including this one

Published temperature anomalies are all fractions of a “whole degree”. A mining example to explain- Assume a sampling operation that assayed samples for gold to a level of 1 ppm Au, with variation from 0 to 200 ppm (0 being no gold in a sample).

A statistical analysis of the data is performed, to enormous accuracy, and gold anomalies of 0.1 ppm are created.

This statistical analysis can be rejected on the simple basis that the anomalies are below the detection limit of the analytical technique. Further more if one statistically produced these “subtle” anomalies, one cannot go into the field to scientifically test these anomalies because they are below the threshold of analysis.

In a similar fashion temperature anomalies of magnitude less than the instrumental resolution (especially historical) are anomalies of the same class – ones which are less than the threshold of measurement, and are therefore specious and, usually, rejected.

On this basis the whole GW issue could be rejected without further ado but I am also reluctant to do so because this severe data rejection also eliminates the MWP, or at least according to the statistics used by the IPCC and its co-workers. This suggests that the analyses are picking up some signal in the noisy background that we might have understanding of but that climate scientists stumbled onto by accident, lacking the necessary theoretical background to make scientific sense of the result.

On the otherhand these subtle trends etc might well be the imposition of preconceptions onto data collection and processing by biassed methodologies – put simply – if one is looking for ducks, and one’s kit is optimised for detecting ducks, then often phantom ducks are discovered – pathological science in other words.

R2 – is this the coefficient of determination? Or the square of the correlation coefficient? Just a picky thing but I sense some terminological hassles in this thread. My texts (mainly Koch and Link, and Agterberg) use conventional terms.

Re 36. It’s an interesting record. Notice how a difference of only around 1C was enough for it to be labelled a little ice age. So you agree that changes of a degree, locally, ARE significant climate changes – right? Perhaps this century will become the little hot age?

I don’t dispute temperatures were lower around the time called the LIA than now, I do think this lowering probably wasn’t as pronounced globally (or NH) as some claim, or at the same times in the same places, because that’s what the evidence I’ve seen suggests.

Michael (37) Re the 1740’s, do we know it didn’t cause changes? I’m not an expert on that decade, are you? More specifically re temperature changes, in the summer temperature at my home can often vary by 10C over a day, and more over a week – species don’t become extinct because of this do they. It’s prolonged changes that changes things for species, or massive sudden ones – all the changes have to be to ouside the envelope within which the species operates for then to be harmed. So *average* temperatures changes over differing timescales refect different kinds of changes. The longer the timescale the more profound the changes going on – right?

The point is – MBH displays none of the variability seen in the CET. It shows a few gentle fluctuations over the 900-year period (including the LIA) between 1000 and 1900 then a sudden jump around 1900 (or possibly 1902).

I’m convinced the reconstruction is unable to replicate past climate variability (a conlusion which is also reached by Hans von Storch incidentally) and is probably because of the unsuitability of tree-ring proxies.

I also believe the sudden 1900-ish jump is almost certainly an artifact of the MBH methodolgy in combination with the use of a certain data series (as revealed by Steve and Ross).

So Let’s try a little thought experiment, Peter. A drop of a degree or so in temperature results in a little ice age. Well, what is the average temperature difference between different earth surface areas? I think if you’ll look you’ll see it’s measured in 10s of degrees. So how likely is it that a degree or two more overall will be a big problem for organisms? A typical organism will be found most abundantly in an area close to its ideal average temperature. So the upper limit in these areas will likely be quite a number of degrees higher. Further, while it might have to retreat from parts of its range which were formerly ok but become too hot, it will also expand into areas which were formerly too cold and will become more common in areas which were formerly suboptimal.

And even in areas where an organism becomes patchy from the generally warmer temperatures, this very patchiness will cause a break-out from the Hardy-Weinsberg equilibrium and allow genetic selection to favor genes with a higher temperature ‘setting’. This would restore the fitness of an organism in at least some areas which might otherwise be lost.

Re 40. John, look, for the nth time here, I’ve said MBH isn’t the be all and end all of it. It’s nearly 6 years old, time passes, other recons are made. Do you seriously think MBH is the last word, or that anyone claims it is? When the TAR was written is was a up to date, in the next report it wont be. That’s OK, things move on, few now think Lamb’s recon was right – that doesn’t mean it author was pilloried accussed of scientific fraud. However, amongst all this mann hunt, what doesn’t dissappear is AGW and the evidence for continuing warming, nor does any evidence emerge that the globaally averaged temperature has been as warm as it is now (with AGW just really getting going) for the last millenia or two. Why don’t you put together you recon of the last 1000 years (think, you could be the first AGW sceptic so to do, distruction rather than construction normally being the watchword in sceptic circles these days…).

Would you expect the CET to be the same at the NH temps? I wouldn’t. We’re next to an ocean for a start, places in continents *must* see different climate variations.

Michael, prolonged depends upon the time V the magnitude. And no, I dont’t think, on a time scale of 10 years, the warming of the 1740 in the *CET* was massive, but it was fairly sudden. Quite interesting really, not sure anyone knows why it happened.

Re: #43
Plants can move much faster than you think, as long as you realize that except for some large trees and shrubs they move by spreading their seed far and wide. Most plants and animals have generation times much shorter than humans. Most plants in the temperate regions are annuals or biannuals. That means that just in a decade or two you can have 10 or more generations. A lot of plants also maintain a presence for long periods of time via underground rizomes, etc., but at the edges or with small populations they can spread amazingly fast as long as there is ground of the proper sort available.

There may be a few plants which have boxed themselves in, in terms of location vs nearest suitable new habitat, but not many. And even in the case they have, humans tend to like to collect specimens of plants and grow them far from their native habitat. There was a paper recently which talked about a bunch of plants which would likely go extinct, but most of them were flowering cactus-like plants in South Africa, exactly the plants thousands of plant lovers in the US, Europe and many other places would love to have either as house plants or to put in their gardens. Here in the Desert SW of the US, for instance, such plants would likely go native in many areas. Saving plants which are at risk is actually a lot easier than saving animals.

Because it’s a mostly futile exercise. I don’t think we have the data out there to do such a thing within necessary accuracy on a global scale to make the conclusions we’d like. I also don’t think the multi-proxy studies have enough spatial coverage to be anything more than educated guesses. And some/many of the proxies themselves have issues which can’t be addressed, either.

Even if you were to assume there is enough spatial coverage to estimate average global temps, that the proxies are accurate, and that you process the data properly, you get statistically unsupported results.

“Why don’t you put together you recon of the last 1000 years (think, you could be the first AGW sceptic so to do, distruction rather than construction normally being the watchword in sceptic circles these days…).”

This is familiar argument (right Dave D?). I think most sceptics would agree that there is insufficient data to collate into a single recontruction. Soon and Baliunas put together some disparate information that was breezily dismissed by Mann et al.

As I posted somewhere recently, it’s like a fortune teller who’s been exposed as a fake asking the exposer to fortell the future or shut up. Actually, come to think about it, this is closer to a simple statement of what’s going on than an analogy.

Re 46 and 47, Jeff, Michael, where is you sense of inquiry? Do you think Columbus would have got to the America’s if he’d listened to the people who said it couldn’t be done? Of course we can find out about the past, if geologists can do it so can climatologists (humm, perhaps you don’t favour geology either?). Perhaps you just lack ‘can do’ spirit😉.

In spite of the “can do” spirit, of perhaps thousands of people, no one as of yet has invented a perpetual motion machine. The “can do” spirit cannot overcome that which is impossible. Peters Columbus example is a logical fallacy.

RE # 47 “I think most skeptics would agree that there is insufficient data to collate into a single reconstruction.”

I would argue that the data is insufficient, and reconstruction impossible, as it fails the requirements of Shannon’s sampling theorem.

Greg, so many contradictory answers. Some sceptics, like Hans (hi Hans), say it can be done, some, you, it can’t. Oh well.

I’m not sure that climate reconstructions are trying to overturn the laws of physics – not a good parallel imo. Otoh I do think it’s more like you saying you dismiss recons of the geological past. After all, there are only relatively few fossils and big gaps in the record, it was, obviously, a long time ago – ‘how can geological recons be right?’ you might ask. Perhaps you think the geological record IS open to other interpretations? Perhaps it’s not possible to interpret that past from the rocks IYO?

Okay – as a geologist I have studied Ted Bryant’s (Wollongong Univeristy) evidence for a large Tsunami on the eastern coast of Australia assumed to be from a bolide impact between NZ and Australia circa early 15th Century. The Korean Choson Annals mentioned severe climatic conditions at the same time. Gavin Menzies in his 1421 adds further data of collapsing cultures around the Pacific at that time too and the collapse of the Ming Dynasty.

Geologically I have a strong suspicion that the earth interacted with a meteorite swarm, and slightly careened to a new rotational axis by this cosmic encounter. I suspect Greenland was then in more temperature latitudes, and is now closer to the North Pole. I am not invoking plate tectonics or crustal shifts.

This is a highly speculative idea of course, and upsets a few paradigms, but as a student of the electric universe or plasma universe as described by Lerner, Peratt. Alfven, and others, gravity is an irrelevancy in this issue.

The main problem is that most geologists are totally unconcerned with the last 1000 years – because we reckon nothing happened globally geologically. It is not even Quaternary studies!

And if you think I might a geological catastrophist, correct – and religion has nothing to do with it. Just picking up from the old geologists of the late 18th and early 19th centuries who were deflected from the science by that Whig Lawyer Charles Lyell. Strange how the English manage to affect science – then geology, today the Hadley Centre and climate science.

I add that the evidence is there in the historical record – if it were only read in the first place.

So the cause of the LIA which truncated the MWP was most likely a cosmic interaction during the 15th Century. This possibility would not have been factored into any climate model.

Peter, in theory it can be done, in practice you run into the difficulties of noisy samples in the wrong places. According to Nyquist you’ll need at least one GOOD sample for every thousand kilometers, this works well in europe and the US but fails miserably elswhere.

RE # 52 “Greg, so many contradictory answers. Some sceptics, like Hans (hi Hans), say it can be done, some, you, it can’t.”

Ummm … no. Hans conditions for being able to do a reconstruction are hypothetical. Hans states “It starts with proper data sampling”. Data sampling requirements are defined by the sampling theorem. Tree rings fail to satisfy the theorem (the data points are undersampled and aliased in the time domain). While I was writing this Hans beat me to the spatial domain aliasing problem. In one case were talking about amplitude vs. time, in the other amplitude vs. distance, the math being essentially the same (spatial being a bit more complex since it involves 2 dimensions).

“I’m not sure that climate reconstructions are trying to overturn the laws of physics – not a good parallel imo.”

And what parallel are you referring to? This looks like another one of your strawman arguments.

“Otoh I do think it’s more like you saying you dismiss recons of the geological past.”

I made no comment on those reconstructions. You are changing the subject and building a strawman.

“Perhaps you think the geological record IS open to other interpretations?”

Perhaps I have not looked at those reconstructions in enough detail to develop an informed position. Peter, please try to stay focused on the subject we are addressing.

No, my birth name, David Earl Dardinger is what I use online and have always used either as David Dardinger, Dave Dardinger or DED for short. You could google for my name if you want further information.

I try to use the “tit for 2 tats” style of reaction to ad homs, but since few people are willing to only insult me once, after a bit people have difficulty telling who started it. Of course this site has a bit different focus than a more general site so I tend to throw back what’s thrown to ‘skeptics’ in general and Steve / John A in particular rather than just at me. Likewise you, being clearly identified with the Mann camp, are likely to have blow-back from what they do in addition to your own interpersonal transgressions.

Failure to disclose the adverse (or at least “adverse looking” Rsq) is a big ethical lapse. As it is now, the best Mann can do is say that he withheld it to “prevent confusion”. But that sort of “confusion” (debate over potential limitations) is exactly what science dictates. sHare the potentially adverse results and let the debate go forward. Don’t hide things.

One Trackback

[…] criticized Mann for not publishing r2 verification scores which were practically 0 (very bad). Mann calculated these scores, but he never published the adverse results. When a committee formed by the United […]