BEST, Menne Slices

A couple of years ago, Matthew Menne of NOAA applied a form of changepoint algorithm in USHCN v2. While changepoint methods do exist in conventional statistics, Menne’s use of these methods to introduce thousands of breaks in noisy and somewhat contaminated data was novel. BEST’s slicing methodology, in effect, implements a variation of Menne’s methodology on the larger GHCN data set. (It also introduces a curious reweighting scheme discussed by Jeff Id here.) With a little reflection, I think that it can be seen that the mathematics of Mennian methods will necessarily slightly increase the warming trend in temperature reconstructions from surface data, an effect previously seen with USHCN and now seen with GHCN.

Mennian changepoint methods break series into segments at breakpoints and the segments are then introduced into the averaging machine rather than the longer series. BEST describe their slicing as follows:

Our method has two components: 1) Break time series into independent fragments at times when there is evidence of abrupt discontinuities, and 2) Adjust the weights within the fitting equations to account for differences in reliability.

The second aspect of this method was discussed by Jeff Id here. I limit myself in this post to the first aspect.
BEST’s variation of Mennian changepoint methods applied to GHCN resulted in thousands of slices:

This empirical technique results in approximately 1 cut for every 12.2 years of record, which is somewhat more than the changepoint occurrence rate of one every 15-20 years reported by Menne et al. 2009.

I invite readers to momentarily reflect upon the properties of slicing methodology.

If Mennian changepoint methods remove more negative steps than positive steps, it will increase the trend of the final temperature series (and vice versa). This is a simple and fundamental point about slicing methods that is not made in the articles, but is worth holding on to.

If there is an overall warming trend in the data (the mix between climatic and urbanization effects doesn’t seem to matter for this point), it seems highly plausible to me that the changepoint method will be more likely to pick up a negative step and miss a positive step. ( I realize that the size of the effect would need to be established in the situations at hand. My guess is that a more artful mathematician than myself could prove the point from first principles but the existence and direction of the effect seems self-evident.)

A couple of years ago, I observed that the introduction of Mennian methods to USCHN appeared to impact GISS US (resulting in warming relative to USHCN v1), where the difference between the 1930s and the early 2000s increased by 0.3 deg C between 2007 and 2011, an increase that I postulated to arise from Mennian methodology, though I did not further analyse these methods at the time.

The fact that BEST is also running somewhat hotter than NOAA or CRU using the same GHCN data indicates to me that a similar phenomenon is at work here.

BEST do not directly reflect on this problem. They state that the introduction of unnecessary breakpoints “should be trend neutral”, though increasing uncertainty somewhat:

The addition of unnecessary breakpoints (i.e. adding breaks at time points which lack any real discontinuity), should be trend neutral in the fit as both halves of the record would then be expected to tend towards the same 𝑏! value; however, unnecessary breakpoints can amplify noise and increase the resulting uncertainty in the record (discussed below).

This argument, as far as it goes, seems fair enough to me. But it doesn’t deal directly with a potential bias towards detecting negative breaks over positive breaks.

From the perspective proposed in this post, that Mennian slicing methods applied to GHCN data yield a slightly warmer trend than CRU and NOAA using unsliced data should not be viewed as an unexpected result. Indeed, rather than classifying the result as unexpected, I think that the result is better described as “trivial” (as this term is used in mathematics to denote something simple rather than unimportant.)

We already knew the results from (more or less) averaging unsliced GHCN without allowance for urbanization and landscape changes. So the results from sliced data also without allowance for urbanization and landscape changes are unsurprising.

The issue is, as it always has been, the contribution, if any, of urbanization or landscape changes to the trend. Here BEST provide a large loophole, as they warn that their methods will not cope with “large scale systematic biases”:

however, we can’t rule out the possibility of large-scale systematic biases. Our reliability adjustment techniques can work well when one or a few records are noticeably inconsistent with their neighbors, but large scale biases affecting many stations could cause such comparative estimates to fail.

Unfortunately, “large scale systematic biases” from urbanization and landscape changes are precisely what’s at issue and BEST’s slicing methodology does not directly bear on this problem. They purport to address this issue in a companion paper on urbanization, which I will discuss in a forthcoming post in which I will discuss the large discrepancy between BEST and satellite data, a point not touched on in the articles themselves.

65 Comments

BEST — “we can’t rule out the possibility of large-scale systematic biases. Our reliability adjustment techniques can work well when one or a few records are noticeably inconsistent with their neighbors, but large scale biases affecting many stations could cause such comparative estimates to fail.”

Indeed, Steve should get paid for this stuff. The fact he gives so much for free is getting on the order of Tim Berners-Lee himself, who invented the Web that made blogs possible and never profited from it – what Stephen Fry rightly says is one of the great altruistic acts of our generation. But Fry may not recognise McIntyre in the same bracket and he should.

He was on contract at CERN it’s true. Compared to what, say, a young Marc Andreessen made from his invention a few years later it may not have seemed much. And he’s been paid since at the W3C. Compared to what the founders of Google etc … But I delight in correction, as always 🙂

“If there is an overall warming trend in the data (the mix between climatic and urbanization effects doesn’t seem to matter for this point), it seems highly plausible to me that the changepoint method will be more likely to pick up a negative step and miss a positive step.”

Given that part of the purpose of the whole exercise is to determine if, and to what extent, there is an “overall warming trend in the data,” are you referring to an overall warming trend in the portion of the data that is not affected by changepoint, or does initial application of the changepoint method contribute to this overall warming trend (iow, is it self-reinforcing/circular)? Is there a programming decision point/parameters made up front in determining how to apply the changepoint method that itself relies on a preliminary assessment of whether there is an “overall warming trend”?

Wouldn’t a simple test be to run the whole BEST analysis without the slicing, or at least the “empirical” portion of it? I don’t know if they did this (if they did I missed it in the papers) but it seems like it wouldn’t be too hard for them. Time to download that matlab code and renew my $5 subscription…

To Eric’s question “Is there a programming decision point/parameters made up front in determining how to apply the changepoint method that itself relies on a preliminary assessment of whether there is an “overall warming trend”?”

From the paper:
“For the present paper, we follow NOAA in considering the neighborhood of each station and identifying the most highly correlated adjacent stations. A local reference series is then constructed by a weighted average of the neighboring stations. This is compared to the station’s records, and a breakpoint is introduced at places where there is an abrupt shift in mean larger than 4 standard deviations.”

The most obviously excluded systematic bias would seem to be related to urban sprawl…??

From the paper:
“For the present paper, we follow NOAA in considering the neighborhood of each station and identifying the most highly correlated adjacent stations. A local reference series is then constructed by a weighted average of the neighboring stations. This is compared to the station’s records, and a breakpoint is introduced at places where there is an abrupt shift in mean larger than 4 standard deviations.”

Without reading the paper, the question that goes begging is: are “corrected” stations used to check “uncorrected” stations? If so, then the order stations are evaluated may matter, and it may also introduce a bias – the first corrected station would bias it’s neighbours in the direction of it’s correction relative to it’s own correction. I seem to recall that NASA/NOAA has done this before…

One way to check for such biases, would be to run the correction repeatedly on it’s own output – the output should, I would suggest, asymptotically approach it’s local average. Any other result should be a “dig here” marker.

Bruce
I’m not sure why you’re focussing on CRUTEM3. It’s not clear that BEST even had access to it, and it is far smaller than BEST. In recent times their data is dominated (in station numbers) by GSOD. Unlike GHCN, GSOD does not really try to be geographically representative, which is why the NH bias.

I’m pointing out that BEST correlates best with CRUTEM3 NH. That info led me to believe that BEST may have favored NH over SH and you helped by noting the ratio of NH to SH stations was 10 to 1 for the monthly anomaly I found most unbelievable – January 2007.

As far as I know, CRUTEM3 is not NH biased. I would suggest BEST is since it matches up with CRUTEM3 NH , not CRUTEM3 global.

I would infer if Mennian methods would favor stations that trend positive, then SH would be ignored. Or BEST doesn’t care where a station is.

Hemispheric oceanic and atmospheric circulation systems are largely independent from one another. So even if seasonal factors are totally removed (unlikely) sites in the same hemisphere will correlation better than those in different hemispheres.

If the dataset has more NH than SH records then a method that weights by correlation will defavourise the minority partner. The under-represented SH will find itself further downgraded.

If there is an overall warming trend in the data (the mix between climatic and urbanization effects doesn’t seem to matter for this point), it seems highly plausible to me that the changepoint method will be more likely to pick up a negative step and miss a positive step.

You seem to be assuming that their changepoint detection would be more likely to ignore warming changes; that it is biased in favor of excluding them on an a priori assumption that the are more likely to be real. If this is not the case, if the change detection is unbiased, then you would expect a larger toal number of warming changes to be dropped (assuming they are a greater proportion of the overal changes).

Steve: I am postulating that their method is more likely to presume the positive steps are real; I am not “assuming” it. Evidence in favor of this surmise is that sliced data in the two cases discussed here runs hotter than unsliced data.

I have no idea why the effect wouldn’t bias the result or how you would correct for it.

The algorithm has to look in the mean of something for a sudden shift. If stuff is drifting up in trend, a down shift would appear bigger than a sudden upshift. It’s almost better to just let them cancel except that many sudden shifts are going to be caused by construction which will also probably bias high.

Actaully, I found the argument that they couldn’t detect UHI in the data a referendum on the poor quality of the UHI method they used. It still probably isn’t a huge effect but we don’t really know how much. The non-detection though kinda proves that the method failed.

Let’s suppose that you have a station originally in a smallish city which increases in population and that the station moves in two discrete steps to the suburbs. Let’s suppose that there is a real urbanization effect and that the “natural” landscape is uniform. When the station moves to a more remote suburb, there will be a downward step change. E.g. the following:

The Menne algorithm removes the downward steps, but, in terms of estimating “natural” temperature, the unsliced series would be a better index than concatenating the sliced segments.

It’s probably a good description of the actual mechanism. Case studies in regions with long time series show that the bias is about 0.5 ° C in the twentieth century. It is not possible to infer directly the value of perturbations by urbanization but at least it shows that the effect is important and not limited to big cities.

Thanks, I hadn’t considered it from that angle. Considering this, it doesn’t seem possible to correct any of the data for steps even when stations are moved. Perhaps data with steps should just be chucked out but that lends itself to a bias in step detection in the direction of trend.

I have a fundamental problem with the use of any scalpel and suture technique in the context of determining long term temperature trends. The basis for my objections are based upon fourier analysis and information content. My
argument is summarized in these bullet points.
1. The GW climate signal is extremely low frequency, less than a cycle per decade.
2. A fundamental theorem of Fourier analysis is frequency resolution dw/2pi Hz = 1/(N*dt) .where dt is the sample time and N*dt is the total length of the digitized signal.
3. The GW climate signal, therefore, is found in the very lowest frequencies which can only come from the longest time series.
4. Any scalpel technique destroys the lowest frequencies in the original data.
5. Any suture techniques recreate a long term digital signals.
6. Sutured signals have in them very low frequency data, low frequencies which could NOT exist in the splices. Therefore the low frequencies, the most important stuff for the GW analysis, must be derived totally from the suture and the surgeon wielding it.

Perhaps it can be argued, demonstrated, and proved, that somehow the low frequencies were extracted, saved, and returned to the signal intact. Statements like the following from Muller (WSJ Eur 10/20/2011) make me believe that most people do not appreciate this problem.

Many of the records were short in duration, … statisticians developed a new analytical approach that let us incorporate fragments of records. By using data from virtually all the available stations, we avoided data-selection bias. Rather than try to correct for the discontinuities in the records, we simply sliced the records where the data cut off, thereby creating two records from one.

“Avoided data-selection bias” – and Embraced high frequency selection bias and bias against low frequencies. . There is no free lunch here. Look at what is happening in the Fourier Domain. You are throwing a way signal and keeping the noise. How can you possibly be improving signal/noise ratio?

If the low frequency is important – and it sure is, why cut them out of the frequency spectrum at all? I feel the only way to evaluate the GW signal is to do analysis on completely uncut temperature records. Yes, there will be no adjustment for station movements. Yes, UHI will be guaranteed to exist in the signal, but it can be in there only once! The danger of the scalpel-suture process is that UHI can be applied fractionally multiple times in from each tooth in the saw-tooth signal. Was UHI removed? Or was half a UHI signal applied 6 times in six splices of five station moves?

It is quite possible I have made a fundamental mistake in my thinking. If so, please cite chapter and verse where I missed the preservation of the original low frequencies that existed before the scalpel was applied. I’ve posted this argument at WUWT April 2 , Climateetc Oct 19, and wmbriggs Oct 22 Thank you for allowing me to attempt to rephase the argument here.

Hmmm, there’s a very influential article in dendro reconstructions entitled the “segment length curse”, which laments the problems of creating a long term index from short segments.

I confess that I always worry about interpreting what we see in these records as manifestations of different “frequencies”. I know that the data can be factored this way mathematically, but I worry that people will reify the decomposition,

Thanks for the followup. I do not doubt that the long term reconstruction LOOKS credible. After all, it is just what a lot of people were expecting.

My point is that the process as I see it effectively destroys the data we seek. Proof that it is authentically recreated in the suturing process are scant. There is a low frequency returned in the final produce, but it is not original. It is counterfeit.

Based upon what must be happening to the data in the Fourier Domain, I am having the same reaction to the BEST process as most of you would have to a process that seems to violate the 2nd Law of Thermodynamics.

Their Fig. 4 effectively shows how sequentially adding the low frequency components approaches the long term character of the 2489 year sequence.

I bring these graphs to your attention only as a good illustration of how LOW FREQUENCY content in the data is essential to the study of multi-decade, nay multi-century, temperature trends.

To recap, BEST’s methodology of the suture is to trust the short term trends, and pay little attention to absolute temperature values. If their mean segment lenght is 12 years (as per P.Solar above, Nov.4), then it is as if BEST took Lui’s Fig.2 and cut away EVERYTHING LEFT of 0.08 leaving little but white noise scraps to suture back together for a century+ “signal”. BEST has some low frequency content in its final sutured product, but since all the original low frequency was destroyed by the scalpel, what low frequecy reappears is highly suspect.

I compliment Lui et al, on a study that highlights Fourier analysis of the data. I cannot speak to the quality of Lui-2011 paper because I have not and cannot validate HOW THEY retained their low frequency data from “tree ring to computer”. I do not know how they retained DC shift and the DC trend prior to the Fourier analysis. I’m also a little puzzled how they got a 1324 year freq component from a 2485 year time series, since integer cycles of the processed signals are the product. Perhaps a variable window size was applied…

The Reference also has a Table 1 of other proxies and paleo-temperature records that show periodicities of between, “55-76 years.”

So, the pregnant question, and admittedly my hobby-horse, can/does the BEST methodology preserve a hypothetical 60 year frequency in data that is chopped up by the scalpel? I do not see how it can on a theoretical basis, and from P.Solars’ pics above, there seems little evidence that it does.

This is exactly what happens in reality, therefore Giss does not correct these discontinuities. These bias are about 0.5 ° C for the twentieth century. I would make one reserve about the curve undisturbed. It seems difficult to know whether this curve is actually undisturbed. In my opinion the only way to do this is to use proxies. You can see such a comparison in this graph:

If I understand BEST’s algorithm correctly, renormalization after slicing should be expected to reduce long-term cycles and fortify long-term trends. Since all linear trends are perfectly correlated, it should be apparent that trending records have higher correlation than non-trending ones. Correlation weighting of stations clearly favors trends.

For what it’s worth, the BEST GST yearly anomaly series is highly (R^2 = .87) correlated (at zero lag) with a world-wide index of century-long records that incorporates capital cities with obvious UHI trends. It correlates rather poorly (R^2 = .30), however, with a similar index that excludes capital cities.

That’s a good observation. Perhaps more specifically all straight lines with a time shift will still have corr=1 whereas two 12m sinusoidals with a 3 mth delay but otherwise identical in form will have corr=0.

Very basic question: Why weather stations are treated equally? I don’t mean just siting issues, UHI; I’m concerned with land vs. sea. By just glancing to previous article — Detroit Lakes — all that is apparent is noise going from -20C to +20C. Why such data is even taken seriously? Compare it to any sea buoy data (5K or so annual diff). Compare it to the station situated at the pole (no day-night time variation)http://www.nerc-bas.ac.uk/icd/gjma/amundsen-scott.ann.trend.pdf
Isn’t the absence of noise there remarkable?

I note with interest the increasing emphasis on actual (or imagined) abrupt changes in climate data records put forward by BEST. Especially intriguing is the language used to describe (albeit somewhat vaguely) how choices of segmentation of long records was done or decided.

Having examined – over the last 16 years – large amounts of data (time series) for signs of abrupt changes I have no difficulty at all in accepting BEST’s hypothesis that climate data appear to be suffused with abrupt changes. The problem is how to provide a convincing case demonstrating inferential statistics that will be accepted as being “significant” by the mainstream climate community.

In many cases abrupt shifts are so obvious that I wonder why traditional climatologists appear not to have noticed them. The only one that I’m aware of that is widely accepted as being real is the PDO event in 1977. Can anyone point to other accepted step changes, and if so, where are they to be found?

I don’t see why the large-scale systematic urban bias issue isn’t best addressed by an estimate in the style of McKittrick–looking for residual correlation between regional economic activity and regional temperature anomaly–even for those who object to the specific implementation in that paper. Messing around with breakpoints and regional correlations between stations seems like a beside-the-point morass.

Re: srp (Oct 31 18:35), we read that 30% of stations show a cooling trend. Even without recourse to any station metadata, it ought to be possible to investigate the importance of UHI via the spatial structure of the cooling stations. If they are spatially correlated, this suggests regional climate trends. If approaching randomly distributed, that would be interpretable as UHI.

One would need a Monte Carlo-style analysis to judge the significance of this, though it would not be easy – I guess would require generated data as well as randomization of existing.

Sometimes a statistical analysis seems disconnected from the climate data. BEST quotes thus “For the present paper, we follow NOAA in considering the neighborhood of each station and identifying the most highly correlated adjacent stations. A local reference series is then constructed by a weighted average of the neighboring stations. This is compared to the station’s records, and a breakpoint is introduced at places where there is an abrupt shift in mean larger than 4 standard deviations.”

In the climate context, what criteria are used for “most highly correlated” – something meaningful like R>+0.8? How are “neighboring statuions defined? – as closer than something like 5 km? Is R calculated on daily, monthly, annual data, smoothed or not, homogenous or hetrogeneous, Tmean, Tmax or Tmin? …..

The following trivial essay is purposely naive, but it needs consideration before one can have confidence in the techniques under discussion. Especially, what significance can be placed on R for periods between break points of 12.2 years or shorter? (Sorry, my web hyperlink is shot today, will have to paste into address bar.)http://www.geoffstuff.com/GHS%20on%20chasing%20R%20-%20extended.pdf

I’ve noted in posts and in discussions with friends that the satellite data don’t agree with any of these land based records, and that the issues still needs to be addressed. So I await Steve’s thoughts.

And, parenthetically, I agree with srp’s comment of 9:36 PM — I have also made this point in discussions with friends:

“I don’t see why the large-scale systematic urban bias issue isn’t best addressed by an estimate in the style of McKittrick–looking for residual correlation between regional economic activity and regional temperature anomaly–even for those who object to the specific implementation in that paper. Messing around with breakpoints and regional correlations between stations seems like a beside-the-point morass.”

What’s wrong with the following
1) Apply Mennian method to data and get result R1
2) Multiply data by -1 and apply Mennian method to that and get result M1.
3) Multiply M1 by -1 to get result R2
4) Calculate (R1+R2)/2

What difference in result does one get vs R1? How significant is it vs R1?

If biases in the process are a result of the methodology giving undue influence to specific temperature series or series with particular traits, then R1 and R2 will basically be the same and nothing is gained in applying your procedure. In particular, if you calculate the correlation between two series, then multiply each by -1 and recalculate the correlation, your two answers will be identical. Adjustments based on correlations will then affect R1 and R2 similarly.

The type of adjustment that could produce unequal R1 and R2 would be one where the changes in the data are made in a specific direction, e.g. those based on the metadata of the individual stations.

One obvious diagnostic that BEST did not provide – presumably because of their undue haste in the answer – is a histogram of the thousands of steps eliminated in the slicing. If they provide this, I predict that the histogram will show the bias that I postulate. (This is pretty much guaranteed since the BEST trend is greater than CRU trend on GHCN data.)

SM,
One of the things which puzzles me here is that if Mennian slicing is introducing a bias in step removal associated with direction of trend, then we would expect to see a bias towards removal of positive steps over a period when trends were predominantly negative. Yet BEST shows a flattish temperature profile during the 1950 – 1976 period when other land-only profiles show a distinct negative trend. This is the opposite from what I would expect.

If the test triggers off the magnitude of any abrupt change it would be more likely to trigger on positive changes during a positive trend. Hence, in removing large +ve steps during warming and -ve steps during cooling it would tend to attenuate variation.

This indeed seems to be what is happening in BEST. One of the first things that struck me on seeing their graph was the lack of variation that you also comment on.

This does not seem to be the same as is shown in Steve’s figure 1. of GISS adjustments where the new adjustment plays down early 20th c. warming and boost late 20th c.

It’s shame Steve did not do a deeper audit on that change when he noticed it. It looks like yet another “adjustment” tailor made to adapt the data to the AGW hypothesis.

Anyway , thanks to Steve for posting that graph again. It is a very telling example of adjustments.

>>
This empirical technique results in approximately 1 cut for every 12.2 years of record,
>>

which instantly suggests to me that a primary factor in the trigger is the solar cycle. Since this is the dominant variation this _should_ be expected.

Since solar variations are not symmetrical , I would not expect the result to symmetrical. How do they deal with that? ….

>>
The addition of unnecessary breakpoints (i.e. adding breaks at time points which lack any real discontinuity), should be trend neutral in the fit as both halves of the record
>>

“Should be” ? WTF? These world class scientists and statisticians did not even _look_ to see what the effects of their arbitrary , ad hoc methods were.

Even if this was an established method, I would expect them to have tested what the effects were and whether they were reasonable as part of due diligence.

Still, I’m sure all this will get picked up by the world class reviewers that the journal has sent the draft paper too. And that Muller’s strategy of global distribution of the draft will ensure it gets picked up before publication.

If there is an upward slope in the data, the anomaly method used by all other “pro” analyses will under estimate the trend. The least squares method used by BEST, tamino, Jeff/Roman, Nick Stokes, Chad and maybe others will produce the true (truer) trend contained within the data.

I think that your point about Lampasas, TX is more pertinent: does the algorithm identify times at which the physical circumstances of the location (or perhaps physical integrity of the thermometer) changed? More properly, enough of them accurately enough.

I think your conjecture that the algorithm is biased to enhance or suppress the warming effect depends on whether the mean size of the upward jumps is the same as the mean size of the downward jumps (i.e., the mean absolute sizes of jumps) and whether their standard deviations are equal. If the upward jumps are greater in number than the downward jumps, but they have equal means and standard deviations, then I don’t see how the algorithm introduces a bias.

Another look at figure 1 and I note that the 20th portion is not far off a noisy parabola centred on 1940. That is the same as a linear increase in dT/dt. That in turn is the result of the forcing produced by an exponentially increasing level of CO2 starting around 1940.

This “adjustment” is AGW in flesh and blood.

This is exactly what comments like “the fingerprint of human effects on climate” refer to.

Well it certainly resembles a human climate fingerprint. The trouble is, someone has been interfering with the crime scene!

Now if your model does not fit the data , there are two things you can do: change your model or change your data to fit the model.

Maybe I should explain some background to the above statement. It’s probably a lot less obvious than it seems to me having studied it. Apologies.

1.
Radiative forcings cause a _rate of change_ of temperature, not a simple temperature change. Many people do not seem to realise this.

Temperature is measure of heat energy, a change in temp is a change in energy: is measured in energy units, joule or whatever.

“Forcings” are power terms (W/m2 etc) a watt is a joule/second. It’s a rate of change not a simple change in energy.

2.
To a rough approximation CO2 levels have been rising exponentially ( a*exp(b*t)+c ) . However, The spectral absorption of particular wavelength increases, not linearly, but with the log of the gas concentration. This in effect undoes the exponential and leaves a linear increase in “greenhouse” effect ( ln(a) + b*t )

3.
A linear increase in rate of change of temp when integrated over time gives a quadratic increase in temperature. ( ln(a)*t + b/2 * t**2 ). In fact the linear term seems to be fairly small and the parabolic term dominates.

4.
If industrial pollution is deemed to have really kicked in after WWII the low point of the parabola would be around that point or earlier.

So what Steve’s figure 1 shows is that, by accident or by design, the new adjustments introduced in 2007 by the Menne method cause an addition signal which is almost EXACTLY like the “fingerprint” of a CO2 induced warming. And the magnitude of 0.3C is very significant
over a 50y period.

Now, the pretentiously named BEST project is using similar methods but has not done (or at least has not made public) any evaluation of the effects of this technique on their data.

Its a recent paper by Menne looking at how the PHA responds to various synthetic inhomogenities. The interesting aspect is that its blinded, with a third parties knowing the “real” results and comparing the Menne and Williams model outputs to them.

That’s an interesting presentation. It’s good to see this sort of verification being done. Although five years after the fact seems a bit late to start, better late than never.

How long before BEST actually evaluate what effect their methods are having on the data?

However, I do not see anything in the presentation’s application of their method to the real data that would correspond to the downward adjustments of the earlier 20th c. show in Steve’s figure.

It does not appear that this is the same algo that is being applied to USCHN as examined by Steve.

Now it could be that an unfortunate combination of T-obs, instrumentation and siting changes have all conspired to a profile that is the antithesis of AGW and their method has skilfully corrected this improbable conjugation of events.

While I can see the value of these statistical methods in identifying discontinuities that ought to be corrected, I don’t see how you can determine the proper correction unless you identify the underlying cause of the discontinuity. For example, if a discontinuity is caused by moving a station to a different elevation, then the appropriate correction is to adjust the entire history before or after the move by the amount of the discontinuity.

However, if the discontinuity was cause by and old Liquid-in-Glass thermometer ( which had experienced zero creep ) being replaced with a new thermometer, then the appropriate correction would be to assume that the new thermometer was correct, and the old thermometer was correct when installed, and to scale the old thermometer readings down proportionally over the life of the old thermometer. If this type of discontinuity was treated the same as a station elevation change it would introduce false warming by the amount of the discontinuity.

That’s a very valid argument Eric. It is similar to another problem that occurred to me earlier. How would such an automated method deal with a volcanic event?

An abrupt downwards change would likely be interpreted as an anomaly (in the true sense of the work) and “corrected”. The gradual recovery in the following decade would not cause a correction in the other sense.

Unless this is studied and accounted for volcanoes are likely to result in spurious warming “corrections”.

The effects of all these methods need fully auditing. The authors should be doing this as a matter of course before even writing their draft papers. This is not a option to looked into 5 years later as “further research”.

Any net positive or negative correct needs to be identified. If it can be quantifiably attributed to a known cause, fine.

All corrections should be logged with time, magnitude and location and the resulting data studied. If there are clusters of corrections in 1963 ,for example, you have issues that need correction before publication.

The current position of B-est that their adjustments “should be trend neutral” is absolutely unsatisfactory. This needs to be studied, documented and the results published.

One way to check if BEST is interpreting volcanic eruptions as station anomalies to be “sliced” out of the record would be to check if there is a cluster of negative slices around June 1991, at the time of the Pinatubo eruption. (I’ll leave this to P Solar or others.) This wouldn’t be a problem if the supposed anomalies were checked against nearby stations, but that doesn’t seem to have been the case with the BEST slicing algorithm, at least not according to Steve’s description above.

Another factor that could cause spurious anomalies of the type Steve discusses in his 10/31 324PM comment above is repainting of the old Stephenson Screens/Cotton Region Shelters. As Anthony noted when Surface Stations was just getting started, these screens were supposed to be repainted periodically to maintain constant long-run whiteness. However, if they darken for a few years and then suddenly become white again, the BEST algorithm might detect an anomaly and thereby eliminate the correction!

As Steve points out, it would be very useful to see an analysis of the BEST slice points, both by size and by time.

[…] 1 – Chopping of data is excessive. They detect steps in the data, chop the series at the steps and reassemble them. These steps wouldn’t be so problematic if we weren’t worrying about detecting hundredths of a degree of temperature change per year. Considering that a balanced elimination of up and down steps in any algorithm I know of would always detect more steps in the opposite direction of trend, it seems impossible that they haven’t added an additional amount of trend to the result through these methods. Steve McIntyre discusses this here. […]

[…] stations to better-quality rural stations through biased detection of changepoints. In a comment on the Berkeley study,which used a similar method, I noted their caveat that the methodology had […]