Friday, March 17, 2017

A few days ago I posted an extensive ANOVA-type analysis of the successive reduction of variance as the spatial behaviour of global temperatures was more finely modelled. This is basically a follow-up to show how the temperature field can be partitioned into a smooth part with known reliable interpolation, and a hopefully small residue. Then the size of the residue puts a limit on the coverage uncertainty.

I wrote about coverage uncertainty in January. It's the uncertainty about what would happen if one could measure in different places, and is the main source of uncertainty in the monthly global indices. A different and useful way of seeing it is as the uncertainty that comes with interpolation. Sometimes you see sceptic articles decrying interpolation as "making up data". But it is the complement of sampling, which is how we measure. You can only measure anything at a finite number of places. You infer what happens elsewhere by interpolation; that can't be avoided. Just about everything we know about the physical world, or economic for that matter, is deduced from a finite number of samples.

The standard way of estimating coverage uncertainty was used by Brohan et al 2006. They took a global reanalysis and sampled at sets of places correponding to possible station distributions. The variability of the resulting averages was the uncertainty estimate. The weakness is that the reanalysis may have different variability to the real world.

I think analysis of residuals gives another way. If you have a temperature anomaly field T, you can try to separate it into a smoothed part s and a residual e:
T = s + e
If s is constructed in such a way that you expect much less uncertainty of interpolation than T, then the uncertainty has been transferred to e. That residual is meor intractable to integrate, but you have an upper bound based on its amplitude, and that is an upper bound to coverage uncertainty.

So below the jump, I'll show how I used a LOESS type smoothing for s. This replaces points but a low-order polynomial weighted regression, and the weighting is by a function decaying with distance, in my case exponentially, with characteristic distance t (ie exp(-|x}/r). With r very high, one can be very sure of interpolation (of s), but the approximation will not be very good, so e will be large, and contains a lot of "signal" - ie what you want to include in the average, which will then be inaccurate. If the distance is very small, the residual will be small too, but there will be a lot of noise still in s. I seek a compromise where s is smooth enough, and e is small enough. I'll show the result of various r values for recent months, focussing on Jan 2017. I'll also show WebGL plots of the smooths and residuals.

I should add that the purpose here is not to get a more accurate integral by this partition. Some of the desired integrand is bound to end up in e. The purpose is to get a handle on the error.
I'll use quadratic LOESS; the reason is that for SST at least there are regions which are otherwise smooth but have curvature on the desired r range which the quadratic can fit. I am forming the integrals with weights from TempLS mesh; these weights depend on geometry only, not on the integrand.

I'll first show the results for January 2017 as a table:

Jan 2017

r km

Ave T

Ave s

Ave e

Var s

Var e

100

0.772

0.7623

0.0097

1.339

0.809

200

0.772

0.7544

0.0176

0.991

0.536

400

0.772

0.7387

0.0333

0.715

0.653

800

0.772

0.7114

0.0606

0.528

0.942

r is the decay constant of the LOESS weighting; The averages are the respective averages of T and its partitioned components. I have also included the variances of s (not very meaningful, being not random), and e. It shows a minimum variance of e at 200km, with 400km not far behind. This level of smoothing seems to give the best fit. Below that, it just gets noisier; the LOESS makes more noise than it saves.However, the integral ("Ave e") continues to go down, since e becomes more random, so there is more cancellation. But the important thing is that in the mid-range it is uite small, and s does well approximate the integral. This is encouraging, because 400km is quite a good smoothing range - for most of the land plot, you would not expect s to have much interpolation error. This would not be true for some extremes, say mid-Africa.

I'll now show the averages over the 37 months from Jan 2014:

Averages 2014-Jan 2017

r km

Ave T

Ave s

Ave |e|

Var s

Var e

100

NA

NA

0.03

1.509

0.807

200

NA

NA

0.017

1.1

0.515

400

NA

NA

0.018

0.844

0.529

800

NA

NA

0.029

0.583

0.72

I have rubbed out ave T and s, since they aren't meaningful for this analysis. But Ave |e| is the important figure, and says that, if you accept that LOESS smoothing will remove at least a large part of the interpolation error from T, and transfer it to e, then that error is small, of order 0.02°C. This is quite an interesting result, because error on a monthly reading is normally reckoned to be about 0.1°C. That includes other things, but still, on this basis it seem quite a bit lower.

I'll show the WebGL plot of smooths and residuals for Jan 2017, and then the full table of the 37 months