Wednesday, June 29, 2011

This is a sequel to the previous post, where the list of data information, taken from the NOAA file, is printed. I have extended the time scale to 2000 years. This time I have used 41-year triangle smoothing. And I have experimented with inclusion and omission of instrumental temperatures.

This the plot over 2 millennia with HADCRUT3NH (with 11 year smoothing) in a background light grey.

More below the jump...

And here is the same plot, but without HADCRUT3, nor other reconstructions with instrumental components

Monday, June 27, 2011

This is the next in what may be a series of multiple time series plotted with animated gif's in the hope of greater visual clarity.

I found at NOAA a large table of proxy-based temperature reconstruction data. The background is described in this Wiki article. For the moment, I've plotted just the Northern Hemisphere reconstructions for the last millenium. For those who like that sort of thing, I've currently omitted the instrumental curve at the end. More plots may appear later here.

Update - I misspoke here. The Mann2008g and 2008h curves are a composite including instrumental temperatures. Crowley2003b also includes instrumental, but only up to 1993.

Unrelated - I have replaced the plot with one including a few more series from the same resource. I have use a consistent anomaly base period of 1936-1965 (the most recent 30-yr period common to all). The Oerlemans set is based on glaciers and is described as global rather than NH.

Here is the plot, unsmoothed. Below the jump is a summary of the data sets.

Update: In the listing below, I have added a number beside each bolded name. This is the number in the NOAA table as extended with HADCRUT etc. It will be useful for anyone running the code and wanting to vary the subsets chosen.

Wednesday, June 22, 2011

In my posts so far in trying to find more readable ways of presenting multiple series plots, I've tried multi-colors, and varying spectral maps. With multi-colors the lines may be easier to distinguish but harder to follow.

Eli, in a comment in the first post, suggested making curves respond to the pointer rolling over. This needs Java programming, which I can't do. But it occurred to me that an animated gif would achieve something of the same effect.

If each curve becomes, for a period, a continuous black line, then its path can be traced easily. Between times, the dots will separate out the local features. I'll switch to this more topical example - JAXA Ice extent:

In my previous post I described the use of alternating colors to improve the readability of "spaghetti" plots of time series, especially for readers who had trouble distinguishing fine shades of color. I updated several times, so if you read it a while ago, you might like to check it again.

There was feedback, here and at Lucia's, from readers concerned about color-blindness, especially red-green, That got me thinking more about appropriate color schemes.

The benefit of three colors alternating, as I had, is that one can hope thatmost people could distinguish at least two of them, since they come from different parts of the rainbow(). But maybe that can be reinforced.

The downside to all this is that alternating colored lines are harder to follow by eye than single color.

Anyway, I've looked more into the R function rainbow(). It is just scanning the hue spectrum in the hsv() function. I'll talk more about HSV and RGB color numberings below. For the moment, this just makes possible a more flexible approach to the spectrum, which may help with color difficulties.

I've plotted here the same TSI example using different spectral ranges. You can click on any plot to see enlarged. The top blue mini-graph shows the part of the spectrum that is enhanced - more colors are chosen from the region where that function is higher. Below is a bar with the uniform spectrum, and below that, the spectrum actually chosen. The rest is as before. Below the jump come some thick line versions.

I'll be interested to hear if any of these color selections seem to be more easily discriminated.

Technical details - RGB, HSV etc

This gets into how colors are represented by numbers. The simplest model is RGB - three numbers representing the amount of red, green and blue. A notation common to many graphics platforms is the string "#rrggbb" where r,g,b are hex digits. So "#ff0000" is just red, "#bbbb00" is gold, and "#444444" is dark grey. Often two more digits are added to represent transparency - FF for none.

HSV is intended to be more in line with the way we perceive colors. Again it's a triple of numbers (in R on a scale of 0-1). H represents hue - like the familiar spectrum. S is saturation, and pretty much represents the amount of color at a given brightness. The brightness (as opposed to darkness) is given by the value (black=0, bright=1).

One thing to note in R is that HSV emulates the spectrum rather than generates it. The violet colors are generally created using reds with blue, so the high end doesn't help if you have trouble with red.

The following plot should make this clearer. Each bar shows the effect of varying one of h, s or v from 0 to 1.

R has various routines that support rgb etc. rgb(r,g,b) turns a triple (range 0-1) into a string for color. hsv(h,s,v) likewise creates a string. Then there are rgb2hsv etc. To make all these plots, I just used the hsv() function.

To get the modified spectrum, I decide on a function f(x) as shown in the top plots above. Then I invert and sum, and normalize to a 0:1 range. That gives a mapping vector i (length N) that moves rapidly through the unwanted colors. So hsv(i,1,1) gives the N colors to be used.

Friday, June 17, 2011

Over at the Blackboard. Lucia was looking at how to get a good color scheme in R to show multiple time series. It's quite hard to get a set with good contrast.

I've been wondering about that too. It's a personal problem - my ability to distinguish color shades has decreased.

I've been dabbling with an alternative idea - stripy lines. Or at least alternating color segments. Then you don't have to rely on shades to make the distinction.

Lucia illustrated with some solar data from Leif Svalgaard. She used different dot-dash line styles to nelp make contrasts. I thought it would be really good to make these in alternating colors. You can do this by over-writing.

So here's what I came up with. Some may like it, some not. The lines are in principle more distinctive, but it's harder to see where they are going. Single contrasting colors are certainly better, if you can get enough of them.

Anyway, here's my plot. The R code is below the jump, and I'll put a zip file (TSIcolors.zip) with data on the doc repository. As Lucia noted, Leif's file just has blanks for missing data, so I edited the NA entries in.The colors are automatically and randomly chosen.

Update:

Peter O'Neill (oneillp) in comments suggested using R-supplied palettes. I think this is better, specifically rainbow(). He also suggested a way to fix the line segments in legend, using seg.len. I found my legend() function would not take that as an argument. I also found that the problem with lines only applied when in jpeg or png mode. I couldn't find the bug, so I wrote my own legend routine - using a subset of the regular arguments.

Update. Replacing the above update. I've redone in the spirit of Peter's second comment. Instead of a new legend function, I use the values returned by the the standard oneto overwrite the line segs. I don't then need to use seg.len

Revised pictures and code below.

Here's a plot with thicker lines. I couldn't get the legend lines right here:
And here is the (revised) code:

# Program written by Nick Stokes to make multi-colored curves
# File from https://www.leif.org/research/TSI%20%28Reconstructions%29.txt
# Blanks have been converted to NA

Wednesday, June 8, 2011

The Wegman Report was a report to Congress, invited by Rep Barton, Chair of the House Energy and Commerce Committee. The report has recently been revealed as heavily plagiarised. It was the centerpiece of hearings directed at Michael Mann's "hockey-stick" papers (MBH98, Nature 1998,MBH99)

However, this post is about the science. The thrust of the WR scientific criticism of MBH is that they used an inappropriate mean to normalize the proxy data - the mean for the calibration period, rather than the full period. This would tend to produce hockey-stick results.

The WR report was based on papers by McIntyre and McKitrick, particularly MM05b GRL. Wegman used their code, archived here. An important claim, frequently cited, is that the MBH algorithm would generate results of hockey-stick appearance, even if the data consisted of red noise with no such tendency. To this end, they showed three figures based on red noise simulations:

Fig 4.1 compared the first PC generated from such a simulation with the MBH reconstruction.

Fig 4.2 showed a histogram of "hockey-stick index" (a difference of means as a measure of HS shape)for 10,000 simulations using the limited and the full mean.It showed a normal unimodal distribution for the full mean ("centered"), and a bimodal distribution for the partial mean ("decentered").

Fig 4.4 came with this caption:One of the most compelling illustrations that McIntyre and McKitrick have produced is created by feeding red noise [AR(1) with parameter = 0.2] into the MBH algorithm. The AR(1) process is a stationary process meaning that it should not exhibit any long-term trend. The MBH98 algorithm found ‘hockey stick’ trend in each of the independent replications. It showed twelve HS-like PC1's generated from a MBH algorithm.

Deep Climate did a thorough investigation of these graphs and their provenance, to complement the work he and John Mashey did on the plagiarism. Regarding these plots he found:

the HS PC's shown were anything but random samples. In fact, the 10000 simulations had been pre-sorted by HS index, and the top 100 selected. A choice was then made from this top 100.

Although Wegman had said that "We have been able to reproduce the results of McIntyre and McKitrick (2005b)", the PC in Fig 4.1 was identical to one in MM05b. Since the noise is randomly generated, this could not have happened from a proper re-run of the code. Somehow, the graph was produced from MM05 computed results.

The red noise used in the program was very different to that described in the caption of Fig 4.4.

In this post, I mainly want to concentrate on the first issue. How much of the HS shape of the PC's that they showed was due to the MBH selection process (and there is some), and how much to the artificial selection from the top 1% of sorted HS shapes? To this end, I tried running the same algorithm with the same red noise, but using correct centering.

This post arises partly from a thread at Climate Audit. A commenter, oneuniverse, undertook the task of re-running the MM05 code. You can find his comments near here. His results are here.

I should first point out that Fig 4.2 is not affected by the selection, and oneuniverse correctly points out that his simulations, which do not make the HS index selection, return essentially the same results. He also argues that these are the most informative, which may well be true, although the thing plotted, HS index, is not intuitive. It was the HS-like profiles in Figs 4.1 and 4.4 that attracted attention.

However, it is also clear that the selection process did make the plots in Figs 4.1 and 4.4 more HS-like.

My own results may differ slightly from those of oneuniverse, in that I took the view that the MM05 code was mixing multiple issues. They noted that MBH also standardised twice by dividing by sd. So they did an explicit svd calc for the MBH emulations, but a standard R prcomp for the centered emulations. I took the view that since it is the decentering that is being studied, it is better to compare the effect of changing just that. I don't believe the other differences matter much, but I think it is better practice to vary one thing at a time.

I'll focus on Fig 4.4, since the PC shown in Fig 4.1 might as well be taken from that selection. Here is the original from the Wegman Report:

And here is my corresponding emulation, using the same selection procedures. It isn't identical to the WR version, but shows the same features.

Now here is what you get if you use centered differencing in the same program. I did this by replacing the calibration mean by the full sample mean but leaving everything else (there's a bit more to it - see update below).

Clearly, there is also a strong appearance of HS shape. But this has nothing to do with the decentered mean. It is the result of the prior selection for HS shape that Wegman used.
You'll notice that the scaling is also somewhat different. This is due to MBH dividing twice (in effect) by the sd in normalising, at least in the MM05 version. I'm not sure that this is a good idea, but it shouldn't make much difference in PCA. The second denominator in the MBH case is larger, because it is calculated relative to the deviant mean. Incidentally, the scaling in the original MM05 code was very different again.

And here is a properly representative sample of the decentered PC's. I simply took a consecutive block (actually 9001-9100) and made the same selection from those, instead of the sorted subsample. I didn't change the numbers within the 100 - because the data is randomly generated, it should be possible to then choose a subsample arbitrarily, rather than regenerate a random selection.

Now we see that there is still some tendency to HS shape, but much less. It can go either way, as expected. In the PCA analysis, sign doesn't matter, so the sign variations don't cancel.

Finally, here is the corresponding randomly chosen centered version. There is essentially no HS tendency.

The last two plots are a fairer indication of the HS tendency than is seen in Fig 4.4 of the Wegman report. It isn't nothing, but it isn't as neat as portrayed there.

Update: I should clarify "leaving everything else" . The "mannomatic" transform in MM05 is: mannomatic = function(x) {N = length(x); i=(N-MK):N; xstd = (x- mean( x[i]))/sd(x[i]); sdprox = sd.detrend(xstd[i]); mannomatic = xstd/sdprox; mannomatic } I've modified the MM05 version for clarity. MK was set to 78, the number of years in the calibration region. It determined the range i, which is used both for mean and standard deviation, and the further normalisation. My "centered" version sets MK=N-1 (all years) rather than 78.

Update: Conclusion.

Wegman, using the code of MM05b, claimed that the technique of MBH (decentering) would yield hockey-stick shaped PC1's even from red noise input (1st fig).

The 2nd fig confirms this. However, the effect is part due to MBH, and part due to a very artificial selection in the MM05 code, where a subsample (100 from 10000) was selected for HS shape prior to display.

The third fig shows that this artificial selection will itself create HS shapes without decentering - no MBH effect

The fourth fig shows how Wegman's Fig 4.4 should have looked, without the artificial selection. Some HS effect, but not nearly as much.

The final fig just confirms that with no selection and no decentering, the HS goes away.

Following the suggestion of oneuniverse, I have recalculated the effect of selection with a centered algorithm, done according to the original MM05b algorithm instead of my adaption of the MBH algorithm. It corresponds to the third plot above. It is of course from a different red noise instance, since the program was re-run. The prcomm() algorithm returns a very different scaling.

It appears to be to also have a very strong HS appearance - but judge for yourself. Again, I emphasise that this is done with the centered MM05 algorithm, as used by Wegman, and the HS derives sinmply from the artificial selection, not from anything unusual in MBH.