Thursday, October 18, 2012

This is my third post on the new beta release of the ISTI temperature database. In the first post, a Google Maps display, I noticed a number of stations which appeared to be duplicates. So I thought I'd check more comprehensively.

I first ordered the inventory alphabetically by name. A complication here is that 430 have no name. Some still showed up as duplicates.

The next step was to collect pairs of adjacent stations whose data began in the same or adjacent year. Then I did a rough distance check and retained pairs for which the sum of lat and longitude differences (absolute value) was less than 1°. That's within about 70 km at most near the equator, requiring greater closeness near the poles. In fact most pairs at this stage have near identical coordinates.

There will be some missing. I suspect Vienna/Wien are duplicates, but are missed alphabetically. The two Trondheims I noticed are assigned coords too far apart. And of course, my test doesn't prove duplication - just flags for checking.

Again, the dataset is large, and takes a few seconds to load, so I have put it below the fold. It is a globe that shows individual station trends with shading on a triangular mesh. The shading color is accurate at the stations themselves. You can display the stations and mesh, and click to pick up the numerical information. There is a little navigator map that lets you reorient the globe as you wish. Maps are available for periods of 30, 45 and 60 years to present. You can magnify 2x, 4x or 8x.

I should emphasise that these are trends for individual stations - there is no modelling or smoothing, except for the triangle shading interpolation. I find that valuable in that it shows the spatial correlation (or lack of). I was interested to see if the larger ISTI set gave a similar result to GHCN.

I think it does. A notable feature of all these plots, whether for period averages or trends, is that the US seems more of a patchwork than ROW. This is of course partly the higher density of stations, but I think there really is less coherence. Perhaps there is a quality issue (associated with the large numbers).

How the trends are calculated.

The trends are calculated using monthly data over calendar years (So I didn't use 2012 months). There is a tricky issue with seasonality. If you take the trend of a year from autumn to autumn, monthly, you'll get an uptrend even if the year ends as it began. The calendar year is mostly winter to winter or summer to summer, so the effect is much diminished.
I allowed for this using the method from TempLS. Instead of just one intercept, I fit (OLS) twelve monthly offsets (means) as well as the slope. This subtracts out the seasonal variation and gives the underlying trend.

Sea Temperatures

As usual I've added SST's by taking a published gridded set and putting artificial stations at the grid centres. Previously I've used ERSST, this time I used HADSST2. The emphasis is on the land data, but the mesh goes haywire without ocean nodes. The HAD grid is coarser, so the ocean trends are more patchy.
Because of the arbitrary placement of stations at 5x5 grid centres, it can happen that they appear on land. Please excuse.

Update There are some odd hotspots in southern oceans for the longer time periods. This is an artefact. It is caused by an arrangement I have which enables me to use the same mesh for all three time intervals (meshes take time to download). I use a single mesh with a node for every station with any admissible trend - a station has to have 80% of months reporting to be assigned a trend over a period. Where for a time period a station doesn't have a trend, I assign an interpolated value for coloring purposes. The station is not shown, so the effect should be a minor upset in the shading only. However, there are in those parts some stations which do not have any connected stations with trends (for that time interval - worst at 60 years). Then the interpolation goes wrong, and the shading shows artificial heat. I don't think it happens anywhere on land.

Monday, October 15, 2012

A few days ago, Peter Thorne of NOAA noted in a comment that a new Initiative ISTI, with NOAA involvement, has released a large new database of surface temperatures. It's on a similar scale to BEST, and I'll do comparisons in due course. But as an initial step, I thought it would be useful to gather a Google maps presentation, as I did for GHCN.

The beta release is here. I used the recommended merge (3 Oct version). The data combines TMIN, TMAX and TAVG (and is big!); I extracted the TAVG. There were 39430 stations in the inventory.

This is a big set for the GM application, so I've divided it into 8 regions with about 5000 each. You can look at them all at once if you like, but it will run very slowly. Selecting one or two regions is much better. There is a little map at the right of the display showing where they are. Because the data takes several seconds to download, I've put the map beneath the fold.

The idea of the map is that it shows stations with tags with information that you can pop up by clicking. But the main use is that you can filter by categories. You can choose ranges of start date (of data), end date, duration and altitude. The mechanics are that you make these selections, select regions, and then press one of the colors (for tag). What you've asked for will appear in that color, additional to what was there before. A special (and useful) color is invisible. The range choices combine with "or" logic, so you get "and" by making what you don't want go away. Because it is "or", you need to suppress the "All" button to make other choices. The buttons toggle. The middle columns with gt and lt signs also toggle.

I've included GHCN stations for comparison. There is a checkbox for each database. To compare you'll probably want to display in one color with one box ticked, and in another with the other box ticked (only one at a time). The green pin is useful here, in case of overlaps.

The map starts out blank, waiting for you to choose a region (or two). It's below the jump:

Usage

The map initially shows no markers. You need to select regions. I'd recommend starting with just one. If you click one of the marker colors, you'll then see a mass of markers in that region. You can filter some out with the invisible button - remember to toggle the "All" button when making selections. You could filter out, say, all stations starting after 1850. The selections are only operative if the left radio button is on.

You can ask for a different selection and a different color. It is the color request that creates actions. This second request doesn't erase markers already showing, though it will change the colors of those that qualify.

Note that choosing a small number of regions helps with performance, but other choices which reduce tags on screen do not help (the tags are there but invisible).

Worked example

I was curious about whether ISTI had more really old data than GHCN. As I found earlier, BEST v1 has a similar number of stations overall, but little new before 1850. So I did this:

Set Region 6

Unset All, set StartYr, and change the textbox to 1800.

Under Actions, click Yellow.

Unset ISTI, set GHCN

Under Actions, click Pin.

So I see a lot of yellow tags in Europe, some with green pins, and some green pins on their own. Clicking on the pins brings up info including start dates.
Some things I notice:

In UK, GHCN has Gordon Castle, Greenwich and Manchester. ISTI doesn't, but has a Central England, which may cover the last two.

Both have a Trondheim in the right place, with similar dates (ISTI 1761-2012, GHCN 1761-1981). But ISTI has TRONDHEIM_VAERNES, about 2.5° further E, from 1762-2011. Duplicate?

GHCN has Lund, 1753-1773. ISTI not.

ISTI has two Budapest records, both starting in 1780. GHCN has one.
ISTI has San Fernando, from 1786, GHCN not.

ISTI has two Prague records (PRAHA-KLEMENTINUM,PRAHA-RUZNY), one starting in 1771, the other in 1775. GHCN has Praha-Ruzyne.

ISTI has a record for Vienna and one for Wien. Both start in 1775.

ISTI has OBERSCHLEISSHEIM and UCCLE (Belg), GHCN not.

So ISTI has a few extras in that period in Europe, some of which may be duplicates.

Thursday, October 11, 2012

Over the last two years I've been exploring various ways of using interactivity to make climate data more accessible and attractive. I've been trying to maintain a gallery, but that takes time, and the effort has been lagging.

So I want to write a post which just summarizes the techniques used, to act mainly as a catalogue with pointers.

KML, Google Earth

The first thing I tried was writing Google Earth files, using their API language KML (of the XML family). This was really just to organise the data provided by the various surface temperature collectors (GHCN, GSOD, CRUTEM, BEST etc. I could show the stations with various colored and sized pins; summary information would appear in balloons on clicking. Then I found an even more useful capability with folders. GE lets you turn the folders on and off. So if I list stations in folders according to their decade of commencement, for example, the user can show a plot with and range of decades he wants.

The downside is that the user needs to download the KML files, or KMZ (a zipped equivalent). They are not excessively big. I have a collection here.

The user could, by hovering the mouse, select individual components of the plot in black contrast. Another such plot here included a general R program for generating code from a CSV data file. And this ice extent plot doesn't pick out curves but lets you focus on time ranges.

Trend plots

This series started with a ordinary graphic which showed, in a color triangle, how the total of possible trends from a single time series, over any time interval could be visualised. However, this was then adorned with JS facilities for choosing different datasets,. Then options were added so that the time series would be simultaneously plotted and the interval in question, and its trend, displayed. This would be updated by clicking on the colored triangle, or by controls on the time series graph. Numerical information, including later confidence limits, could be displayed. Here is a list:

Earth projections

I've never been happy with the various projections which attempt to render the whole earth on a page. I have preferred spherical projections, even though it requires several views. But JS creates a Google Earth like possibility - a spherical projection in which you can adjust the viewpoint.

Every month recently I have published a flat projection of a spherical harmonics fit to the temperatures anomalies, so this was a clear candidate. The globe views were typically from the corners of a surrounding cube. Examples are here:

HTML 5 and triangle meshes

HTML 5 Canvases are relatively new. They offer huge flexibility for interactive graphing. This was developed in the climate plotter to be described, but the first application was to shading global plots. The canvas lets you prescribe colors at the corners of a triangle and it will then shade continuously between them, or at least between two of them. Getting all three right is harder to manage, unlike with Gouraud shading that does it automatically.

The big merit of this is that I can show temperature anomalies with no model fitting, so that colors will be correct at the stations. With the above caveat, it may be that only two thirds will be exactly right, but that's enough to see what is going on. When that is done, the mesh and stations can be shown, and the user can click to bring up station names and details. Again, a spherical projection is used, and can be user-rotated to an arbitrary view. I provide a navigator map (flat projection) on which you can click to choose the point to focus on. Examples are here:

WebGL

This new facility, described in a recent post, makes available the full power of GL, and is potentially an improvement on HTML5 with spherical triangle meshing. However, support from browsers is still patchy, and the JavaScript that implements it is more difficult to modify - GL has its own style that it is perilous to disrupt. Some day it will be the best option.

Google Maps

This is a different style of using Javascript. It creates the facilities of GE with KML, but in a browser window. Google makes its API available, and as in Maps style the globe can be traversed and flags shown of coded colors, with balloons of information in response to clicking etc. Control is by the familiar JS gadgets of radio buttons etc.

Climate Plotter

I think the most advanced use of Javascript and HTML 5 is with the Climate Plotter. This interactively draws plots from a store of annual climate data. You can form regressions (of just about anything with anything); you can plot against an image background. You can interactively rescale, show different axes with different units; adjust anomaly basis intervals. The history is here:

Probably in conjunction with my introducing the Captcha system, my system imposed moderation. I didn't realise this, and so the queue built up. My apologies here, especially to Girma. I've released all the comments. I've also removed the Captcha for the moment. I'll try to make sure moderation is not re-imposed.

Wednesday, October 10, 2012

This is an advance on HTML 5 that I've been using for some previous dynamic temperature plots. Some Web browsers now support WebGL, a Javascript version of openGL for 3D plotting. I've been playing with it for monthly temperature maps - here's September.

What you'll notice is a globe that responds to mouse dragging just like Google Earth - in fact I believe GE, at least initially, used a version of WebGL. The main advantage for presentation is better shading - Gouraud shading rather than the rather kludgy HTML5 canvas version.

A downside is that support for WebGL is quite patchy, and implementation depends also on your graphics card. I've found Chrome is fine; Firefox produces a fragment of picture and then gives an error message, and of course IE is way behind. Some other browsers have the capability but disable it by default. I understand the reason is that it creates a vulnerability to DOS attacks which send dynamic but very slow pictures.

I don't think WebGL will replace my HTML 5 versions for a while. I don't have as much control, so I can't, for example, allow you to click on the picture to bring up local station info.

The other downside is that the files are fairly large, and take a few seconds to download. So I've put them below the jump. There is the WebGL version and a snapshot of the HTML5 version. The color scheme is the same - I haven't figured out yet how to put a bar on the WebGL version. It's a direct plot from the TempLS September station anomalies - exact for each station, and shaded elsewhere.

I've added a technical update describing the methods I used here.

So here is the globe. Give it a spin! The left button rotates, the middle button/wheel enlarges, and the right button changes field of view (which can also enlarge).

Here is a snapshot from the HTML 5 version. You can see that the shading is more ragged.

Technical update

I should say more about how this is done. I use my R program which does the interactive monthly presentations, in conjunction with an excellent R package, rgl, in which Duncan Murdoch has a big role. This enables me to show in an R GUI the spinnable globe as you see it (browser permitting).

Then there is an rgl routine writeWebGF, which generates an HTML program with Javascript. I could use that directly, but it is bulky (about 2.5 Mb, mostly data). The JS is beautifully written, but the data layout is extravagant. I've been able to cut it down to about 500 Kb, with some added Javascript. I show that via an iframe, because I need to run the program when loading is complete. Normally that would be a flag in the body element, but Blogger controls that, hence the iframe.

Tuesday, October 9, 2012

The TempLS analysis, based on GHCNV3 land temperatures and the ERSST sea temps, showed a monthly average of 0.53°C for September, up from 0.49 °C in August. Last month also showed a higher rise with late data. There are more details at the latest temperature data page.

Below is the graph (lat/lon) of temperature distribution for September. I've also included a count and map of the stations that have reported to this date.

This spherical harmonics plot is done with the GISS colors and temperature intervals, and as usual I'll post a comparison when GISS comes out.

And here, from the data page, is the plot of the major indices for the last four months:

Monday, October 1, 2012

Sceptics complain a lot about adjustments made in indexing temperatures. Rarer is an acknowledgement of the argument for the adjustments. The fact is that if an adjustment is appropriate, then it is required. It's not optional.

This post will set out the quantitative basis for one of the larger adjustments to USHCN, a frequent object of this complaint. This is TOBS, the time of observation. It arises because USHCN gets its data from a wide variety of observers, many voluntary. The time at which min/max thermometers are read and reset is recommended but not mandated, but is on record. For many stations it has changed, and this matters.

In this post I take a USCRN station, Boulder, Colorado, with hourly data from 2009-2011. I calculate the effect of varying the notional reading time of a min/max thermometer. There is a positive bias of about 1.3°F if it is read in mid-afternoon, tapering to nearly nil around midnight. There is potentially a cooling bias in the morning, though for this site it was small.

But firstly, a discussion of why temperature measurement is relevant to the climate debate, and what kind of measure should be used.

The role of temperature measurement

The climate debate is about potential global warming caused by the addition of carbon dioxide and other greenhouse gases to the atmosphere. Sometimes the impression is created that the basis for worry is in fact the observation of rising temperatures, and if doubt can be cast on this observation, the worry goes away.

This is not true. The case for AGW is now, and always has been, based on the physics of the greenhouse effect. Addition of GHG's to the atmosphere leads to warming. The amount of warming (climate sensitivity) is not perfectly known, but there are reasonable estimates.

GHG's have increased, so a rise in temperature should be discernible. If there were none, then AGW would be in doubt. That is why a study of historical measurements is important. But history shows that temperatures have risen. There are of course uncertainties about all measurements, and there are short-trerm fluctuations which can obscure trends. But the rise is consistent with AGW. It is not the proof of AGW.

Measuring daily temperature

Every now and then a post like this appears, in which someone discovers that the measure of daily temperature commonly used (Tmax+Tmin)/2 is not exactly what you'd get from integrating the temperature over time. It's not. But so what? They are both just measures, and you can estimate trends with them.

The reason (Tmax+Tmin)/2 is used is that a very long history is available. In pre-electronic days, observers used min-max thermometers like the one on the right. The pins are pushed up as the mercury rises, and do not descend until reset. Note that the minimum scale is reversed. Typically, once a day the position of the pins is read and the pins are moved (eg with magnet) back to sit on the current position of the mercury. The reading gives the max and min of the previous day, but the time when they occurred is not shown. Typically an observer would record the location of the pins and the temperature at the time of reading.

Regular hourly readings are only widely available since the introduction of MMTS a couple of decades ago.

Note that with the minmax thermometer, if you reset the max when the temperature is falling, it may happen that the temperature may not return to that level for the whole next day. In that case, the next max you read will be the value that you set it to. This is, as I'll show, why time of observation matters.

USHCN Adjustments

Discussions of USHCN adjustments often refer to this plot:
Note that is is V1, and so out of date, but it does show the TOBS effect. This paper of Vose et al has more details, and describes the underlying cause thus:

[3] The majority of weather stations in the U.S. Cooperative Observing Network (and therefore in HCN) are staffed by volunteers. Consequently, the network has no mandatory time at which daily measurements must be taken. Most individuals prefer observing times other than midnight, resulting in an observation day that differs from the standard calendar day. For example, at a station where the volunteer reads the thermometers at 0800 LST, the observation day extends from 0800 LST the previous day to 0800 LST on the current day. At a station where the volunteer reads the thermometers at 1700 LST, the observation day starts and ends 9 hours later. Nevertheless, the observations at both stations are recorded for the same calendar day.

[4] When the observation day differs from the calendar day, a ‘‘carry over’’ bias of up to 2.0°C is introduced into monthly mean temperatures. This bias occurs when atmospheric conditions cause a temperature from one day to be ascribed to the following day. For instance, suppose an observer reads the maximum and minimum thermometers at 1700 LST on April 1, then a cold front passes through the area overnight. If the temperature on April 2 never exceeds the value at 1700 LST on April 1 (when the thermometers were last reset), then the recorded maximum will actually be the temperature at 1700 LST on April 1. This temperature will be higher than if the 24-hour measurement ended at midnight, and because the monthly mean is computed by averaging the daily maximums and minimums, the mean for April will likewise be artificially high. In general, this carryover phenomenon results in a warm bias for observation days ending in the afternoon and a cool bias for those ending in the morning.

Station observations - Boulder Colorado

As mentioned, I was looking around for a station with a good set of hourly readings for some years, with few missing values. I first looked at Washington, DC, but there were lots of gaps. So I thought the USCRN station at Boulder was promising, and indeed it had 2009-2011 with only 38 missing hours (which I interpolated). To simplify, I used MST (Mountain Standard Time) only, no daylight saving. All temperatures are in Fahrenheit.

Diurnal pattern

The diurnal pattern varies through the year. But here is a graph of the hourly averages (°F) for all of those three years. As expected there is an afternoon maximum and a minimum in the early morning.

Time of observation effect

This is simulated, supposing that we took the max and min of 24-hour blocks. Often the max measured is the afternoon max of each day, and the min is the early morning min. Then the time of observation doesn't matter.

But sometimes the reset value is not reached again in the next 24 hours. Then the "max" recorded is not a real max - it reflects the warmth of the previous 24-hr period, rather than the cold of the next. So it is a warm bias. If the same thing happens with the min, it is a cold bias.

The following tableau shows the frequency of times of max measurement subject to this notional reset. Here and later it is assumed that the reset occurred just before the time stated. The times are 0:00, 6:00,9:00,14:00,17:00,21:00 MST. Any of the plots can be expanded - just right click and View.

If you look at the first plot, with reset at midnight, you see the expected afternoon peak of maxima, but also a peak at midnight. This says that about 80 times in 3 years the maximum of a calendar day occurred at midnight. This implies that the weather turned cold after midnight. The max doesn't reflect how cold it became. There is a smaller peak at 11pm, reflecting the fewer occasions on which a warm front came through.

Resetting at 6 am, there are still a few days where there is a measured max there. But moving on to 5 pm, there is now a very marked peak. In fact, for more than a fifth of days, the 5 pm temperature is higher than for the next 24 hours. This is significant because the NWS recommendation had been to reset at 5 pm.

A more sensitive histogram is of the durations between maxima, shown below for the same reset times. "Normal" is about 24 hours. A short interval, or a long one of near 48 hours, indicates that the same peak is effectively being counted twice. This is very marked at 2pm, but also strong at 5 pm (and significant at 9 am).

I'll show this side-lobe effect as a daily cycle by plotting the variance of the histogram. This increases with the size of the side lobes:

Minima behave somewhat similarly, though with the afternoon effects replaced by morning. Here is the variance plot for the difference between minima. In fact doubling of minima is rarer, perhaps because the minima themselves vary less, or because the daily minimum is less peaky.

Temperature bias

So here is a plot, as a function of reset time, of

the three year average max, Tmax

the three year average min, Tmin

(Tmax+Tmin)/2, the min/max used in indices (in black)

Each is plotted relative to its mean

Each temperature should be adjusted to restore it to a standard reset time. Vose et al quoted above say this should be midnight. This adjustment really only is important if the time of observation changes, introducing an apparent trend. Again Vose et al explain how changes did occur in USHCN. I've also plotted each individual year, just for the min/max, to show that the pattern is fairly reproducible from year to year:
So there is every reason to expect that adjustments calculated from the present hourly obs can be applied to past readings where we only know the min/max and time of obs.

Adjustment

The range shown here is quite large (for this 1 station), about 1.5°F, while the TOBS adjustment in practice was only about 0.3°F over a century. Not all stations did change their time of obs, and those that did typically changed from about 5pm to 9am, which only has a fraction of the full effect.