Monday, June 29, 2015

A short time ago, while writing about the discrepancy between the NCEP/NCAR index and TempLS mesh, I made a temporary modification to the daily WebGL plot to show monthly averages. I've now formalised this. There is a black bar at the top of the date selector, and if you mouseover that, it will offer monthly averages. They are all there now, including the current month (to date). I expect this will be kept up to date.

Friday, June 26, 2015

In my previous post, I began a three part description of the new vwrsion of TempLS. That part was devoted to the collection of land and SST data into a temperature data array.

The next part is about forming the other basic array for the calculation - the weighting array, with one entry for each temperature measurement. I've described in various posts (best summarized here) the principles involved. A spatial average (for a period if time, usually month) requires numerical spatial integration. The surface is divided into areas where the temperature anomaly can be estimated from data. If the subdivisions do not cover the whole area, the rest will be assumed (by the arithmetic) to have the average anomaly of the set that does have data. The integral is formed by adding the products of the estimates and their areas. That makes a weighted sum of the data.

Traditionally, this was done with a lat/lon grid, often 5°×5°. An average is formed for each cell with data, in each month, and the averages are summed weighted by the cell area, which is proportional to the cosine of latitude. That is not usually presented as an overall weighted average, but it is. HADCRUT and NOAA use that approach. TempLS originally did that, so that the weight for each station was the cell area divided by the number of stations reporting that month in that cell. Effectively, the stations are allocated a share of the area in the cell.

Of course, this leaves some cells with no data. Cowtan and Way explored methods of interpolating these areas (discussed here). It makes a difference, because, as said, if nothing is done, those areas will be assigned the global average behaviour, from which they may in fact deviate.

Recently, I have been using in preference an irregular triangular mesh (as shown right), of which the stations are the nodes. The weighting is made by assigning to each station a third of the area of each triangle that they are part of (see here). This has complete area coverage. I have continued to publish results from both the grid and mesh weighting.

The code below shows how to form both kinds of weights. Note that the weighting does not use the actual measurements, only their existence. Stations with no result in a month get zero weight. There is no need for the weights to add to 1, because any weighted sum is always divided by a normalising sum.

The stage begins by loading saved data from the previous part. At this stage the land and SST data are concatenated, and a combined inventory made, from which the lat/lon of each station will be needed. R[] are filenames of saved data, consistent with the job control.

It's just a matter of assigning a unique number to each cell, and a vector with an entry for each of those numbers, and going through each month/year, counting. In R it is important to use fairly coarse-grained operations, because each command has an interpretive overhead.

This is more complicated. The mesh is actually the convex hull of the points on the sphere, which the R package alphahull calculates via convhulln(). This is quite time consuming (about 20 mins on my PC), so I save the results on a "w_.." file. So it reads the most recent, if any, and for months where the pattern of months matches, it uses the stored, which is much faster. Also the lat/lon of stations are converted to a unit sphere.

So we enter the ii loop over months with x and a w array waiting to be filled. If stored w doesn't match, the convex hull is formed. Then A, the areas are found. Note that convhulln does not consistently orient triangles, so the abs() is needed. The areas are in fact determinants, again calculated in a way to minimise looping.

Then it's just a matter of assigning area sums to nodes. This isn't so easy without fine looping, since each column of the m array of triangles has multiple occurrences of nodes. So in the jj loop, I identify a unique set of occurrences, add those areas to the node score, remove them, and repeat until all areas have been allocated. The last part just saves the newly calculated weights for future reference.

The last code in this step just saves the result for the calculation step 4.

Wednesday, June 24, 2015

TempLS is my R program that I use for calculating global average temperatures. A recent review of the project is here. A few weeks ago I posted an iterative calculation sequence which is the basis of the program. I noted that the preliminaries for this were simple and mechanical. It requires the data just organised in a large array, and a set of weights, derived from station locations.

I've never actually released version 2.2 of TempLS, which is what I have been using. I did a documented release of v2.1 here. Meanwhile, the program continued to accumulate options, and got entwined with my automatic data acquisition system. So I thought it would be useful to create a simplified version with the basic functionality. I did post a simplified version once before, but that was land only, and with more primitive weighting.

So I'm planning to post a description with three parts. The first will be data gathering, the second weighting, and third is the calculation, which has already been covered. But there may be results to discuss. This is the first, which is about the data gathering. There is no fancy maths, and it is pretty boring, but useful for those who may use it.
The program is in four stages - GHCN, SST, weights and calc. The first two make up the data organising part, and are described here. Each stage stores its output, which the next reads, so can be run independently. I have a simple job control mechanism. You set a variable with four characters, the default is
job="UEGO"

There is one character for each stage; current provisions, according to location, are:

U,A

Unadjusted or Adjusted GHCN

D,E

ERSST Ver 3b or V4

G,M

Grid or mesh-based weighting

O,P

Output, without or with graphics

Each character can be lower case, which means skip that stage, but use the appropriate stored result. The point of this brevity is that the letters are used to name stored intermediate files, so the process can be re-started at any intermediate stage.

The program should be run in a dedicated directory - it will create subdirectories as needed. It starts with a section that defines some convenience functions, interprets the job code to make the names of the intermediate files (I use .sav for binary saved files), and defined the year range yr. This is hard-coded, but you can alter it.

It uses R facilities to download, unzip and untar. Then it finds the resulting files, data and inventory, which have a date-dependent name, and copies them into the current directory called ghcnu.dat etc. Then it removes the empty directory (else they accumulate). Download takes me about 30 secs. If you are happy with the data from before, you can comment out the call to this.

The next routine reads the data file into a big matrix with 14 rows. These are the station number, year, and then 12 monthly averages. It also reads the flags, and removes all data with a dubious quality flag. It returns the data matrix, converted to °C.

The inventory is read, and the station numbers in d are replaced by integers 1:7280, being the corresponding rows of the inventory. iv is a reduced inventory dataframe; lat and lon are the first two columns, but other data is there for post-processing.

Then it is just a matter of making a blank x matrix and entering the data. The matrix emerges ns × nm, where ns is number of stations, nm months. Later the array will be redimensioned (ns,12,ny). Finally d and iv are saved to a file which will be called u.sav or a.sav.

ki is d,e for 3b,4; this is just making filenames for the cases. ERSST comes as a series of 89 x 180 matrices (2°), 10 years per file for 3b, and 1 year for v4. This is good for downloading, as you only have to download a small file for recent data. That is a grid of 16020 (some land), which is more than we need for averaging. SST does not change so rapidly. So I select just every second lat and lon. In v2.2 I included the missing data in a local average, but I'm not now sure this is a good idea. It's messy at boundaries, and the smoothness of SST makes it redundant. So here is the loop over input files

The files, when downloaded, are stored in a subdirectory, and won't be downloaded again while they are there. So to force a download, remove the file. I use cURL to download when there is a new version, but that requires installation. The first part of this code gathers each file contents into an array v. The -9999 values are converted to NA, and also anything less than -1°C. ERSST enters frozen regions as -1.8°C, but this doesn't make much sense as a proxy for air temperature. I left a small margin, since those are probably months with part freezing (there are few readings below -1 that are not -1.8).

Now to wrap up. A lat/lon inventory is created. Locations with less than 12 months data in total are removed, which includes land places. Then finally the data array x and inventory are stored.

iv=cbind(rep(i1*2-90,90),rep((i2*2+178)%%360-180,each=45)) # Make lat/lon -88:88,2:358
m=length(x)/90/45
dim(x)=c(90*45,m)
rs=rowSums(!is.na(x))
o=rs>12 # to select only locations with more than 12 mths data in total
iv=iv[o,] # make the lat/lon
x=x[o,] #discard land and unmeasured sea
str(x)
ns=sum(o) # number stations in this month
rem=ncol(x)%%12
if(rem>0)x=cbind(x,matrix(NA,ns,12-rem)) # pad to full year
str(x)
dim(x)=c(ns,12*((m-1)%/%12+1))
save(x,iv,file=R[2])
} # End of stage 2

So I'll leave it there for now. The program has the potential to create a.sav, u.sav, d.sav and e.sav, each with a data array x and an iv. A simple task, but in lines it is more than half the program. Stage 3, next, will combine two of these and make a weight array.

Sunday, June 21, 2015

NOAA is now using their new V4 ERSST, and so should I. It has come along at a time where I am preparing my own V3 of TempLS (more below), so I will make a double transition.

But first I should mention an error in my earlier processing of V3b, noted in comments here. ERSST3b came in decade files; that helped, because decades before the present did not change and did not require downloading. So I collected them into a structure, and with each update to the current decade (every month) merged that with the earlier decades.

However, I misaligned them. Most climate data uses the strict decade numbering (1961-90 etc), but ERSST uses 2000-2009. I got this wrong in the earlier data, creating a displacement by 1 year. Fixing this naturally gave better alignment with other datasets.

I'll describe the new TempLS version in detail in coming days. V 2.2 of TempLS had become unwieldy due to accumulation of options and special provisions for circumstances that arose along the way. I have made a new version with simplified controls, and using a simple iterative process described here. It should in principle give exactly the same answers; however I have also slightly modified the way in which I express ERSST as stations.

The new version is an R program of just over 200 lines. Of course, when simplifying there are some things that I miss, and one is the ability to make spherical harmonics plots (coming). So for a while the current reports will lag behind the results tabled above. That table, and the various graphs on the latest temperature page, will now be using ERSST v4.

Below the fold, I'll show just a few comparison results.
I've used running means to smooth out monthly noise, and focussed on the period since 1990. At some stage I might say more on the V4 changes, but that has been covered extensively elsewhere. I show the old results before fixing the missing year problem, then the V3b results after fixing, still using V2.2. Then there are results using new ERSS v4 and new TempLS V3. Ideally I would separate the effects of the new code and new ERSST, but getting V3b working with ERSST v4 would take a while, and the combined effect makes very little change.

So here are the grid weighted results. The effect of the year error is clear, but there is little change going from ERSST 3b and TempLS2.2 to ERSST 4 and TempLS 3.

Friday, June 19, 2015

NOAA has a new version, based on the new ERSST4. This has prompted me to make a change in the NOAA data that I use. They publish two sets; I have been using MLOST, which is the traditional one based on 1961-90 anomalies. But they also publish a version based on 1901-2000, and this is what they use for their climate summary announcements. Updates to MLOST have been lagging, so it is time to change. The different basis shouldn't affect the usual plots on the latest data page, or the trendviewer, but will affect the tabulated numbers.

So the May number was considerably higher than the previous May, and as I expected, higher than April, unlike GISS. In fact, it was quite in line with the NCEP/NCAR index.

I see there is a post at WUWT suggesting that the new record is somehow an artefact of the new version. As I commented here, the versions differ very little in year to year changes, by month. It is very unlikely that the old version would not have shown a similar rise, and in both versions, May 2014 was hotter than any previous May.

Thursday, June 18, 2015

In my previous post, I noted that comparing data on different anomaly bases brought in discrepancies because of vagaries of monthly averages over the base period. These are actually significant enough to detract from the use of the NCEP/NCAR index (base 1994-2013) in predicting GISS (1951-80) and NOAA (1961-90) indices.

So I have added to that table, below the NCEP monthly averages, recent monthly values adjusted to the earlier basis periods by adding the 1994-2013 month averages of GISS and NOAA anomalies. This expresses NCEP on those scales, although strictly, it is unique to those indices. Adding HADCRUT averages would give slightly different 1961-90 values.

I have also shown the actual GISS/NOAA values in the right column. The current tables are here:

GISS adj

NCEP

GISS

Jun

0.835

NA

May

0.773

0.71

Apr

0.702

0.71

Mar

0.855

0.84

Feb

0.818

0.82

NOAA adj

NCEP

NOAA

Jun

0.618

NA

May

0.558

NA

Apr

0.461

0.469

Mar

0.573

0.564

Feb

0.545

0.575

Except for GISS May, the correspondence is very good.

Update. I calculated a whole lot more values, in a table below the fold. It looks as if the last few months aren't typical. The correspondence is still reasonable, but discrepancies as observed April-May are not unusual.

It's easy to think of changing the base period for calculating temperature anomalies as having the effect of simply shifting the values up or down by a fixed amount. And for annual averages, this is true. But for monthly averages, it isn't, quite. Suppose, like HADCRUT, you express anomalies to a base 1961-90. That means that all Januarys are expressed relative to the January average for that time, etc. If you shift to a different period, those monthly bases will shift relative to each other. That is one reason for using a longish period (30 years). A long anomaly average will fluctuate less from month to month.

This effect was noted by commenter Olof, in relation to the April-May discrepancy between NCEP/NCAR and GISS of about 0.1°C. My NCEP index uses a base period 1994-2013 (where data is good); GISS uses 1961-90. If you extract GISS averages over 1994-2013, you get this:

Base period

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

1994-2013

0.507

0.547

0.568

0.533

0.498

0.515

0.504

0.519

0.528

0.548

0.58

0.503

There is an expected drop from 0.568°C to 0.533°C of GISS relative to NCAR/NCEP on the latter's base period. In shifting to the later base, you would need to subtract these values for each month. That would raise May relative to April, and even more relative to March. This means that, because it seems May's were actually rather cool during this time, the NCEP index is always going to tend to give a higher anomaly.

This clearly explains only part of the discrepancy. But I thought it was worthwhile calculating all of the common indices on all base periods likely to be used, where possible (ie they have data). There is a table below the fold.
Here is is. You can generally recognise the native base period by a string of zeroes (for those periods I zeroed rounding discrepancies). BEST uses 1951-80, but for some reason doesn't give exactly zero. RSS uses 1979-98, which I haven't included.

Wednesday, June 17, 2015

For the first time since I have been tracking both NCEP/NCAR and surface temperatures (TempLS), there has been a substantial disagreement. The TempLS Mesh result showed May about the same temperature as April, and this was backed up by GISS. But NCEP/NCAR suggested a warmer month, comparable to February.

There was a new factor in the mix - GHCN has a new version 3.3. But its changes are mainly methodological, and made little difference to previous months. Both TempLS and GISS use that version. There is also a new version of ERSST, v4, but neither TempLS nor GISS (AFAIK) use that. TempLS will soon.

The difference does seem not very robust. TempLS Grid did show a significant rise, of about 0.05°C, which does indeed match February for that measure. This is working from the same data but with different (grid-based) weighting with lesser coverage. Since NOAA and HADCRUT share that characteristic, I think they might show May somewhat warmer.

HADSST3 also showed a rise in SST. So I wondered, if so much is rising, there must be something one could identify in going from April to May which would particularly affect the GISS/TLS indices. So, below the fold, I'll show a map and table
Here is an active WebGL plot of the differences, May anomaly - April, by stations (red dots) with readings in both months. The spacing between is linearly interpolated and shaded. The color scale is highly non-linear, with the extremes being quite large.

The cool spots are in the Arctic, Antarctic, and particularly NE Siberia. The ocean warms by a small, fairly uniform amount. The colors don't give a particularly good quantification of the temperature differences at the extremes. Here is a table of the top 10 cooler GHCN locations, by anomaly difference:

May-Apr

Latitude

Longitude

Name

-8.19

80.62

58.05

GMO IM.E.T.

-8.08

79.5

76.98

OSTROV VIZE

-6.26

81.6

-16.67

NORD ADS

-5.82

77.72

104.3

GMO IM.E.K. F

-4.82

-68.58

77.97

DAVIS

-4.42

49.65

94.4

BARUUNTURUUN

-4.38

36.32

-119.64

HANFORD 1 S

-4.24

73.5

80.4

OSTROV DIKSON

-4.24

-67.6

62.87

MAWSON

-4.16

-78.45

106.87

VOSTOK

It's heavily weighted toward N and NE Siberia. I've added a facility, which I'll formalise, to the NCEP/NCAR map. The table on the right allows you to choose a day, by mousing over and tracking what it says just NE of the globe - click when you have what you want. It now allows you to choose the 0'th on the month, for Jan-May 2015, and you get the map of the monthly average anomaly. Selecting 2015-5-0, I get the May average map. Here's a plot of the Arctic region:

The reanalysis certainly shows cold anomaly in the Arctic, but nothing special in NE Siberia, and it was cold there in April too. That seems to be the main difference between GHCN and reanalysis.

Update. Here's another interesting table. This time I've weighted the anomaly differences to show their contribution to the TempLS sum. So when it says Qiqihar -0.00149, that means Q contributed that amount to the May-April anomaly difference. The weighted order is different to the absolute order, and the amounts are fairly small. The Antarctic now look more significant than the Siberian. The discrepancy between NCEP and GISS/TLS is of order 0.05°C, so none of these if changed would make more than 1/30 of that.

Tuesday, June 16, 2015

GISS has reported an anomaly average of 0.71°C for May 2015, the same as for April. This is in agreement with TempLS mesh, but in contrast to the rise in the troposphere measures RSS and UAH.

I was surprised by the TempLS result, since the NCEP/NCAR index had indicated a much warmer result. HADSST3 also rose in May. But the GISS result essentially confirms it. It coincides with the new version 3.3 of GHCN, which would have influenced both indices. But that change made little difference to previous months. I am planning to post a study of the anomaly differences between April and May.

TempLS Grid also rose by about 0.05°C, so I think May rises in HADCRUT and NOAA are likely.

Below the fold is the customary comparison between GISS and TempLS.
Here is the GISS map for May:

Wednesday, June 10, 2015

GHCN Data was delayed slightly for May by the release of Version 3.3. When data became available, it was fairly complete. It showed a slight drop of 0.01°C from April (0.628 to 0.618). Satellite measures rose considerably.

Update 12 June. I'm still uncertain of this result. GHCN seems unsettled; according to my count, the most recent file has for May about 300 stations fewer than the day before.

This is at variance with the reanalysis data, which suggested a much warmer month. I wondered if V3.3 had an effect, but previous months were much as before. I'll watch the next few days for any changes, but as I said, most of the data is in. So May is looking more like April (cool for 2015) than February.

Details are in the report here. N Russia and Alaska had very warm spots; the poles were cool. The Arctic coolness was reflected in the reanalysis results. Generally the coolness was due to land; SST rose, as reflected by HADSST3.

Thursday, June 4, 2015

I posted about a gadget for displaying NCEP/NCAR based average temperatures for user-selected Arctic regions. I have now embedded it in the maintained data page, so it will update whenever the other NCEP data does - usually daily. Operating guidance is at the original post.

Commenter Nightvid asked about including past years, and I'll do that sometime. It would overload the page to include the data by default, so I need a user-initiated download scheme.

I'm thinking of a similar gadget for whole Earth. The current one is for absolute temperatures, which is OK for special regions like Arctic. And it would be OK for, say, the NINO regions, but less good for continents, say. The alternative is anomalies, which don't show the seasonal effect. It's a dilemma.