Tuesday, December 18, 2012

Even though we documented changes from the beta1 release to
beta2, did we actually change the overall results of the databank? Well, there
was a noticeable drop of stations over time, especially over the past 50 years:

However this does not mean that we are losing important
stations. It was discovered through Nick Stokes’ blog that there may have been
duplicates within the beta1 release. After some analysis, we tweaked the
algorithm to remove these duplicates. Because of this we have lost the station
count, however we are still seeing many more stations than the current
operational product of GHCN-M version 3. In the end however, addressing these
changes from beta1 to beta2 did not make a major difference on the annual
global anomalies.

We are still in beta, however we are pushing forward for a
version 1.0.0 release soon!

Monday, December 10, 2012

If you were paying attention in an earlier post on characterizing the first version beta release you will have noted that the databank timeseries behavior is subtly different to that of the 'raw' GHCNv3.

The early period record is slightly cooler than the estimates from GHCNv3 while the last decade is warmer than GHCNv3. The net impact is to increase the apparent trend. This pattern is present in all the merge variants to a greater or lesser degree. This raises the logical question as to why this difference is arising. Is it because the databank's improved number of stations are sampling areas of the globe previously unsampled in GHCNv3 which behaved in a different manner to the restricted GHCNv3 sample from this larger whole or is it down to additional station sampling in areas already sampled by GHCNv3? And if so why? The two graphs below do the obvious thing and split it out simply by averaging over grids present in both and those only in the databank (there is a much smaller population of gridboxes present in v3 but not in the databank which would be grossly too small to have a significant material impact on global estimates being considered here).

With GHCNv3 gridbox sampling (concentrate on (spot the?) difference between red and blue)

New gridboxes.

So, most of the difference appears to relect better sampling regions already sampled. The question of why and what impact it has on homogenization efforts is 'future work' ... and is why we now need multiple groups to take up the challenge of creating new data products from the databank.

Thursday, December 6, 2012

Today, we have released our second beta version of the global land surface databank. This update includes some changes that were made in response to comments on this very blog, along with a few minor tweaks.

The beta2 release can be found here: ftp://ftp.ncdc.noaa.gov/pub/data/globaldatabank/monthly/stage3/. Within that directory one can find all the data and code used, along with some graphics depicting the results of all the merge variants. A technical description of the merge program (similar to beta1) is also provided, along with a new file documenting changes from beta1 to beta2.

Updated lookup table to determine whether a candidate station is merged, unique, or withheld after a data comparison is made

The original merging methodology can be found here, as well as a description of the changes from beta1 to beta 2 here.

The deadline has passed for new data to be added for an official version 1 release. However there is still plenty of time to provide feedback on all the methodologies used in constructing the databank. Your comments have helped us so far, and we welcome any more that may arise.