The recommendations are reproduced below:• To use daily maximum/minimum temperature as the ‘base’ data set to which adjustments are made, with data at monthly and longer timescales derived from the daily data (adjusted where appropriate) rather than adjusted separately.

• To ensure that all detection and adjustment of inhomogeneities is fully documented, allowing reassessments to be made in the future (e.g. if new techniques are developed or previously unknown data or metadata become available).

• To carry out an objective evaluation of known methods for homogenisation/adjustment, in collaboration with the COST action;

• To establish a testbed of data for this purpose (see white paper 9);

• To seek to ensure that all sources of uncertainty are well quantified and defined.

8 comments:

Some reanalyses do utilize near-surface temperature data. Even if they do not, it it not just upper-air temperature data that influence the near-surface temperature analysis, This will depend on observations of any variable that affects surface temperature - clouds, humidity, wind, soil moisture, snow cover, aerosol (coming soon) and so on. So writing "upper-air temperatures (on which reanalyses are based" is a bit misleading.

It should be also be noted that homogenization methods that utilize reanalysis data tend to be based on use of time series of differences between observations and the background forecasts from the reanalysis system, not on differences between the observations and the actual analyses. As the background model tends to disperse and dilute the effect of assimilating isolated biased observations, this enables homogenisation corrections to be derived sensibly for observations that were among those assimilated.

1) "Annual data series could be too short to detect change-points with acceptable uncertainty." - According to my experiences: for changes whose characteristic time is at least 3 years the use of aannual variables is clearly the most effective (when the time series is long enough to examine 3-year sections, but for surface temperature datasets this condition is always given).

2) "Potential approaches include... the use of different detection and/or adjustment methods on the same dataset." - Problems: a) The sources of errors in detection/correction are often common for different methods (e.g. unfavourable noise configuration around the timing of artificial inhomogeneity); b) When two or more methods give different results, there is no key for finding the best choice among the results. When e.g. only 1 result differs from 2 or more other results, it is still not sure that not the exceptional one is the closest to the real world. - Therefore I believe that arbitrary combinations of methods cannot be recommended, instead of that: As a base role 1 method, with the best theoretical properties and practical performance (on test datasets) has to be selected. A combination of methods can be useful only when i) the joint application can be well reasoned by the complementary characteristics of the participating methods, ii) tests of the joint performance prove the improvement.

On line 90-91, again I would urge for data to be flagged-and-tagged rather than, ever, being deleted.

On the second recommendation, of course all the code for detecting and adjusting inhomogeneities must be fully version-controlled and published, and the exact version and configuration of code, and the version of any input datasets, used to produce any derived dataset, must be recorded in the metadata of the derived dataset. For the same reason, the input datasets must always be retained, see white paper 6.

There should probably be more links between this "white paper" and number 9. In particular, it seems important to test the methods thoroughly on simulated data (as described in detail there) before they are used in practice.

Also, the white paper should perhaps address the issue of continuous updating for homogenised datasets. In particular, with homogenised data (unlike raw data), the homogeneity adjustments (and therefore the data) prior to time T will, in general, change once new data arrive for some interval from T to T+dT. This is because homogenisation is nonlocal in time and it gradually becomes easier to detect heterogeneities that occurred recently before T as more data after T accumulate. What dataset updating procedure will then be followed? This should probably also be referenced in White Paper #4.

Seconding Nick here: lines 85-90. There should be no question whatsoever about deleting "data". Data should be preserved "as received" and archived. Any processing on that data has to be documented and replicatable. Of course in that process subsets of the data will be tagged and marked as unreliable. One important aspect of retaining bad data is to assist in the process of error reduction and to monitor process improvement. Some people actually work with bad data to characterize what went wrong and design systems to fix it. So, keep all the data and document the steps you use to quality control it.

1) In 2003 one of my fellows asked me to test change-points in the series of Hess-Brezowsky macrocirculation types [of central Europe]. That time I found that there is hardly any reliable information in the climatological literature about the performances of detection methods due to the lack of well-elaborated test examinations. That time I worked in Hungary, and possessed the monthly temperature series of 20th century from some fifteen Hungarian sites. I used that observed database to develop an efficiency-test for the objective (=statistical) methods of inhomogeneity detection [OMIDs]. Fifteen OMIDs were involved in the tests which were fulfilled in the following way:i) Each OMID was run in the observed dataset, and the statistical characteristics of the detected inhomogeneities [IHs] were retained. The total number of statistical characteristics was 204.ii) Test datasets were constructed, each with 10 thousand artificial series of 100 elements. OMIDs were applied on them in the same way as for the observed dataset. The set of 204 characteristics of IHs was calculated for each artificial dataset, and the properties of these 204 characteristics were approached to the properties of the observed datasets, with large number of iterative modifications in the simulation process of artificial time series. Ultimately I found that the real properties of IHs can be reconstructed only when large number of short-term, platform-like IHs are included in test series. The results were presented first in 2004, in the ECAC (European Conference of Applied Conference) in Nice. A detailed description of the method can be found in the CD of the 5th Seminar for Homogenisation and Quality Control (Domonkos, 2006), but, unfortunately, they have not been published yet in peer-reviewed journal.

2) Later, the COST ES0601 did not adopt my methodology, either its results. Likely, because the above described examination has two shortcomings: a) test datasets directly contain relative time series (i.e. difference between an imaginary candidate series and an imaginary reference series), thus the problems related to the time series comparison in homogenisation procedures are completely excluded from the examination, b) the examination was made for a selected small region (Hungary) and the spatial validity of the results is questionable.Notwithstanding, here and now I suggest again that the comparison of characteristics of detected IHs between observed and artificial datasets is a necessary element of obtaining realistic picture about the performance of OMIDs in real (observed) datasets.

3) In 2009 the COST ES0601 announced the comparison of OMIDs with the use of its own-developed benchmark dataset. I decided to develop an own OMID relying on my experiences about the efficiency-test results from 15 OMIDs. The new method is the ACMANT (Adapted Caussinus-Mestre detection Algorithm for Networks of Temperature series). As it seems from its name, the method is not absolutely new, but the further development of the Caussinus-Mestre method. This development was successful, the ACMANT performed among the best OMIDs. In the last half year I made modifications in some segments of the procedure, and according to my own calculations with the use of COST benchmark, the ACMANT performs better, than any other OMID examined yet. (Obviously, it should be checked with blind tests.)

I do agree with the idea described in page 6 line 256-281. In my experience of creating fine-resolution daily gridded precipitation dataset over Asia, quality control (QC) was harder comparing with the negotiation of data collection. So, we are collaborating with local researchers and try to work together. NMS of developing countries often require capacity buildings. So, it was a nice activity to give seminar, to passing them a simple source code (or executables) for gridding/quality control.

Currently available database, such as GSOD, GHCN-D and NCAR-dds were excellent treasure for many researchers. However, quality control is not perfect. We are happy to give feedback them (NCDC or NCAR) if we find any inconsistency with other data source data, especially those we could fortunately obtained from NHMs. There are many unit-in-measure errors (factor 10, mm/inch conversion), and those are relatively easy for local specialists to detect. Hence, I am happy if such data center or a data bank we are discussing now would receive such inputs. We do not want to blame the devoted data centers and data providers, but our detection should not be neglected.