Abstract:
Analysis of temperature extremes over time requires daily maximum and minimum temperature data from stations with records of sufficient length, quality, completeness, and temporal homogeneity. Homogeneity of the daily temperature record is an especially difficult challenge due to stations experiencing varying degrees of change over time in location, instrumentation, observing practices, and siting ... conditions.

The DayRec interface uses daily maximum temperature (Tmax) and minimum temperature (Tmin) observations from the National Climatic Data Center's (NCDC) Global Historical Climatology Network (GHCN) - Daily database (Durre et al. 2010; Menne et al. 2012). As the name implies, GHCN contains data from countries around the globe, including thousands of stations in the U.S. A special subset of these stations are the 1218 stations in the U.S. Historical Climatology Network (USHCN) (Menne et al. 2009), which has been used as the main dataset for monitoring U.S. climate since the 1980s. The periods of record for USHCN stations vary somewhat, but most extend from the early 1900s through the current year. Most USHCN stations are located in non-urbanized areas and are operated by unpaid cooperative observers as part of the National Weather Service's Cooperatuve Observer Program (COOP). Compared to city or airport stations (often referred to as "first-order stations"), these COOP stations have experienced fewer significant station moves and are considered more homogeneous over time, although not perfectly so. Still, thanks to rigorous qaulity assurance efforts at NCDC, these daily station records are considered to be the best available for analyzing and monitoring changes in extremes for the U.S. One can obtain station history information (metadata) from NCDC's Historical Observing Metadata Repository (HOMR).

In deciding what stations were suitable for DayRec, the first step was identifying all Tmax and Tmin observations in the 1218 GHCN-Daily station records having any of the quality flag assigments described on the quality control page of the GHCN documentation (see also Durre et al. 2010). These flags indicate that the accuracy and reliability of data are questionable, so these observations were set to the missing indicator ("-999") matching the one already used in the database for actual missing observations. Next, an assessment of the amount of missing data was performed. Instead of simply assigning an acceptable percentage threshold, it was desired that the volume of missing data allowed not only be quite small, but that any missing observations also be spread out relatively evenly over time (both seasonally and over the full period of record), so as to not impart a time-dependent bias that could on its own lead to misleading impressions regarding the distribution of record-setting temperatures. This was done via 3 steps:

1. for each day of the year (1-365; Feb. 29 data from leap years was discarded), determine the number of years having data for each decade over 1911-2010;

2. flag any day of the year that had one or more decades with less than 8 observations; and

3. discard any station with more than 5 such flagged days over any month for Tmax or Tmin values.

This assessment resulted in 200 stations being retained for use in the interface initially, with several areas (especially the southwest) being underrepresented, which we will address soon by interatively relaxing missing data criteria so as to retain more stations that can be used over roughly the last 100 years and still give a relatively reliable picture of changes in record-setting Tmax and Tmin occurrences. In addition, some stations do not have century-scale records, but these will also be made available starting with later decades, a few as late as 1951-1960.