Historical Station Distribution

In his comment to How much Estimation is too much Estimation?, Anthony Watts suggested I create a scatter plot showing station distribution with latitude/longitude. It turned out not to be the ordeal I thought it might be, so I have posted some of the results in this thread. I started with 1885 and created a plot every 20 years, ending in 2005. I deliberately ended with 2005 because this is the final year in the GHCN record prior to the US station die-off of 2006.

Every dot on a plot represents a station, not a scribal record. Stations may be comprised of multiple records. A blue dot represents a station with an annual average that was fully calculated from existing monthly averages. A red dot represents a station that had missing monthly averages for that year, so the annual average had to be estimated. Stations that had insufficient data to estimate an annual average are not shown.

In the case where multiple scribal records exist for a station in the given year, I assigned a blue dot if all records were fully calculated from existing averages, a red dot if at least one record was estimated, and no dot if none of the records could produce an estimate. I believe this errs in the direction of assigning more blue dots than is deserved. Hansen’s bias method mathematically forces estimation to occur during the period of scribal record overlap.

The first plot shows coverage in 1885, five years into the GHCN record.

1905 shows improved coverage across the continental US, Japan and parts of Australia. A few stations have appeared in Africa.

1925 shows increased density in the western US, southern Canada, and the coast of Australia.

At the end of WWII, not a lot of change is noticeable other than improved coverage in Africa and South America as well as central China and Siberia.

In 1965 we see considerable increases inChina, parts of Europe, Turkey, Africa and South America.

A decline in quality seems to be apparent in 1985, as many more stations show as red, indicating their averages are estimated due to missing monthly data.

A huge drop in stations is visible in the 2005 plot, notably Australia, China, and Canada. 2005 was the warmest year in over a century. Not surprising, as the Earth hadn’t seen station coverage like that in over a century.

Update (Steve Mc): USHCN station information gets added into GHCN with a lag of almost a year (noted in comments below). Jerry Brennan, who’s followed this for some time, reports the following update schedule in the past:

USHCN station data for the year 2002 were published in the USHCN website by May 2003, and added to GHCN between November 8, and December 10, 2003.

USHCN station data for the year 2003 were added to GHCN between April 10, and May 6, 2004, and published in the USHCN website by January 2005.

USHCN station data for the years 2004, 2005, and the first three months of 2006, were added to GHCN between August 13, and September 11, 2006, and published in the USHCN website, with data through October 2006 by March 2007. The additional months of data were not added to GHCN.

USHCN station data through May 2007 were published in the USHCN website in October 2007, but the “new” data have not been added to GHCN as of this date.

By the end of February of each year, GHCN will usually have data for the full previous year from only 120 (non USHCN) stations in the 48 contiguous USA states.

I might add that there are two locations for USHCN data, one at NOAA and one at CDIAC. The NOAA version is more updated than the CDIAC version – perhaps there are other differences as well. I personally confirmed that the NOAA version (Oct 2007 edition) is updated to May 2007 for most USHCN stations. There was a USHCN update in May or June 2007 which updated to late 2006 for most USHCN stations.

I personally confirmed that the most recent GHCN version (ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2 Feb 2008 edition), as noted by Jerry, contains USHCN updates only to March 2006 or so. Thus, GHCN is two USHCN updates behind at present (May 2007, Oct 2007) and its USHCN version is at least 14 months stale relative to what it could be using.

114 Comments

Nice job, John. What are the details on how you made the maps? Not that I’m asking you to do it, but a time lapse movie showing the stations winking on and off would add the time dimension in a revealing way. Somebody who visits here could do it, I’m sure. The rapid decline in the 90s would really be apparent.

Technical question: how do they take into account the non-uniform distribution of the stations when calculating the “global” mean temperature? Regions like the Arctic, Siberia, large parts of South America and Africa have little or no coverage. Surely this introduces an error in the global mean. What is the estimate for this error?

If most of the warming has occured in the Arctic (including Greenland) and Siberia, as the data seem to show, surely the lack of stations there introduces a large uncertainty into the warming trend. In other words, if the regions where the trend is the strongest have the poorest coverage, then the error on the trend is larger than if the trend was uniformly distributed across all stations.

Assuming that the satellite data now provides more complete coverage, is there a way of determining how well the satellite data may be calibrated against the ground station data? Is there enough overlap and stability in the satellite methods with sufficient numbers of ground stations?

Francois–
The averages are supposed to be surface area weighted. That must mean that in areas with low thermometer density, each thermometer measurement contributes a lot to the average. To the extent that that thermometer is inaccurate or temperature varies spacially, this can add quite a bit of uncertainty to the GMST.

In places like the US which, until recently was entirely covered with dots, each individual thermometer would contribute less to the average. In these areas, uncertainties in individual thermometer measurements are likely to cancel, and uncertainties due to real spacial variations will be smaller.

Ideally, to minimize error, you have thermometers distributed more or less evenly by area. ( Though, if there are large regions that are climatically similar, you could deviate from this a bit.)

It’s interesting to see Canada never quite jumped into engaged in the temperature measurement frenzy that appeared well under way in the US by 1885. I can’t help wonder what caused us to sprinkle thermometers everywhere?

I’m squinting– it looks like Guatamala city had a station in 1905; that general area had many in 1965 (around when my family lived there.)

Yes, I have lists now of the entire world, by year, and whether or not the station reported an average and if it was estimated. I also have summary lists by continent / by month, and can create them easily enough by country if I need to.

#1 Gary

I have considered creating an AVI or some other movie file. If I have time I will do it. The movie is easy enough. It is generating each graph that can be tedious.

It’s interesting to see Canada never quite jumped into engaged in the temperature measurement frenzy that appeared well under way in the US by 1885. I can’t help wonder what caused us to sprinkle thermometers everywhere?

…

Squinting again– are there no stations in Greenland?

My recollection that much of the network of weather stations was associated with agriculture. Hence the more temperate regions of Canada have coverage comparable to the US lower 48, but once you get outside of the areas where there is substantial farming, there isn’t much need for the coverage.

Although John’s map doesn’t seem to have any Greenland Stations, GISS has several. Here is one .

I understand the idea of weigthing by surface area. The point I was trying to make is that if you’re looking for a trend (up or down), and that the trend is not the same everywhere, the extent of the coverage will affect the error on the trend differently for different locations.

Say there’s no warming in the tropics, but you’ve got one station every kilometer, but there is a presumably strong trend in the Arctic, where you’ve only got a handful of stations. All those stations you have in the tropic are basically useless to determine what the trend is. You might as well not use them. So if you only use the Arctic stations, you will find that your uncertainty on the trend is mighty large. So having a large number of stations globally can give you a false sense of confidence in the actual uncertainty of the trend itself.

For example, some data seem to show that Greenland was as warm in the 1930’s as it is today. But how are we to know: there were precious few stations, if any at all, during that period.

I’m not too concerned about the data after, say, 1960. But I’d say the error on the first half of the 20th century may be larger than a simple estimate based on the sheer number of stations would give. I’m not quite sure how you would take that into account in making your error estimate. That’s basically the question I am asking to this board!

This is important, not so much because the 1930’s, or the MWP or whatever other period may have been as warm as today. But if you’re going to calibrate your GCM based on the past, and our knowledge of the forcings (solar, volcanic, GHG), you need to take into account that uncertainty. Larger “natural” variability may mean that solar forcing, for example, has a larger effect than currently estimated.

The averages are supposed to be surface area weighted. That must mean that in areas with low thermometer density, each thermometer measurement contributes a lot to the average. To the extent that that thermometer is inaccurate or temperature varies spacially, this can add quite a bit of uncertainty to the GMST.blockquote>The gridded data is fixed 5×5 lat/long blocks. This implies that a much smaller surface area near the poles would have a more significant impact on the global average. Do you know if they really use surface area or it the GMT a simple average of the gridded dataset?

The averages are supposed to be surface area weighted. That must mean that in areas with low thermometer density, each thermometer measurement contributes a lot to the average. To the extent that that thermometer is inaccurate or temperature varies spacially, this can add quite a bit of uncertainty to the GMST.

The gridded data is fixed 5×5 lat/long blocks. This implies that a much smaller surface area near the poles would have a more significant impact on the global average. Do you know if they really use surface area or it the GMT a simple average of the gridded dataset?

Steve: They say that they area-weight and I have no reason to think otherwise.

Raven–
Well,I’m not certain. But it isn’t necessarily either/or. They could easily attribute an average temperature to each block, and then, later when finding the average for the planet, weight by the area for the block.

I’d be a bit surprised if computations of GMST used temperatures weighting them like a distorted flat map. That makes no sense physically, and doesn’t strike me as the sort of thing either GISS or Hadley would be likely to do. (I guess we could break down and read the papers? There are links at each entities sites.)

13, until fairly recently, someone had to live there to have a weather station. That’s why most of Canada and all of Greenland are empty. Even now, if it’s unmanned, a source of power and a communication link of some sort is necessary. That can be done way out in the boonies with solar panels and a satellite uplink, but that’s a bunch of money.

The GHCN station collation (used in turn by CRU and NASA-GISS) has a delay in incorporating USHCN network. It looks like the 2006 list lacks the USHCN stations, but they will get into the 2006 list at some point fairly soon.

Also and I’ve mentioned this many times, GHCN collated a lot of historical data in the early 1990s that it promised to update “irregularly”. For the most part, the “irregular update” hasn’t occurred since then and subsequent data is mostly airport data. It’s my surmise that data is actually available in local services for many stations that “disappear” in the early 1990s and that the reduction shown here mainly reflects the fact that GHCN collation has been very lackadaisical. There may also be station closures, but I know for a fact that some Russian and Chinese stations unavailable after the GHCN last date in the early 1990s continued to functioun.

Francois– Yes. It’s the large areas with no stations that are a problem with respect to uncertainty. (That is, unless we can assume temperature vary much less spacially in those regions rather than other regions for some reason. Can we? I think there are spacial temperature variations in northern Canada. No reason to think otherwise. )

“A grid of 8000 grid boxes of equal area is used. Time series are changed to series of anomalies. For each grid box, the stations within that grid box and also any station within 1200km of the center of that box are combined using the reference station method.

A similar method is also used to find a series of anomalies for 80 regions consisting of 100 boxes from the series for those boxes, and again to find the series for 6 latitudinal zones from those regional series, and finally to find the hemispheric and global series from the zonal series.”

It appears they measure area with lat/long and attempt to compensate for the different physical areas by computing the zonal averages. However, they effectively negate this step by using the zonal means to calculate the global means. This implies the polar data makes up 1/3 of the GMST if they calculate a simple average of the zones.

I can’t help wonder what caused us to sprinkle thermometers everywhere?

Initially, war needs; the ‘wx services’ of the time were originally in the ‘Signal Service’ (think ‘telegraph services’ back then) branch of the US Army with subsequent reporting changes since then:

… starting in 1873, reports on river conditions and flood warnings that same year, and frost warnings for Louisiana sugar growers in 1880, with subsequent reports targeted at growers of cotton, rice, tobacco, corn, and wheat. By 1883, the Signal Service budget was close to $1 million, considerably exceeding the sum that Joseph Henry had recommended.

Over the years, the Signal Service proved to be something of a bureaucratic rolling stone: it was upgraded to the Signal Corps in 1880; in 1891, after a good deal of wrangling between civilian weather personnel and their military superiors, its ties with the army were severed and it was handed over to the Department of Agriculture as the U.S. Weather Bureau; then in 1940 it was passed on to the Department of Commerce, where it has remained ever since, under the jurisdiction of the Environmental Science Services Administration (ESSA) from 1965 to 1970 and since then under the National Oceanic and Atmospheric Administration (NOAA). (Ref. 13)

David Laskin’s book, “Braving the elements: the stormy history of American weather” pg 143

It appears they measure area with lat/long and attempt to compensate for the different physical areas by computing the zonal averages. However, they effectively negate this step by using the zonal means to calculate the global means. This implies the polar data makes up 1/3 of the GMST if they calculate a simple average of the zones.

They don’t compute a simple average, and I haven’t been able to come up with a weighting scheme that gets me from the zonal averages to the global averages.

Hansen says the adjustments he quietly made because of Steve’s discovery of an error in the post-2000 data was restricted to the US. These maps indicate the extent to which US stations dominate the global total. It is hard to imagine the correction didn’t have some effect on the global number. We also need to know how many stations Hansen uses for his global annual average are from outside the US.

The other issue to consider is that in 1885 most stations were close to big cities. It is likely they would have experienced the most amount of Urban Heat Island Effect, which raises questions about which stations P.D Jones used to create his 130+ year claim of an unnatural rise in the annual average temperature.

RE 31. Dr. Ball, Hansens method wont Overweight the US. He divides the world into equal area chucks
and then estimates the chuck based on the stations within the chuck. So, densely sampled areas get
1 number with a small varience and less dense areas get a number with a bigger varience.

Still, the lack of any semblance of uniformity either in the spatial dimension or the temporal dimension
Should raise some legitamate doubt.

Yes. It’s the large areas with no stations that are a problem with respect to uncertainty. (That is, unless we can assume temperature vary much less spacially in those regions rather than other regions for some reason. Can we? …

Grant me some literary licence in the following: With ‘cold air’ being produced in the polar regions and warm, humid air in the tropics (we will ignore oceans for the moemnt) under the influence of the global ‘three-cell GCM’ (GCM as in General Ciculation Model as the term is used introduced to those studying meteorology) with a sufficiently long average (multi-year averaging) I as an engineer would be somewhat content that we would be measuring climate scale temperatures (and any changes) of that region and not simply weather variability.

#33 Not sure if it is on topic but from what I gather you are suggesting the use of time averages to make for the lack of ensemble run. I think this is not a hopeless suggestion but it clearly has its problems.

13, until fairly recently, someone had to live there to have a weather station. That’s why most of Canada and all of Greenland are empty. Even now, if it’s unmanned, a source of power and a communication link of some sort is necessary. That can be done way out in the boonies with solar panels and a satellite uplink, but that’s a bunch of money.

Of course in the arctic and antarctic much of the winter months are without sunlight to power a solar powered device.

3. SPATIAL AVERAGING: BIAS METHOD
Our principal objective is to estimate the temperature change of large regions. We would like to incorporate the information from all of the relevant available station records. The essence of the method which we use is shown schematically in Figure 5 for two nearby stations, for which we want the best estimate of the temperature change in their mutual locale. We calculate the mean of both records for the period in common, and adjust the entire second record by the difference (bias) . The mean of the resulting temperature records is the estimated temperature change as a function of time. The zero point of the temperature scale is arbitrary.

A principal advantage of this method is that it uses the full period of common record in calculating the bias between the two stations. Determination of is the essence of the problem of estimating the area-average temperature change from data at local stations. A second advantage of this method is that it allows the information from all nearby stations to be used provided only that each station have a period of record in common with another of the stations. An alternative method commonly used to combine station records is to define by specifying the mean temperature of each station as zero for a specific period which had a large number of stations, for example, 1950-1980; this alternative method compares unfavorably to ours with regard to both making use of the maximum number of stations and defining the bias between stations as accurately as possible.

A complete description of our procedure for defining large-area temperature change is as follows. We divide each of the 80 equal-area boxes of Figure 2 into a 10 by 10 array of 100 equal-area subboxes. (The east-west and north-south dimensions of a subbox are about 200 km; in the polar boxes the equal area requirement for the subboxes causes the polemost subboxes to be noticeably elongated in the north-south direction.) For each subbox we use all stations located within 1200 km of the subbox center to define the temperature change of that subbox. The N stations within 1200 km are ordered from the one with the greatest number of years of temperature record to the one with the least number. The temperature changes for the first two stations are combined, then the third with the first two, and so on, using

for t without available . T represents temperature change, t is time, and n identifies the station. is an intermediate estimate of the temperature change based on stations 1 through n; these equations are applied repeatedly until is obtained, where N is the total number of stations within 1200 km of the subbox center. Here is the distance of the nth station from the subbox center, and is used to calculate the weight by which the nth station temperature change is weighted. decreases linearly from 1 at the subbox center to 0 at a distance D, where we have taken D = 1200 km as a representative direction-independent distance over which the temperature changes exhibit strong correlation.

OT I’m a bit late in joining the conversation again, as I just spent most of the weekend developing confirmation that a volcano near the Ross ice shelf once thought to be extinct, is now becoming active again. I welcome any thoughts anyone might have.

The Greenland stations are there. Any station on a coast will be hard to pick out, because the color will be hidden by the grayscale of the coastal outline. I toyed with larger station marks, but they gave the false impression of really great global coverage. I tried with different color schemes, but I am sensitive to those that are color-blind. Rest assured, though, the stations are there.

#45 How about some kind of contour plot to indicate station density. Given stations occupy a discrete point some kind of low pas filter would need to be used. Perhaps a Gaussian shaped low pas filter. I’m not sure how you would decide on what the best spatial width of the filter should be.

RE47, yes lots of it, but please carry on the conversation over there rather than OT here. Just wanted to advise people here who are good at pointing out flaws so I could strengthen the presentation or go down in flames gracefully.

Between 1890 and 1940 the Weather Bureau was a part of the Dept. of Agriculture. Temperatures were important because the daily max and min temps affected crops, especially around 32°F and in the 100+°F ranges. Because of the need for detailed and frequent observations for aircraft operations, the Weather Bureau moved into the Dept of Commerce (FAA) and the interest in parameters such as temperature and quantitative precipitatiion were of low priority as compared to wind direction and speed, ceiling heights and horizontal visibilities. None of the parameters most important to agriculture and today’s climate studies was of much concern to the FAA observers. The switch from manned observations by specially trained NWS employees to automated observations, notably ASOS and the replacement of CRS platforms with ASOS and MMTS equipment occurred in the late 70s and early 80s. The reliability and precision of thermometry in the automated systems was sharply lower than the certified error range of mercury-in-class thermometers used in the CRS screens. Certified mercury-in-glass thermometers were calibated to about 0.1°C. The acceptable range of calibration for ASOS instrumentation was plus or minus 1.2°C.

The development of the North American Radar Defense network, usually referred to as the DEW (Distant Early Warning) Line, brought manned weather observations along the Arctic Circle in Canada. Some 58 Dew Line stations were constructed during the 1954-1957 period. These were usually the only regular observations available from these remote locations. The end of the cold war saw a reduction and essentially an elimation of most of these observations. I suspect that similar intstallations in the FSU fell by the wayside as well. A great history and a map of the Dew Line stations with other informative goodies can be found here.

With today’s observations mostly taken from airports in or close to urban areas, there are very few continuous observations taken from rural locations left to provide reliable climate data. Urban areas amount to well under 2% (some estimations put them, at 0.2%) of the Earth’s surface. It would be surprising if, as these urbam areas continue to expand, there were not upward trends in the average of currently observed land temperatures. To refer to these averages as representing global average temperatures is patently absurd.

About 1200 US stations in GHCN are USHCN stations, and their data
are handled differently than those of other stations.

USHCN station data for the year 2002 were published in the USHCN website
by May 2003, and added to GHCN between November 8, and December 10, 2003.

USHCN station data for the year 2003 were added to GHCN between April 10,
and May 6, 2004, and published in the USHCN website by January 2005.

USHCN station data for the years 2004, 2005, and the first three months
of 2006, were added to GHCN between August 13, and September 11, 2006,
and published in the USHCN website, with data through October 2006 by
March 2007. The additional months of data were not added to GHCN.

USHCN station data through May 2007 were published in the USHCN website
in October 2007, but the “new” data have not been added to GHCN as of
this date.

By the end of February of each year, GHCN will usually have data for the
full previous year from only 120 (non USHCN) stations in the 48
contiguous USA states.

#52 JerryB
Thanks. This confirms what Steve has been saying about the missing US data (what I like to refer to as “die off”).
When the data for 2007 and 2006 appears, it will be fun to see if / how it changes the results for those two years here. Right now those two year’s anomalies are calculated without the USHCN stations.

NCDC has a USHCN version 2 in the works, and the next addtion
of USHCN data to GHCN may not occur until USHCN v 2 gets
published. The to be published date has already passed (July 2007)
so it’s anyone’s guess as to when it will be.

I might add that there are two locations for USHCN data, one at NOAA and one at CDIAC. The NOAA version is more updated than the CDIAC version – perhaps there are other differences as well. I personally confirmed Jerry’s observation that the current NOAA version of USHCN (Oct 2007 edition) is updated to May 2007 for most USHCN stations. In addition, there was a USHCN update in May or June 2007 which updated to late 2006 for most USHCN stations.

I believe the delay is due to the expectation that v 2 will
be published real soon. In the late summer of 2002 I spoke
with someone working on USHCN v 2 at NCDC, and he then
expressed the opinion that it would be published by early 2003.

BTW, the two developers of GHCN v 2, Peterson and Vose, do
not participate in the month to month updating process.

It’s my surmise that data is actually available in local services for many stations that “disappear” in the early 1990s and that the reduction shown here mainly reflects the fact that GHCN collation has been very lackadaisical.

Indeed. When I checked Czech stations, some of these also disappeared, although they are run by Hydrometeorological Institute, properly stationed and well and alive.

If the updating lag is part of the appearance of recent station loss, it might be useful to go back a couple of years before 2005, say 2000, when reporting is presumably as complete as it is ever going to be.

John’s graphs (and many other such climate graphs) are equirectangular, meaning that the vertical axis is proportional to latitude. This eliminates the distance distortion of the Mercator projection in the vertical direction, but leaves a horizontal distortion proportional to secant(longitude). As a consequence, polar areas like Greenland and Antarctica still have too much area, though not as badly as Mercator.

This area distortion can be eliminated by an appropriate compression of the vertical axes as the poles are approached. I believe making the vertical axis proportional to sin(latudide) would have this effect, since the derivative of this function is cos(latitude), the reciprocal of the secant(latitude) horizontal distortion. This projection would doubly mash shapes near the poles, but would show relative areas at a glance, without the N-S directional distortion of ovoid projections.

Has anyone in Climatology ever used such an equiareal rectangular projection? Is it available on canned mapping routines? The standard equirectangular projection maps make a little bit of local Arctic warming look like the whole earth is on fire!

The real question of interest in USHCNv2 is in how they’re going to handle UHI effects. Their UHI corrections as applied in part-4 of version one are totally ridiculous. See the NYC Central Park version of this which is completely untenable. Despite this, except for a brief presentation of the NYC data on this site and a few other blog sites, the silence from the AGW community has not only been been deafening, but also revealing. The numbers are so outrageous that any attempt to support these numbers would justifiably provoke ridicule. On the other hand, the raw data has been generally faithful to the actual observations. Only in the world of climate science could this be considered a novel approach.

I understand that they have made some sort of “adjustment” to the raw data in v2 to provide a uniform UHI adjustment, but they may be spending some time on figuring out how to justify these changes when the data is released to the masses. Whatever; when the data finally is released, look for the adjusted version to become the focus of reinvigorated controversey.

#62. Equi-area projections are used in many contexts. In terms of graphic presentations, you’re 100% right and anything other than equi-areal should be banned from scientific journals.

In practical terms at my level, the only issue is the availability of routines in R (and for someone else Matlab). Doug Nychka of NCAR maintains the R-package fields. He’s very responsive to inquiries. I’ll check the manual and otherwise check with him.

Steve – A very important add-on to this analysis would be to quantify the actual overlap in the GISS, NCDC and CRU data sets in terms of the raw station data that they draw from. Phil Jones told us several years ago that about 90-95% of the sites are from the same raw data, but this has not been confirmed by quantitative analysis. If the data sets are so interdependent, this means that it is misleading to present the trends from the separate analyses as independent assessments. Roger

Long time reader, first time commenter. Kudos for all your, et al, work.

I’ve not seen any discussion in the great debate that deals with the accuracy of the sensor that
records the temperature. I’m not talking about an external interference, such as rooftop or parking lot locations,
but the inherent accuracy of the gauge. Suffice it to say that thermometers have gotten better since biblical times.

#67. I agree entirely that they are not “independent”. My guess right now is that GISS has more smoothing at a gridcell level than CRU3. For example, the CRU3 gridcell containing Barabinsk can be tracked to Barabinsk, but the GISS gridcell will be a complicated blend.

The allocation of critical enterprise between land SST is totally skewed. Hardly any critical effort has been expended on SST and IMO it’s a greater priority.

A good example of the distortion of impression that is caused by the equirectangular projection are Hansen’s graphs in the 2/3 thread “Hansen in Antarctica” (#2658). His “Fig. 3.19” copied there is somewhat like Mollweide, though I don’t think it’s quite equal-area.

#32
I appreciate the averaging approach, but the problem is what do you do when you have no station in an area? In fact, there are many areas where you have many rectangles over a large area with no stations. As I recall they used to take averages for the four corners and then produce an average for the entire area. Again as I recall there were several problems with this because many rectangles had no stations but were various proportions of land and ocean or freshwater. I also recall questions about differences in elevation. One debate I had with a modeler involved a rectangle that was half flat prairie and half high mountains and there were no stations in the mountainous half. As I have noted many times the surface data is inadequate as the basis for determining global annual average temperature let alone as the basis for models, it is made much less than adequate with the lack of above surface data.

Discussion about the arctic, lack of data, and the equi-rectangular problem is exacerbated by the lack of data. Take a look at the map in the Arctic Climate Impact Assessment Report (ACIA)(the main source of information for the IPCC) here:

RE 74. Hi Dr. Ball. I’m not sure about hadcru, but Hansens method is to average over a 1200km radius.
( after carving the globe into equal area tiles)

I think Hansen 87 has the relevant graphs and analysis. In essence Hansen studied the correlation
in temp between some 50 or so sites and found that at 1200KM the correlation was .6

.6 being a nice number and 1200km being a nice round number ( in metric, of course in the
english system it’s downright ugly) the science was settled. So, thou shalt average stations within
1200km of each other, even when said 1200km crosses oceans, mountains,deserts, and polar ice caps.

One possible reason for some of the reduction in stations is that the FAA had an automation program in the 1980’s that involved closing Flight Service Stations. They were mostly consolidated into 64 AFSS facilities. One of the FSS Specialist jobs was taking local weather observations, so that may have caused some of the US reduction.

Something intuitively bothers me about lumping such diverse climates into these tiles. I can’t quite put it into words what seems so wrong about, for example, lumping all of California and it’s surrounding land and ocean into one tile, but it seems awfully coarse. Can anyone either show why I’m wrong, or why I’m right?

.6 being a nice number and 1200km being a nice round number ( in metric, of course in the
english system it’s downright ugly) the science was settled. So, thou shalt average stations within
1200km of each other, even when said 1200km crosses oceans, mountains,deserts, and polar ice caps.

Next time you need to leave your sarcasm on long enough to note that a correlation coefficient of 0.6 is equivalent to an R-squared of 0.36. IOW, the 1200km radus explains 36% of the temperature variation. Science Settled. Moving on.

John #80,
I think they don’t use satellite data is that the satellite and GISS have diverged lately. Current January 2008 RSS -.629, UAH -.588, GISS +.12. That is the problem. As for coverage I see the tropic and desert seem to have few stations. That is where the AGW group says we should have changes along with the well known Arctic and Antarctic coverage problems. GISS has yet to explain the divergence from the satellite data. Some say its the polar regions but if you do or do not include them it still doesn’t seem to match anything.

Here is a graph with much higher station-to-station correlation in high lattitudes, the correlation is worst near the equator, which is logical as he value of annual average temperature is dominated by winter temperature.

which is actually not bad for interpolation to 1200 km at high lattitudes.
A similar interpolation is done with global heat flow measurements.

RE: Greenland data – From my review, there are two stations with long-term data (at least 1930 – present), Gadthab Nuuk and Angmagssalik, in the GISS station data. For both of these sites, the temp anomaly in the 1930s/40s was about the same as the past ten years or so. The two long-term sites in Iceland (Reykjevik and Akureyri) as well as one (Jan Mayen) on the big island north of Norway and Sweden show the same pattern. So for this part of the Artic, the current temperatures are not much different than historic temps.

Re: Central Park, NYC – If you look at the pre-homgeneization data from this site (i.e., not corrected for UHI), there is very little, if any increase in temperatures since the 1940s. For this specific site, UHI may not be that big a deal in the latter half of the 20th century since the entire area around Central Park was already built up by that time. Since we are looking at anomalies instead of actual temperature, I don’t know that it would play that big of a deal in locations that have been heavily urbanized throughout the temperature record.

Re: lumping different climatic zones into one grid – Again – since what they are looking at are the temperature anomalies for each station, I don’t think that lumping the sites together makes that big a difference.

The easiest way to evaluate the question of whether there is long-term global warming would actually to calculate the long-term temp anomaly trend for an spatially representative set of stations (probably 1000 would be enough) with the most reliable data set and determine if there is a statistically significant increasing trend at each location. If 90% of these stations showed a statistically significant increase, some showed no change, and less than 5% showed cooling, we could be pretty damn certain that there has been world-wide warming. I am sure it would be possible to develop specific cutoff percentages (e.g., 70% warming, 20% no trend, 10% cooling) to determine the answer. Of course, you don’t get a global average temp. anomaly to worry about with this method.

Re: lumping different climatic zones into one grid – Again – since what they are looking at are the temperature anomalies for each station, I don’t think that lumping the sites together makes that big a difference.

Then why grid the earth at all? Why not one big lump? I can’t see any more rationale for lumping the Mojave desert in with the Pacific ocean than lumping the Andes in with the Indian ocean.

Very interesting! However, these correlations must be sensitive to time interval. At a daily frequency, 1200 KM is much too large, though it evidently works well at annual or even monthly frequencies.

Incidentally, Hans’s maps appear to be based on the 1805 Mollweide projection, which I’m coming to appreciate more and more as I think about this. The 1772 Lambert projection is a big improvement over the 200x Hansen projection, but it makes NAm and Eurasia look like they have been worked over with a meat tenderizer, while SAm is horribly crippled. Mollweide is just as equi-areal, yet much more elegant.

The grids that are a mix of land and water are factored to remove one or the other, supposedly. My beef is that if I’m tracking a grid and the variation between coldest and warmest in the year is 100F, why do I care if the global anomaly is off by 1.2 or so, when you can probably pick any day in a year and have much much more of a spread between the high and low of the day? What’s 1.2 if the temperature changes 40 in a day? Plus, do they track and adjust for heat flow from one 2×2 or 5×5 grid and another? I just don’t see it.

My question is why isn’t the GISS, and other like temperature data compared to the RSS and UAH data over the time period Jan 1979 to Jan 2008. RSS and UAH are probably the best and most consistant (within themselves and reasonably between them) temperature data that we have. The problems with these satellite data sets are probably well known.

I realize that there are different baselines etc, etc, but surely the statistical geniuses here can compensate for that (no slight intended). It seems to me that too much time is spent trying to fix the GISS type of data.Lets just compare the two and see what comes out of it I might be interesting.

For example, I came across this site and what struck me about the bar graphs was what a different impression they gave about the period 1979 to 1997 and how dissimiliar the different regions were. I seemed to me, that to a visual approximation, the years between 1979 and 1998 showed no temperature increase. So I asked William Briggs what he thought. His rather completereply was very interesting:

Most graphs of temperature show an simple regression like this:

Dr. Briggs look into the trends more deeply and one of the results was this

Note: No warming from 1979 to 1996 (or 98 depending on the transects). Warming to 2004 and then cooling again.

He also, indicates some cycles, and other patterns of unknown origonand in need of more study to see if they mean anything. Dr. Briggs wisely makes no conclusions (except one) for this analysis, but a layperson like myself sure finds them interesting. I would like to see more comment on them and the same type of analysis done on GISS type data to see if it is at all the same, somewhat the same or completely different.

My question is why isn’t the GISS, and other similar temperature data compared to the RSS and UAH data over the time period Jan 1979 to Jan 2008. RSS and UAH are probably the best and most consistant (within themselves and reasonably between them) temperature data that we have. The problems with these satellite data sets are probably well known.

I realize that there are different baselines etc, etc, but surely the statistical geniuses here can compensate for that (no slight intended). It seems to me that too much time is spent trying to fix the GISS type of data. Lets just compare the two and see what comes out of it. It might be interesting.

For example, I came across this site http://mclean.ch/climate/Tropos_temps.htm and what struck me about the bar graphs was what a different impression they gave about the period 1979 to 1997 and how dissimilar the different transects were. I seemed to me, that to a visual approximation, the years between 1979 and 1998 showed very little no temperature increase. So I asked William Briggs http://wmbriggs.com/blog/ what he thought. His rather complete reply was very interesting:

Most graphs of temperature show an simple regression like this:

Dr. Briggs look into the trends more deeply and one of the results was this:

Note: No warming from 1979 to 1996 (or 98 depending on the transects). Warming to 2004 and then cooling again.

He also, indicates some cycles, and other patterns of unknown origon in need of more study to see if they mean anything. Dr. Briggs wisely makes no conclusions (except one) for this analysis, but a layperson like myself sure finds them interesting. I would like to see more comment on them and the same type of analysis done on GISS type data to see if it is at all the same, somewhat the same or completely different. After all, I would think that at least the trens between the two types of data should be at least similiar.

Thanks Steve for the clarification. The “cool parks” theory doesn’t seem relevant unless you only analyze stations where data is collected in cool parks.

The change-point analysis is designed to find inflection points in a temporal data stream rather than the trend itself. It can only be analyzed well after the fact and perhaps that’s its attraction; it can’t be used for real-time tracking. It’s also a little cumbersome to use, even with large data streams, unless one presumes that the large data stream is in some way systematically correlated.

Re #84 and NYC data:

That’s my point. The raw data, or even the TOB corrected or Filnet data shows little warming over the past several decades. However, the UHI corrected data in that same data file differs from the raw data by as much as 7°F (cooler) for some of the months between 1960-1990, but this “correction” is gradually reduced to less than 2°F by the early 2000s. The effect of this “correction” is to create a hockey-stick trace in the past 40+ years where no such curve is detectable in the raw data. In other words, it’s an absurd manipulation of the data base under the guise of being an urban heat island correction in a city that has been urbanized for over 100 years. Still, I hear nothing but the sound of crickets from the AGW crowd on this particular USHCNv1 data series.

For a quick station coverage animation you can use Microsoft PowerPoint and
paste each one of John’s Station coverage GIF images onto a seperate slide.
You can then step forward or back through the years using the up and down
arrow keys. Down and dirty but works well.

Re:grids, equirectangular projections, etc. Some years ago two solar physicists had an angry debate (thru regular mail and scientific magazines, no internet at that time). One of them had published a paper showing a 14 day period solar oscillation. The other one complained the oscillation detected was half the Sun rotation period so what the first one was actually measuring was an harmonic not filtered by the instruments or the data analysis. They finally agreed on building a computer model (CM) simulating the rotation of the sun to avoid instrumental bias. They built inside the CM a grid for the star feeding each cell of the grid with the actual luminosity (or whatever they were measuring) and then made the Sun rotate using the CM to see the average result for the whole Sun. They discovered everything was useless. The method used to map the grid within the CM had too much influence on the final results and afaik they finally switched – side by side-to a different strategy. Sorry if too much off topic. Best

This has been a very excellent thread. Despite the widespread issues with the land-based data, I do think we can say that there has been a warming trend over many parts of the world in the latter half of the 20th century. However, as shown in #96 above, the current temperatures do not appear to be wildly different than in many periods in the past, at least based on available instrument records (which probably have issues of their own). What gets me is why don’t the GISS folks just come clean with the many issues surrounding the temperature record and put some much bigger error bars on the trend estimates? Also, why do so many insist on keeping hundreds of records of qeustionable quality, yet when it comes to the pre-instrumental records, they are willing to rely on very sparsely distributed proxies?

Well, I asked the question (in the Estimation thread) about whether we were to believe there has been 80+% coverage of the Southern Hemisphere, but I already knew the answer.

I cannot post it, because it’s on paper, but somewhere I have a page from Science around 1999 of a study claiming to tease out a human component of GW. It is accompanied by a map, which shows white for areas with not enough coverage to assess.

As I recall, in addition to the Roaring ’40s, areas not assessed including all of Africa except the coasts, all of souhwest Asia, almost all of the Amazon basin, all of the Tibetan plateau and Sinkiang, and all of Antarctica.

In other words, this ‘global’ average disregarded the three hottest and the three coldest parts of the globe. I call shenanigans.

Sam post #102, that was exactly the gist of my post that was removed. When I posted “corect me if I am wrong” I was expecting to be corrected and learn but not shut.
In the end, this thread and the previous one is displaying the complexity of the disparate temperature data and of the statistical treatment the data is submitted to. Simply put, my question is: is the signal really better than the noise? Is it why this blog is so critical to make scientists who are not statisticians understand the temperature data quality problem in relation to the climate change issue? Isn’t that the reason why as suggested by Leroux, adding or substracting temperatures is meaningless at explaining climate change?Thank you for an answer.

Bob North, (84) I am not sure you are aware of all the issues for the Central Park, NYC site. Yes, very little, if any increase in unadjusted temperatures since the 1940s. However, in the seventies and eighties, the HCN adjusted temperature was about 7 degrees less than actual reading. Now, the HCN adjusted data is 2 degrees less. The decreased differential adds about 5 degrees (F) of reported warming. Meanwhile, the GHCN adjusted temperature(apparently used for global temperature) is almost seven degrees warmer than the HCN adjusted data — and both come from the same thermometer. Moreover, it is not clear that the UHI effect is no stronger today than it was 30 years ago. While inhabitants may have not increased in numbers, it seems to me in my trips to the area that number of workers in the area has increased, visitors to the park have increased, there are more offices in the area, and and traffic has increased. (Again, personal observation — no empirical study.) It would not surprise me if there had been some increased UHI effect even if the population has not increased.

Neverheless, even if NYC has not had a warming effect, I am comfortable with the conclusion that on average temperature are higher than they were two or three hundred years ago. Higher temperatures would explain glacial melt and extended growing seasons. Yet, because of proposed regulations, it is important to get an handle on whether current trends match projections of Global Climate Models if these Models are the basis of adopting regulations.

The USHCN v2 announcement
appears to have been updated, deleting mention of when
the data are to be made available.

Steve: Interesting. It used to say that the data would be available in July 2007. It’s hard not to think that the delays are related to their awareness that scrutiny of this release is likely to be more substantial than the prior release. Maybe they’ll even have proper documentation and technical reports accompanying the release.

at c02 science you can look by state and record the latitudes. If I remember correctly all the sites in vermont save one or two which show some kind of warming actually show cooling . AT co2 science, the state of vermont has escaped all the controversy!
Then I looked over at the Giss data. It is impossible to find most of these sites: they have been eliminated. But the one or two which remain are the warming ones. One station, enosberg looks very different at Giss than at co2 science. most troubling.

Various organizations have collected temperature data from meteorological
organizations around the world. Some continue to do so. Some pass on
much of that data in more or less convenient manners. One such
organization is the US NCDC, a part of NOAA.

Among their collections of data is GHCN V2, a combination of previously
collected (i.e. mostly old) data from numerous sources, and relatively
recent data from some US locations, and some non-US locations that are
MCDW locations, i.e. many fewer locations than were among the collections
of old data. The apparent “drop” in the numbers of GHCN stations was
largely due to the large number of non-MCDW stations that were included
among the collections of old data.

I am going to use this sequence of graphs by John Goetz for a presentation in a couple of days but don’t really understand the following ”

“In the case where multiple … records exist for a station in the given year, I assigned a blue dot if all records were fully calculated from existing averages, a red dot if at least one record was estimated, and no dot if none of the records could produce an estimate.“

Even if one of the multiple records at a location exists continuously without breaks, surely the station deserves a blue dot?