GISS Step 2

Here are some notes and functions on some work that I did last fall trying to benchmark the GISS Step 2 adjustment in a non-US site. My first efforts to follow the written prescription have been unsuccessful. I’m placing some tools and a benchmark case (Wellington NZ) online and perhaps people who are trying to decode Step 2 from the Fortran angle can push this peanut along a little. All discussion is in R,

Step 2A
This is an optional step that I would encourage people to ignore for present purposes since there is nothing at issue here and I’ve saved the files from this step in ASCII form online. This step finds the “rural” stations within 1000 km and then collects their GISS dset1 histories. For completeness, I’m showing the work here.

First here are commands to download two files of GISS dset1 and dset2 versions (scraped Feb 2008 vintage).

This step leaves us with three data sets for onward use: an information set of 4 stations ( here – HOKITIKA AERO, WHENUAPAI, KAITAIA and CHATHAM ISLAND); a data set holding the dset1 and dset2 versions of Wellington NZ and a data set holding the dset1 versions of the 4 rural comparanda – all located at http://data.climateaudit.org/data/giss/step2/ .

STEP 2
Starting from this step, first we read in the comparandum series, the target series and the information.

Now we count the number of rural stations with at least 3 values. This is done by counting availability and setting the count at NA for values less than 3. Then the range is determined. In this case a range of 1951-1984 is obtained. In this case, dset2 is calculated for 1939 to 1988. Don’t know why this doesn’t reconcile.

count=ts(apply(!is.na(chron),1,sum),start=tsp(chron)[1])
count.adj=count;count.adj[count<3]=NA
#if less than 3 stations, not calculated
M0=range(time(count)[!is.na(count.adj)])
#range of available

Then I calculated the fraction of this range in which there are at least 3 stations (some cases go in and out.) This is not an issue in this example. I haven’t implemented this test yet as there are other conundrums at hand, but will at some point.

#sequentially adds in the rural series
for (k in 2:K) {
j0=order1[k] #chooses the series to be added
delta[j0]=hansen_delta(chron[,j0], reference);#calculates Hansen delta
weights2=W[,j0]
weights1=apply(as.matrix(W[,fixed]),1,sum,na.rm=T)
#g=function(x) g= weighted.mean(x,,1),na.rm=TRUE)

Now calculate the two-legged adjustment over the period in which adjusted values have been reported (this particular period selection has not been replicated in this example, so this is a restricted test.) The two-legged adjustment here is done from the difference between the dset1 version for Wellington and the comparandum series (in column 3), using the following implementation of the two-legged procedure as described in the underlying texts:

The results are illustrated below. The first panel shows the dset1 version for Wellington and the rural reference series calculated above (green). The second panel shows the two-legged fit (green) to the difference between the two series (dashed), compared to the actual adjustment series (solid). Obviously not at all the same. The bottom panel compares the dset2 version to the emulated dset2 using the above adjustment series.

In this particular case, the adjustment is not at all close. OF course, there are other issues here that we’ve visited previously: like why, NASA has been unable to locate data for Wellington NZ for nearly 20 years, but that’s a different story.

In choosing this site, I wanted to stay away form sites that had dozens of comparanda in case clues arose from the versions. I got stuck last fall. What I’d like from anybody that’s been able to get GISTEMP to work through this step is to extract working files for Wellington NZ (50793436001) and we’ll see if we can decode the intermediate steps.

I’ve also collated various scripts and programs from GISTEMP step 2 in the order that I think that they are implemented in one file here , but have been unable to get much of a foothold in understanding the actual implementation of the calculations.

101 Comments

Are they seriously using the Chatham Islands to ‘adjust’ wellington? Wellington is one of the windiest cities in the world – does average windspeed affect the UHI effect? I would have expected wind to provide cooling, and hence reduce the UHI effect. And the Chatham islands are very isolated

Surely there is a clear difference between correcting for UHI by using a station in a nearby and similar geographic environment, and using one in a completely different environment. Do they only care about them having a similar elevation, or what?

Folks, for now, let’s not discuss whether any of this makes sense, please limit remarks to what Hansen is actually doing. There will be plenty of time to discuss things, once we’ve confirmed what he’s done.

Also remember that some of this to-ing and fro-ing probably won’t matter much, in that it’s just making weird weightings of different series – which only matters if some of the series are “bad”. If the data is OK, you can have a pretty crummy method and still not screw weighted averages all that much.

The foolishness of his coding and methodology annoys me as much or more than any of you. But no need to pile on right now.

Could you have a look at the GISS Temp data for southwestern Australia. I have been looking at it to answer a criticism of me on Tamino’s site. It would seem that almost all rural stations in the region were “destroyed” in 1992. Apart from Perth (the capital of Western Australia) and a couple of other major towns that’s where the records stop. Of course Perth has records through to 2008 (a major city). I am sure data are still being collected in the rural parts of the region (after our Bureau of Meteorology is first class) so why aren’t they in the GISS dataset?

Further to my previous post. Nearly all rural stations for Western Australia cease in 1992. The major towns have more current and up to date data. The question is: Why is it that there is 16 years of missing data for Western Austalian rural stations. Is the station data on the GISTEMP website (http://data.giss.nasa.gov/gistemp/station_data/) the same data as that used by GISS for their temperature anomaly calculations? If it is why are more current data for rural stations in Western Australia not available?

#8. We’ve talked about the 1992 issue on many occasions. GHCN (and thus NASA) hasn’t collected fresh data from rural stations in the ROW since the early 1990s, not just in Australia, but in Russia, China, South America. You can get data on the internet that NASA hasn’t been able to locate. It’s pathetic.

Steve “Folks, for now, let’s not discuss whether any of this makes sense, please limit remarks to what Hansen is actually doing.”

I am really trying to keep on this point, as I know you hate diversions.

Can I just ask why he is doing this process at all?

Statistically, what is the effect of transforming a dataset, in this manner, in terms of the size of the SD? Can a transformation of a dataset really ‘improve’ the quality? If it can, it should be possible to estimate, by triangulation, the temperature of one rural site, from three other rural sites.
Has Hansens method been tested in this manner? How close is Hansens method to the ‘real’ value?

The content here for the last few weeks has been nothing short of exceptional (not that I pretend to understand what is going on under the hood of the code). There is alot of bloviating on the topic all over the web, but who is doing any real work in the public forum open to all for critique and collaboration? CA and no one else. The unique experiment that is this blog will be remembered long after we figure out the real delta associated with doubling CO2.

I’m trying to run the R code, but have run into a couple of problems. It seems to be related to the following:

On line 74 in …/step2/functions.txt there is a line ” rbind(info[,c(“id”,”name”,” …. ” that doesn’t seem to belong there. I just removed it from my local copy, and thus got the R program to run a bit further.

My next problem is related to the emulate_dset2(…) function. It seems that when it is called with method=”base” it creates the dset1.monthly data structure. That does not seem to happen when using method=”read”. So, in the “step2.txt” file, it is called with method=”read”, and I get the following:

I’m not really sure how to resolve that one. Seems to help by setting method=”base”, but I’m not sure I will then produce the intended result. And, if I do that, I run into a problem where M0 is not defined a few lines further down, on the line with: temp=(time(Y)>=M0[1])

I still need to familiarize myself with this code AND with the fortran code. Wow, fortran77, now those were the days. I’m not at all ruling out that I’m doing something wrong. I’m using R 2.7.0

Hansen’s compilation is very far from comprehensive and no reson is given why, Some authorities demand confidentiality and/or a fee. For New Zealand, he seems to have access to airports but not to others. Is access to airports some sort of international obligation?

The 2-legged ROW UHI adjustment for Wellington looks the same as John G described in the previous post for the US. So is the difference between US and ROW in this respect just how stations are classified into Urban and Rural?

However, for Wellington, the Urban series itself (dset1) and the emulated rural comparison series overlap for most of c.1880-1985. Wouldn’t the adjustment have been fit to this entire period, with a knee anywhere in its interior less 5 years on each end, and then applied to dset1 for the entire period dset1 is available? Instead, the actual and emulated adjustments are shown as beginning c.1940 and 1950, resp.

The methodology in my “twoleg” function is far simpler and thus more reliable for what they say they are trying to do (and it works fine on this template), but I’m having difficulty seeing that Hansen’s function actually does what it’s supposed to do. To be fair, I don’t have a Fortran compiler and would appreciate results on this template from someone with such a compiler.

The steps that look odd to me are the calculations for xnum1, xnum2 and denom.

The slope and intercept of the last step entitled “linear regression” seem to work OK although the method is a bit roundabout and old-fashioned. But I can’t get the results from the segmented calculations to line up with the construction.

look incorrect to me. Wherever these formulaw come from and whatever their purpose, they don’t seem to yield the correct answer as illustrated by the plot provided in this script. This is something that I’ve just noticed and it’s possible that I’ve incorrectly transliterated something or missed something, but the formulaw themselves look weird.

#25. Thor, thanks for this. That reconciles the subroutine. Hansen’s method is a very roundabout method of doing things, but with the corrected transliteration matches the simpler routine that I had posted as the function “twoleg”. Here was the form of my calculation:

So Hansen’s method of implementing the method could be implemented in a couple of lines, but, given his objective, for this step, his formula can be replicated.

The flip side of this is that my function “twoleg” is a valid and simpler implementation of this subroutine. This is therefore not the reason why my Step 2 implementation does not replicate HAnsne’s Step 2; the difference lies elsewhere.

Now we count the number of rural stations with at least 3 values. This is done by counting availability and setting the count at NA for values less than 3. Then the range is determined. In this case a range of 1951-1984 is obtained. In this case, dset2 is calculated for 1939 to 1988. Don’t know why this doesn’t reconcile.

So I gather, re #20, that the green line in the top panel of the graph is the weighted average of the available rural stations in the 4-station set that was identified, whether there are 3 or not. There are at least 3 only during the period 1951-84 (though not always the same 3), whence this is the emulation period in the 3rd panel.

If Hansen in fact went back to 1939 with this algorithm, he must be picking up another station going back to 1939. Could this starting date be a clue as to which station this might be?

I see you’re using the Arcsine to compute great circle distance, while John G mentioned that GISS in fact approximated this with the straight line (“as the worm burrows”) distance, to avoid the once significant computational expense of an arcsine. Is there a station just outside 1000 KM that might be picked up this way, and that starts in 1939?

#27 Hansen also uses 6375 km as the earth’s radius. Another small adjustment, but perhaps enough to matter in terms of bringing another station into play. I haven’t found it yet, but I also have a bug in my calculation of distances. I get Chatham wrong but everything else right. Go figure.

#2 Hu McCullough, The GISS program doesn’t not need a string of consecutive years with at least 3 rural stations in order to compute the 2 piece linear regression. It looks to me like the first year has to have 3 rural stations and the total number of years with at least 3 must be more than 2/3 of the total number of years in the record starting from that first year with 3. In addition, it has to have at least 20 years with 3 rural stations. I hope this is understandable!

#30. Your commment is understandable but I’m having trouble locating where these tests are implemented in the code.

For reference, Hansen et al 1999 says:

An adjusted urban record is defined only if there are at least three rural neighbors for at least two thirds of the period being adjusted.

Hansen et al 2001 says:

The urban adjustment in the current GISS analysis is a similar two-legged adjustment, but the date of the hinge point is no longer fixed at 1950, the maximum distance used for rural neighbors is 500 km provided that sufficient stations are available,

#31 Steve Mc, The program that combines the rural stations within range keeps a count of the number of stations used in each period. This is call IWT(n). The variable NRURM is set to 3 in a parameter statement upfront. After the program is done combining all of the rural records in range, it starts a loop at statement 191 to see if the combined record meets certain criteria. You’ll see in this loop several checks against NURRM, i.e.3. They check for at least 20 periods with at least 3 rural records in range and then for the 2/3 criterion before trying to fit the broken line.

The “N3″ variables are all associated with keeping count of records with three or more rural stations. N3L is the last one, N3F the first, and N3 a count of all of them.

However, “NXY” is an index into the time series “TS” created when the urban record is subtracted from the combined record. This time series is created when the combined rural record has one or more stations. TS is immediately copied into “F”, and F and NXY are passed to trend2.

I see no check in getfit or trend2 that limits the curve fitting to those years with 3 or more rural stations. It seems to me that, so long as there are enough years with 3 or more rural stations (NCRIT or more), and they meet the XCRIT temporal density requirement, then the curve fitting proceeds with the entire combined series.

In the Wellington case that I’m examining, assuming that the identification of rural stations is correct, N3F is 1951, but the dset2 series starts in 1939. 1939 is the first year that there are 2 rural comparanda. Off the top of my head, I can think of two possibilities: my collation of rural stations is missing a candidate or perhaps the target station is included in counting N3.

#33 John, I think your right about not needing 3 records at the beginning. And I didn’t mean to imply that all of the data points in TS had to have at least 3 rural stations. You’re right that GETFIT and Trend2 don’t check for the 3 rural station condition. All that is done in papars.

#34 I hate to bring this up because I don’t yet understand what’s going on, but there is a final step after GETFIT is run that “finds an extended range.” Perhaps that’s what you are seeing. Maybe John Goetz has figured this part already.

IYXTND = 17 for Wellington, so the code will find LXEND. We know N3L is the last year of the combined record (offset from 1880 I believe), IYOFF is 1880 (I believe), and
IYU2 is IYU1+ILSU(NURB)-I1SU(NURB)+2
where
IYU1=MFSU(NURB)+IYOFF-1
If wkkruse can help me figure out what IYU2 is for Wellington then I think we can find out how Hansen extends the range back to 1939.

For Wellington, the N3 range in PAPars.list, one of the working files, is 1951-1984 with the “Extended range” shown as 1939-1989. So we’ve probably got the right rural comparanda in this example and it’s a matter of decoding thow the extended range is determined in order to push through this little portion of the code.

#37. What about this? IYU2 is probably the last year available in the target urban U-series, which in this case is 1988, since NASA doesn’t know where Wellington NZ is. So they allocate the extension first to lengthen the record into the present and then assign the remainder of the 17 years to the early portion.

So it’s down to this. Is there an undocumented fudge factor operation, manually applied? Is that the stumbling block to replication.

Steve: Please don’t editorialize or jump to unwarranted conclusions at this point. All we know right now is that the code is a mess. However they get results, so it operates. I have no reason to surmise manual fudging, tho there are many other issues with this.

Clearly it looks like GISS tries to make sure the combined series fits exactly their “2/3 rule”, which wkkruse and I have been discussing here. The rule might be summarized as follows:

During the period in common between the combined rural series and the urban series, 2/3 of the years in common must have three or more rural stations represented in the combined rural series.

To this point we have been looking at instances where GISS trims the combined series to meet the rule. For example, if the combined record starts in 1891 and ends in 1990, but the years 1901 to 1940 are made up of fewer than three stations, then the early years of 1891 to 1900 are dropped, and 1941 to 1991 are used to do the adjustment. At least that is my initial guess.

Wellington seems to be a case where they expand the series to meet the rule. In the case of Wellington, there are 34 years in common with three or more rural stations in the combined series. Thus, the rule says the comparison can be made against up to 34/(2/3) = 51 years, so long as the 34 with 3 or more stations fall in that 51 year range. The code I mentioned in #37 is responsible for extending the range.

As we all know, this particular Fortran program is painful to follow, but I have narrowed things down some. For Wellington, some key values are:

For Wellington, the N3 range in PAPars.list, one of the working files, is 1951-1984 with the “Extended range” shown as 1939-1989. So we’ve probably got the right rural comparanda in this example and it’s a matter of decoding thow the extended range is determined in order to push through this little portion of the code.

1951-84 (34 years) is exactly 2/3 of 1939-89 (51 years), so perhaps this is where the 2/3 factor mentioned by wkkruse in #30 and Hansen in #31 above comes in. Other adjustment periods like 1938-88 would still have 2/3 of its years with 3 comparanda, but perhaps there was also an unstated minimum of 2 stations, whence 1939?

Would the regression be run with the full “extended range”, or just with the 3-station range?

# 43 John, I’ve been out having some fun, so I’ve missed a lot here. I think you’ve got the logic correct for the 2/3 rule. That’s why the do that “drop early years” step. But remember they still want at least 20 periods with at least 3 rural stations too. I only have the Fortan code to work from, and I’m not familiar with the data. So I think I’ll just sit back and enjoy this and hopefully learn something.

Taking the three unknown variables from #43 one at a time, beginning with MFSU(ISU), the code develops the value as follows (leaving out a lot of unrelated steps in between):

MF=INFO(1)
MFCUR=MF
MFSU(ISU)=MFCUR

The value for INFO(1) is generated by the toANNanom program, and I believe it is an offset to the first year of the record for Wellington.

So wkkruse, if you don’t mind looking at toANNanom and figuring out if the INFO(1) it writes for Wellington has a value of 0, 1, or 2, that would be great. Keep in mind the first year of record for Wellington is 1881, and that toANNanom does recognize that a GISS year begins in December (although it may not matter here). I am guessing the value is 2.

Let N3 be the total number of years in which three or more rural stations can be combined, with N1X the first year and N2X the last year. N3O = N2X – N1X + 1 is the length of the overlap period. Not all years in the overlap period necessarily have three or more stations.

If N3 is greater than 2/3 of N3O, the algorithm will extend the beginning and/or the end of the overlap period so that N3 is 2/3 of N3O. IYXTND is the number of years that must be added to the overlap period to make N3 2/3 of N3O.

If N2X is earlier than the last year of the urban record, the end of the rural overlap period is extended toward the urban record’s last year. If additional extension years remain, the extension years are applied to the beginning of the combined rural record.

John G, there are epicycles upon epicycles in this calculation by Hansen

From examination of the logs from the run that Arthur E has archived, we can be sure that the 4 rural reference stations are correctly selected here. The difference between the reference composite and Wellington dset1 looks like its going to be pretty close to what’s shown below (solid = reference minus Wellington dset1) for the “extended range” 1939-1989 (3 stations only available for 1951-1984). This would not appear to yield the actual dset2 adjustment shown in dashed, using a two-legged adjustmend.

So what’s going on? It looks like a couple of things. There is an override in which one-leg is substituted for two legs and this can be confirmed for the Wellington case. Also, and I’m just checking this, it looks like pre-extended range information is used to create the adjustment.

In the plot below, I’ve shown two-leg and one-leg adjustments – using reference information from prior to the “extended range” of 1939, the 3-station range ending in 1951. What we can see here is that the actual dset2 adjustment (dashed) is one legged despite a two-legged fit.

The subroutine in the padjust program contains an override in which the two-legged slopes are overwritten by the one-legged slopes depending on the value of iflag. The iflag value appears to be in the log PAPars.list , which shows a flag of 1211 for Wellington NZ, a value which will provoke an override. The iflag value appears to be generated in the program flags.f, which looks for short knees (limit 7 years), absolute value of slope 1 (limit 1), absolute value of slope 2 (limit 1),absolute value of the slope differences (limit 0.5) and a test which seems to be (a) different slope signs and (b) absolute values of both slopes (limit 0.2).

This heretofore elusive calculation is now looking within reach. Arthur E’s logs have been very helpful. It should be possible from available code to determine if pre-extended range results are used in creating the slope.

I might add the 1.5 deg C step in the 1920s that appears to drive the slope adjustment is precisely the sort of step that is used as a homogeneity adjustment in USHCN. So one can see quite different methods at work here.

Steve, nice catch in flags.f, which to this point I had ignored. I feel like STEP 2 is a Ronco commercial – “but wait, that’s not all!!”

I would be interested in the rationale behind the conditions in which a one-legged adjustment is favored over two-legged. The 7-year limit is interesting, given that five years are already factored into the beginning and end of the overlap period in PApars.

I presume that the y-values are all in units of 10 times deg C as otherwise the values don’t make sense. Below is a plot on that basis, plus values in the early 1980s are centered at 0 to match the dset2 adjustment. On this basis, the one-legged adjustments more or less match over the 1951-1984 period with 3 stations. The slope here is determined not over the 3-station period but includes value back to 1880 with only one station. Also the values in the “extended range” are not those from the slope, but are simply the 1951 and 1984 values extended. I expect that this is what further parsing of the code will show, but the code is like tangled fishing line.

The holding directory at ftp://data.giss.nasa.gov/pub/gistemp/ for GISTEMP working files was created for the first time on June 10, 2008, just after JOhn G initiated the present attention on June 8 http://www.climateaudit.org/?p=3169 . In the past, we’ve noticed NASA responding to CA posts – needless to say without linking here – and the present “coincidence” seems like another example.

This is an annual anomaly calculation. I extracted Chatham Island which is used in the Wellington NZ calculation and compared it to the annual anomaly (Hansen style) that I had calculated from the GISS dset1 version that I scraped in Feb 2008. Most values were identical, with some being 0.1 different presumably due to mysteries of Hansen rounding.

But there were (and are) some puzzling differences. The binary version has values for 1880, while the Feb 2008 version commenced in 1881. I re-downloaded the current dset1 version and it also started only in 1881. Where did the 1880 values appear from? Who knows? This is NASA.

The only other differences were in 1989, where the differences were as much as 3 deg C! I re-downloaded the current dset0 version and compared it to the versions scraped a few months ago and compared the dset0 versions (there were 7 of them.) dset0 series 1-5 for Chatham Island were file identical. dset0 series 6 had a few more months of data in 2008 and added some months that were previously NA in 2007, but otherwise was file-identical.

So they’ve obviously located a transcription error in this version and corrected it between Feb 2008 and June 2008 either at the GHCN stage or GISS stage. Fair enough. But this error has been corrected nearly 20 years after the collection of the data and only after Climate Audit commenced examination of this particular calculation. Needless to say, they didn’t issue an error notice nor advise us of the change.

The impact on dset1 values is attenuated but still noticeable (indeed, that’s what I originally noticed.) Here are the monthly dset1 anomaly values calculated from the Feb 2008 scrape:

Because Hansen erases his history, I can’t go back and double check against actual Feb 2008 values at NASA, only against what I scraped and it’s not beyond the realm of possibility that I’ve been wrongfooted somewhere in trying to extract results from the nasty NASA formats. But in this case, I’ve got indpendent scrapes of dset0 and dset1 versions showing an error in the same place, so I’m pretty sure that I’ve not introduced the error as a scraping artifact.

So why were QC efforts at GHCN and NASA incapable of identifying the huge summer (Jan 1988) errors in one of the Chatham Island versions, especially given the existence of multiple independent versions, most without the transcription error? How could a summer value of 1 deg C survive even the most cursory quality control?

It would also be interesting to know how the error was picked up between Feb 2008 and now? Who picked it up? Who made the correction?

When I was young, there was an annoying novelty song about a crazy toy – it had lyrics like It went zing when it spinned, …., with the narrator trying to guess the purpose of the crazy toy, ending with a comment that “I guess we never will” [know the purpose]. Now that it’s in my head, I can’t get rid of it, but I can’t quite remember it either. Argggh.

GISS calculations seem like that most of the time. Lights popping, gizmos moving, binary files appearing, disappearing, re-appearing. Pointless calculations. At the end of the day, moving about 2 inches. Sort of like the crazy toy.

I have two copies of v2.mean from GHCN. One is from February of this year and the other is from August 2007. In both cases the Chatham record ends with 5079398700055, meaning there are six versions. What is the source then of this seventh version GISS uses?

I figured out where the seventh version of Chatham Island comes from. It is not GHCN. In Step 0 GISS combines in a bunch of Antarctica station records. In the input files for Step 0 one can find the station lists and their records. antarc2.txt contains the mysterious scribal record for Chatham.

Here is where it gets interesting. Looking at the Chatham record on the UK site, it still contains the errors described by Steve in comment #61. The version on the GISS website contains the corrections / changes Steve noted.

The timestamp on the antarc2.txt file buried in the latest GISTEMP sources (as of today) is June 9 at 12:35 PM. I have a version of the GISTEMP sources I downloaded on June 2. In that version antarc2.txt has a timestamp of February 28 at 6:57 PM, and in that version Chatham matches the record presently on the UK site, errors and all. It is pretty obvious then that GISS made the changes.

Clearly the record on the UK site has some errors, and it makes sense GISS would want to correct them. However, I find it interesting that GISS made the changes without seeming to involve the source of the data. I wonder how the new values were derived?

#68 Steve…they are manually editing, which is fine, but what are the edits based on? How do they know that a 1.0 in one case should be 14.1, and in another it should be 16.3? It is good that they noted a manual change occurred, but it would be better if they noted how the values were derived.

I had noticed last year that a number of the data points flagged in the GHCN quality control file (and hence removed from the v2 dataset) were easily recoverable transcription errors, such as an incorrect sign or an inadvertent shift of several months worth of data. Easily recoverable and easy to document the reason. A lot of the Meteo data fit the same bill – before it mysteriously disappeared last month.

If the missing and incorrect Chatham Islands data were so easy to recover by hand, then document the methodology.

Regarding #65, I do find it a perfect description! It is this crazy toy with thousands of fancy features, none of which have a meaningful purpose. I also think my analogy to Stephen King’s The Jaunt is apt as well. It is a science fiction / horror story, evoking the haunting phrase “It’s eternity in there…”

#71. Doncha feel exactly like the kid in the video- seeing two strange green GISTEMP eyes, or pressing on GISTEMP buttons and watching it go whrrr and disappearing under a chair. And Hansen says that he disdains jesters.

RE: 68 and 69. If GISS is continuing to edit, correct, and obfuscate online data in real time in response to the diligence of outsiders without posting reasons why or methodologies, I would think that Michael Griffin should be notified. It has really gotten out of hand.

I’ve had a bit more luck decoding this sucker. The black jiggly series is the difference between the “reference” composite of 4 “rural” stations and the Wellington dset1 series. The dashed red line is a fit over the 1951-1984 period in which there are 3 stations; and the black dashed is the difference between dset2 and dset1. It looks like the adjustment in this case is coerced back to a one-legged adjustment and that the end values of the 1951-1984 adjustment are extended to the “extended” dset2 range of 1939-1988. I’ll pull these together in a longer note.

Given the black series, does the black dashed line represent any sort of logical estimate of required adjustment? In this case, the procedure makes no sense whatever. I didn’t pick this data set in order to embarrass them; as readers here know, I picked this one in advance because it had enough texture to illustrate the procedure and not so many comparanda as to make reverse engineering too difficult.

Steve, getting very close. Any thoughts on why there is a displacement between your red adjustment and the GISS adjustment?

It would be good to know from our New Zealand friends how the rural stations evolved in the period 1939 to 1988. It looks like Hokitika Aero started as a grass field and now has two paved runways. No idea though on the exact location of the station then and now. Looking at where the coordinates for Whenuapi land me in Google Earth, that station is probably at the airport as well.

It looks like Hansen re-zeros the adjustment at the closing values, tho I have to confirm this. HEre’s a plot of the deltas between the observed dset2 adjustment and the emulated adjustment. For this station, the calculation is matched up to Hansen-rounding, which I’m not going to worry about right now. I can only take so much of this dreck at one go.

I spent a number of hours today making a coherent data frame that didn’t fail with some of the bizarre NAs in the dset1 data base. There are sure a lot of land mines in this data set.

Steve Mc, some of the craziness you observe is probably because a bunch of this was written for nine track tape.

Ninth track -wasn’t that used for parity (parity bit)?

I believe it was in the case of our Pertec drives on the 990/12’s (990 series was a “16 bit” machine with 8-bit byte addressablity). Pertec tape drive: rack-mounted using vacuum-tecnology tape ‘accumulators’ where tape was looped off the supply and take-up reels into ‘wells’ with vacuum supplied at one end as a means to reduce the inevitable inertia experienced with reels (sometimes full of tape) for rapid back-and-forth tape movement past the read/writes heads. A quick bit of googling shows the ninth track was used for parity on other machines series (incl. VAX) as well.

#88. No, Hansen’s pattern is to have the last adjustment at 0. So if the emulation ends at a non-zero value, deduct the closing value from the series as a whole. I’ve also done the re-scaling back to the dset2 monthly version (CEntigrade, not anomaly) and have that up to 0.1 deg C in sporadic years, with most at 0.0. The adjustment ends in 1989 rather than 1988. Merely having a couple of values in 1989 triggers the extension algorithm. I should be able to get the last squeak on this case.

I’m not sure what records Hansen uses for NZ, presumably the raw numbers ? Because, the NZ Met service carries out its own adjustments. The raw temperatures for most NZ sites have tended to trend downwards, it is only the adjustments that give them a positive slope. I’ll try to find some further info on these adjustments.

Here is a short primer on aspects of surveying of the Earth. Its purpose is to show that as technology advances, so do corrections to past data.

Hu McCulloch might find interest in map projection maths.

Steve Mc., there might be reasons within this unpublished paper why GISS data are changed frequently by small amounts. I am not saying that the amounts are often significant values, only that they do cause recalculation especially for satellite and military work. You might be able to correlate some past dates with adjustments to the surface temperature record.

There are more papers available, showing how some of the equations in this paper are derived; and more on satellite accuracy.

It remains the case that absolute distance measurements do not exist in this type of work. For example, the acceptable answer might depend on the number of terms used in a series expansion like Taylor’s.

It’s not the value of a change that I’m referring to, it’s the fact of a change. I do not know the answer to this, but if a new calculation for (say) satellite calibration adjusts a distance, does this mean that the whole GISS system is adjusted? There has to be some reason for the frequency of adjustments.

Re # 96 Steve McI, let me know if you already have the public CD of some 1200+ Australian B o M stations. The CDs are the best means I know for getting ROW data from here.

#91 I think You are confusing Chathams with some other island (Stewart?). Chatham Islands are on the Chatham Rise about 800 km almost due east of Christchurch, i e just north of the middle of the South Island. A fairly good analogue would be using data from Bermuda to correct the record for Washington DC.

Rain in Seine … I think I have read all CA posts for the last 2 years (E&OE) and enjoyed Maine. That is, if it is possible to enjoy bad science, errors going neglected and the rest of the lessons therein.

[…] together with a revised “credit” here. One of the 6 stations is Chatham Island, where I noticed a problem on June 13, 2008 when I was trying to replicate GISS methodology using Wellington NZ as an example. John Goetz […]