MBH98 Source Code: Status Report

I reported recently on that the recently archived (July 2005) source code multiproxy.f shows that the cross-validation R2 statistic was calculated and not reported.

Today, I’m merely going to summarize some collation details from my inspection of the code, listing input files, output files and providing a lexicon of variables. The source code requires a variety of input files which do not exist at the existing data archive. Under the circumstances, I would have expected punctilious archiving, but this hasn’t happened. In addition, if one crosschecks the steps documented in the new source code with the replication issues listed here , rather few of them are covered in this current code dump.

I’ll return to this in a future post, with a particular consideration of Preisendorfer’s Rule N.

Table 1 is a collation of all input calls in multiproxy.f, showing the directory and file name. There are 3 types of files: temperature, proxy and information (rosters, locations etc.) Comments on these directory/files follow the table. Table 1. Input Files to MBH98 Source Code multiproxy.f

File

Page

Directory

File

Contents

1

7

DATA/JONESBRIFFA/MONTHLY

glb-train-month**.int

Gridcell temperature

2

8

DATA/JONESBRIFFA/MONTHLY

globe-1902.dat

Gridcell locations

3

26

JONESBRIFFA/1854-1993/

globe-1854.dat

Roster

4

26

JONESBRIFFA/1854-1993

globe-1854.mask

Roster

5

26

JONESBRIFFA/1854-1993

glb-long-all*.int

Gridcell temperature series

6

26

JONESBRIFFA/1854-1993

glb-long-cold*.int

gridcell temperature series

7

26

JONESBRIFFA/1854-1993

glb-long-warm*.int

Gridcell temperature series

8

35

DATA/PROXY-ANNUAL/

names-longtemp

Roster

9

35

DATA/PROXY-ANNUAL/

temp-1820.loc

Roster

10

35

DATA/PROXY-ANNUAL/

nome2(j)

Temperature annual

11

14

MULTIPROXY/DATA/nome0/

nome(j)

Proxy data

12

13

MULTIPROXY/DATA/

multiproxy.dat

Roster

13

13

MULTIPROXY/DATA/

multiproxy-proxy.dat

Roster

14

13

MULTIPROXY/DATA/

multiproxy-instr.dat

Roster

15

31

quinn.dat

Nino index

None of the temperature or information files can be matched in the FTP/MANNETAL directory or FTP/MBH98 directory. First, the directory nomenclature is inconsistent with directory nomenclature in both archives. Second, nearly all of the file names lack matches (a couple of output file names do match.)

1. The monthly gridcell temperature files glb-train-month**.int are presumably derived from the file FTP/MANNETAL98/INSTRUMENTAL/anomalies-new (archived for the first time in July 2004) by taking a subset and carrying out interpolations. However, the code for this step is not provided. I have been unable to replicate Mann’s selection of 1082 gridcells using the criteria reported in the Corrigendum SI. The code for making this selection is not provided.The file globe-1902.dat looks like it might be the same as the archived file FTP/MANNETAL98/INSTRUMENTAL/gridpoints.loc, but this is only a guess.

2. The file globe-1854.dat is a roster of latitude/longiture gridcell identifiers. It is not archived anywhere.

3. The file globe-1854.mask presumably identified the 219 gridcells said to have “nearly continuous” records from 1854 and illustrated in a diagram in MBH98, but it is not archived anywhere.

5-7. The files glb-long-all*.int, glb-long-cold*.int and glb-long-warm*.int are presumably derived from the file FTP/MANNETAL98/INSTRUMENTAL/anomalies-new mentioned above. However, they are not archived in this form.The file names-longtemp cannot be identified in any current archive.

8. The file temp-1820.loc cannot be identified in any current archive. It is a list of 10 “long” series. There will probably be some connection to the file FTP/MBH98/INSTR/TEMP/temp.loc, but this has more than 10 series.

9. The files PROXY-ANNUAL/nome2(j) are presumably 10 “long” temperature series. Again this is perhaps connected to the long series in FTP/MBH98/INSTR/TEMP, but the precise series are unidentified.

10. The file multiproxy.dat does not exist in any archive. Nothing remotely like it exists in FTP/MANNETAL98. The older archive FTP/MBH98 contains a roster file FTP/MBH98/PCS/multiproxy.inf , which has the same sort of structure as is contemplated for the file multiproxy.dat.

The file multiproxy.inf has directory calls corresponding exactly to directories in FTP/MBH98 e.g. CORAL/MISC and the file names correspond to file names in the FTP/MBH98 directories e.g. redsea-o18.dat. For the non-PC series, I can see how multiproxy.inf works. For the PC series, it’s still unclear. Here MBH98 calculated PC series in steps – something that was not mentioned in MBH98 itself. Although the Corrigendum says that PC series were re-calculated in each step, there is no evidence of this in the data archives, with PC series calculated fresh for some steps and some networks, but not others and no rationale has ever been provided.

In order to find the right directory, an extra subdirectory level has to be included e.g. TREE/VAGANOV/BACKTO_1750 is one of the PC directories in multiproxy.inf. This method can be used to pick out PC series from different BACKTO subdirectories, but you’d need to have a separate multiproxy.dat for each calculation step (11 in all). None of these are provided.

I think that there may be different rosters with names like backto1820.dat (which was a file emailed to me in April 2003 by Scott Rutherford and is identical to the file multiproxy.inf.) This suggests that there would be corresponding files for other periods, which have not been archived.

In the new archive FTP/MANNETAL98, there are a set of files sort of like this FTP/MANNETAL98/datalist1400.dat etc., but the read-in form of these files is not consistent with the read-in format of multiproxy.dat as the directory information is not in these files.12-14. 12-14. The individual proxy series are read in from files with the form MULTIPROXY/DATA/subdirectory/filename.dat. This form is inconsistent with a read-in from FTP/MANNETAL98 since the proxy series have been collated into matrices in each calculation step. It is consistent with the forms in FTP/MBH98 as long as the multiproxy.dat files are specified right.

The table here shows the various output files written from the program multiproxy.f. The few files that have been archived under the multiproxy.f filename are indicated below with their location: for the temperature principal components calculations, the eigenvectors, eigenvalues and re-standardized and annualized PC series were archived at UMASS in 1999 and are also at FTP/MANNETAL98. The gridcell standard deviations were archived in 2004 at FTP/MANNETAL98. There are 5 rpc series archived in both locations, but these appear to be spliced and the staging is hard to reconcile. The files betas*- and corrs*- are statistics which are collated in stats-supp.htm, subject to withholding discussed yesterday.

5 Comments

Is it possible that Mann originally believed in his hookey stick that much that he made the calculations quickly to get the results published and expected that they would be more thoroughly verified later? Because nobody has later been able to verify what he did he doesn’t dare to disclose everything because the original reconstruction was not properly done. I feel pretty sure that he will never disclose completely the methods and data used in MBH98 & 99.

Warning: Subprogram CSVD argument usage mismatch at position 6:
Dummy arg IP in module CSVD line 2749 file multiproxy.f is used before set
Actual arg IP0 in module %MAIN line 1138 file multiproxy.f is not set

************************** END ftnchek output ********************

I really would be happier if decisions concerning the future of
the world economy did not depend on whether the compiler sets
unitialized variables to zero.

Steve: Could some of the diagnostics be compiler-specific? He’s produced results, so at some level I presume that his program must at least compile.

The ftnchek program is designed to check the syntax of program
vs strict fortran standards. So programs which fail the checks
can potentially give different results depending on the
compiler/CPU mix.

Uninitialized variables are usually set to zero, and most
compilers do have an option to enforce this. But, the people
who write programs with unitialized variables generally do
not bother fiddling with compiler options.

The 470 or so such mixed REAL*4/REAL*8 mode expressions
mean the program is more susceptible to round-off error
creeping in and possibly polluting the quality of any
computation. I noticed that the code had an SVD in there,
if I was doing an SVD I would generally like to maintain
the integrity of all the 15 digits of REAL*8. It should
be noted that there are compilers with options that can
force everything to be REAL*8.

For the record, I like to know what is happening to all
15 digits of my floating point numbers as I add, subtract,
multiply and divide them. Mixed-mode arithemetic and/or
unitialized variables do not constitute good programming
practice.