UC on CCE

I then compared verification statistics for the different reconstructions as shown below. OLS yielded much the “best” fit in the calibration period, but the worst fit in the verification period.

If OLS is equivalent to ICE, it actually finds the best fit (minimizes calibration residuals), and in proxy-temperature case makes the most obvious overfit. Let me try to get an understanding of the methods applied. As we know, core of MBH9x is the double pseudoinverse, in Matlab

RPC=P*pinv(pinv(TPC(1:79,T_select))*P(903:end,:));

where P is standardized (calibration mean to zero, calibration de-trended std to 1) proxy matrix, TPC target (temperature PCs) and T_select is that odd selector of target PCs. After this, RPCs are variance matched to TPCs, and brought back to temperatures via matrix multiplication. As you can see, I can replicate MBH99 reconstruction almost exactly with this method:

Differences in 1400-1499 are related problems with archived data, and in 1650-1699 they are due to unreported step in MBH procedure. Steve and Jean S have noted these independently, so I’m quite confident that my algorithm is correct.

I’ve considered this method as a variation of classical calibration estimator (CCE) and Steve’s made a point that this is one form of PLS. These statements are not necessarily in conflict. Original CCE is (with standardized target)

where matrices and are ML estimates of and , obtained from the calibration experiment with a model

By setting , I get exactly the same result as with double pinv. Which verifies my observation that MBH9x is CCE with special assumption about proxy noise and with incorrect variance matching step after this classical estimation.

Back to OLS (ICE) estimate, which is obtained by regressing directly X on Y,

this is justified only with a special prior distribution for , which we don’t have. Thus OLS is out of the question. Yet, it is interesting to observe that OLS is actually a matrix weighted average between CCE and zero-matrix (Brown82, Eq 2.21) :

It would be interesting to compute MBH99 results with real CCE, but S does not have inverse for AD1700 and later steps. But results for up to AD1700 are here, CCE:

ICE:

As you can see, variability of ICE is lower, as it is weighted towards zero. But hey, where did that hockeystick go ?

87 Comments

In MBH98, Mann says that “conventional approaches” were “relatively ineffective”:

There is a rich tradition of multivariate statistical calibration approaches to palaeoclimate reconstruction, particularly in the field of dendroclimatology where the relative strengths and weaknesses of various approaches to multivariate calibration have been well studied 14,15. Such approaches have been applied to regional dendroclimatic networks to reconstruct regional patterns of temperature 16,17 and atmospheric circulation (1820) or specific climate phenomena such as the Southern Oscillation 21. Largely because of the inhomogeneity of the information represented by different types of indicators in a true multiproxy network, we found conventional approaches (for example, canonical correlation analysis, CCA, of the proxy and instrumental data sets) to be relatively ineffective.

This is one of those posts which makes me want to go off and study statistics, but I don’t know when I’ll get a chance. I was able to follow it enough to decide it’s probably an important one and perhaps easily converted into a journal article. Will there be feedback from real climate people? Who knows but from previous history, probably not.

BTW: how do those track-back rats glom on to new posts here so quickly? Are they subscribed to some RSS feed you could block them from?

Largely because of the inhomogeneity of the information represented by different types of indicators in a true multiproxy network, we found conventional approaches (for example, canonical correlation analysis, CCA, of the proxy and instrumental data sets) to be relatively ineffective.

ineffective= no hockey stick ? 🙂

Dave,

the reason why I think this is important:

1) If you diverge from conventional approach, you have millions of degrees of freedom to derive any conclusion you please.

2) Statisticians have made quite an effort to calculate the CIs with the conventional approach. All that is gone when you apply these modified approaches.

I have to admit that when the snake oil salesman came to town (i.e when MBH98 was published), I bought in rather uncritically. I took them at their word that they had developed a new and better method capable of extracting signals that traditional methods could not. This goes to illustrate the cost of assuming other people know what they’re doing. Most of all it proves the value of transparency of calculations and of accountability for choices made in analysis.

MBH should have published their method in the statistical literature before publishing the Nature paper. And the Nature Editor who let this slip should also be accountable for this major error. In the future Nature should stick to nature and leave the math to the mathematical journals. Never again will I trust another mathematical argument in Nature, unless it is 100% clear that the mathematical methods have have had time to season in the peer-reviewed mathematics literature.

#6. When you think about it (and I’d be interested in UC and Jean S’ thoughts on this), it’s very difficult for a statistical method to be so incorrect that the algorithm itself can be said to be “wrong”.

When you add the withdrawal of the Rutherford et al 2005 RegEM code in June 2007 to MBH98, you now have two out of two cases up to 2006 where Mann has introduced his own statistical program to great fanfare and where defects that do not rise above programming errors have been identified. As Mann said to NAS, he is “not a statistician”.

As with MBH98, Rutherford et al 2005 was not published in a statistical journal, but in J Climate, once again without establishing the statistical properties of the method. It had several programming defects, any one of which would have been fatal.

Mann et al 2007 is a third go, where Mann is once again publishing a complicated and unknown statistical method in a non-mathematical journal. From a statistical point of view, the lacunae in the presentation are astonishing: the method is tested against a climate model with a signal but not against other methods and not against data sets that are contaminated.

Given this track record, it’s amazing that JEG and Kim Cobb seemingly view Mann et al 2007 as some sort of resolution or vindication of the HS controversy rather than as road kill waiting to happen – especially when he hasn’t even bothered fixing his PC methods.

Which verifies my observation that MBH9x is CCE with special assumption about proxy noise and with incorrect variance matching step after this classical estimation.

I would also add a rather special assumption that the targets are PCs. This is rather strange if you think it in terms of assumed calibration experiment model! For instance, tree rings are assumed to be determined by a linear combination of (global) temperature PCs plus noise. Talking about teleconnections…

#7 (pk): UC is essentially arguing that MBH9x can be viewed as classical calibration estimator (CCE) with some rather strange additional assumptions and additions. Hence, there is not really anything very novel about the MBH method unlike the team is arguing. Then he proceeds to show what happens if one of those additional strange assumptions (noise covariance S=I) is removed.

#10 Because then manuscripts heavy in experimental statistical methodology that are sent to, say, the Journal of Rubbish Climatology could be re-directed (in whole or in part) by an Editor or reviewer to a venue that is more appropriate: a Journal of Statistical Climatology, published by, say, the American Statistical Association. Yes, authors with a novel idea may prefer to submit to JRC, but they do not have the only say as to how novel ideas are routed. The existence of an alternative would free the climatological reviewer from the responsibility of having to review the novel statistical portions of a paper. He could reject out of hand as “inappopriate for JRC; please submit to JSC“. And, guaranteed, JSC would have a rigorous standard that certain folks discussed at CA could probably never reach. And finally, methodologically flawed papers that somehow managed to get published in JRC could be criticized openly in JSC.

I actually had that idea brought up when I was at Pielke’s conference in August I had lunch with Robert Lund, of Clemson, who presented “Change point detection in periodic and correlated settings.”, which was about detecting station moves in temperature data. Curiously he started the presentation with detecting a changepoint in Mauna Loa CO2 data.

Lund worked on helping NCDC setup USHCN V2, and then promptly told me when I asked his opinion of it that “they did it wrong”. He opined that a “Journal of Statistical Climatology” was something he was thinking about starting to address such concerns. I agreed and encouraged him to do so and mentioned Steve M’s work.

I think there is no time better than the present. In fact, I think this entire idea could be bootstrapped with a minimal cost. It would provide a venue to promote new analytical ideas, encourage review of methods, and provide a forum for review.

I feel this is very important, and that it can offer much in the way of contributing to the quality of Climate Science.

Then these kinds of conversations must have been occurring at the ASA workshop. One wonders what the results were. Surely there are backroom conversations going on about putting together a strategic NSF grant application to develop a working group on statistical climatology? Yes, an online journal would be cheap to set up and run, but cash always helps.

RE16 Bender, I’m thinking of a business plan here, not for profit, but to get the journal up and running to a point of self-sustainability. How much cash do you think it would take?

As I see the steps in the simplest form they would be:

1) website, with mission and standards,
2) a call for editors and reviewers
3) a call for papers
4) a call for subscribers, and sustaining members
5) publish first journal online, in PDF, and in printed form
6) rinse and repeat

#20: There were a few posts back complaints about important comments being buried in the flood of comments. So now that Steve has raised few comments (originally meant to be comments like this one by UC), which I suppose he felt were interesting, as separate posts, we hear complaints about posts being incomprehensive. If Steve had asked UC to make a real post about the topic, I’m sure we would hear complaints about his anonymity like happened last time. If anonymity issue could be solved, I’m sure someone would complain about English. If that could be solved, I’m sure there would be other things to complain about.

C’mon people, what do you expect? Steve, UC, me, or other people are not paid anything to do this. This is a blog, not a scientific journal nor a popular science magazine. UC had probably spent quite a few hours of his spear time to give you the above information. Steve has better things to do than start expanding acronyms and explaining background. Kind of bugs me that all they get for their hard work is a collection of complaints.

UC, I’ve got an idea on what might be happening in the 1400-1450 and 1650+ interval. We’ve been assuming that Mann’s description of the number of TPCs used is accurate. Nothing else is accurate – why should we assume that this is an accurate description. If you look at this post http://www.climateaudit.org/?p=776 , the impact of adding a Temperature PC to the reconstruction can be very large. Maybe he uses 2 temperature PCs in the AD1400 step?

M. Jeff, why dont you stop suggesting and giving advice to other people what to do, and do it yourself? What prevents you updating the acronyms page? Just send the ones you want to add to Steve, Im sure hell add them. If you want to open a parallel web site, open one. If you want better posts, write better ones to your parallel web site. If you want a ClimateAudit wiki, open one. If you want more articles to your ClimateAudit wiki, write more.

I think Ive had enough of these complainers for a awhile.
[lurking mode on]

I agreed with “braddles”, who had what seemed like a constructive suggestion and I also asked some questions. Steve McIntyre seems to be a perfectionist when it comes to statistics and has an amazing memory and grasp for the issues, and that is one of the reasons that I voted daily for this as the Best Science Blog site, so don’t prejudge my motives. I am in awe of the amount of work that Steve does here, and could not imagine that he could do more. I am not qualified to assist in the implementation of any of the tweaks that have been suggested.

Other’s have suggested that more of Steve’s work should be published in peer reviewed publications. I agree, but that does not mean that I am complaining, such a thought is a compliment. I will continue to try to understand the gist of the issues.

UC, Ive got an idea on what might be happening in the 1400-1450 and 1650+ interval. Weve been assuming that Manns description of the number of TPCs used is accurate. Nothing else is accurate – why should we assume that this is an accurate description. If you look at this post http://www.climateaudit.org/?p=776 , the impact of adding a Temperature PC to the reconstruction can be very large. Maybe he uses 2 temperature PCs in the AD1400 step?

RPC5 is here and starts 1650. Might be possible to reverse-engineer this out.

Try looking at the link below for data1400.txt and data1450.txt. I can’t find 1650.txt, but I bet there is one. Comparing the proxies used in 1600.txt vs. 1700.txt and the dataall.dat file, there is data used in 1700.txt that goes back to 1650, but not to 1600.

#31. We know that data set. Either Mann’s data archive is not what he used in MBH98 or there’s a still unreported methodology.

You know, it’s funny, but the inconsistency that UC plotted here is what led to the Barton Committee and the NAS Panel. When I was interviewed by the WSJ in early 2005, despite identifying several major errors, I said that I still wanted to see his code and data because there were still discrepancies. Any auditor in the world would refused to sign off with discrepancies such as the ones shown here. The WSJ asked Mann about it and he said that he would not show his algorithm as that would be giving into intimidation tactics – as though me, a private citizen in Canada – had any authority or jurisdiction over him. The WSJ reporter wrote Mann’s answer up and, to anyone other than a real climate scientist, anyone that actually had a job where they had to report to people, it looked bad. It caught the eye of the House Energy and Commerce Committee and one thing led to another.

But after all those folks looked at this, and even after Mann produced his source code, he produced incomplete source code that didn’t work with any data on archive and still no one knows how the exact 1400-1450 results were obtained. (However we reconcile perfectly to Wahl and Ammann’s version which was subsequent to ours.)

The number of PCs retained in various proxy sub-networks were determined for each independent interval of the stepwise reconstruction based on a combination of objective criteria (modified Preisendorfer rule N and scree test)

1) They do matter, with conventional statistical approach the hockey stick disappeared. And this is completely independent of BCPs or no BCPs issue. If someone wants to make the full reconstruction with singular S, I suggest to read and implement [1]. Maybe even Sundberg would be interested on this.

2) In science, you should move on after the error elimination phase. Why RegEM if there’s nothing wrong with MBH98 algorithm?

#34. The situation with Nature is actually much worse than simply accepting the above language. This is not from MBH98 but from the Corrigendum. The original article said (without explaining) things like:

An objective criterion was used to determine the particular set of eigenvectors which should be used in the calibration as follows. Preisendorfers25 selection rule rule N was applied to the multiproxy network to determine the approximate number Neofs of significant independent climate patterns that are resolved by the network, taking into account the spatial correlation within the multiproxy data set. Because the ordering of various eigenvectors in terms of their prominence in the instrumental data, and their prominence as represented by the multiproxy network, need not be the same, we allowed for the selection of non-contiguous sequences of the instrumental eigenvectors. We chose the optimal group of Neofs eigenvectors, from among a larger set (for example, the first 16) of the highest-rank eigenvectors, as the group of eigenvectors which maximized the calibration explained variance.

You’ve quoted a section not from the Corrigendum but from the Corrigendum SI:

The number of PCs retained in various proxy sub-networks were determined for each independent interval of the stepwise reconstruction based on a combination of objective criteria (modified Preisendorfer rule N and scree test).

In addition to being unintelligible as written, it also differs from the original statement and should have been in the body of the Corrigendum.

I have been able to determine that the Corrigendum was not externally peer reviewed (merely edited) and that the Corrigendum SI was not even reviewed by Nature editors. Nature said that their policies do not require them to peer review Corrigenda – a surprising policy when you think about it: you’d think that these are the situations that require particular care, rather than the opposite.

The Corrigendum SI was not even reviewed by Nature editors, much less peer reviewed.

Technometrics! Thank-you for mentioning my favorite statistics journal. Great Statisticians applying novel techniques to real-world problems. If there is to be a model for the proposed “Journal of Statistical Climatology,” Technometrics is it.

Now, off to my basement to see if I have that 1989 issue stuffed in a box somewhere.

I apologize in advance if this is way off and I expect Steve to delete it, but I followed the second link in #28 and reviewed the Algorithm discussion in the Methods section and found this closing sentence:

“It should be stressed that reconstructions that did not pass statistical
cross-validation (i.e., yielded negative RE scores) were deemed unreliable.”

Isn’t this basically saying MBH98 is a curve fit and any reconstructions that didn’t support the desired curve were discarded, based solely on the fact they had negative RE scores?

I’m finding this very interesting, but like pk (#13) had a hard time figuring out what was being discussed. It turns out that CCE and ICE are defined on UC’s Filtering Theory page, which is linked to his identifier on this blog. There was also some relevant discussion of PLS at thread # 1338 last April.

It turns out that CCE (Classical Calibration Estimation) is very clever — Suppose we have a bunch of proxies Y and a target variable X (global temperature), and a lot of observations on both during a calibration period. Ignoring the intercept for the moment, our model is
Y = XB + E,
where B is a vector of proxy-specific slopes. We can estimate this equation proxy-by-proxy across time by OLS or, perhaps more efficiently, by what econometricians call Seemingly Unrelated Regression (SUR), which takes into account the covariances of the errors across proxies. (This may be what UC means by ML estimation)

Then in each reconstruction period, we observe Y’ and want to know what X’ was. The above equation is still valid, but we can’t tell if X is multiplying B or vice-versa. So instead of taking the data as given and estimating the coefficients, we take the coefficients as given (as estimated during the calibration period) and just estimate the data! This is a separate cross sectional regression across proxies for each reconstruction date.

OLS will give unbiased estimates of X’, but Generalized Least Squares (GLS or Aitken’s formula) is more efficient, so CCE computes G, the covariance matrix of the E’s across proxies, or equivalently, UC’s S which is proportional to G, and uses it to do GLS.

Using XT to represent the transpose, the pseudoinverse (pinv) discussed is (according to wiki) simply inv(XT*X)*XT, the matrix that generates OLS results. So the MBH double PINV is apparently just doing OLS on OLS, and therefore is equivalent to CCE, but with OLS in the second step rather than GLS. This is not “incorrect” or “strange” as charged by UC and Jean S (#10), but at worst merely “lazy” and “inefficient.” Note that in GLS, setting S = I is equivalent to setting S = vI for some constant v, so that the double pinv formula does not really make a specific false assumption about the variances. GLS will estimate v and incorporate it into its standard error calculation.

Although second stage OLS is unbiased and not unreasonable, the usual OLS standard errors will be wrong, since they do not take into account the unequal variances and correlations between the proxies. This can be corrected using G (or S). But MBH don’t even attempt to compute the OLS standard errors of their reconstructions, as discussed on another recent thread here, so this point is moot at it stands.

However, a big problem I do see with CCE (or double pinv) is errors-in-variables bias that arises because B is in fact measured with a lot of error in the first stage regressions. This bias generally makes the coefficients too close to zero, so that the reconstructed temperatures will be too flat. I.e. no MWP, no LIA — sound familiar? (In fact, UC shows that CCE doesn’t even give a hockey stick…) This bias can be corrected with a formula that goes back to Milton Friedman’s Theory of the Consumption Function, but that’s getting to be a lot of work.

So at this point, ICE (Inverse Calibration Estimator) is starting to look good: Just run X on Y during the calibration period, even though the causality goes the other direction, and then use this to estimate X’ from Y’ during the reconstruction period. If there are more proxies than calibration observations, this regression won’t even run, but then it would actually make sense to extract principal components (PCs) from the proxies, and consider only the first few of those instead of the whole proxy set.

Of course, MBH’s “creative” construction of their PC’s (to put it charitably) raises a lot of issues. Mann on RC objected to the MM critique that if the standard calibration were used in place of their HS-generating one, their stripbark-dominated PC1 would still come through, as PC5. This raises some interesting data mining issues in addition to the stripbark matter, but I’ll leave that for now.

PS: Steve, when you delete or move inappopriate or off-topic comments, would it be possible to leave the comment number, so that cross-references remain intact for the record? Thanks!

#41: You are getting into the Mannian world 🙂 I generally agree what you say, a correction though:

So the MBH double PINV is apparently just doing OLS on OLS, and therefore is equivalent to CCE, but with OLS in the second step rather than GLS. This is not incorrect or strange as charged by UC and Jean S (#10), but at worst merely lazy and inefficient.

UC or I were not referring to that. UC referred to the “variance matching” which happens after the CCE estimation. That is, after the targets (instrumental PCs) have been estimated, MBH matches the variances of the estimated PCs to the true variance of the instrumental PCs. I referred (by “strange” in #10) to the same thing as UC plus the thing I mentioned in #9 and the asumption S = I (in reality you cannot even assume even that the proxies has equal variance noise).

My writings are mostly based on Brown’s Multivariate Calibration, excellent paper that I can send upon request.

So instead of taking the data as given and estimating the coefficients, we take the coefficients as given (as estimated during the calibration period) and just estimate the data!

Yep, leads not to ML estimate, but according to Brown it is quite close. However, in proxy-temp case I wouldn’t count on it. So much noise, u know 😉

This is not incorrect or strange as charged by UC and Jean S (#10), but at worst merely lazy and inefficient. Note that in GLS, setting S = I is equivalent to setting S = vI for some constant v, so that the double pinv formula does not really make a specific false assumption about the variances. GLS will estimate v and incorporate it into its standard error calculation.

Using S=I changes the result, though. In addition, at AD1700 step (and after that ), S would be singular. Not sure how good S=I assumption is after that, but note that same happens for ICE at AD1730 – there’s no S, but inversion problems with YT*Y. (and pinv(X) or (XTX)\XT are not equal then)

But MBH dont even attempt to compute the OLS standard errors of their reconstructions, as discussed on another recent thread here, so this point is moot at it stands.

I guess that occurs when there are more proxies than calibration observations? But couldn’t you still do CCE with Principal Components of the proxy data? How then do you get answers after 1700? By using a truncated pre-1700 proxy set?

How then do you get answers after 1700? By using a truncated pre-1700 proxy set?

Mann’s steps are 1000-1980 , 1400-1980, etc. I just stop after 1600-1980 step.

Uncertainty of B should not directly affect the CIs in the second step, just the point estimates. But thats bad enough.

It should affect the CIs. Simple example, assume that true B is a zero matrix. You’ll get non-zero in the first phase, quite high calibration R2 and low verification R2s. And calibration residuals are not informative at all. Errors in B act as are scaling errors in the second phase. Scaling errors are not independent of X’. For much better explanation, see Brown’s paper or,

The construction of confidence regions for X’ is of particular importance as analysis of errors associated with the various estimators of X’ has not generally received enough attention in the literature

cc. Editors of Nature and Science

Please let me know if you can find his proof. Meanwhile, I have found Browns paper on JSTOR and will take a look at it.

In general, p(X’|Y’) = p(Y’|X’)p(X’)/p(Y’). In the classic regression of Y on X with known variance, p(Y’|X’) is Gaussian (or Student t to be precise) with a mean that is linear in X’ and a variance (given by Brown) that depends on the standard errors of the regression coefficients from the calibration step as well as the error of Y’ about its population mean.

If the (X, Y) are bivariate Gaussian and pairwise independent, the prior p(X’) is the marginal of the joint distribution observed during the calibration period, and ICE gives the correct posterior. A Y’ say 5 sd outside calibration experience can give a point estimate of X’ that is outside experience as well, but it will be attenuated by a factor of R2, because of the strong pull of the Gaussian prior. But this prior is much too strong for most paleoclimatic questions.

If instead we let the prior for X’ be uninformative, i.e. uniform, the posterior for X’ looks just like p(Y’|X’), but as a function of X’ holding Y’ at its observed value. Unfortunately, this function integrates to infinity, so that after normalization (division by p(Y)), the posterior is everwhere zero, i.e. just as uninformative as the prior!

Brown cogently gets around this by advocating inverting the Y confidence intervals, i.e. finding those values of X’ for which the observed Y’ lies within say its 95% CI. This requires solving two quadratic equations, and gives an asymmetric CI, with the long tail extending outward from the calibration experience of X. The point estimate is then just the inversion of the regression line of Y on X.

For confidence levels above that with which a zero slope coeffient could be rejected, the Y-CI boundaries are not monotonic in X. In these cases, Brown interprets the CI for X’ to be the whole real axis. This makes sense given we don’t even know which way the slope goes at this confidence level.

I haven’t gotten to Brown’s MV section yet, but the logical extension of the above would be to invert the confidence ellipses for Y’, taking into account the covariance across proxies as well as the uncertainty of each proxy’s regression coefficients. This could be done numerically by checking a grid of X’s, and seeing which fail to reject the observed vector Y’. CCE would evidently give a valid point estimate, but would not tell us what the CI is, not even using the GLS coefficient covariance matrix.

In the MV case, there is likely to be a severe data mining issue, but that goes beyond the present problem of calibrating with a fixed set of proxies.

A Y say 5 sd outside calibration experience can give a point estimate of X that is outside experience as well, but it will be attenuated by a factor of R2, because of the strong pull of the Gaussian prior. But this prior is much too strong for most paleoclimatic questions.

Yes, and in [1] this is called loss of variance. But that’s another funny story.

I havent gotten to Browns MV section yet, but the logical extension of the above would be to invert the confidence ellipses for Y, taking into account the covariance across proxies as well as the uncertainty of each proxys regression coefficients. This could be done numerically by checking a grid of Xs, and seeing which fail to reject the observed vector Y. CCE would evidently give a valid point estimate, but would not tell us what the CI is, not even using the GLS coefficient covariance matrix.

See Brown’s Eq. 2.8. CCE comes into play at Eq. 2.16, confidence region degenerates to CCE.

I would also add a rather special assumption that the targets are PCs. This is rather strange if you think it in terms of assumed calibration experiment model! For instance, tree rings are assumed to be determined by a linear combination of (global) temperature PCs plus noise. Talking about teleconnections…

To obey mike’s order, I’ve re-read MBH9x. Maybe I have advanced a little bit in my ‘Learn MBH9x in few years’-goal. The basic model seems to be actually

where Y is proxy data matrix, matrix D downsamples (by averaging) monthly data to annual, X is the monthly grid-point data matrix, B is the matrix of unknown coefficients and E is noise matrix. So each proxy (at year i) responds to annual average of linear combination of grid-point data (at year i). B is huge matrix, so Mann compresses monthly data utilizing singular value decomposition (relation to principal components regression is obvious, see any textbook on subject), the model at this point is:

Now compress data by truncating, take only k column vectors of U, V (and corresponding largest singular values), and the model becomes

re-write as

where and . Now Z contains the U-vectors you can download from Nature pages pcxx.txt (with special scaling, but that’s not important now). Diagonal of S is in tcpa-eigenvalues.txt, and V-vectors are eofxx.txt .

In textbook principal components regression, takes simple form, as new variables are orthogonal. Here we don’t have that luxury, as Z is made by downsampling the original U vectors (and X-data has values up to 1993 unlike proxy data). I’d still like to know if there is some short form of CCE with unitary Z. Anyway, from this point we’ll just do CCE (with that noise assumption) as explained in above post, and obtain estimate of annual grid-point data, .

Some notes

1) This is just a pre-treatment before CCE, my criticism of variance matching and incorrect uncertainty estimation applies (until you refute them)

It’s part of 500 lines m-file experiment, and too ugly code to publish (at this point). The basic idea was to find that confidence region (year-by-year) by going through all sensible values of one component of X’, while keeping the others at ML-estimate value, i.e. try scanN times

X(:,X_index)=linspace(min_V,max_V,scanN); % other columns of X are filled with corresponding ML estimate

for j=1:scanN
R(j,X_index)=(Y(year,:)’-Bh*X(j,:)’)’*iS*(Y(year,:)’-Bh*X(j,:)’)/(1+1/NCal+X(j,:)*inv(Xc’*Xc)*X(j,:)’);
end

then find the values of R that are less than right-hand side of the equation,

Now first and last value of i95 define 95 % limit for that component, approximately, i.e region should be ellipsoid, and the ellipsoid axes should be parallel to the coordinate axes. Works well with noise simulations, so that’s enough for me. Don’t use in life-critical applications, though 🙂

And no complaints, please, this is voluntary work, there are some secret plans to publish OS MBH99 Matlab project, and even this explanation is better than the one for MBH99 CIs 😉

CIs were for reconstructed PCs (rpc0x.txt in here , in #51 ). rpc to NH mean is just a linear transformation, so I used normal error propagation procedure to obtain CIs for NH mean. Slightly fiducial, but better than Mann’s best effort. And then you need to assume that basic model holds, and then you read CA..

So what we have here in Figure 3 is a graphic showing cross-validation R2 statistics by gridcell for the AD1820 step which has 112 “proxy” series and very different success than the controversial 15th century step.

UC I’m trying to understand the CCE method at the moment but I was curious if you ever tried to apply it where X is several columns which include the grid point temperatures and precipitation levels along with the CO2 concentration. Then from estimating X, average the values of T to get the mean global temperature.

If we don’t know are the proxies responding to solar or temperature or precipitation or CO2, then we should include all those in X. But I have a feeling that this is not the right way to go. More physiology and local reconstructions. That would leave less weight to the statistical method applied.

#62, I think local reconstruction is the way to go because as you have stated adding more proxies increases the error bounds. I don’t think that the other extreme where one proxy is used to measure one temperature is desirable either. What should be done is a temperature point on the globe should be chosen, then enough available proxies near that point should be chosen so that the best balance between the size of the error bars and the size of the residual. My hunch is if you do this you won’t be using proxies from half way around the world.

UC and Jean S —
I take back what I said in #48 about forming multiproxy confidence intervals as in Brown’s 2.8 or UC’s post 50 above. In fact, these intervals are all wrong!

To see the problem, fast forward to p. 320 of Brown’s discussion of the comments immediately following his article. He writes,

Mr. Aitchison alludes to the strange properties of the sampling theory confidence intervals when q > p given in Section 2.2 under Theorem 1 (ii) ….
A simple example may serve to clarify the issue. Suppose all the regression parameters are known, as follows when n -> infinity in Section 2. Furthermore if q [the number of proxies] = 2 and p [the number of variables to be estimated] = 1, Gamma = I, and [omitting Brown’s primes for simplicity],
Y1 = x + e1
Y2 = x + e2,
then the predictive distribution of (Y1-x)^2 + (Y2-x)^2 under normal theory, is chi-squared on two degrees of freedom. The proposed sampling theory approach gives a 95% confidence interval for x as all x such that
(Y1-x)^2 + (Y2-x)^2 6.0, but that must be a typo],
where 6.0 is the upper 5 per cent point of a chi-squared on two degrees of freedom. The confidence interval is -sqrt(2) Y = (1,-1)^T and vanishes to the point x = 0 when Y = (sqrt(3),-sqrt(3))^T.
The likelihood and Bayes approaches do not behave in this way….

But in fact this is not the sampling theory approach. Define Chisq(x) = (Y1-x)^2 + (Y2-x)^2. This represents the “unlikelihood” as it were of Y1 and Y2 given x, and in fact is -2*log L(x) (+ const). Minimizing this unlikelihood, or equivalently, maximizing the likelihood, gives the sampling theory estimator Ybar = (Y1 + Y2)/2. Any Stats 101 student can tell you that Ybar-x ~ N(0, 1/2), whence t(x) = (Ybar-x)/sqrt(1/2) ~ N(0,1), or t(x)^2 ~ chi-squared(1DOF). The proper sampling-theory way to form a confidence interval for x about Ybar is using t(x) (or equivalently t(x)^2), not with Chisq(x).

All Chisq(x) tell us is whether (Y1, Y2) is ex ante likely to happen or not. What we want to know is, given that (Y1, Y2) did in fact happen, what must x have been? There is no point in shaking your fist at the storm just because your model told you that it was statistically impossible (or virtually impossible). Given that the storm is occurring anyway, you may as well come in out of the rain and try to figure out what the combination of state variables must have been that most likely allowed it to occur.

However, it turns out that t(x)^2 identically equals Chisq(x) – Chisq(Ybar), and thus, when everything is Gaussian, the LR statistic as well. Thus it is not Chisq(x) itself, but rather the change in Chisq(x) from its minimal value that should govern the confidence inverval on x! When the variances are estimated, the chisquared stats become F stats, and presumably it is the change in the F stat from its minimal value that should govern the confidence interval on x, not the F stat itself.

When p = 1, Ybar = Y1, so Chisq(Ybar) = 0, and the change in Chisq(x) equals Chisq(x) = t(x)^2 itself. So there really is no difference between p = 1 and p > 1.

If everything is Gaussian and the parameters and variances are perfectly known, the posterior density, likelihood function, and sampling theory all give the same answers under a diffuse prior for x. When the parameters are estimated, there is the problem that the posterior density for x doesn’t integrate for p = 1, so that inverting the confidence intervals for Y as a function of x must be used. However, for p > 2, this problem goes away, as Brown himself notes on p. 297. For p = 1, the posterior density has hyperbolic power tails that behave like |x|^(-1). But for p = 2, the tails behave like |x|^(-2), which integrates. This density is Cauchy-like in that it does not have a mean, let alone a variance, but it can still be used to compute good confidence intervals. In general, the tails are like |x|^(-p), so that the posterior mean is well defined for p > 2 and the variance exists for p > 3. It becomes a matter of taste whether to use the change in the F-stat as above or the posterior to form confidence intervals, but it wouldn’t hurt to compute both.

Of course, it is impossible that global temperatures at any time in the past 10000 years were either below absolute zero or hot enough to melt the Rockies, so that a completely diffuse prior is not really justified. Nevertheless, any reasonable limits are bound to be subject to criticism. It’s best to approach the data with a completely open mind about x (if not the model), so that no one can say that you pre-supposed your answer.

Brown in fact prefers ICE (regressing X on Y) for his examples, but in the first one, at least, he has randomly removed (X, Y) pairs from his calibration sample to be used for illustrative estimation. In doing, this, the prior for X’ becomes the distribution from which the calibration X’s were drawn, and so ICE really is appropriate. A diffuse (uninformative) prior for X’ requires CCE. (I haven’t figured out his second example yet).

In quoting Brown above, I forgot that you can’t use Less Than signs in HTML, since it thinks these are tag names. Using .LT. for Less Than etc, his criterion, with my parenthetical remark in brackets, should read

The proposed sampling theory approach gives a 95% confidence interval for x as all x such that
(Y1-x)^2 + (Y2-x)^2 .LT. 6.0 [Brown in fact writes .GT. 6.0, but that must be a typo],etc.

You’re right — I wasn’t minding my p’s and q’s correctly! Brown’s simple example involves two proxies, with a single response each. However, to keep it simple he is assuming that the estimated intercepts happen to come out to 0 and the estimated slopes happen to both come out to 1, so that it does look like two responses to a single x’ (and should therefore give the same CI!).

Here is a version that corrects this, as well as another HTML error that I didn’t catch:

UC and Jean S —
I take back what I said in #48 about forming multiproxy confidence intervals as in Brown’s 2.8 or UC’s post 50 above. In fact, these intervals are all wrong!

To see the problem, fast forward to p. 320 of Brown’s discussion of the comments immediately following his article. He writes,

Mr. Aitchison alludes to the strange properties of the sampling theory confidence intervals when q > p given in Section 2.2 under Theorem 1 (ii) ….
A simple example may serve to clarify the issue. Suppose all the regression parameters are known, as follows when n -> infinity in Section 2. Furthermore if q [the number of proxies] = 2 and p [the number of variables to be estimated] = 1, Gamma = I, and [omitting Brown’s primes for simplicity],
Y1 = x + e1
Y2 = x + e2,
then the predictive distribution of (Y1-x)^2 + (Y2-x)^2 under normal theory, is chi-squared on two degrees of freedom. The proposed sampling theory approach gives a 95% confidence interval for x as all x such that
(Y1-x)^2 + (Y2-x)^2 .LT. 6.0,
where 6.0 is the upper 5 per cent point of a chi-squared on two degrees of freedom. [Brown in fact writes .GT. 6.0, but that must be a typo.] The confidence interval is (-sqrt(2), sqrt(2)) when (Y1, Y2) = (1, -1) and vanishes to the point x = 0 when (Y1, Y2) = (sqrt(3), -sqrt(3)).
The likelihood and Bayes approaches do not behave in this way….

But in fact this is not the sampling theory approach. Define Chisq(x) = (Y1-x)^2 + (Y2-x)^2. This represents the “unlikelihood” as it were of Y1 and Y2 given x, and in fact is -2*log L(x) (+ const) when the errors are Gaussian. Minimizing this unlikelihood, or equivalently, maximizing the likelihood, gives the sampling theory estimator Ybar = (Y1 + Y2)/2. Any Stats 101 student can tell you that Ybar-x ~ N(0, 1/2), whence t(x) = (Ybar-x)/sqrt(1/2) ~ N(0,1), or t(x)^2 ~ chi-squared(1DOF). The proper sampling-theory way to form a confidence interval for x about Ybar is using t(x) (or equivalently t(x)^2), not with Chisq(x).

All Chisq(x) tell us is whether (Y1, Y2) is ex ante likely to happen or not. What we want to know is, given that (Y1, Y2) did in fact happen, what must x have been? There is no point in shaking your fist at the storm just because your model told you that it was statistically impossible (or virtually impossible). Given that the storm is occurring anyway, you may as well come in out of the rain and try to figure out what the combination of state variables must have been that most likely allowed it to occur.

However, it turns out that t(x)^2 identically equals Chisq(x) – Chisq(Ybar), and thus, when everything is Gaussian, the LR statistic as well. Thus it is not Chisq(x) itself, but rather the change in Chisq(x) from its minimal value that should govern the confidence inverval on x! When the variances are estimated, the chi-squared stats become F stats, and presumably it is the change in the F stat from its minimal value that should govern the confidence interval on x, not the F stat itself.

When q = 1, Ybar = Y1, so Chisq(Ybar) = 0, and the change in Chisq(x) equals Chisq(x) = t(x)^2 itself. So there really is no difference between q = 1 and q > 1.

If everything is Gaussian and the parameters and variances are perfectly known, the posterior density, likelihood function, and sampling theory all give the same answers under a diffuse prior for x. When the parameters are estimated, there is the problem that the posterior density for x doesn’t integrate for q = 1, so that inverting the confidence intervals for Y as a function of x must be used. However, for q > 2, this problem goes away, as Brown himself notes on p. 297. For q = 1, the posterior density has hyperbolic power tails that behave like |x|^(-1). But for q = 2, the tails behave like |x|^(-2), which integrates. This density is Cauchy-like in that it does not have a mean, let alone a variance, but it can still be used to compute good confidence intervals. In general, the tails are like |x|^(-q), so that the posterior mean is well defined for q > 2 and the variance exists for q > 3. It becomes a matter of taste whether to use the change in the F-stat as above or the posterior to form confidence intervals, but it wouldn’t hurt to compute both.

Of course, it is impossible that global temperatures at any time in the past 10000 years were either below absolute zero or hot enough to melt the Rockies, so that a completely diffuse prior is not really justified. Nevertheless, any reasonable limits are bound to be subject to criticism. It’s best to approach the data with a completely open mind about x (if not the model), so that no one can say that you pre-supposed your answer.

Brown in fact prefers ICE (regressing X on Y) for his examples, but in the first one, at least, he has randomly removed (X, Y) pairs from his calibration sample to be used for illustrative estimation. In doing, this, the prior for X’ becomes the distribution from which the calibration X’s were drawn, and so ICE really is appropriate. A diffuse (uninformative) prior for X’ requires CCE. (I haven’t figured out his second example yet).

[end of corrected earlier post]

I don’t pretend to understand how MBH actually computed their reconstructions from their complicated data base, but the above remarks would apply to CCE calibration in general, which I gather they sort of used. Of course, if they assumed their proxies were orthogonal and they in fact were not, that would further understate the CI’s.

I’m currently playing with trying to calibrate Thompson’s 2006 PNAS ice core data to HadCRUT3-GL, in order to find out what the calibrated temperatures and CI’s look like. This is a much simpler problem, or at least should be, were it not for the fact I have noted elsewhere that there are no fixed coefficients that will generate his Himalayan composite Z-mometer from the raw data for the 4 component sites…

However, to keep it simple he is assuming that the estimated intercepts happen to come out to 0 and the estimated slopes happen to both come out to 1, so that it does look like two responses to a single x’ (and should therefore give the same CI!)

IMO the distinction here is relevant. In Brown’s example, we know that the distribution of Chisq(x) statistic is Chi-squared on two degrees of freedom. In addition, as you show, we know that distribution of t(x) is standard normal. Now we can form confidence interval using both methods,

for the former case, and are the roots of polynomial

and for the latter case

and

There is nothing wrong with the former CI, except maybe that it is, on the average, longer interval than the latter. But that’s probably because in the example scales (B) were equal, and we use that information when forming the CI.

Yes, you are right that the chi-squared CI can sometimes be longer. This occurs when the proxies are relatively in accord, the most extreme case being Y1=Y2. Then, I was right that chisq(x) = t(x)^2. However, the CI’s are different, because chisq(x) is being given 2 DOF (with a 5% cv of 5.99), whereas t(x)^2 is being treated as chi-squared with only 1 DOF (cv = 1.96^2 = 3.84). In this extreme case, the chi-squared CI is Ybar +/-1.73, while the t-based CI is Ybar +/-1.39.

When Y1 and Y2 are typically divergent, as in Brown’s example (1, -1), both intervals are essentially (identically?) the same, namely Ybar +/-1.4. But when Y1 and Y2 clash, the chi-squared interval becomes smaller, and quickly (at -1.73, 1.73) shrinks to a single point and then disappears completely into ImaginaryNumberLand.

So yes, either could be larger, but no, chisq(x) never gives us the right CI except by accident. When the CI is a single point, that does not mean that we are any more certain about where x is than when the proxies are just by accident in perfect accord. And when the CI is a pair of complex numbers, that is not just “strange”, as Aitchison put it, but downright wrong.

Of course all this takes the parameter estimates as given (if only imprecisely as in the more realistic case in Brown’s Section 2). If Y1 and Y2 gave really divergent estimates of x, and we were more ambitious, we could use this information to revise our estimate of the regression variances and therefore of the standard errors of the coefficients. Or even more ambitiously, we might even consider a non-Gaussian leptokurtic distribution for the errors such as the Levy stable class (my favorite). But the standard simple case is just to assume the errors are appoximately Gaussian and then to use the calibration period (only) to estimate the variances.

If the estimated slope coefficients on the two proxies turned out to be different, the same considerations would still apply.

So yes, either could be larger, but no, chisq(x) never gives us the right CI except by accident. When the CI is a single point, that does not mean that we are any more certain about where x is than when the proxies are just by accident in perfect accord.

If the model is correct, that CI (based on roots of the polynomial) is, in classical sense valid CI. Sometimes that CI might be a single point, the classical approach accepts this ( we shall be right in about 95 % cases in the long run ). With fiducial approach and Bayesian approach the assertion is different. So, if we accept classical CI approach, there’s nothing wrong with Brown’s Eq. 2.8 (or am I missing something? )

Mike B, Thanks for the link, very interesting paper! The point that we have 10^7 computational advantage over Fisher is worth a thought. For example, to seek for an answer to the correctness of CIs in #69, I can run this code with 100000 repetitions, to obtain a result

UC, I located the Brown & Sundberg 1989 Techometrics paper referenced in your Re #35, but that paper doesn’t have an eq 2.8 or 2.16 and doesn’t cover your above discussions in Re #49 and #50. Are you referring to another one of his papers?

This shows empirically that both formulas give the right coverage for their respective questions, yet the answers are different.

Yes, and of course it makes sense to choose the one that gives the shortest interval (on the average). I have no doubt that the second method is better, maybe even uniformly most accurate CI. But it is obtained assuming perfect calibration ( * ), while the first method is a special case of Brown’s Eq 2.8. From time to time it gives empty sets for CI, and maybe then it is a good procedure to change the CI method post hoc ( conditional reference sets or whatever). But IMO it is incorrect to say that CIs by Method 1. are wrong (as you say in #68).

Re #68, 70, 76, 77, here’s my proposal for a Multiproxy generalization of the univariate CI that (unlike Brown’s!) never comes up empty or imaginary:

Simplifying Brown to the case p = 1, assume
Yi ~ alpha + beta xi + N(0, Gamma), i = 1, … n
in the calibration period, where Yi is the i-th (qX1) vector of proxies, xi is the i-th observation on temperature (or whatever), and Gamma is a qXq covariance matrix. (I’m too lazy to set subscripts and greek letters in tex, particularly without a previewer!) Let x = (x1, … xn)T be the nX1 vector of temperature observations, where T represents Transpose. For convenience, suppose x has been demeaned as in Brown.

Let a and b be the single equation OLS estimates of alpha and beta, and G be an appropriate estimate of Gamma (see below). Confirm that there is no significant serial correlation in the errors.

Since b ~ beta + N(0, Gamma/xTx), we may test beta = 0 (in all its components) with
(xTx)bT G^-1 b.
This is approximately chi-squared(q), and in some cases exactly qF(q,d) for some appropriate DOF d. (see below) If (and only if) we can reject beta = 0 at some sufficiently small test size pmin, we may proceed with prediction.

Let Y’ be the qX1 vector of proxies in a given prediction period, and x’ be the unobserved scalar temperature to be estimated. Then
Y’ ~ a + b x’ + N(0, sigma^2(x’)Gamma),
where sigma^2(x’) = 1 + 1/n + x’^2/xTx.

Furthermore, since the scalar bT G^-1 (Y’-a) is monotonically increasing in x’ (at least to the extent that we’re confident beta is not 0), we may use this scalar to form a non-empty (1-p) confidence interval for x’ that generalizes the univariate case for any p > pmin. Substituting G for Gamma, we have at least approximately
bT G^-1 (Y’-a) ~ (bT G^-1 b) x’ + sigma(x’) sqrt(bT G^-1 b) T(d),
where T(d) has at least approximately student t distribution with the same appropriate degrees of freedom d used for the F test above. If tc is the critical value for the desired confidence level, the confidence bounds xCI may be found by solving the quadratic equation
x’hat = xCI +/- sigma(xCI) tc / sqrt(bT G^-1 b)
for its two real roots.

If Gamma = gI for some scalar g, then g may be estimated by pooling all nq errors and the test and CI are exact with d = q(n-2) DOF. Furthermore, according to Brown (if I understand his rather complicated Wishart argument correctly), if Gamma is estimated without restriction by S/(n-q-1), where S = ETE and E is the nXq matrix of all the calibration errors, then d = (n-q-1) and the F-test and t-based CI are again exact.

If q >= n, S is not of full rank and we can’t use it to estimate Gamma without some additional restrictions. However, we don’t have to go all the way to Gamma = gI. For example, we might estimate the diagonal elements without restriction, but then model all the correlations with just a couple of free parameters. In this case I don’t think there is an exact distribution, but it is asymptotically normal, and probably the F and t distributions with d = n-2 or so would be a very good approximation.

I’m planning to use this to calibrate Lonnie Thompson’s 6-core data from CC 03 to temperature. Any input would be welcome!

The issues with multivariate calibration -based reconstruction uncertainties in MBH98, MBH99 and many more reconstructions, are independent of PC-stuff and other proxy problems discussed at CA. That is, if you spend many blog threads to explain that treeline11.dat and pc01.out are ok to use, you still need some 10 more blog threads (*) to explain why MBH9x does not use conventional multivariate calibration methods, well described in statistical literature.

Note how latest IPCC report deals with this issue (CH6 page 472):

The considerable uncertainty associated with individual reconstructions (2-standard-error range at the multi-decadal time scale is of the order of ±0.5°C) is shown in several publications, calculated on the basis of analyses of regression residuals (Mann et al., 1998; Briffa et al., 2001; Jones et al., 2001; Gerber et al., 2003; Mann and Jones, 2003; Rutherford et al., 2005; D’Arrigo et al., 2006). These are often calculated from the error apparent in the calibration of the proxies. Hence, they are likely to be minimum uncertainties, as they do not take into account other sources of error not apparent in the calibration period, such as any reduction in the statistical robustness of the proxy series in earlier times
(Briffa and Osborn, 1999; Esper et al., 2002; Bradley et al., 2003b; Osborn and Briffa, 2006).

Re #78 etc, these formulas have two important but easy special modifications.

First, if one or more proxies are missing for a particular reconstruction date, there is no need to recalibrate the whole system (as there would be with direct multiple regression of temperature on the proxies). All you need to do is omit the relevant elements of a and b (which, I neglected to mention are qX1 vectors to start with, as are alpha and beta), and also remove the corresponding rows and columns of G before inverting it. This will gradually widen the CI’s as proxies drop out, but there is no need to have a sudden change in the system every couple of centuries (as I gather MBH did).

And second, if you smooth the reconstruction by averaging over m periods, the reconstruction using the smoothed proxies is the same as smoothing the reconstruction using the unsmoothed proxies. However, the CI of the smoothed reconstruction is considerably smaller than what you would get if you just smoothed the CI bounds of the unsmoothed reconstruction. The coefficient uncertainty, measured by 1/n and x’^2/xTx in sigma^2(x’), is the same in both cases, but the lead term 1 that represents the disturbance uncertainty gets replaced by 1/m, assuming as in Brown’s benchmark case, that the disturbances are serially uncorrelated.

(Note that the X’s and Y’s can both be highly serially correlated without the disturbances being serially correlated. I don’t know about annual treerings, but this is the case with a few decadal ice core series I have looked at.)

Confirm that there is no significant serial correlation in the errors.

If (and only if) we can reject beta = 0 at some sufficiently small test size pmin, we may proceed with prediction.

See Mann07,

Under the assumption of moderate or low signal-to-noise ratios (e.g., lower than about SNR 0.5 or “80% noise”), which holds for the MBH98 proxy network as noted earlier, the value of p for the “noise” closely approximates that for the “proxy” (which represents a combination of signal and noise components).

Empty CI and infinite (or huge) CI are separate issues. Brown’s CI can come up empty because it asks the wrong question, namely whether the observation was possible or not in terms of the model. In this case, even the point estimate should be rejected if this is your question, IMHO.

My CI (in #78) instead asks how high or low could x have been, given the observation that somehow or other did occur. This can be huge as in the univariate case, and will even be unbounded on at least one side if you ask for a confidence level that exceeds your confidence that the relationship is invertible (ie beta /= 0).

Note also that p=1 does not hold in MBH9x, except for AD1000 and AD1400 steps.

In MBH9x, I can see that each proxy should correlate better with its own local temperature than with global (or NH) temperature, but does this step (with p > 1) really serve any purpose if all that is ultimately desired is an estimate of global (or NH) temperature?

I’m planning to try it out with Thompson’s 2006 CC data and will let you know what turns up. So far it looks like there really is a HS if you use his choice of core for each site, albeit a very noisy one.

That’s true. More correct way to represent those empty cases would be no CIs at all ( classical interpretation, #71 ). That post needs some updating, but the main point remains, calibration-residual based CIs are not valid. And IPCCs explanation with ‘considerable 0.5 deg’ and ‘minimum uncertainties’ is just silly.

In MBH9x, I can see that each proxy should correlate better with its own local temperature than with global (or NH) temperature,

Not necessarily local correlation, it can be teleconnected to anywhere.

but does this step (with p > 1) really serve any purpose if all that is ultimately desired is an estimate of global (or NH) temperature?

In fact, since b is random (and perfectly correlated with b itself), it’s not really legitimate to set c = bT g^-1, or at least when we do bT g^-1 (Y’-a) does not really have a normal distribution. Rather it the convolution of a non-central chi-squared distribution with q DOF with the normal distribution of the disturbance term, which is way to messy for me.

So I’ll retract my proposal in #78. However, I still think Brown’s sometimes empty CI answers the wrong question. The answer must lie in the change in the F statistic from its minimum (as in #68 above), the LR statistic (which may amount to the same thing as the change in F asymptotically), or the posterior distribution (calculable for q > 2).

In the multivariate calibration problem using a multivariate linear model, some conservative confidence regions are constructed. The regions are nonempty and invariant under nonsingular transformations
…

We see from Table 2 that the likelihood-based region due to Brown and Sundberg (1987) is the shortest, followed by our region (4.9) and Brown’s (1982) region (in cases where it is nonempty). Note that Brown’s region is exact and hence its coverage probability is the assumed confidence level of 95%.