Editor’s note: this is a technical post and comments will be tightly moderated for tone and content.

No circularity

It has been alleged that in Marotzke & Forster (2015) we applied circular logic. This allegation is incorrect. The important point is to recognise that, physically, radiative forcing is the root cause of changes in the climate system, and our approach takes that into account. Because radiative forcing over the historical period cannot be directly diagnosed from the model simulations, it had to be reconstructed from the available top-of-atmosphere radiative imbalance in Forster et al. (2013) by applying a correction term that involves the change in surface temperature. This correction removed, rather than introduced, from the top-of-atmosphere imbalance the very contribution that would cause circularity. We stand by the main conclusions of our paper: Differences between simulations and observations are dominated by internal variability for 15-year trends and by spread in radiative forcing for 62-year trends.

Specifics

Our paper relies on one piece of explicit (deterministic) physics, namely energy balance (conservation). We maintain that the best way to represent this deterministic physics during a period of warming caused by radiative forcing F is through the model properties α and κ (see final paragraph below), in addition to F. This well-established tenet leads us to formulate the Earth’s energy balance in the form:

ΔT = ΔF / (α + κ) (1)

where ΔT and ΔF are (linear) trends over a specified period of global-mean surface temperature and radiative forcing, respectively. We reason in our paper why equation (1) motivates the linear regression that quantifies the contributions of across-ensemble variations of ΔF, α, κ, and furthermore of internal variability, to ΔT in the CMIP5 ensemble. Internal variability is represented as an additional error term in (1). Our linear regression motivated by (1) and applied to all possible trends during the historical period is a major methodological innovation of our paper, irrespective of how the numbers on the right-hand side are obtained.

The issue that was brought up relates to how we obtained the time series of simulated radiative forcing F from which to calculate the trends ΔF. Unfortunately, radiative forcing for the historical period is not straightforward to diagnose from a model. However, the top-of-atmosphere radiative imbalance N (downward minus upward radiation) is available for many models. This radiative imbalance arises originally from the radiative forcing from increasing greenhouse gas concentrations and other influences. But N and F are not equal, because the surface warms (say, by an amount T), and more radiation is emitted back to space. This amount of radiation is usually expressed as ΔT, such that we have:

N = F – αT (2)

Because N is readily available but F is not, Forster et al. (2013), from where the time series of F were taken, used the pre-determined model property α to obtain F by:

F = N + αT (3)

using the N and T that they diagnosed from simulations of the 20th century. This is a correction that needs to be applied to N so that one obtains the radiative forcing. On the right-hand side of (3), the two terms are of comparable magnitude over multi-decadal timescales, and the first term dominates over 15 years. Not correcting for the increased back radiation would, on physical grounds, imply using N, which contains the very contribution from the surface response T that we must eliminate in our estimate of F.

Of course one could legitimately ask how accurate this correction is, and we would hope that in future generations of coordinated model simulations a better direct diagnostic of F is possible. But for the CMIP5 models used in our study and in Forster et al. (2013), applying equation (3) has been the only approach possible. Forster et al. (2013) performed a number of tests of their procedure and found it to be adequate to produce time series of radiative forcing. However, it is possible that owing to an imperfect correction, our attribution of temperature trends to either forcing trend or feedback contains some ambiguity. This ambiguity is not new. In fact there is no unambiguous way of splitting forcing and feedback, and this remains a problem that the climate research community has grappled with for some time (e.g., Hansen et al. 2005, Myhre et al. 2013).

For understanding the results and limitations of our paper, it is crucial to appreciate that this ambiguity only refers to the contributions by the different elements of deterministic physics that we identify, and not to the total of deterministic (regression) contributions. In particular, our conclusion about the magnitude of internal variability in surface-temperature trends (dominant for 15-year trends, substantial for 62-year trends) is insensitive to the ambiguity.

As we have shown in our paper, our novel estimates of internal variability in the CMIP5 ensembles are consistent with completely different approaches, for both 15-year and 62-year trends.

We note that the order of magnitude of our diagnosed α contributions to spread in 62-year temperature trend are consistent with what is expected from equation (3) in Marotzke and Forster (2015). We notice further that from equation (1) here, it is obvious that ΔT varies proportionally with ΔF whereas ΔT varies less than proportionally with either α or κ (unless κ becomes very small, close to a new equilibrium, in which case ΔT varies inversely proportionally with α); this provides ready explanation for a lesser role of ensemble spread in α or κ over the historical period, compared to ensemble spread of radiative forcing. This counters the spurious argument outlined by Lewis for rejecting a small role for α on purely physical grounds.

Additionally, Lewis claims that the values for climate feedback parameter α and ocean heat uptake efficiency κ are so uncertain as to render them useless. But the α and κ values we use were diagnosed previously using established methods, relying on strongly forced, idealized model simulations (Andrews et al. 2012, Kuhlbrodt & Gregory 2012; Vial et al. 2013, Flato et al. 2013). These approaches and simulations are defined such that α and κ can be viewed as being model properties. By contrast, Lewis used historical simulations in trying to diagnose α and κ.

At the 2014 AGU Fall Meeting it was shown independently by Kyle Armour (MIT), Drew Shindell (Duke University), and Piers Forster that over the historical period these quantities change over time. Hence, their diagnosis from historical simulations is highly uncertain. This also supports the physical explanation as to why α and κ have a small role in determining model spread that Lewis did not understand. The small spread supports the reasoning that unique values of α and κ do not well characterise 20th century trends.

Flato, G., and Coauthors, 2013: Evaluation of Climate Models. Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change, T. F. Stocker, and Coauthors, Eds., Cambridge University Press, 741-866.

About Ed Hawkins

Climate scientist in the National Centre for Atmospheric Science (NCAS) at the University of Reading. IPCC AR5 Contributing Author. Can be found on twitter too: @ed_hawkins
View all posts by Ed Hawkins →

Post navigation

148 thoughts on “Marotzke & Forster response”

The bit I got stuck on reading your paper was this: why do you need to estimate the multiple regression coefficients? According to your linear expansion of equation (3), can’t you calculate the beta values directly from the ensemble means of delta F, alpha and kappa?

Good point. We used equation (3) for conceptual guidance, suggesting which predictor variables to use in the regression, rather than wanting to rely on its quantitative correctness. And given what we’ve since learned about alpha and kappa during the 20th century (cf., 2014 AGU meeting), this seems like a sensible choice. But there is certainly room for follow-on studies here.

I still don’t really understand what the linear expansion is doing. Wouldn’t the linear expansion of any function just give you the same multiple regression equation? I would think that the linear expansion you’ve used would at least suggest a restriction beta_2 = beta_3.

I thought Lewis was making two points:
– in the M&F earth energy balance equation dT = dF / (α + κ), dT is actually a hidden component of dF i.e it’s appearing on both sides of the equation (like saying 3 = 10-10+3). So the equation is flawed from the start and any regression using it will be equally flawed such that no substantive conclusions can be reached based on the regressions; and
– the values for a and k have been chosen by M&F arbitrarily and it’s possible to get wildly different values using historical simulations. So, again, no substantive conclusions can be reached based on the regressions.

On reading this response it seems to me:
– M&F have nothing to say about the both-sides-of-the-equation issue;
– they just prefer their arbitrary choice for a and k.

“Because radiative forcing over the historical period cannot be directly diagnosed from the model simulations, it had to be reconstructed from the available top-of-atmosphere radiative imbalance in Forster et al. (2013) by applying a correction term that involves the change in surface temperature. This correction removed, rather than introduced, from the top-of-atmosphere imbalance the very contribution that would cause circularity.

in the M&F earth energy balance equation dT = dF / (α + κ), dT is actually a hidden component of dF i.e it’s appearing on both sides of the equation (like saying 3 = 10-10+3). So the equation is flawed from the start and any regression using it will be equally flawed such that no substantive conclusions can be reached based on the regressions

It’s not correct that dT is a hidden component of dF. The external forcings are imposed on the climate models – they’re external, they don’t depend on T by definition. However, it’s quite tricky to work out what dF actually is because it’s both self-consistently determined in the models (i.e., the models calculate the radiative influence of the different external effects) and the model is also responding to these external forcings (i.e., the external forcing is essentially defined as being the radiative effect in the absence of a temperature response).

So, how to you work out what dF actually is in each model? You just use energy conservation. If there is a change in external forcing of dF in a model with a climate sensitivity of alpha, then if the temperature response is dT, the change in the top-of-atmosphere energy imbalance well be dN where,

dN = dF – alpha dT.

So, if you know (from your model output) what dN and dT are, you can get dF from

dF = dN + alpha dT.

That may look as though dF depends on dT, but it doesn’t. If internal variability were to produce a small fluctuation if dT, then dN would also change in response to that and dF would remain fixed. So, just because dN and dT are being used to determine dF, does not mean that dF is somehow dependent on dT. So, the claim that the analysis is circular is – as far as I can tell – wrong.

Well, that’s my understanding. Those who know better than me are welcome to correct any of my misunderstandings.

If it helps:
Nic is right that ΔT does appear on both sides, we are not arguing about this we are arguing about the implications.

We see the method as a necessary correction to N, to estimate the forcing, F. This is what we are looking for in the model spread, not the role of N – it would be more circular to use N as this contains a large component of surface T response.

As Piers and others have said – the crucial aspect is that N (as calculated in the models) contains a component related to T. So, the correction made to N to obtain F is trying to remove that T influence, rather than adding it.

Ideally F would be diagnosed separately from the models, but it isn’t (generally), so we have to use the correction to get a best estimate of F – which doesn’t depend on T, even though it looks like it from the way the equations are written. This correction will not be perfect but has been shown to work well in more idealised situations where we have better estimates of F, as in the link that Piers provides.

is it possible to illustrate the (lack of) circularity by showing the correlation between the estimated forcing, F, and the simulated temperature, T? If this correlation is small, then does that demonstrate the claim of circularity to be unimportant?

It might appear from

F = N + αT

that F and T will be correlated, but this isn’t necessarily the case if N and T are themselves (anti)correlated. Would the correlation between diagnosed F and simulated T help demonstrate this issue? Or am I barking up the wrong tree?

You’d think that if the climate models are conserving energy, that N and T would have to be (anti)correlated. Or, that for a fixed change in forcing F, N + αT is constant for all realistic values of T. However, it might be good to illustrate this.

Tim, through the energy budget all these terms are related, so you expect correlations. Therefore it is difficult to gauge the quality of our F estimate, although we did the best job we could in my earlier paper – and it seems robust.
On 15 year periods random fluctuations in N dominate and this is not an issue

Model N and T trends are positively correlated across the models over most 62 year periods during which trends are significant: as to +0.62, +0.74, +0.56 and +0.42 starting in respectively 1921, 1931, 1941 and 1951. It would be surprising if they weren’t, as the kappa model assumes that dN = kappa*dT

Nic is right that ΔT does appear on both sides, we are not arguing about this we are arguing about the implications.

Whatever the physics rationale, this is a bad setup for a regression. If your results are stable, then you should be able to get similar results via Nic Lewis’ equation (6) of the CA article – an equation in which Nic attempted to cancel out factors occurring on both sides.

It seems to me that the usual (if not universal) statistical approach to this dataset (given its warts) would be to begin estimation via equation (6), cancelling what can be cancelled, rather than trying to rationalize estimations containing elements on both sides that jeopardize any meaning to the regression.

According to Nic’s calculations as presented, regressions using his equation (6) yield entirely different results – a result that seems to me to be fatal to the assertions of the article . Thus far, no one here has confronted this problem.

Nic’s equation (6) has some similarity with the formula I present in a comment higher up. The way I write the formula tells that the circularity does not make the regression unstable, All regression coefficients can be determined as well as in absence of the circularity.

What may remain is an greater uncertainty in the reconstruction of the temperature from the results of the regression, because the coefficient of the temperature may happen to be very small.

I think Nic’s equation 6 not only makes it worse because dN depends on dT while dF does not, but also misses the whole point of the Marotzke & Forster analysis. The goal is to determine the externally forced trend and a residual due to internal variability.

If I have this right then I also think Nic’s suggestion is actually wrong. If you use dN to give you dT in the way Nic suggests, then using dT = kappa dN is not the forced response, it’s the actual temperature change and includes the forced response and the variability. Hence adding epsilon to this makes it wrong. It also doesn’t allow you to distinguish between the forced response and the variability, which was the point of Marotzke & Forster. At least, I think this is right.

Even if one dismisses Nic’s Eq. (6) [wherein he attempted to simplify the calculation in order to remove the purported circularity] is it the case that his Eq. (5) [from which he actually derives his regression] is also an incorrect analysis?

There obviously seems to be a difference between how a physicist and a statistician approach statistical analysis. It seems to me that the physicists are to some extent hypothesizing a can-opener.

But watch what happens (as I understand it and I haven’t parsed it) if you start from the data: in this case, what you have are the series N and T. ATTPhysics says: dN depends on dT while dF does not. Well, if dN depends on dT, that’s precisely the sort of thing that you want in a statistical relationship. Rather than being ill-suited to regression, isn’t it ideally suited to regression? ATTP’s comment seems to misunderstand the entire purpose of statistics.

From a physics point of view, you may want to add alpha*T to N get F, but from a data/statistics point of view, the two series: T and N+alpha*T, are going to be related by construction. Even if there is a real relationship somewhere, you won’t be able to disentangle it from the tautological relationship created by construction.

Steve,
I’ve responded to this on your blog too, but it seems to me that you’ve suddenly turned this into a statistics problem. It isn’t. The goal was to determine the externally forced and internal variability contributions to the model trends for different time intervals. To do this you need to have estimates for the external forcings (which is done in Forster et al. 2013). You then take these forcings and use them to determine the forced trends and, by comparing with the actual trends, estimates for the contribution due to internal variability.

You appear to be suggesting that it would be better to use dN rather than dF because it suits the statistics better. Well, you can’t use dN to determine the forced trend so whether it suits the statistics better is really rather irrelevant.

HAS,
I wasn’t arguing against statistics. I was simply pointing out two things. The suggestion that dT appears on both sides because dF depends on dT is incorrect, because if climate models conserve energy, then the quantity

dN + adT

does not depend on dT. It depends only on the change in external forcing, dF, which is external and, hence, by definition does not depend on dT.

I was also pointing out above, that replacing

dT = dF/(alpha + kappa)

with

dT = dN/kappa

doesn’t work because the goal of Marotzke & Forster is to determine the range for the forced trend and the range for internal variability. Since dN/kappa is not the forced trend, you can’t use it in place of dF/(alpha + kappa). The forced trend depends on F, so if you want to determine the range for the forced trend, you need to use F.

Of course you need to use some form of statistical analysis to do this kind of work. However, statistics does not trump energy conservation.

Sorry, you miss the point, and it is a point I and others were raising at CA. You are confusing the real world with what is being shown in these models, models that have been estimated using statistics.

As I said whether these data sets exhibit the behavior you would like them to is an empirical issue – you can’t guarantee it a priori just because the physical world should perform that way.

The imperative to demonstrate this is doubly important because the way the model data was derived, on the face of it, violates the assumptions of the tools used.

Ross McK makes the same point to you at CA suggesting a Hausman endogeneity test to demonstrate your assertion to be true.

I saw Ross Mckitrick’s comment and I don’t think what he is saying or you’re saying is correct. You seem to be suggesting that quantities like dN and dT are statistical. Well they’re not. A climate model is a physically-based model that uses the laws of physics to determine how our climate will respond to changes in anthropogenic forcing (for example). Therefore at any instant in time (to within the accuracy of the method)

dN = dF – adT

Therefore

dF = dN + adT.

Since dF is an externally imposed quantity, it does not depend on dT and therefore neither does dN + adT. Just because someone is using a statistical technique to extract information from a climate model does not make the above untrue.

Now if you want to go ahead and show that climate models violate energy conservation to an extent that would make it problematic, go ahead. That, however, would not change the point I’m making: if a climate model conserves energy (to a suitable accuracy) the term dN + adT is independent of dT.

You need to think about how dN and dT are derived. They are statistical entities derived from the models. There is no guarantee they will behave as you say – particularly when you try and go on to partial out the dF from it.

Sorry but dN and dT have to be statistical as they are estimated trend terms. In fact they are trends estimated over quantities that are averaged up over other quantities that are themselves empirical approximations to unobservable quantities that might, at a very fine level of resolution, be variables in an actual law of physics. But at the level of usage here, dN and dT are statistics, not physical variables.

Exogeneity of dF cannot simply be asserted, but it can be tested. There are lots of cases in econometrics where endogeneity bias is suspected but can’t be established a priori. But it’s a big issue: if it is present, the regression results are absolutely meaningless. The prima facie case for endogenity bias here is certainly strong enough that an econometrics reviewer and editor would have demanded it be tested.

It is a straight forward empirical question that needs to be answered. Are the estimates of the T and N time series used in the analysis and the F series derived from them sufficiently well behaved that the F can subsequently used as desired.

On one side there are theoretical statistical arguments that suggest not, on the other practical responses that say they absolutely are or that it is close enough.

Too start to resolve the issue protagonists need to understand what each other is saying.

I’m still not sure that ATTP quite agrees that the chain between nature and the actual estimates in use (and the techniques used along that chain) is weak enough to warrant systematic analysis.

All the data is downloadable for anyone to do a systematic analysis. Can go to Climate Explorer and calculate the global TOA variables (to get N) and global temperature (T) for all the historical simulations. No-one has yet demonstrated using the actual data that T and N are not well behaved enough as far as I’ve seen.
cheers,
Ed.

I have understood the logic of the paper as follows. For simplicity I consider one single period of 15 or 62 years. The same process is repeated for every period independently.

We observe that model runs in the ensemble have produced a wide spectrum of temperature trends for the period being studied. We present the question:

Is it possible to find simple characteristics of the model runs that can be used to predict the temperature trend ΔT that that model produces?

The hypothesis is that a suitable set of characteristics is α and κ as descriptors of the model and ΔF as descriptor of the external conditions, and that the model where these characteristics enter is a linear model.

ΔT = a + bΔF + c α + d κ

The coefficients of that linear model are to be determined by regression. One complication is that the CMIP5 database does not tell forcings, but it tells TOA imbalances N. Therefore the substitution

ΔF = ΔN + α ΔT

is used as operative definition for ΔF in the regression.

Regression can now be performed as minimization of the sum of squared residuals given by the formula

e = (1 – α b)ΔT – a – bΔN – c α – d κ

for each of the model run of the ensemble.

Determining the coefficients is clearly a stable task that gives unique answers. The fact that ΔT occurs on both sides in one version of the formula does not make this step any weaker in the determination of the regression parameters.

We have thus a formula that can be used to predict ΔT given ΔF, α and κ of the model run. It’s true that the operative definition of forcing is not optimal, but that does not prevent the use of the regression model, when the value of ΔF is assumed. This can be considered one additional assumption on top of the others, and this one is not the most problematic.

The Figures 2 and 3 of the M&F paper tell about the results from this regression for each of the period considered.

I do not believe that any one of the tests of exogeneity or endogeneity can be applied to this analysis. We lack the consistent alternative needed for such a test.

The approach is based on strong and crude approximations including the assumption that a model linear in the variables is good enough. Thus what’s really needed is an overall assessment of the validity of the model as presentation of the ensemble rather than formal tests of some part of the derivation. For the present purpose the most essential point is that the determination of the regression coefficients is well behaved, and I think that there are no problems in that.

One calculation that can be considered part of a test has been done, the calculation of residuals, but we do not know, how large the residuals would be, if the model were a good presentation of the ensemble. Other models could be developed for the same task and the residuals compared. That might tell something.

Yes, as M&F describe in the post, they make the assumption that their estimate of F is good enough. They also argue that this makes no difference on 15-year timescales as the other variability term dominates. For the 62-year timescales they acknowledge that this introduces some ambiguity, although the method to derive F has been tested in other situations in Forster et al. 2013.
Cheers,
Ed.

No sorry you miss the point. Accepting the way F is estimated warts and all (as described by M&F) I’m asking if there is a problem using the calculated F time series rather than the original time series from which it was estimated.

Yes, N does not provide the necessary information alone to do the analysis that M&F wanted to do – i.e. to test the relative roles of forcing and variability. N is useful for other purposes.
Cheers,
Ed.

What we are talking about is how to do this statistically and whether M&F could just pick up the F as reported, or whether they needed to include the precursor time series as part of their analysis to make sure the error terms propagate through correctly.

HAS,
Whether this approach leads to some problems in exceptional situations or to no problems at all depends on the use of the resulting regression formula.

(1) There are no extraordinary problems in the determination of the coefficients of the regression formula.

(2) The regression formula is well behaved everywhere, when it’s used with ΔF as the independent energy flux variable.

(3) If the formula is used to estimate the temperature with ΔN as the externally given energy flux variable, the coefficients of ΔT diverge, when the case happens to be that bα is very close to one. Thus estimating the contributions of ΔN, α and κ to ΔT may be highly suspect. There’s the possibility that some of the model runs happen to have such conditions for some of the intervals as α and κ are fixed for each model, and the regression coefficient b is different for each interval, but the same for every model.

(4) The paper of M&F does not use the suspect operation of (3), but the safe one (2) in dividing the temperature change to three causal components and the residual. That’s the physically meaningful way of dividing it.

(3) may lead to larger problems for the same reason that makes TOA imbalance more variable also in the real Earth system. Looking at TOA is not a particularly useful approach, when we wish to determine the surface temperature, but it’s more useful in the estimation of the forcings. The dependencies (values of partial derivatives) work that way also in the real world.

But I am at loss to understand comment (1) or (2) if applied to
ΔT = a + bΔF + c α + d κ
unless you are saying this is relates to the particular data set as it came out in the wash, and in particular in the case of (2) that having done (1) the thing doesn’t explode.

While (1) may be true wrt ΔT = a + bΔF + c α + d κ (namely you can determine the coefficients) the question is in doing that have you violated the assumption required to draw useful conclusions in terms of the estimated parameters?

This is again a repost of a comment at aTTP. I would like to hear, what the authors (or somebody else) think about this. It would be nice to learn also, whether it’s possible to check the analysis to see, whether the conditions occur that might be problematic.

—-

As far as understand correctly the paper, the idea is based on the assumption that the temperature variations (or trends over 15 years and 62 years) that the CMIP5 models produce can be summarized by the formula

ΔT = a + b ΔF + c α + d κ + e

where
– ΔT is the temperature change or trend of the model (or CMIP5 run) considered
– ΔF is the change or trend in ERF (effective radiative forcing) over the same period as deduced from CMIP5 database in Forster et al (2013)
– α is the (constant) feedback parameter of that model as deduced from CMIP5 database and reported in Forster et al (2013)
– κ is the (constant) ocean heat uptake parameter of that model as deduced from CMIP5 database and reported in Forster et al (2013)

a, b, c, and d are regression coefficients that are determined by fitting the above formula to the CMIP5 based data of all model runs included in the study ensemble. e is the unexplained residual that’s minimized in the determination of the regression coefficients.

The same procedure is repeated independently for every 15 year and 62 year period that the overall period includes.

The next observation is that F was determined in Forster (2013) as

F = N + α T

where N is the TOA imbalance included in the variables of CMIP5 data base, while F itself is not included. F is not directly an externally defined forcing used in model calculations but a model result defined by the above formula (lacking better ways of determining the forcing from CMIP5).

Thus the formula used in regression can be rewritten (as I did before)

(1 – b α) ΔT = a + b ΔN + c α + d κ + e

This formula is effectively used by M&F to determine the coefficients a, b, c, and d for each period. The resulting coefficients are supposed to tell, how strongly the temperature variations depend on ΔF, α, and κ according to the first formula of this comment.

The problem of circularity arises from the fact that N is actually used rather than an independent F, because the value of F is not determined directly, but by the approximate formula given above. It’s expected that temperature variations grow with increasing ΔF and decrease with increasing α and κ, i.e. it’s expected that b > 0, c < 0, and d < 0. If b α > 1 the coefficient of ΔT is negative in the regression formula. Such cases might affect in a strange way all coefficients obtained as some of the models (those that have highest α) work against the others. This problem does not arise, if the resulting b is always so low that b α < 1 for all models. (If many of the values are close to 1 some lesser problems may arise).

Intuitively I would expect that something more dramatic would come out than seen in the results of M&F, if the problem is serious in this particular analysis, but that’s only the first intuitive guess. It’s also possible that this issue leads to cancelling effects that force the coefficients c and d to be always small (a surprising result of the M&F paper), but that’s not my first intuitive guess.

As you can see, the above is totally based on the formulas chosen by Forster (2013) and M&F. It’s not necessary – or appropriate – to confuse this logic by physical arguments, which are not part of the actual calculation. I used earlier the expression pure mathematics, perhaps I should have used the more accurate expression pure computation. Starting values are from a database, formulas are given. Results follow from that. Variable F is not a real forcing, it’s a derived construct (ERF) defined in Forster (2013), motivated by physics, but not an externally given forcing.

I think that I have found the resolution for the apparent problems myself. If I’m correct the whole issue is inconsequential.

Marotzke and Forster define a regression model to describe the behavior of the CMIP5 models for the set of variables ΔT, ΔF, α, and κ. The determination of the coefficients of the regression model brings in also the variable ΔN, and it’s connection to the other variables. This extended set of variables introduces the possibility of writing the regression model in various different ways. It makes it also possible to calculate, what happens when ΔT is changed without a change in ΔN in a model described by the regression. If we do that in the cases that I have described as problematic that leads to temperature feedback that’s larger than the original change. The model taken in that way is not well behaved.

The above is not, however, a real problem, as we can restrict the application of the model to the four variables used by M&F allowing ΔN to vary freely as required by the given relationship. Under these conditions the regression models do not have singular behavior at all. That’s the way M&F thought about the models throughout.

The next question is, whether the potentially problematic behavior of the model does enter the determination of the regression coefficients, when the goal of the analysis is to use the resulting model in the way M&F have used it. The answer seems to be that it does not enter.. When the relationships are used in this direction, there are no singularities, there are only situations. where the coefficient of ΔT is zero, and that’s not a problem for the regression.

Thus it seems that the issue Nic presented is not really a problem. There’s a circularity, but that enters in the direction, where it is not a problem at all.

(It took me too long to understand the situation. Obviously I have worked with mathematics too little for years and even for decades.)

The interesting issue remains that it’s difficult to understand, why α and κ have very little influence on the 62 year temperature trends. TCR should affect strongly the latest decades, and TCR = F/(α + κ). The authors give some arguments, but to me they are not sufficient to explain the observed behavior.

Either the result is caused by a property of the CMIP5 ensemble, which might mean that the ensemble is not representative of wider sets of models (perhaps due to a selective process), or the analysis of M&F does not for some other reason have power to extract the influence of model sensitivity on the trends over the period of strongest warming from added CO2.

I remain convinced that the analysis carried out in the paper is flawed and does not justify the conclusions drawn, at least in respect of 62 year periods. As I wrote originally, it seems reasonable that internal variability dominates over most if not all 15 year periods.

I will be following this up with more specifics, but I have to spend time on other matters for a while.

Without handling the data in the precise form considered by M and F, I think that this discussion cannot proceed. I’ve written to Marotzke requesting that he provide his data as collated for statistical analysis, together with his script showing his statistical calculations. Hopefully, he will see the benefit of disseminating this information. I don’t propose to comment further without such data. I’ve made a similar request to Nic Lewis, but wish to start with Marotzke’s data and method first.

Hmmmm, sounds like you’re planning to audit the work of Marotzke & Forster, rather than redo the analysis. Ed, and others, can correct me if wrong, but I think all you need to redo this is in the public domain and relatively easily accessible.

Also, given that you’ve let Nic Lewis write a post, on your site, claiming to have found a schoolboy error and then insulting both the authors and the reviewers, if I was Jochem Marotzke any response to your email would be short and to the point.

I agree that reproducing results is a core aspect of the scientific method. No-one is arguing otherwise. The question is what is the best way to test a result or conclusion.

If I see a result that surprises me I will first try and reproduce the result myself starting from scratch – this is a far more powerful test of a conclusion than taking the same code and rerunning or examining the scripts. There are numerous choices when writing an algorithm to process several GB of data and only by writing your own will you better understand those choices.

For example, Nic chose to use the ensemble mean in his calculations, whereas M&F used individual ensemble members – this will make a difference, especially as different models have different numbers of ensemble members. There may be justifiable reasons for making either decision.

My personal view is that if someone wants to test results, then they should take the same data and implement their own version of the algorithm based on the described methods, asking for clarifications where necessary. They can then test the sensitivities to whatever seems important. Even better would be to reprocess the data from scratch, but that is often a serious amount of effort!

You can argue that your recommended way is the best way to test a result or conclusion, and you may or may not be correct. But why should one scientist demand that another use or not use a certain method? If M&F refuse to share their exact methods, that is what they would be doing.
If everyone believes AGW is potentially a very serious problem, should they not cooperate fully to quickly achieve the best possible understanding?

Hi Steve,
I have demanded nothing and I also said that methods should be clarified if there is any confusion. That is very clear.
But, I happen to think that to get the best possible understanding – which is what we all want – it is far better to spend the time and effort on trying different approaches to answer the same question, rather than trying to replicate exactly someone else’s approach.
cheers,
Ed.

Using the exact same data and methods as a starting point seems to me a “no-brainer” … why it is increasingly fought, often vociferously, by authors, seems to make little sense.

Because, as people other than me have also pointed out, it’s the answer to the question that is interesting, not necessarily the method itself. What we’d like to know is whether or not the result in M&F is a reasonable result or not; not whether or not they really have made some kind of mistake.

If people have the expertise to actually address this question, then why would they waste their time trawling through M&F’s data and code, why not simply address the problem again and see what you get? If people don’t have the expertise to address this question, then they’re unlikely to understand the significance of an error in M&F if they were to find one?

Think of this more broadly. Most research is taxpayer funded to a certain extent. Do you really think the taxpayer would be better served if half the experts spent their time checking the other half’s work, or would it be better if they all just did their research semi-independently, but all aiming to improve our understanding of whatever is being studied?

We develop understanding through different people/groups in different universities/countries trying to answer the same questions and – over time – converging on a similar answer, not through checking whether or not you can find a mistake in someone else’s analysis.

To be clear, I’m not arguing that one should ignore mistakes if they become obvious, I’m suggesting that spending your time looking for them is not going to be time well spent.

I have indeed been trying to scour the public domain in order to recreate the data so that I can understand fully what has been done in the paper. Perhaps you can help me. The paper states that the regression uses 75 models for which the forcing estimates are available. The supplementary table lists a total of 72. I found 77 on Climate Explorer that might qualify. Which figure is correct?

There are other inconsistencies and procedures described which may have several reasonable interpretations as well. Have you ever tried to do what you suggest?

Yes, I have tried to do this type of reproducing on another paper. The lead author wanted an independent check of his results, so I took his described methods and implemented them in a different programming language. On another occasion a co-author checked my results separately. Both times it provided confidence in the results and improved our description of the methods!

You would have to ask M&F about which simulations and models were used. Data gets added to the CMIP5 database regularly and it will often depend on when the analysis was started as to which simulations have all the required data available at the time.

But, whether you use 72 or 75 members shouldn’t change the conclusions, which is what is being questioned. It is more important and valuable to test the sensitivities to such choices than reproduce precisely the same numerical values as M&F in my view.
Ed.

Pekka Pirilä wrote:The interesting issue remains that it’s difficult to understand, why α and κ have very little influence on the 62 year temperature trends.

I think this is an important point in view of the conclusion in the abstract of the M&F paper:The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded.

What if all the models have similar values of α and κ and that those values all correspond to greater climate sensitivities than in the real climate system? How would this influence the regression results with the linear regression model used:

ΔT = a + b ΔF + c α + d κ + ε

If the values of α and κ vary only within narrow intervals, how much possibility is there to reach a significant conclusion about their impact from such a regression?

Your formulation requires doing the full regression simultaneously. There is a difference between doing multiple linear regression and doing regression of the residuals of an initial fit, subsequently, on different parameters. Can anyone comment on the applicability of that here?

There is a significant difference – if you do individual, sequential regressions, you will invariably overfit your regression to the initial parameters (attempting to match all variations of the data) and underfit later parameters, in an order-dependent fashion. This can be seen in any regression with missing parameters, described as omitted variable bias.

That’s why you perform multiple regression – as the data you are trying to regress upon is (presumably, if you’ve correctly included the major physical inputs) the summed response of _all_ the parameters.

Ed,
I’ve collated CMIP5 data from KNMI and have some useful scripts in R for doing so efficiently. I’ve written some posts on such results.

I suspect that I could carry out my own collation quite quickly and, if I pursue the matter, might well do so at some point.

However, in statistics, specialists like to use exactly the same data for testing statistical methods, so that differences in results arise from how one handles the data rather than differences in collation or in the larger data set. Since the dispute is about the validity of the statistical methods, this is by far the most efficient method.

It baffles me that this should be an issue for the climate science community, particularly in light of the experience over the past few years. Let’s hope that it isn’t in the present case.

Steve,
The point is that the goal of Marotzke & Forster was to try to answer a question. The question was essentially “how much of the spread in climate model trends can be explained by forcing variability and how much by internal variability?” So, if someone wants to check this (and it would clearly be good if others did) the ideal thing would be for them to simply do their own analysis. It doesn’t even have to be using exactly the same input data, because this result should not be sensitive to, for example, the number of models considered.

Let’s say you do this and get a very different answer. That would be interesting and we could all then try and work out why. Let’s say you do this and get about the same result. Does it mean that it’s now right. Well, no, but it adds some more confidence. Let’s say, at the same time, someone points out an actual error in Marotzke a& Forster, does this now matter? Sort of, but you’ve just got the same result, so maybe it wasn’t an important error. More people should check, but it might not matter.

Essentially we benefit more by other people doing their own analysis to answer the same question, than we do by people simply trawling through the code used by another group checking for errors and mistakes. An argument between you and Marotzke & Forster over a potentially silly mistake would be much less interesting than a discussion about why your independent analysis produces a different result to theirs (assuming that it does),

It would indeed be good to try a different method and check the results. But the first step should be to duplicate the previous result. That is standard practice in any science: first duplicate the previous result, making everything exactly the same as much as possible. Then try varying something.
If you don’t do that, you can’t tell what caused a different result, because you don’t know what’s different. You can’t tell if your same result is really the same, because you don’t know if two things are different and cancelled out.http://neurotheory.columbia.edu/~ken/cargo_cult.html
Statistics isn’t the same thing as scientific experiments. But the nice thing about statistics is that you can repeat the calculation exactly, unlike regular science where you do the best you can. Skipping this step is a personal choice. Insisting that this step must be skipped makes no sense.

Anyhow, different people have different ways of studying things. If the easiest way for Steve McIntyre to understand exactly what someone did is to track the work in the code, I don’t see why others should be saying, Well, I find it more effective to do ___.

An even better idea: let Steve do his re-analysis of M&F using their exact code and methods AND have other people try to replicate their surprising results using their own methods. That way, we get to see if M&F’s methodology was flawed or robust (a useful learning exercise in itself) and we get to see if their results can be reproduced independently, which might inform us that, even though their method was suspect (or not), the results they got were reasonable (or not). Why exclude one or the other?

It seems that my former proposal may not be an explanation of the linear regression results discussed. But it is still puzzling that ΔT = a + b ΔF + c α + d κ + ε is not substantially varying with varying values of especially the climate feedback parameter α between different climate models. With an increasing positive feedback, corresponding to an increasing positive value of this parameter, the effect of a given amount of forcing should be deactivated by a decreasing temperature rise. Why is this not seen as a significant effect in the linear regression results?

I have another question about an effect that I wonder how it could influence the regression results. The climate models are tuned to describe the historical temperature curve. Shouldn’t this mean that a climate model with a higher α should use a higher ΔF in order to give the same ΔT for a given historical period than a model with a lower α.

As a consequence of this tuning there should be a dependence between α and ΔF. After linearization α would be approximated as a linear function of ΔF. How would such a linear dependence influence the linear regression results? Could this result in the observed absence of an effect from variations in the values of α?

I don’t know if this resolves your first issue, but the regression model actually only involves the variation in each quantity (F, a, k) from the multi-model average for that quantity. i.e., it’s

Delta T’ = bo +b1 DeltaF’ + b2 a’ + b3 k’ + epsilon.

So, in a sense, they’re assuming that there is an already known multi-model mean trend, and they’re then using the regression to try and determine how much of the variation in the model trends can be explained by variability in the forcings (and in a and k) and how much is explained by internal variability (epsilon).

I don’t think that your suggestion is an answer to my question. The linearization doesn’t change how the variables depend on each other.

Let us compare two models over the same historical period. Assume for simplicity that they have the same values for κ but different values for α and ΔF. If both models are tuned to show the same temperature change we have that ΔT’ = 0.

Thus ΔT’=0=a + b ΔF’ + c α’ + d*0 + ε => ΔF’ = -c/b α’-a/b

Thus ΔF’ is a linear function of α’ considering that the same equation is valid for all pair of models with the same κ. Substituting the definitions of ΔF’ and α’ shows that also ΔF is a linear function of α. I think that this reasoning may be generalized for the case with varying κ.

This may be getting too philosophical again, but F is imposed on the models; it’s external. The climate sensitive is a model metric and indicates how much the climate would warm under a doubling of CO2 and includes both the Planck response and the feedbacks. From a physical perspective (yes, I know, physics again) there shouldn’t be a direct relationship between F and a.

There seem to be three possibilities. Climate models do a reasonable job of representing the physical climate and hence they tend to behave similarly with some amount of variability. Or, they’re explicitly tuned, as you seem to be suggesting. Or, a combination of the two in the sense that some models may be rejected for just matching observations poorly without necessarily a good reason for doing so.

I don’t know enough to know which it is or even if it could be something else. What I will say is that I don’t think it’s correct to say that F depends on a, since F is external. Could a depend somehow on F. Maybe I guess, but I’m not quite sure how.

In connection with the dependence between ΔF and α and climate models Figure 1 in Schwartz et al. (2014) may be of interest. This shows that for the whole preindustrial period, with a fixed observed value of ΔT, the CMIP5 climate model results gather around at straight line in a log-log-diagram for ECS vs. ΔF-ΔN. This straight line in the log-log-diagram has the equation

ECS=3.7/α=3.7 ΔT/(ΔF-ΔN)

where ECS and ΔF are variables but ΔN and ΔT are constant.

The following derivations shows how this equations follows from Equation (1) ΔT= ΔF/(α+κ) in the paper by M&F:

Pekka has proposed that the regression can be done in a restated form of the original equation. This is incorrect. The problems with the regression model adopted in M and F are due to the endogeneity of the situation and in no way do they depend on (nor does this comment address) the correctness of the specification of the model.

In order to understand the arguments on the effects of circularity on the regression used in M&F, it is necessary to look at the Least Squares methodology in a bit more detail.

The authors start with a mathematically based statistical model:

ΔT = a + b ΔF + c α + d κ + ε

In the model, the variables ΔF, α and κ are assumed to be independent of ε which accounts for the random variation of ΔT in the statistical model. The ε’s are assumed to be independent of each other and to have means equal to 0. In this case, the authors have implicitly assumed that the ε’s are also homoscedastic, i.e. each having the same variance. There is a further very important assumption that the ε’s also be independent from all of the predictors.

In LS, estimates of the coefficients and the variance of the ε’s are obtained by first forming a sum of squares of the residuals:

SSE = ∑ε2 = ∑[ΔT –(a + b ΔF + c α + d κ)] 2

and then minimizing SSE with respect to the parameters a, b, c and d. It should be noted that the parameter estimates are functions not only of the non-random variables, but of the ε’s as well so they are random variables within this structure. In this case, the minimization procedure is simple to carry out using easily calculated matrix algebra.

Now what happens if ΔF is calculated from a previous relationship with two variables: ΔF = α ΔT’ + ΔN?

We substitute this relationship into the original equation to get:

ΔT = a + b(α ΔT’ + ΔN) + c α + d κ + ε

If ΔT’ is not the same as ΔT, then nothing is changed. The variables on the right hand side are still unrelated to ε and the entire procedure gives identical results to the previous case. However, if ΔT’ and ΔT are identical, the situation becomes radically different.

Now, ΔT has become a predictor of itself and the ε’s are present not only at the end of the regression equation, but also (invisibly) through the ΔT which is also on the right hand side. The predictors have violated a very important assumption that they must be independent of the ε’s. Hence, the usual simple regression procedure fails and all results from it are spurious. Estimates of the parameters, confidence interval and p-values will be biased and therefore neither reliable nor scientifically meaningful. This violation occurs even if one uses ΔF in the regression procedure. Despite the fact that you can’t “see” ΔT in the equation, its effect is still present mathematically because it has been used in the calculation of ΔF.

To produce a solution for this situation, the regression equation can be rewritten as Pekka suggests:

(1 – b α) ΔT = a + b ΔN + c α + d κ + ε

and the sum of squares becomes:

∑ε2 = ∑[(1 – b α) ΔT – (a + b ΔN + c α + d κ)] 2

Minimizing this with respect to the coefficients in the equation is not as simple as in the above cases, but can be done with a little bit of programming or by using available optimization techniques. But the story does not end here. From the regression, we need to form the decomposition:

ΔT = Predicted(ΔT) + Residuals(ΔT)

In the ordinary regression case, the predicted value is calculated by replacing the corresponding values of the predictor variables into the equation for each model. The residuals are then calculated by simple subtraction or taken directly from the minimizing process for SSE. For the circular case, the entire equation must be divided by (1 – b α). Note that α (and therefore 1 – b α ) are in fact vectors whose elements have different values depending on which climate model the particular observation is from. This has some important consequences, not the least of which is the introduction of bias into the entire estimation process.

First, b (and therefore 1 – b α) is itself a function of the ε’s in the model. The distribution of the ratio of two random quantities is very complicated and can be unstable, particularly if the divisor is close to zero.

A second consequence is that the effective coefficients for the predictor variables will be different for each climate model in the regression. As noted above the divisor is a vector so the actual value will be different for every observation.

Finally, the residuals are no longer the ε’s themselves. Due to the division process, they have become ε/(1 – b α). Their independence has been destroyed due to the common presence of the estimate of b and they are now heteroscedastic with a variability depending on the sign of b as well as the magnitude of α.

The bottom line is that the regression done in the M and F publication is inappropriate and their subsequent results are scientifically unreliable and difficult if at all possible to correct.

Roman,
You’re ignoring a couple of things. One is that Delta N also depends on Delta T through the equation

Delta N = Delta F – a Delta T

Therefore it seems to me that if you want to put all Delta T terms on the LHS, you also need to move Delta N.

You also seem to be ignoring that the regression in Forster et al. (2013) was done using the full Delta T and Delta N values for each model, while the regression in M&F was done using Delta T’, Delta F’, alpha’ and kappa’, which are the variations from the ensemble model averages.

It is my understanding that ΔT and ΔN are the model-simulated GMST and TOA radiative imbalance variables. These are separate measures taken from the climate model. On the other hand, ΔF is calculated from ΔT and ΔN. It does matter what is calculated from what.

I am not ignoring what form of the variables was used in what paper because it does not matter. Both forms produce ***exactly*** the same estimates for b, c and d. Only the a’s will be different.

The predicted values and the and the residuals are also identical. This is what we teach in an elementary course on regression.

In a climate model, Delta F is an externally imposed quantity. By definition it does not depend on any internal quantity. It is what is driving the changes in the model. However, it is a quantity that is not easy to extract from the models because it is calculated self-consistently.

However, it is possible to extract the Delta Fs because (and I hate to say it again) if the models are conserving energy, then

Delta F = Delta N + a Delta T

So, yes, Forster et al. used Delta T and Delta N from the models to determine Delta F. This doesn’t mean though that Delta F somehow depends on Delta T because it is the external imposed forcing that is driving the changes in Delta T and Delta N

So, if you were to take you substition to its logical conclusion you would now recognise that Delta N also depends on Delta T through

You make the thing too complicated by not following, how the computation actually takes place. When you follow that, you see that the regression coefficients are determined in a straightforward way that’s not disturbed at all by the fact that the same ΔT occurs on both sides of the equation or multiplied by the sum of two terms that may be close to zero.

Potential complications appear only, if the resulting regression model is used to calculate ΔT from ΔN, α, and κ, but I agree with the authors that such calculations are not deeded in the analysis reported in the paper.

You don’t seem to understand that the ΔF in the sentence “In a climate model, Delta F is an externally imposed quantity” is NOT the same as the ΔF in the sentence “However, it is possible to extract the Delta Fs because (and I hate to say it again) if the models are conserving energy, then Delta F = Delta N + a Delta T”.

The first one may very well be an external existing (but unknown) quantity. The second one is an ***estimate*** of that quantity. The two would not typically be the same unless the system was completely deterministic.

If you knew the first one, then the second would not be necessary. But you don’t so you calculate the estimate from the variables you have available using the relationship between them. Yes, the estimated ΔF is strongly related to ΔT.

Roman,
Then it would seem that you need to show that estimated Delta F is not a good representation of the actual Delta F.

However, you state that the estimated Delta F is strongly related to Delta T. How do you know this? If the model is conserving energy, then this is not true. Just because Delta T appears on the RHS of the equation in which Delta F is determined, does not make it so. As Ed, and others, have pointed out, using that form of the equation is done to remove (as well as possible) any dependence of Delta F on Delta T. Given your claim that this is not the case, I assume that you can actually show this.

Given the equation dF=dN + a*dT, take the covariance of the RHS and LHS with respect to dT.
= + a*
dF is independent of dN (in the sense of covariance) iff the two terms on the RHS cancel exactly. aTTP, is this your assertion?
Regards

I’m suggesting that if the models conserve energy, then that has to be true. Okay, we’re talking about globally averaged quantites and it’s possible that there could be regional variations that mean that alpha doesn’t always represent the actual feedback response at all time instants. But averaged over a suitable time interval, if models conserve energy then, by definition,

dF = dN + a dT

and since dF is externally imposed and does not depend on dT – by definition – dN + adT should not depend on dT.

ΔF used in the M&F paper is strongly affected by external input, but it depends also on the model and contains unknown contributions from internal variability.

The determination of the regression model from the CMIP5 data is a well behaved process, or more precisely all its steps have of a well behaved nature.

The regression model is, however, a model for the variables defined implicitly by the way the variables are handled in the models and further processing. The relationship of these implicitly defined variables and the real physical variables of the same name cannot be fully understood. Therefore it’s also uncertain, how well the model variables satisfy the laws of real physics.

From reading the discussion here I get the following picture of the alleged circular reasoning:

The forcing F appears in the form of input variables to the climate models that are independent of the calculated temperature change ΔT. (But the calculated temperature change is of course dependent on the change in the forcing ΔF).

However, in the paper by M&F the real input forcing has not been used but instead the changes of the forcing have been estimated according to the equation:

ΔFi*= ΔNi- αi ΔTi

Here * denotes that this is an estimated value and index i refers to the climate model simulation number. In this equation the values of ΔNi, αi and ΔTi are calculated by the climate model.

After that the values of ΔTi are assumed to be possible to be calculated by the equation:

ΔT= ΔF/(α+κ)

Using this equation a linear regression model is derived where index e refers to the ensemble mean:

ΔTi-ΔTe=ΔFi*/(αi+κi)-ΔFe*/(αe+κe)=a+b (ΔF*i-ΔF*e)+c (αi-αe)+d (κi-κe)

The regression model may also be represented in the following way:

ΔTi=a1+b ΔF*i+c αi+d κi

where

a1= ΔTe-(b ΔF*e+c αe+d κe)

Although the really input ΔF values may not have been dependent on ΔT the estimated ΔF* values are, because this follows from the equation used for estimating them.

Pehr,
I think this may explain some of the issues. Let me try and see if I can explain. There are others here more experienced than I am who can jump in and tell me where I’m going wrong if they wish.

So, if we start at the beginning, Forster et al. (2013) used basic energy balance to estimate the forcing time series for various climate models. Essentially they use that

dF = dN + adT.

Now, once you have your forcing time series, you can then use it to determine how a model would respond to a change in external forcing. There are, however, two other things you need (for a simple model at least). You need to know the climate sensitivity, alpha, and the ocean diffusivity, kappa. Essentially, alpha tells you the feedback response when T changes by dT, and kappa tells you how fast the oceans warm and is used to represent the planetary energy imbalance. Essentially, if you apply a change in forcing dF, then if kappa is large (oceans warm quickly) the surface will warm more slowly, than if kappa is small (oceans warm slowly).

So, the basic energy balance equation is now

dF = alpha dT + kappa dT,

which you can rewrite as

dT = dF/(alpha + kappa).

What I think you’re trying to do is argue that because kappa dT = dN, that you can replace kappa with dN/dT. This, however, isn’t quite correct, because the above equation is only true in the absence of variability. It’s not correct that kappa = dN/dT.

It’s fairly easy to see why. Consider the following; you have a forced response plus some variability about this forced response. What happens if dT suddenly jumps up? Well, dN has to go down in order to conserve energy. If dT goes down, dN goes up. Therefore, energy conservation tells you that the variation of dT about the forced response is anti-correlated with the variation of dN about the forced response. In other words.

kappa =/= dN/dT.

What one could write is

kappa = dN_forced/dT_forced.

What M&F did was to represent variability by an additional term, epsilon, giving

dT = dF/(alpha + kappa) + epsilon.

There are two ways to look at this now. Those who would like to re-introduce dT into the RHS, need to remember that dN also depends on dT. Alternatively, the first term on the RHS really only includes the forced dT and dN, not the full dT and dN. Those are included through the residual, epsilon. If so, then dT does not appear explicitly on the RHS and there is no circularity.

I don’t know if that helps to clarify things. I would guess not, but it’s worth a shot.

Unfortunately, I have problems to get my equations correct. My apologies for causing confusion.

In my comment FEBRUARY 11, 2015 AT 10:49 PM the equation for a1 should be:
a1= ΔTe-(b ΔF*e+c αe+d κe)+a

In FEBRUARY 11, 2015 AT 11:08 PM the second equation should be:
ΔF= ΔN+ α ΔT

Thanks for your comment. I have some views on your reasoning that
kappa =/= dN/dT.

In fact, according to the energy balance model over the climate system we have:

ΔN= ΔF- α ΔT = κ ΔT => ΔT= ΔF/(α+κ)

This represents a steady state energy balance expressing ΔN at first as the input to the climate system at the top of the atmosphere and secondly as the output from this system to the deep sea.

You are right that those equations cannot be satisfied if we consider that the climate system is far from steady state, such as after a sudden change in ΔT due to internal variability. On the other hand, consider that internal variability could be described as a change in the value of κ, then it is possible that the steady state equations will remain approximately satisfied (quasi steady state) while both ΔT and ΔN adjust to their new values.

From the information in Forster et al. (2013) the values of the climate feedback parameter α for the climate models used by M&F were determined independently from climate model simulation experiments on quadrupling of the carbon dioxide mixing ratio by linear regression of ΔN vs. ΔT using ΔN= ΔF- α ΔT as the linear regression model.

Values of α+κ, and then of course κ when α has been determined, were also determined independently from climate model simulation experiments on doubling the carbon dioxide mixing ratio by increasing this ratio by 1% per year. In that case, if I have not misunderstood it, ΔT was regressed vs. ΔF using ΔT= ΔF/(α+κ) as the linear regression model.

However, to my understanding, both ΔTi values and estimated ΔFi* values used by M&F were obtained by using climate model simulations for the historical period, basically the twentieth century. This raises two questions.

The first question is how the fact that ΔFi* depends on ΔTi according to ΔFi*= ΔNi + αi ΔTi influences their results.

The second question is how the tuning of the climate models in order to account for the temperature curve over this historical period will influence their results. This tuning may result in a dependence between the climate feedback parameter αi values for the climate models and there ΔFi values as I have discussed in previous comments.

These are, in my view, the background and the questions raised. But I am still far from having a founded opinion on what the final answers to those questions are.

Pehr,This represents a steady state energy balance expressing ΔN at first as the input to the climate system at the top of the atmosphere and secondly as the output from this system to the deep sea.
The above is kind-of wrong. The input to the system is actually dF. This produces a TOA imbalance which causes the temperatures to rise, which reduces this imbalance by alpha dT. The remaining imbalance is then dN, so that

dN = dF – alpha dT,

or

dF = dN + alpha dT.

If you add variability, then dT can bounce around, but this causes dN to also bounce around so that

dF = dN + alpha dT.

So, if you then use this energy conservation to determine dF, and then reverse engineer this to determine the temperature response, given a known dF timeseries, you lose the information about the variability as all you have is the external forcing., which shouldn’t depend on internally driven variations in dT.

On the other hand, consider that internal variability could be described as a change in the value of κ
I don’t think kappa can change as it’s (on short timescales) a constant that tells you how energy diffuses into the ocean. Now, if there is variability, that would typically be because of some kind of ocean cycle releasing energy, increasing dT and reducing dN, or vice versa. So, I do think you should think of it as

dT = dF/(alpha + kappa) + epsilon,

where the first term is the forced response, and the second is the variability.

I have another opinion on this. ΔF is a theoretical concept in climate science and may only be seen as an input to the climate system on certain occasions. One such occasion is if you look at the system when it is assumed to have changed from one equilibrium state to another.

Another occasion is in a carbon dioxide doubling experiment. Then ΔF= ΔN at time zero but afterward the input equals ΔN that differs from ΔF due to the effect of climate feedback.

The theoretical basis for ΔF is a linearization of ΔN as a function of ΔT. That is the origin of the equation ΔN= ΔF- α ΔT defining ΔF as the temperature independent term.

Considering the steady state energy balance model discussed here we have that ΔN at the top of the atmosphere equals the difference between incoming radiation from the sun and outgoing radiation in the form of long wave infrared radiation and reflected short wave radiation (usually in units W/m2).

At the bottom of the mixed ocean layer ΔN equals the heat transfer rate from the mixed layer to the deep sea. It is assumed that the average temperature difference between the mixed layer and the deep ocean is the driving force for this heat transfer. It is of course possible to imagine that the heat transfer conditions may change, that is that κ may change.

ΔF is a theoretical concept in climate science and may only be seen as an input to the climate system on certain occasions.
Except, what the climate models are doing is assuming that it exists (or, rather, they’re calculating the radiative influence of external factors like increasing CO2). You might dispute whether or not it exists in reality, but I’m not sure how you can dispute its existence in the models since it’s imposed upon them.

But I have not disputed the existence of ΔF as an input to climate models. Input data to a climate model is one thing, but what I discussed in connection with the steady state energy balance model was the input at the top of the atmosphere in the form of energy flux to the climate system. We seem to be referring to two different things.

what I discussed in connection with the steady state energy balance model was the input at the top of the atmosphere in the form of energy flux to the climate system. We seem to be referring to two different things.
Possibly, but dF is the input. In the absence of a temperature response, dF = dN. As the temperature rises, it then becomes

dF = dN + a dT.

So, dF is the constant, externally imposed forcing, and dN and dT are what vary.

It’s possible that we’re saying the same thing in different ways, though.

I disagree. In the Andrews et al. (2012) the forcing ΔF is an output from the study obtained by linear regression of ΔN vs. ΔT. They write in the first sentence in the abstract:

We quantify forcing and feedbacks across available CMIP5 coupled atmosphere-ocean general circulation models (AOGCMs) by analysing simulations forced by an abrupt quadrupling of atmospheric carbon dioxide concentration.
My point is that although you know that the change in carbon dioxide forces the climate system you must also determine what this means in the form of energy flux at time zero in W/m2. This is found as output from the climate models, not as input.

Yes, I agree that climate models calculate the radiative influence of an external change. However, that still means that this change produces an external forcing that drives change in temperature, dT, and planetary energy imbalance, dN.

ATTP, I understand that dF is an external forcing and is exogenous. But my understanding of your explanation is that dT is an effect of dF albeit confounded by the variable dN. But without making a very specific assumption about dN, cov(dT, dN) should not be zero. You seem to be asserting a peculiar regression structure in which the two indep’t variable have no covariance to the dep’t variable. This is not impossible, of course, but seems unusual.
Regards

I’m not sure that I’m saying what you think I’m saying. In physics speak 🙂 if climate models conserve energy and if the climate sensitivity is time-independent, then if there is a change in forcing, dF, following by a change in temperature, dT, the change in planetary energy imbalance should be

dN = dF – a dT.

Therefore, if one wants to determine the change in forcing from in a climate model, one can do so using,

dF = dN + adT,

and as long as the conditions I specified at the beginning of this comment are true, then this should return a reasonable estimate of the actual change in external forcing. Also, since dF is the radiative response to a change in something external (in the absence of a temperature response) it is – by definition – independent of dT and, therefore, so is dN + adT.

Of course, climate models don’t conserve energy exactly, and alpha isn’t completely time-independent, so a more interesting question might be how would affect the analysis in M&F, but I don’t think this is sufficient to argue that the method in M&F is wrong. I also think that a lot of this has been covered before (in Forster et al. 2013, for example).

In this study both α and κ are assumed to be constants for each of the models. α tells the total feedback as seen in the particular model between two equilibrium states of the model. Both states are represented by averages over long periods. When the models have internal variability, that may lead also to variability in the relationship between surface temperature and TOA imbalance whose equilibrium average values are described by α. Thus the relationship

ΔF= ΔN+ α ΔT

is valid for the long term averages (at least for that pair of states that’s used to define α), but not necessarily over shorter term. How well it’s valid for the 15 year trends (or 62 year trends) must be uncertain to some extent, perhaps to a quite significant extent. Leaving the correction out would, however, not be correct even on the average. As we are discussing model properties, the deviations may differ from a model to another.

The calculation does not use separate values αi for each period. If such values were known, then we knew the actual ΔFi as well, but that’s unfortunately not the case. For this reason the values of ΔF used in the regression are not exactly the forcings, but an operationally defined substitute of somewhat unknown nature. It’s probably a better substitute for the 62 year trend than for the 15 year trend.

I would like to ask if the ΔF derived from the models in Forster (2013) is the same ΔF value used in 2015, either in average or directly model to model, (if any models were the same). Also, if they were the same models if they all had the same internal values in both studies.
Thx

For those who want more information on that on a graduate student level I can recommend the following discussion of simple energy balance models by Isaac Held where one of the special cases is the steady state model discussed here:Transient vs equilibrium climate responses

Your paper refers to the primed variables below as the “cross-ensemble variation”. Does this mean that they are calculated as follows?

ΔF’ = ΔF_overbar – ΔF
α’ = α_overbar – α
κ’ = κ_overbar – κ

where ΔF was obtained as described above and α and κ were known from earlier work. The quantities with an overbear are the ensemble mean, but the other quantities vary with model (but not the model run).

If not, a better explanation for cross-ensemble variation would be greatly appreciated.

Hi Ed, I am puzzled by the result here that temperature trends do not depend much on ocean heat uptake and feedback parameters. That seems to me to be counterintuitive. Is there a simple explanation as to why that result makes sense?

If I’m understanding the bottom line it’s that a test has been constructed of the behavior of the variables in the models using ΔF as the independent variable and The First Law as the pretext for the equation. The controversy is if the test will give meaningful results is ΔF is pre-constructed from the variables being tested (from the same models or averages of them). The conclusion is partly ambiguous because of:

1) the degree of pre-construction contained in ΔF,
2) that if the equation itself being an undeniable assumption, being energy conservation, that assures the cancellation of any circularity,
3) If the assumption of 2 is invalid considering the lack of knowledge of the inner workings of the model being tested.

The conclusion for now is that all three of the above problems have been adequately satisfied to all’s satisfaction (except mine).

Pekka has written in several comments that “the calculations of M&F are stable (robust might be more accurate) and without significant problems from circularity”, and asserts (I think correctly) that Marotzke and Forster agree with him about this. He has also now clarified mathematically what he is arguing.

If I understand correctly, Pekka argues that the fact that ΔF is a linear function of ΔT does not involve a circularity since it is the actual model-simulated ΔT (ΔTs) that is used to calculate ΔF, not the purely forced, free-of-internal-variability-etc-error, version of ΔT (ΔTf), which is what the regression fit represents (if the regression model is appropriate). I will explain why Pekka’s argument does not support the conclusions of Marotzke and Forster in relation to 62 year periods.

Suppose that over the 62 year period involved simulated multidecadal internal variability leads to ΔTs exceeding ΔTf in some models and falling below it in other models, without the simulated value of ΔN (ΔNs) being similarly affected. This seems both plausible and likely; many models exhibit substantial multidecadal internal variability, and show little correlation between multidecadal ΔTs and ΔNs (after detrending).

In this situation, models with ΔTs > ΔTf will generally have a relatively high diagnosed value (ΔFs) for ΔF, since ΔFs = α ΔTs + ΔNs. (Note that although Marotzke and Forster write of α ΔT being a “correction” to ΔN, it is the larger of the two terms in most cases.) As a consequence of such internal variability, intermodel spread in ΔTs will be positively related to that in ΔFs, increasing the proportion of the intermodel spread in ΔTs that is “explained” by the ΔFs, or the “contribution to the regression by the ERF trend”, which Marotzke and Forster state is dominant for start years from the 1920s onward. This effect is what I refer to as circularity; it is not total and I did not claim that it was.

I consider a contribution to intermodel spread in ΔTs that arises purely from the same elements of internal variability appearing on both sides of the regression equation to be an artefact of an unsatisfactory method. Perhaps on reconsideration Pekka may also come to this view.

Whether the circularity element that exists in the regression method used is the largest source of error in this study is uncertain. I identified other potentially serious sources of error involved in it; they may be more important. Paul_K has set out further issues with the study’s methods.

Note that it would be unsurprising if Marotzke and Forster has just found that the ERF trend ΔFs has a considerably larger influence than model feedback strength and model ocean heat uptake efficiency over historical 62-year periods starting from the 1920s on. Aerosol forcing varies hugely between models (by over 1 W/m2). Up to the turn of the century, 62-year ΔTs trends have a correlation of 0.9 with diagnosed or estimated aerosol forcing levels for the models used by Marotzke and Forster. And over the entire Historical simulation period, 1860-2005, ΔTs trends have as high a correlation with aerosol forcing strength in models as with ΔFs.

However, that intermodel differences in the ERF trend have to date had a considerably larger influence than those in model feedback strength would not justify Marotzke’s claim: “The difference in sensitivity explains nothing really”. And even if variations in model sensitivity explain relatively little of the intermodel spread over the Historical period that would not justify his statement that “The claim that climate models systematically overestimate global warming caused by rising greenhouse gas concentrations is wrong”. It is entirely possible that systematically-excessive model sensitivities have until recently been largely offset by systematically-excessive aerosol forcing and/or obscured by a positive influence of actual multidecadal internal variability on observed GMST.

The problems arise in the way Nic describes above, because he does not follow the actual steps that are taken in the analysis that starts by Forster et al (2013) and continues in the paper discussed in this thread. The actual procedure is clear and robust, but there are some issues related to the interpretation and use of the resulting regression formulas.

I believe that the following describes correctly the essential steps of the analysis, but there may be some further details that I’m unaware about.

The regression model is defined in such a way that it implies that ΔF of the is operationally defined as

ΔF = ΔN + α ΔT

where all variables are from the diagnosed properties of the particular model and model run for the period considered. Based on that the residual of the regression formula is calculated as

e = ΔT – a – b ΔF – c α – d κ

or equivalently with the operational definition of ΔF

e = (1 – bα)ΔT – a – b ΔN – c α – d κ

(This is a point, where Nic seems to be proposing different formulas that make the regression less robust.)

The regression coefficients are determined by searching coefficient values that minimize the sum of squares of the residuals summing over all model runs for the period considered.

When the calculation is done as I describe, it’s robust, but as I note above, Nic seems to propose that a different approach should be used.

Now we have the regression formula

ΔT = a + b ΔF + c α + d κ

The coefficients are robustly determined, but we have a problem. ΔF used in the step of determining the formula was defined operationally by a formula that does not correspond to standard definition of forcing. As acknowledged by Martotzke and Forster in their response above

Of course one could legitimately ask how accurate this correction is, and we would hope that in future generations of coordinated model simulations a better direct diagnostic of F is possible. But for the CMIP5 models used in our study and in Forster et al. (2013), applying equation (3) has been the only approach possible.

They continue to discuss arguments to support their approach and the significance of this issue in their response. (Read it, if you don’t remember the text.)

The problem that Nic is discussing reappears here in the way that ΔN calculated from the formula

ΔN = ΔF – α ΔT

is not any more the same as in the model run, when ΔF is chosen as the same. Thus the case described by the regression model is not the same as the corresponding original one. We might try to fix the case by using the original ΔN, but that leads to a different case again, and is actually inconsistent with the way the regression was preformed.

My intuition tells that the problems are quantitatively small, if the product bα is significantly less than 1 for a dominating majority of the model runs for each period. In the opposite case the results may be less significant.

Nic,
Maybe you should think about the physical meanings of the terms dF and dN. I can’t see why replacing something that is assumed to be known a priori (dF) with something that is not known a priori (dN) is a standard way to do linear regression. By a priori, what I mean is that it is possible to know the radiative influence of the external drivers before running the models, even though this isn’t what is actually done. This is not true for dN.

Also if you do want to do both regressions in a single step, I don’t see how you can move a dT (which together with dN constrains dF) from the RHS – where it is given by the model output – to the LHS where it is not. In doing so it would seem that RomanM is assuming that temperature changes are being driven by dN, which is not correct, they’re being driven by dF.

I’m not sure that you are on the same page as Pekka Pirila’s comment below since he seems to be criticising others for not precisely following the procedures used by M&F.

The fact that exogenous forcing, as a matter of physics, drives global temperature does not change the fact, also as a matter of physics, that feedbacks largely determine equilibrium temperature under any given forcing scenario. Therefore, to accurately model global mean surface temperature you must get the feedbacks “right.”

Yet this apparently has no bearing on the results of the M&F analysis of model simulations, despite their selected samples displaying a 300 percent range in feedback magnitude. They state:

…we find no substantive physical or statistical connection between simulated climate feedback and simulated GMST trends over the hiatus or any other period…

They calmly accept this extraordinary result and reject claims that climate models overestimate human contributions to radiative forcing. Even though they seem to be discounting the possibility that their required internal variability adjustment is actually (in whole or in part) a feedback parameter missing from the model inputs, it appears that no realistic magnitude of feedback would change their results. Given the profound implications of their conclusions, independent reproduction of their results is both a scientific and public policy imperative.

I have discussed these issues with aTTP also on his blog. We are in agreement at least on the conclusion that the analysis of Marotzke and Forster is internally consistent and that the claims of circularity are based on a different model invented by the critics. The choices of M&F are reasonable, those of the that alternative are not.

Having said that, i think that the model is extremely crude, and that there are issues acknowledged also by the authors that may affect the conclusions. These effects may be substantial in my personal mainly intuitive judgment. More reliable results require at least significantly more work based on what’s known about the CMIP5 model runs, and possibly additional model runs that may be too expensive in resource requirements to be performed for these models.

When new inter-model comparisons are done, more information will surely be recorded. At that stage a better analysis is also possible, but that’s of now help right now.

oplus,
I’m largely in agreement with Pekka. My argument is not that M&F has no problems (although I would argue that they address most of the possible issues in the paper) my point is simply that the circularity argument made by Nic Lewis ignores that this issue is largely removed by an assumption of energy conservation that allows one to estimate the external forcings, and – as Pekka points out – the criticism is also based on an alternative model suggested by the critics that would not actually allow you to address what M&F are trying to do.

It may be clarifying to look at the the (counter factual) case that the relationship between ΔF and ΔN is precise for the normally defined physical parameters. In that case the same formulas could still be written. The arguments of Nic and RomanM could still be written. They do not discuss the inaccuracy of the replacement. That leads, however, to a paradox:

When the relationship is exact, there’s no reason to consider ΔN in any way more primary information than ΔF. According to their argument out of two equally primary alternative choices one leads to circularity, while the other does not. That paradox shows, that there’s something wrong on the argument.

What’s wrong is hinted at in one of Nic’s own comments. He notices that the statistical model of residual noise changes in some of the transformations, but that’s not a minor detail but that’s whole key for understanding where their conclusions originate.

The estimation depends on the statistical model of the residuals. The basic assumption of M&F is that the statistical properties of the residual are the same for each model run, when the residual is defined as they define it. I have emphasized in many of my comments that the regression depends on the formula for calculation the residuals, when the rest is ordinary linear regression.

The choice of the way the residual is calculated is determined by the implied assumption that it’s due to internal variability that is correctly scaled in the choice that they have made. The relationship between ΔN and ΔF is assumed to be given by the original formula using there the observer temperature, not something corrected by the residual.

This is a physics based assumption. It may be right or wrong, but only the physics as it’s presented by each model can tell the full answer for that. If the difference ΔF-ΔN depends on the same internal variability that leads to the residual, then the formula should be modified to reflect that. There’s, however, little reason to think that replacing ΔT by (ΔT-e) would be the correct way of doing that, or at least no one has presented evidence for that.

When we consider this paper, we should recognize all the assumptions made in it. We may criticize the choices and argue that some other choices lead to better description of CMIP5 ensemble, or that no proposed parametrization is fit to the analysis presented in the paper. That’s all legitimate, when arguments are given to support each point.

The claim of circularity as presented by Nic and RomanM is, however, not correct, but based on proposing a different model that leads to problems and claiming that the conclusions should be based on their model and actually rejected, because their model has circularity.

Acknowledging CMIP5’s purpose in essence is to facilitate the understanding of complex climate feedbacks, an undeniably international concern, and in doing so naturally has sacrificed the adversarial rigor that we hope is apparent and appreciated by all this forum.

In light of the above I respectfully as the following:

1) Do the authors acknowledge the statistical uncertainties that have been brought to light and agree or disagree their approach be recommended for future work?

2) Do the authors acknowledge that their work cannot be reported as to be as having tested all CMIP5 models as reported by Science News Daily and perhaps others?

3) Can the authors provide their explanation of the approach in their selection of 36 of the 56 coupled models and the identity of the one model not listed in Nature, along with their selection of which forcings derived in 2013 that they chose not to use and which forcing they chose to derive fresh, and any other selective actions not published?

4) Do the authors agree the claim that this paper proved: “ Climatologists have been fairly correct with their predictions” as reported Science News Daily and others?

5) Does the moderator or others have a view as to whether the an author’s response is warranted?

The problem here is statistical in nature. It has nothing to do with physics or which variables are external or internal or when they are observable or what drives what. If there are flaws in the way the data reflects the “physics”, this is a different situation which should have been dealt with before ever submitting the publication.

The authors have provided a data set and a statistical model which underlies the data and the relationships between those variables and the analysis of that data is done within the context of the statistical model. From this juncture on, it is purely a mathematical and a statistical problem.

The basic relationship in this model is given by the equation
ΔT = a + b ΔF + c α + d κ + ε

It contains several unknown parameters and a variable ε (usually termed the “error”) which accounts for the “random” variation in the model.

The intent of the analysis is to determine how much of the variable ΔT can be accounted for by a given set of other variables. In order to do this, the unknown parameters need to be estimated along with estimates of the values of ε. The authors have chosen to use Least Squares methodology to do this.

The starting point for this analysis was an error sum of squares. Its format is not an arbitrary choice by the authors, but rather based on certain optimal properties of the solution within the model structure.

SSE = ∑ε^2 = ∑[ΔT –(a + b ΔF + c α + d κ)] ^2

This quantity is minimized with respect to the unknown parameters a, b, c and d. From these we can estimate the values of ε and calculate the predicted values of ΔT along with the residuals = Observed( ΔT) – Predicted(ΔT).

Now, it turns out that in this data set, there is an identity relating three of the variables: ΔF = α ΔT + ΔN. If we substitute this identity into the above SSE, and rearrange terms, we get

SSE = ∑ε^2 = ∑[(1 – b α) ΔT – (a + b ΔN + c α + d κ)]^2

This is not the sum of squares of a “new model”. It is exactly the same SS as that above with exactly the same unknown parameters and exactly the same ε’s and exactly the same relationships between variables in the data set . Describing it as “a different model invented by the critics” indicates a lack of understanding of statistical models and of the mechanics of least squares methodology in general.

Since the two sums of squares are just two representations of the same equation, the following principle seems to be quite evident. If the presence of the hidden relationship between ΔT and ΔF in the data has no effect, then minimizing the latter SS must produce the same estimates of the unknown parameters and ε’s as the former.

The two minimizations do not produce the same results. In particular, the residuals for the latter SS are now dependent on the individual climate model’s α: res = ε’/( 1 – b’ α) where the ‘ denotes an estimated value. This clearly indicates that there is a systematic effect on the residuals due to α which is not been accounted for in the equation coefficient c. You will also note that in this case, the residuals in fact are not the same as the estimated errors terms.

Roman,
Three questions.
1. Does the linear regression used by Forster et al. (2013) guarantee that dF=dN + adT exactly. Your substitution would seem to require this to be the case?
2. If you move the term (1-b alpha) back to the RHS of the form that you regard as equivalent, does the RHS respresent a reasonable form for how external forcings plus internal variability will drive increases in temperature?
3. The actual goal of M&F was to determine how much of the spread in model trends could be due to the spread in model forcings and how much was due to something else (internal variability). How does your modified form address this question?

The assumption of M&F is that ΔF for each model run is obtained from ΔN and ΔT of that model run. All these values come from the CMIP5 database. They do not vary during the determination of the model. There’s explicitly no feedback.

When the model has been determined it’s taken to be a model that links ΔF to the estimates of ΔT.

No analysis is done related to the values of ΔN any more. ΔN is not part of the further analysis, and it cannot be part of a feedback equation.

The TOA imbalance is almost identical to the net energy flux into the ocean, because the heat capacity of the atmosphere is small. The net heat flux into the ocean varies rather strongly due to the El Niño -La Nina variability and other forms of variability that are present also in the models. Therefore N is not very stable. F is expected to be more stable. That’s possible, because surface temperatures vary due to the same processes that cause N to vary.

Whether the values of calculated from the formula used F are, indeed, more stable that the values of N can be checked. The authors write in their response

Not correcting for the increased back radiation would, on physical grounds, imply using N, which contains the very contribution from the surface response T that we must eliminate in our estimate of F.

The paper Forster et al (2013) contains timeseries of the forcing obtained by this approach, but not those of TOA imbalance to compare with.

RomanM,
The model is different, because you assume that the new ΔT obtained from the model should be used to determine ΔF, while the M&F assumption is that the estimate of ΔF given by the original data from CMIP5 data base gives the result that should be used at every later step.

Which of the two alternatives is the correct choice is an issue of physics, not of statistics.

Apologies for butting in, but I don’t think he’s doing anything of the sort. You wrote further upstreams that the following formula for the residuals is “equivalent” to that used in the paper

e = (1 – bα)ΔT – a – b ΔN – c α – d κ.

But this is not fit for simple use with linear regression, which is what they claim to use in the paper. Roman has explained this already. For myself I would say that dT can not be written as a linear combination of the parameters and use another tool.
If the above equation is not valid, then we’re in a different situation altogether. In my humble opinion it’s approaching the point where input from the authors is needed.

Although there is disagreement on whether one can excuse avoidance of statistical orthodoxy, as I believe some are saying, by the circumstance of the physics being represented, I think it is universally agreed to be important that the physics be accurately represented mathematically. It has been pointed out by Greg Goodman at http://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-warming-differences/#comment-751840 that the feedbacks in the models are not linear relationships. The ocean surface heat flux is known to oscillate as well as to become less responsive as temperature equilibrium with the air is approached. In addition, the change in TOA imbalance (TOAI) is itself non-linear as shown in the NASA graph at the link. Indeed this complex relationship made the forcing’s derivation a difficult task as the authors stated it is. And, it is another point of uncertainty as to whether all the model assumptions and the author’s interpretations were correct.

Do the author’s not agree that the models, as complex as they have become, do not approach nature’s complexity yet to be modeled? In fact, the authors conclude that this unknown portion, dubbed “natural variability,” is dominant over the models in the 15-year period. But isn’t it true that the models have been constructed with 56 groups of guesses based on trying to duplicate the behavior of GST over the past 150 years? And, as Nic Stokes pointed out, since most of the models have the ocean oscillations out of phase with each other the result is basically a linear guess of forcing mixed with artificially generated amplitude of noise? Are we to understand that the purpose of this paper and all the work being done to analyze is validity, in the end, is to see whether in fact the models have had enough noise added or if they need more?

In Forster et al. (2013), F13, there is data for 23 CMIP5 climate models. In Table 1 we find data for the climate feedback parameter α and the ocean heat uptake efficency κ. In Table 2, first column, we find data for the historical change in adjusted radiative forcing ΔF for the period 1860-2003 for each of the models. In Table 3 we find the corresponding temperature changes ΔT for the same period, which I refer to as ΔT_cmip5.

I made some calculations, FWIW, in order to compare the results of M&F for 62-year periods. This means that I calculated results for one 144-year period and compared with results for 62-year periods.

At first I calculated ΔT_eb for all 23 models according to the energy balance equation (1) in M&F

ΔT_eb= ΔF/(α+κ).

The internal variability ε according to the terminology of M&F and their equation (3) was calculated as

ε = ΔT_cmip5 – ΔT_eb.

Then I made a multiple linear regression using ΔT_eb as temperature data with the following regression model:

ΔT_eb=b0+b1 ΔF + b2 κ + b3 α

The regression result is

ΔT_eb,hat = 0.77 + 0.56 ΔF – 0.37 κ – 0.47 α with R2=0.98.

The unit of ΔT_eb,hat is K and its value refers to the temperature rise from 1860 to 2003. This must be remembered in comparing with the corresponding values for the 62-year periods of M&F given as trends with units K/decade.

This equation shows that a change in climate feedback parameter α in the climate model ensemble from its highest value of 1.79 W/(m2 K) to its lowest value of 0.63, corresponds to -0.47 (0.63-1.79)=0.55 K higher temperature 2003. A change in ΔF from lowest to highest value in W/m2 gives a temperature increase of 0.56 (2.5-0.8) = 0.95 K.

That result suggests that a too low climate feedback parameter, corresponding to a too high equilibrium climate sensitivity, may have a considerable effect. Both forcing and climate sensitivity seems to be important for how much the temperature change may be over the whole industrial period. In 62-year periods ending around 2003 the main part of the GHG-driven anthropogenic temperature rise is assumed to have occurred, so a similar pattern should perhaps be expected.

For further studying the spread in the ΔT-values, standard deviations of ε, 0.56 ΔF, 0.37 κ and 0.47 α were calculated as measures of the spread in ΔT_cmip5 due to natural variability, adjusted radiative forcing ΔF, ocean heat uptake efficency κ and climate feedback parameter α.

The calculated standard deviations for spread in ΔT_cmip5 with all effects and in ΔT_cmip5 due to ε, ΔF, κ and α are equal to

0.37; 0.19; 0.31; 0.09; 0.15;

This result seems to me to give a different pattern than from the 62-years results of M&F. ΔF, κ and α seem to result in more spread in relation to ε than in M&F for 62 year periods ending around 2003. Also κ and α seem to result in more spread in relation to ΔF than in M&F. Note again that only ratios of two spreads may be compared because M&F use other units.

I have done some further calculations with respect to the Nature paper by Marotzke and Forster. This was done in order to hopefully gain some better understanding about the strengths and weaknesses of the results by M&F, which have received considerable attention.

Here I present calculations of the maximal deterministic temperature changes in the CMIP5 ensemble, analyzed by Forster et al. (2013), F13, due to variations in α, κ and ΔF. M&F define the deterministic temperature change as the temperature change given by the energy balance equation (1) in their paper:

ΔT_eb= ΔF/(α+κ)

A description of the relevant data in F13 and the justification for using them is found in this comment.

I have used four different methods for the calculations.

First method:

I used directly the energy balance equation ΔT_eb= ΔF/(α+κ). First I calculated the ensemble means ΔFe, αe and κe. Then the maximal change due to α was calculated as follows, using the minimal and the maximal values α_min and α_max from the climate model ensemble data.

ΔT_eb[maxch, α]=ΔFe/(α_min+κe)-ΔFe/(α_max+κe).

Corresponding calculations were carried out for ΔF and κ. The maximal temperature changes due to α, κ and ΔF became:
0.52; 0.44; 0.87 K

Second method:

Use of the linearized equation introduced by M&F:

ΔT_eb=ΔFe/(αe+κe) + ΔF/(αe+κe) – ΔFe/(αe+κe)^2 α – ΔFe/(αe+κe)^2 κ.

After inserting the numerical values we find:

ΔT_eb = 0.88 + 0.51 ΔF – 0.45 α – 0.45 κ.

The maximal temperature change for α is calculated as 0.45 (α_max- α_min) and correspondingly for ΔF and κ. The maximal temperature changes due to α, κ and ΔF became:
0.52; 0.48; 0.87 K

Third method:

I made a linear regression using the values of ΔT_eb as described in this comment:

Regarding the choice of method for calculating the deterministic variations it seems to work well with the first method that is also easy to apply. However the second method also works well and the results show that the temperature variations are almost linear over the range of changes in the independent variables.

Linear regression seems to complicate the calculations. Anyway the third method gives reasonable results with not so much deviation from the two first methods. However, the fourth method gives less accurate results with more deviations. It is not completely clear what linear regression method M&F have been using.

As to the results, the two first methods suggest that the variations in the deterministic temperature due to α, κ and ΔF are of similar magnitude. The variations due to ΔF are greater than the other ones but at most with a doubled temperature change. This seems to differ from the pattern of the results in M&F.

The linear regression according to the third method gave similar results. Furthermore the regression results shows that all regression parameters formally have significant values, including the coefficients before α and κ (I say formally since there is a considerable error in ΔF and the evaluation of significance seems to be questionable). In the fourth case the less accurate regression gave results deviating more from the two first methods and the parameters for α and κ were not formally significant.

Thus, the choice of calculation method seems to be important for the possibility to draw conclusions from the results. In the worst case no formally significant temperature variations due to α and κ could be detected.

Pehr Björnbom says: “What if all the models have similar values of α and κ and that those values all correspond to greater climate sensitivities than in the real climate system? How would this influence the regression results with the linear regression model used:”

This is exactly the point I made above. The authors do not demonstrate that thier method is capable of detecting the influence of α and κ on any scale.

They use an untested “innovative” method and simply assume that it works and that it is capable of detecting the influence of the relative narrow range of α and κ in th CMIP5 selection in the presence considerable noise and further errors induced by the multiple linearisation assumptions.

Any new method must be tested before it is used and conclusions drawn.

It would be good to see them address that omission. Until they do so their study has no objective value.

It is also worth noting that the sliding “trend” analysis they are doing is actually a running mean of the rate of change. In fact it is applying two different low-pass filters to the rate of change of temperature.

In the case of the 62y running mean, this seems to be adequate. However, the running mean is a particularly poor filter that can introduce serious distortion into the result due to its inverting part of the data.

Looking at the 1992 spike in their ensemble mean in figure 2a it is obvious that there is a lot variability remaining that is faster than the 15y filter they have applied .

This is clear evidence of the distortions typically produced by a running mean filter. This will be adding spurious noise to both model and HadCRUT “trends” that will be further degrading their results.

I don’t see obvious distortion in the 62y graphs ( whether it matters depends upon the frequency content of the data. ) so the running mean is probably an acceptable choice in that case. I would strongly suggest they choose a less distorting filter such as a gaussian or Lanczos for the 15y analysis.

Some of the differences they find between their 15y and 62y analyses are likely due to the poor choice of filter.

A first step is to recognise explicitly that what is being done is a study of low-pass filtered rate of change in temperature. The need to choose an appropriate filter then becomes self evident.

All is this notwithstanding, they need above all else to validate their method.

The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems not to be completely unfounded.

That is my take-home message from the Nature paper by Marotzke and Forster (2015) combined with the discussion here and on Climate Audit. My way of reasoning follows in summary.

Only a few model simulations show a temperature trend as low as observations during the last 15 years. Models with a low value of the climate feedback parameter α have a tendency to have lower values of the ERF trend. With a value of the ERF trend according to AR5 such a model should show an even higher temperature trend. On the other hand models with high values of α have a tendency to have higher values of the ERF trend. With an average value of the ERF trend from AR5 such a model should show an even lower temperature trend.

Thus using the ERF trend from AR5 for the last 15 years should give a still wider spread of temperature trends, with some even higher but some more even lower than the observed trend.

This suggests that there are likely several models that deviate from observations due to high climate sensitivity. This agrees with the evaluation of climate models by Stott et al. (2013) and with studies of climate sensitivity from observations, for example Skeie et al. (2014)

One thing the authors seem to have overlooked in their own graphs is the “predictions from regression” plot shown in figure 3b. There is a clear bifurcation into two groups at the end of the graph. The two groups are separated by white space indicating an interval of clear separation. They fail to report or diagnose this separation which is likely to be due to model sensitivity.

They discuss the “difference” between models and HadCRUT4 but never actually plot it. Leaving the reader to visually compare the two overlaid.

Greg Goodman and others have shown with straight forward methods demonstrating that models with higher TCS also show the most aberration from observed and recorded historical GMST, on the whole, as well as from the last 15-years. As Greg observed, a simple magnification of the author’s Fig. 3 b graphic also shows bifurcation of the high TCS models from the low, invalidation the conclusion: [The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded.] unless by “systematically” the author’s mean without exception. But, even that was not proved by this paper.

This paper is an affront to the long-held scientific gold standard that demonstration of power to predict is the only acceptable method for validation. Predicting the past or model simulation, no matter how innovative the method, is not the same as predicting the less corruptible and tunable future.

In one fell sweep the authors have announced validation of the models to the world press and attempted to push back the time frame for perhaps anticipated invalidation. Whether variability is “natural” or unforced has no bearing on predictability. It is not like locating an electron. Progress continues to be made in understanding Pacific Decedal Oscillation, ENSO and glaciation cycles. With these advances validation time as well as model simulation error bars will surely be tightened. Perhaps the “differences between simulated and observed trends are dominated by random internal variability over the shorter timescale…,” of15 years. This small (not widely contested) observation pertaining to CMIP5 little necessitates a global newsblast from the Max Plank Institute: “Global warming slowdown: No systematic errors in climate models.”

Since even lay-people assume science is supposed to be validated by empirical predictions it’s no wonder science reporters assumed such had been done from reading the MPI headline: “Skeptics who still doubt anthropogenic climate change have now been stripped of one of their last-ditch arguments.” Did the author’s write this? Did Max Plank I. allow Dr. Marotzke to review it? Surely they knew that millions of people would interpret that as the models tracking right on course.

This type of communication seems to be exactly what the late Richard Feynman spoke so passionately against. It is also of note that the paper is written to leave the impression that the values from obtained from the CMIP5 archive were the modelers’ or of other independent source rather than the author’s own work unless the reader followed to a footnote and researched.

The models never created information that could be used to test themselves with their own output or observed record since that was used, though indirectly, as input. My earlier comments about data selectivity I now find irrelevant as I find the paper’s foundation to be circular. Also, the argument about whether the First Law corrects circularity is moot; all values were derived from one variable (the time component being removed), delta T, insuring closure from identity. For the same reason I find Nic Lewis’s circular substitution and other arguments moot. Regardless of criticisms by Lewis and others the author’s attack above on Nic Lewis’s previous work is irreverent to the present paper and thus was uncalled for.

Lets just concentrate on the data rather than argue semantics. I have repeated the trend analysis for 15y and for 60y trends for Hadcrut4 . I then compared this to what would be expected by an underlying AGW term ~2.5ln(C/C0) and a PDO/AMO term (see here) together with CMIP5 model trends taken from your Extended Data figure 2. I am now downloading all CMIP5 runs to do this trend analysis properly

While it is true that the AGW emerges from the 60y trend, the result are not really compatible with CMIP5 models after a start date of 1935. I am convinced that the current hiatus in warming is due to a downturn in the net AMO/PDO since 1998. This implies that models really are running too hot and underlying AGW has a TCR of ~1.7C