Wednesday, June 19, 2013

This is a follow-up to this earlier post, which please see for details. I had got into some difficulty there with using the R function nlm() to estimate both the regression parameters and the delay coefficients for each of the exogenous variables Vol, Sol and ENSO. The solar variable, which interacts most weakly, was apt to be assigned zero or negative delay, which created constant or exponentially rising secular processes, which were used by the fit.

I could avoid this by constraining that parameter. But I think it is better to do as others have done and use a common delay for all three. There is reasonable physical justification for that, and it reduces overfitting.

The result is a much more stable trend pattern across the time intervals and data sets. The trends since 1997 are now mostly between 0.65 °C/century and 1.325. This might still be seen as a slowdown, but surely a minor one. Oddly the only exception is the case studied by SteveF, Hadcrut 4 with linear from 1950. The trend I got there was 0.117°C/cen, I think quite similar to his, as was the decay coefficient at 0.026 (cf his 0.031).

I have not normalised the units, so the actual magnnitudes of the regression coefficients are not easy to interpret. I'd still discount the 1979 quadratic, although no problems are obvious.

Start

Trend

1

Vol

Sol

ENSO

t

t^2

Delay

SS

HADCRUT4

1950

0.117

2e-04

-8.39135

-3e-05

0.42995

0.25141

NA

0.02584

10.079

HADCRUT4

1950

1.024

-0.07304

-2.27873

0.00048

0.14844

0.30742

0.21653

0.12729

8.254

HADCRUT4

1979

1.086

-0.01689

-2.72106

0.00069

0.13331

0.50485

NA

0.11706

3.909

HADCRUT4

1979

1.078

-0.00963

-2.82278

0.00069

0.13478

0.50155

-0.07812

0.11325

3.893

GISS

1950

0.787

-0.00116

-6.00112

5e-05

0.24419

0.34476

NA

0.04173

10.441

GISS

1950

1.246

-0.04836

-2.70426

0.00042

0.1288

0.37122

0.14273

0.11017

9.626

GISS

1979

1.325

-0.01623

-3.28644

0.00065

0.1334

0.49335

NA

0.09649

5.086

GISS

1979

1.315

-0.01142

-3.38601

0.00064

0.13513

0.48994

-0.05241

0.09342

5.079

NOAA

1950

0.65

-0.00121

-4.42051

0.00017

0.19036

0.33218

NA

0.06223

8.103

NOAA

1950

0.877

-0.04486

-2.39086

0.00042

0.13126

0.34807

0.13221

0.118

7.351

NOAA

1979

0.929

-0.01586

-2.81475

0.00059

0.12721

0.47137

NA

0.10492

3.581

NOAA

1979

0.913

-0.00629

-2.99252

0.00058

0.13034

0.46551

-0.10409

0.09849

3.554

Images

The images may be scanned in the viewer below. There are 24, and you can flip through them using the top buttons. But you can also subselect using the selection boxes. For example, if you choose GISS, you will then cycle through just the 8 GISS plots. If you ask as well for plot type components, you will cycle through the four component plots. etc.

Data

Start Year

Regressor

Plot type

SteveF has had difficulty getting comments through the system - I'm trying to find out why. Anyway, he sent by email these comments, and I'll just follow with a few points I made in reply - hope we can get comments working for him soon:SteveF says:I think I may have identified why the influence of the solar cycle increases in the 1979 to present analyses compared to the 1950 to present analyses: there is a coincidental congruence between the 1983 and 1991 volcanoes and the peak (or near peak) of the solar cycle. That is, the downward part of the solar cycle is aliased with the volcanic influence. In the longer series, this influence is diluted in the regression, though for certain is still influencing the results. The 1964 to 1970 eruptions, while weaker, do not coincide with the peak of the solar cycle. I have tried several things to straighten this out, so far without a lot of success. I will next try a regression from 1950 to 1975 to see if the solar cycle influence falls to almost nothing (or even negative!) as I suspect it will. Substituting a regression specified quadratic or cubic secular function.
I think a defensible argument is, absent a good physical rational, that the best approach is to simply sum the two radiative terms (both with watts/M^2 units) and see what happens to the diagnosed "optimal lag". Any regression that reports a physically implausible lag constant for best fit seems to me very suspect. There have been multiple published reports with estimated lags for volcanoes; a tau value near 30-36 months for the decay of the response seems to be a common, though the exact value depends weakly on assumed climate sensitivity value (I have independently verified this is correct). Perhaps the best thing is to just try a few lags in the credible range and see what the best fit regression constants turn out to be. I also note that very much higher solar than volcanic response (on a watt/M^2 basis) is exactly the opposite of what I would expect on physical grounds; the slower solar cycle response ought to be lower, on a degrees/watt/M^2 basis than the volcanic, because the 11 year solar cycle 'sees' slower responding (deeper) parts of the oceans more than the shorter volcanic forcing.
If one accepts a large discrepancy between the measured changes in solar intensity over the cycle and the size of the temperature response, then some kind of 'amplification' of the solar cycle must be responsible (eg. more low clouds at the minimum), and evidence for that seems lacking. Of course, even if one assumes the true solar cycle forcing (in watts/M^2) is far higher than the measured changes in solar intensity, that doesn't mean there should not be considerable lag in the temperature response.
Finally, I tried a linear secular trend starting in 1964 (to include the earlier volcanoes); the fit is improved compared to starting in 1950 and the discrepancies between model and Hadley global temperatures is much reduced. Perhaps you could try that as well.
Nick says:
I can well imagine that solar cycles might get tangled with volcanoes. My experience was the the solar cycle, being weak, could be pushed around a lot by the optimisation process - both the linear and the lag fitting. It had a strong tendency to drift into zero or negative lag, which nlm() took advantage of to create spurious fitting functions. For the same reasons, I don't take too much notice of the amplitude that emerges from regression. It could be just taking up a bit of the volcano signal.My main concern at the moment is with the use of Nina34. I think it's a sensitive index, but also includes warming trend, and when you adjust with it, it takes out that part. I'm inclined now to switch to SOI, which may not be as good, but can't drift. A practical alternative is to detrend Nina34, but harder to justify, and the detrend would depend on the interval.I'd be happy to try a 1964 start. Another option is to use GHG forcing as one of the regressor functions.

58 comments:

Nick, looks more stable. Obviously having same delay makes physical sense if all these are simplistically viewed as equivalent radiative forcings.

I don't know if you're still watching Steve's thread on Lucia. I have been discussing the implications of the analytical solution to the linear model IDE with Paul_K and picked a small but very important error.

The result confirms something that I have been pushing for a few years now, that this sort of fitting ( at least for short term series) should be fitted to dT/dt and NOT the time series T(t).

That is quite important obviously but I think it is rigorous in relation to the linear and the solution to the differential equation.

Check it out and see if you think it makes sense.

Basically, it is dT/dt that is a power variable in terms of physical units. It is incorrect to try to fit T (which is energy not power) to a power forcing. The units just don't work.

If you look at the thread I explain why the short term response should be fit to dT/dt in terms of the equations too.

The fit of dT/dt instead of T makes sense for me. The units are more consistent, and the equation based in a simple energy conservation principle is sound. dT/dt=[Pin-Pout]/m/c, with Pin and Pout power and m and c mass and specific heat capacity. It seems reasonable to me, I agree with you.

BTW would you like to post some R code of how the do the fit. I would not mind fiddling but I can't find the two days it will need to decrypt the doc and arcane hyroglyphics needed to define a model to fit in R.

Greg,It's finished the HADCRUT sequence, and then for some reason nlm() has failed to converge with GISS. It's right as far as it goes - the numbers printed are the SS. I've checked that it is the right data file. So I'm puzzled. Our R versions are the same.

Strange. I ran what I think is identical code and data, and indeed for HADCRUT the SS was identical, but it completed. If you want to get it to complete with just HAD, go to line 54for(J in 1:3){#datasetand just change the 3 to a 1. Orfor(J in c(1,3)){#datasetshould skip GISS and go to NOAA.

Greg,I'm glad it worked - I'm still not clear why it failed, but anyway.

The line y=yy[n,J]takes the temp from the array. You can difference at this stage (add a line following):y=diff(c(NA,y))The initial NA is to keep the length of vector right; if it gives trouble, substitute 0.

Where I'm hoping to go with all this is to combine the two, since the correct linear solution (as I'm discussing at Lucia's) includes both. A way to do the multivariate fit to both _may_ actually fit post 2000 without all the fudging.

To do it properly requires another method since the ration between in-phase response that you are fitting already and the dT/dt response I'm trying to do is not fixed. The ration is function of frequency.

However, if we "filter" the combined response and simply things we can say LF dominates the in-phase and HF dominate dT/dt

ie we need to fit c1*T(t)+c2*(dT/dt)

instead of the more correct 1/f weighting we approximate an average for each frequency band.

Now I think that should be possible with simple changes to what you have already done.

This should remove the need to "spread" the forcing terms and probably all the detrending as well. If there is a long term multi-century upward trend (and I think there is) that should be added as an explicit variable and not used as a pretext for removing variability from the time-series, much of which may be more properly attributable to the forcing variables we are trying to fit in the first place.

If I was capable of parsing R code I would have done this already. Unfortunately R is alien to me and I don't have the time to invest in that particularly steep learning curve.

I'm attacking this problem from a different angle using tools I already understand. Though I think adapting this method would be both simple to do and enlightening.

Perhaps forget "where it's going" for the moment . There's nothing about co2 in what I linked there. It's purely about the maths.

The point of what I've posted is to recognise what the correct _full_ response to the linear feedback model is. That is fundamental to what you are doing here because you are dong a linear regression of the supposed radiative forcings on T(t). That implies a implicit assumption that the response to the forcing is in phase with the forcing. It is not.

I've shown is that this is not the case, there are both in-phase and orthogonal (time derivative) responses. The latter dominates HF, the former LF. So determining the cross-over frequency/periodicity it an essential first step.

If you can then show that the primary time constant of the system is such that you can ignore on or other of these terms for that data being considered, then a linear regression on either T(t) or dT/dt may be proposed.

Your reply above shows that you have not taken the time to read and understand what I posted so I suggest that you take the time to digest it.

This is not your error, it's just one you have picked up from loads of published climate science including the F&R.

It seems to me that just about everyone is making the mistake that a "linear" model must have a linear response and then rush off to do the regression fit.

The key point is that it is a linear feedback model , not a linear model. and linear feedbacks have the response that Paul_K provided that is far from linear.

ωt=1 is the cross-over between the two regimes where the two coeffs are equal so the first requirement is to asses what tau is for the system and to asses whether the data being considered is sufficiently devoid of frequencies on either side of that divide for part of the response equation can be ignored.

If you see some fundamental flaw with that I'd be interested to know where I'm going wrong.

I think you need to be careful about this use of the term lagged. I was going to pick Steve up on this in his original post but thought it may be seen as pedantic knit-picking. It is not and I now regret not having corrected it.

Firstly what you are doing is a convolution . This 'spreads' the duration of the various forcing producing a later peak. But it does NOT lag the data.

Also we need to clearly state what kind of lag we are referring to when it is truly a lag. A fixed temporal offset, or time lag, is fundamentally different to a phase lag.

Now it may be that the convolution is providing the third term of the response equation (though I am just guessing that for the moment, if you can comment on the maths of that equivalence please do ).

It seems that you are still lacking the derivative of the forcing which is an inescapable part of what you should be regressing.

"Now it may be that the convolution is providing the third term of the response equation (though I am just guessing that for the moment, if you can comment on the maths of that equivalence please do )."

Greg, you have this the wrong way around. The exponential smoothing is exactly the solution of your linear equation, by Paul_K's integrating factor method or whatever. And yes, it is pedantic nit-picking to fuss about whether it is a lag. You have chosen to try to represent it via Fourier Transform and phases; you have to deal with the resulting problems. Your opinions are stronger than your technical skills.

There's absolutely no reason to include the derivative of the forcing in the regression.

Yes, that seems to check out with the Laplace transform of a negative feedback. So if you're applying the convolution theorem to find the response function why don't you say so clearly instead of this crap about introducing a "lag". Hardly pedantic is it.

That looks dimensionally correct and since we are applying the same summation to all terms I don't see the objection to the third term.

This requires that FT is defined so is not fully generalised solution. But since all this is done with detrended inputs anyway that does not add any further conditions.

They may be equivalent but this is not obvious. By linearity considerations, summing the solutions for all the individual Fourier components should give a valid solution. If they're different I'd like to know why and find out there's an error.

The forcing is the first summation , its derivative is the second, the third is the transient response which can be got by applying the same FT coeffs as applied to the sine, cos terms to omega itself.

The transient is dominated by HF and is damped by the exp decay. I don't know if there is a more intuitive way to express FT (ω) sum . I agree with Telescope that it looked a bit like I was adding the FFT to the time series that way I originally wrote it but in fact it is weighting the transient elements with the freq resp of the system , which is quite proper.

So you see that, with the proviso of a stationary time series for which we can define the FT, the response = forcing + tau * d/dt forcing + exp transient.

If dominant tau is about 3.5 or so for climate you can see why I wanted to regress d/dt(forcing)

Now unless there's a slip up in the maths the two solutions should be equivalent for inputs where FT is definable. It's just that with this formulation we clearly see the presence of df/ft whereas in the convolution is not apparent as such.

In fact by 'lagging' the input with the convolution as per Laplace solution you are effectively introducing all the cos terms to achieve the phase lag , it's just not obvious that is what is happening.

Hey , I'm not differentiating a forcing , I'm solving the ODE and observing the presence of the derivative.

Calling it "hopelessly muddled" without any reasoned objection is pretty weak. You may as well say " I don't like it".

I've laid out the maths pretty clearly. Can you see a mathematical error in breaking down F(t) by Fourier analysis, then using linearity to derive a solution to that equation from the individual solutions for each F component using the formal solution for sin(wt) ?

I'm open to the idea that I may have made an error but calling it "muddled" does not cut much ice.

You seem to competent at maths, and it's not hard to follow. If there is an error, point to it.

Having identified tau as being,not just some arbitrary "lag" fitting parameter to "smooth" the input, but as the time constant constant of climate system, leads to an important implication of your results that you have missed.

Your regression fitted values of tau range between 0.25 and 0.5 whereas tau of climate is reckoned to be between 3 and 4.5 years.

That would seem to have two interpretations.

1. Current estimations of climate and model tau are off by an order of magnitude in which case you need to discuss the implications for CS which is inherently tied to tau.

2. The fit produced by this method is spurious. If tau is off by an order of magnitude what credibility can we give to the fit results in general and specifically the idea that it accounts for the recent 'plateau'.

In view of the fact that there are none of these inputs which could account for the 60y cycle and your de-trending does not attempt to remove it either I know where I'd put my money.

Greg,I do not claim that this fitting is a good way of identifying the "tau of climate" (there isn't such a thing anyway - there are many time scales). It's a regression - it explains variance. Other values of tau might explain nearly as much. Exponentials are far from orthogonal.

I can't tell what you've done in your previous comment. You have to be careful with transforming the derivative. But the fourier transform tapers slowly with high freq - you're convolving with a cut-off exponential, which has 1/ω behaviour.

The process of regression does not 'explain' anything . That is what you have to attempt to do with the model you chose to fit. You are fitting what you say corresponds to a linear feedback response model with in single feedback. At the same time you say "there isn't such a thing anyway - there are many time scales".

It seems clear that you don't know what it shows or why you are fitting this particular model, but you still try to interpret its results to determine what the true underlying rate of change is for a certain period.

If the tau you are fitting is not the tau of the climate response what it is and why are you fitting it.

OK, I've run this with some test data. A synthetic time series that looks "something" like global temps.

Using that as the input "forcing" , I've calculated the response by both addition of Fourier cmpts and the Laplace convolution.

They are identical.

http://climategrog.wordpress.com/?attachment_id=402

I used adjacent months for the diff so there's a 0.5m offset, I left this slight offset so we can see both. Using [-1,0.+1] for the diff would make it centred.

Here you can see that the in-phase component is dominated by the 60y cycle, the orthogonal by the 9y cycle, in line with what I said originally.

If I have time later I may try to prove this formally by showing the kernels are equivalent.

This way of viewing it gives more insight and may (finally) help you see the importance of the derivative term. It is interesting to see it as the weighted sum of forcing and its derivative , both passed through a low-pass filter.

The exp term is just a spin up but gives a result for the early period that is not possible with the Laplace convolution.

The input must be stationary for this work accurately any linear trends or periods longer than the data need to be removed. The Laplace method is fully generalised for all inputs.

I'm not sure I can see a reason to prefer this method for fitting but having shown they are equivalent it gives a lot more insight into the relationship between input and output.

Greg,The convolution is in the function res(), which is what nlm operates on. It uses the recurrence relation of SteveF's original post. The lines are: k1=abs(b[2+M]); k2=1-k1; o=length(b)>10; for(i in n[-1])r[i,]=k2*r[i-1,]+k1*s[i,]s[] is the Mx3 matrix of the forcing series.

Nick, the point is the formula is wrong . If you let the regression 'fix' it I suppose you could then scale back the fitted value by the sensitivity.(K/W/m2)

Anyway,it's all rather academic since the whole exercise is compromised until you take account of the the 60 year cycle. Since you don't have a regression variable that can account for it you need to remove it in the detrending stage. Either by fitting a 60y cosine or by using a cubic rather than a quadratic that could at least follow the general form. Extending the current code to do a cubic should be a doddle.

If you pretend it is not there you are trying to simulate it , incorrectly, with the regression variables that you do have. The result is necessarily spurious.

BTW if you use a cubic, you will probably totally remove the down turn and end up with a pretty constant rise. That is the claimed object of the exercise.

I don't know whether this will bring your fitted tau more into line with acceptable values, it may help.

I still think the exercise if beggared by the 9y cycle Curry & BEST published on recently and which is a strong feature in many SST records. This will almost certainly end up spuriously attributed to the two volcanoes in that short record. That is the real reason why Grant "Tamino" Foster restricted his analysis to post-79.

Have a look at my 9/22/60y model below. It really was just a lash-up just to have a some test data, I'm surprised how well it works with the 'lag'/filter provided by the linear feedback.