Comparing CMIP5 & observations

[Last updated: 21st February 2018]

This page is an ongoing effort to compare observations of global temperature with CMIP5 simulations assessed by the IPCC 5th Assessment Report. The first two figures below are updated versions of Figure 11.25a,b from IPCC AR5 which were originally produced in mid-2013.

The first panel shows the raw ‘spaghetti’ projections, with different observational datasets in black and the different emission scenarios (RCPs) shown in colours. The simulation data uses spatially complete coverage of surface air temperature.

The second panel shows the AR5 assessment for global temperatures in the 2016-2035 period. The HadCRUT4.6 observations are shown in black with their 5-95% uncertainty. Several other observational datasets are shown in blue. The light grey shading shows the CMIP5 5-95% range for historical (pre-2005) & all future forcing pathways (RCPs, post-2005); the grey lines show the min-max range. The dark grey shading shows the projections using a 2006-2012 reference period. The red hatching shows the IPCC AR5 indicative likely (>66%) range for the 2016-2035 period.

The observations for 2016-7 fall near, or just above, the top of the ‘likely’ range depending on the dataset. 2016 was warmed slightly by the El Nino event in the Pacific. The years 2015-2017 were all more than 1°C above a 1850-1900 (pseudo-pre-industrial) baseline.

One interesting question to consider is: given post-2012 temperature data and scientific studies which have appeared after AR5, would the overall assessment (black bar & red hatching) be changed? In AR5 the assessment was that temperatures would be 0.3-0.7K above 1986-2005 for the 2016-2035 period average (as shown by the black bar). My personal view is that the upper limit would remain the same, but there would be an argument for raising the lower boundary to 0.4K, mainly because of improved understanding of the effect of missing temperature data in the rapidly-warming Arctic.

There are several possible explanations for why the earlier observations are at the lower end of the CMIP5 range. First, there is internal climate variability, which can cause temperatures to temporarily rise faster or slower than expected. Second, the radiative forcings used after 2005 are from the RCPs, rather than as observed. Given that there have been some small volcanic eruptions and a dip in solar activity, this has likely caused some of the apparent discrepancy. Third, the real world may have a climate sensitivity towards the lower end of the CMIP5 range. Next, the exact position of the observations within the CMIP5 range depends slightly on the reference period chosen. Lastly, this is not an apples-with-apples comparison because it is comparing air temperatures everywhere (simulations) with blended and sparse observations of air temperature and sea temperatures. A combination of some of these factors is likely responsible.

In addition, the figure below updates Fig. 1.4 from IPCC AR5, which compares projections from previous IPCC Assessment Reports with subsequent observations. The HadCRUT4.4 observations from 2013-2015 are added as black squares. Note that previous reports made differing assumptions about future emissions. This figure has not yet been updated to include 2016-7 temperature data.

71 thoughts on “Comparing CMIP5 & observations”

Your update of Fig 11.25 should use the Cowtan and Way uncertainty envelope as this data series is now widely preferred to HadCrut4 for the evaluation of recent temperature trends, for obvious reasons. HadCrut4 should be shown as a dotted line without the uncertainty envelope, to emphasize its deprecation.

I don’t think the Cowtan and Way uncertainty envelope covers all the uncertainties that the HadCRUT4 envelope covers.

They have a reduced coverage uncertainty due to the kriging, but as far as I know (happy to be corrected, I know their data set is in a state of continual improvement), they don’t propagate all the uncertainties in the HadCRUT4 gridded data through their analysis so they’re likely to underestimate the total uncertainty.

Would downloading the global mean from KNMI also give the uncertainty envelopes? I did download KNMI and calculated mean and envelope for RCP4.5. But I had to download all model series to do that. Or am I missing something here?

Perhaps you could post the consolidated CMIP5 data as you provided to Schmidt et al. Or better yet just post the data as charted (i.e. CMIP5 model mean with uncertainty envelopes, assessed likely range).

Also, AR4 has a very good feature where data to reproduce each chart was made available. Is that available for AR5?

Ed,
Please stick to using HadCRUT4. Cowtan & Way are SkS activists and it is far from clear that their GMST timeseries is to be preferred to HadCRUT4 even over shortish timescales. And their co-authorship of a recent paper (Cawley et al. 2015) that multiplied by a factor it should have divided by, thereby wrongly strengthening their argument that TCR had been underestimated in another study, does not inspire confidence in the reliability and impartiality of their temperature dataset.

For a full spatial coverage comparison, I think using NCDC MLOST and UAH and RSS TLT data alongside HadCRUT4 would be preferable. Ideally one would compare the TLT data with model projections having a similar vertical weighting profile – do you know if and such data has been produced for CMIP5 models, at grid-cell or global resolution?

Cowtan & Way are SkS activists
So what? You’re associated with the Global Warming Policy Foundation. Do you think people should discount what you do because of that? I don’t, but it appears that you think they should.

And their co-authorship of a recent paper (Cawley et al. 2015) that multiplied by a factor it should have divided by,
Oh come, that’s pathetic. It was a stupid mistake. For someone who works in this field, it should be patently obvious that Loehle’s calculation was complete and utter nonsense. That Cawley et al. made a silly mistake when discussing what would happen if Loehle had used all external forcings, rather than CO2 only, doesn’t change that Loehle’s calculation was garbage. That you would focus on Cawley et als. silly mistake, while ignoring Loehle’s calculation completely, does you no favours. Maybe people shouldn’t ignore your work because of your association with the GWPF, but should ignore it because of your very obvious and explicit biases. Maybe consider that before harping on about the impartiality of others.

Nic Lewis,
I could hardly be described as an activist. You’re welcome to go through all my postings at SKS and point out instances where I have pushed a particular policy agenda. I am an Inuk researcher from northern Canada who has devoted my life to studying the cryosphere with a side interest in some climate related items – certainly not an activist. I write for skeptical science on issues related to the Polar Regions and my aim is to help better inform people on these areas that are often discussed in the popular media.

Secondly, and more importantly, you have yet to demonstrate any issues with Cowtan and Way (2014) and yet you repeatedly berate it. You have been multiple times asked to defend your views but you have not put forward any sort of credible technical reasoning. Now you have resorted to attacking the credibility of the authors because you’re not capable of attacking from a scientific perspective.

You’re welcome to your opinions, of course, but you will be asked to defend them when they’re not based on fact. Once again – I am very open to having a discussion with you on the technical merits of the CW2014 record. Will you avoid the topic – as you have each and every time I asked for such a discussion?

“For a full spatial coverage comparison, I think using NCDC MLOST and UAH and RSS TLT data alongside HadCRUT4 would be preferable. Ideally one would compare the TLT data with model projections having a similar vertical weighting profile – do you know if and such data has been produced for CMIP5 models, at grid-cell or global resolution?”

Aside from having less coverage, there is also evidence that the GHCN automated homogenization algorithm is downweighting several Arctic stations because they’re warming rapidly in contrast with the boreal cooling over eurasia. This has been verified by some of the people at GHCN who we contacted on the issue.

As for the remote sensing datasets – I am unsure of which record is preferable but when you look at the disagreement between the RSS, UAH and STAR groups it is necessary that more effort is done to reconcile these differences. Secondly, the satellite series will undoubtedly miss some of the Arctic warming because it is characterized as being most intense in the near-surface as a result of ice-feedbacks. This is something that analyses such as by Simmons and Poli (2014) have picked up and this is also an area where I do believe that the use of UAH by our record in the Hybrid approach could underestimate some warming (potentially).

All these issues aside – using CW2014 or BEST with your approach to climate sensitivity raises the numbers by ~10% and you have yet to provide a technical argument why these two records should be excluded.

Yes, indeed. This is not an apples-to-apples comparison to what was published in the AR5 Technical Summary. That one used 4 datasets: HadCRUT4, ECMWF, GISTEMP, and a NOAA dataset. Interestingly, it did not use RSS or UAH. The terrestrial datasets used gridded data to extrapolate temperatures for areas where there is sparse or non-existent data; for example over the oceans. RSS and UAH cover these areas.

” That you would focus on Cawley et als. silly mistake, while ignoring Loehle’s calculation completely, does you no favours.”

Far from ignoring the shortcomings in Loehle’s method, I wrote in my original comment on Cawley et al 2015 at The Blackboard:

“Some of the points Cawley et al. make seem valid criticisms of the paper that it is in response to – Loehle (2014): A minimal model for estimating climate sensitivity (LS14; paywalled). I’m not very keen on the LS14 model for global temperature changes over the instrumental period, on Cawley et al.’s revised version thereof, or on their alternative “minimal” model. They are all cycle-based curve-fitting approaches, without what I would regard as a properly justified physical basis.”

The shortcoming in Loehle’s model have absolutely no relevance to my comment here, as you must surely realise.

[snip]

That all five authors of Cawley et al could overlook such a basic and gross error is very worrying. A combination of comfirmation bias and carelessness seems the most likely explanation to me. As I wrote, that does not inspire confidence in the Cowtan & Way temperature dataset, whatever its merits may be.

I don’t discount what people associated with SkS produce, but I do scrutinise it carefully. As this case shows, peer review cannot be relied upon to pick up even obvious, gross errors.

I’m not very keen on the LS14 model for global temperature changes over the instrumental period, on Cawley et al.’s revised version thereof, or on their alternative “minimal” model. They are all cycle-based curve-fitting approaches, without what I would regard as a properly justified physical basis.”

Oh, and this is rather over-stated (I was going to say nonsense, but I’m trying to reign this in slightly 🙂 ). The model in Cawley et al. is essentially an energy balance model with a lag that attempts to take into account that the system doesn’t respond instantly to changes in forcing, and with a term that mimics internal variability using the ENSO index. It is a step up on a simple energy balance approach. You’re right about LS14, though. That is just curve fitting.

Hi Nic & ATTP,
Please try and keep the discussion scientific! I have edited and snipped some comments from both of you which strayed too far off topic. All papers should be judged on their merits rather than author lists.

I have retained HadCRUT4 as the primary global datatset as they use no interpolation/extrapolation, and included the other major datasets for completeness. All sit inside the HadCRUT4 uncertainties (using this reference period).

New reader Ed and very much enjoying the science you’re addressing here. First comment – I share Nic Lewis’s view that Cowtan and Way dataset should not be used because C&W have recently been shown to be practitioners of poor quality science and so it would be prudent to be sceptical of all their science, at least for the time being.
C&W were authors (with others at SkS) of a paper that was researched, written, presumably checked, and then published, containing a substantive and integral error in workings/calculations that rendered their published conclusions wholly unsupportable. Nic Lewis noted the error at Lucia’s Blackboard. Bishop Hill picked it up which is how it came to my attention.
Robert Way’s twitter feed had 4 tweets about the published paper on 15 Nov 2014 that read:

“Last year, this rubbish (WB – he’s referring here to the Loehle paper) was published in an out-of-topic journal that contained the author on its advisory board 1/4
Beyond the 4-month timeline from receipt to published, clear statistical errors showed it had not been scrutinized enough during review 2/4
Reimplementing their analysis showed a number of methodological flaws and assumptions to the point where a response was necessary 3/4
Enter Cawley et al (2015) who show how fundamentally flawed Loehle (2013) was in reality #climate (4/4)”

Hubris.

Judith Curry has posted before about her commenters (denizens) giving Way a break because he is young but she’s noted he has choices to make about how he wants to conduct his science career. Based on his election to be involved with SkS, his unnecessary sharpness in comments to Steve McIntyre at ClimateAudit and his tweets, I am of the opinion Way is leaning warmy in the manner of other unreliable warmy scientists a la Mann, Schmidt, Tremberth et al, and as a consequence I should be skeptical of all of the science with which he is involved and I should remain so until he and his co-authors publish a corrigenda acknowledging their error and correcting it.

I think it’s reasonable to take the following position:
1. the C&W dataset was created by scientists who have at least once publicly concluded x when their own workings show y, and the scientists did not even realise their workings showed y before they went to press (whatever the reason they engaged in a low quality science activity).
2. it is reasonable to consider that if they have done it once then they may have done it twice i.e made a substantive error with their dataset.
Ignoring their dataset at this time is defensible, and probably even prudent.
My two cents.

Hi WB – welcome to CLB!
Am sure we all agree that mistakes do happen – I have published an erratum in one of my own papers and I believe a correction to the Cowley paper is happening.

I prefer to discuss scientific issues here – the data and methodology is available for CW14 so it could be independently checked if someone wants to. C&W have also been very open to point out that 2014 is second warmest in their dataset. And, the results are well within the HadCRUT4 uncertainties.

I see no reason to assume there is a mistake in CW14 until shown otherwise – their involvement with SkS doesn’t matter to me. In the same way, I don’t assume that Nic has made an error because he is involved with GWPF.

Thanks Ed, I do like the science and I try my hardest but I am a commercial lawyer in the private sector (IT industry) so I view the practice of climate science through that legal professional prism.

I don’t actually agree with you that ‘mistakes do happen’ because to me that’s an incomplete statement. I think you mean ‘mistakes do happen but the people who made those mistakes in piece of work B should not have an unfavourable inference drawn against them about piece of work A, C, K etc so long as they issue a corrigenda, or perhaps have their work retracted’.

I think that’s what you mean. And I think that is the common approach in academia/govt sector pubic service, which is, after all, where climate science is practised. But here in the private sector ‘mistakes do happen’ is called ‘failure’ and we most definitely draw unfavourable inferences about the people who fail and about their works – all of their works.

You are only as good as your last case/deal/contract/transaction.

We put workers on performance management if they’re not reaching their KPIs, we sack them if they still can’t reach their KPI’s. We do that because at the end of the line we’re vulnerable to getting sued if we deliver an inferior defective product or a lousy service. That threat of litigation tends to concentrate our minds on not delivering inferior/defective goods and services. Hence we rid ourselves of under performing staff.

In climate science (i.e academia) there never seems to be any adverse consequence to mistakes. Your attitude of presuming C&W work is good until it is proven bad is collegiate and generous. I can’t share it cos it’s just not how private sector folks work – I presume everything C&W do, have done and ever will do will be mistaken until it is proven right by a trusted source.

Who I trust is quite another matter but I’ll start with Steve McIntyre and thrown you into the pot alongside Judy Curry 😉

Hi Ed,
First comment here, but I’ve been reading your excellent blog for a while..
You have earlier made a good point of comparing apples to apples (as much as possible), eg Hadcrut4 vs masked CMIP5.
I have been wondering, is it fully right to compare observational Global surface temperature indices with CMIP5, since the indices are composites made by roughly 71% SST and 29% 2m land air temperatures, but CMIP:s are 100% 2 m air temperatures?

I have played with data from KNMI Climate Explorer and made a CMIP5-index that should be an equivalent to the global temperature indices. I downloaded CMIP5 rcp 8.5 means for tos (temperature of ocean surface) and land-masked tas (2 m air temperature) and simply made an index with 0.71tos+ 0.29 tas.
For comparison I chose Hadcrut4 kriging (Cowtan & Way) since it has the ambition of full global coverage.

I also applied a 60 year base period (1951-2010) for anomalies. The reason is that there might be 60-year cycles involved that affects the surface temperatures by altering the Nino/Nina-distribution. For example 1910-1940 was a warm period, 1940-1970 cool, 1970-2000 warm, etc.
The use of a 1986-2005 base may cause bias, since it consists of 15 warm and 5 cold years, which anomaly-wise risks to push down the surface index. I avoid this by using a 60-year base, a full cycle with hopefully balanced warmth and cold.

The observations fit very well to the model index. I did not bother to make standard confidence intervals, instead I simply put a “fair” interval of +/- 0.2 C around the CMIP means. I have also included the average anomaly for 2015 so far.

You make an excellent point, and a very nice graphical demonstration. How much difference does the use of land+ocean simulated data make in your data, compared to just the global 2m temperature?

Coincidentally, a paper is about to be submitted by a team of climate scientists on this exact topic. There are some subtleties with defining such a land+ocean global index to exactly compare with the observations because of how the models represent coastlines, and because of a changing sea ice distribution, but your approach is a good first step.

Watch this space for our estimates of how much difference we think it makes!

The CMIP5 tos (green) has a clearly lower trend than ocean mask 2 m tas (yellow). This makes sense in a heating world since water has a higher heat capacity than air. As a result the composite global index (brown) runs lower than the standard global 2 m tas (blue). By coincidence the global index is very similar to the ocean mask 2 m tas.

I dont know in detail how KNMI explorer masking handles sea ice and seasonal variation, that could give some error in this kind of crude estimates.
However, I am looking forward to a scientific paper and a good blog post on this interesting issue…

I agree that masking (& also blending) is preferable when comparing with HadCRUT4 (see the recent Cowtan et al paper).

The issue for any overall assessment (such as figure 11.25) is that a different processing of the simulations is required to match the characteristics of each of the N observational datasets, which would give N different ensembles to somehow visualise. AR5 took the decision to use ‘tas’ everywhere and show all the observations on the same figure. The recent Cowtan et al paper chose to focus on HadCRUT4 only. But, there is no perfect choice.

The data for ERAi is not yet complete for 2015 – it normally runs a couple of months behind – but will be interesting to see what it says.

Do you have particular people in mind that routinely make mistakes and/or have motivated reasoning?

Regarding ‘motivated reasoning’, I am concerned by climate scientists who make public statements to the effect that ‘urgent actions are needed to reduce CO2 emissions’ and call other scientists who disagree with them ‘deniers’.

I don’t ignore the papers of such authors, but I find they need careful checking, auditing, and interpretation.

Thanks, I hadn’t seen your recent paper. My interest was in how the range of absolutes in the base runs impacted on model evaluations using out of sample comparisons, rather than the rather narrow issue of the use of anomalies and selecting reference periods.

Despite the comfort you take from the lack of correlation between forecast increase vs absolute temp, as you observe they clearly aren’t independent, with those above actual having (what looks like significantly) lower increases. Mauritsen et al doesn’t help a lot since the issues are inter model (remembering that it is these increases that give the backbone to the scenario limits used by IPCC).

Hi Simon,
The arguments in the paper about the simple physics and feedbacks also help globally (see the Appendix), but the regional issues and the linearity of the feedbacks are also important.
cheers,
Ed.

The simple model doesn’t really help with evaluating the output of different models against actual temps. At the margin if you are a modeller it may not matter what the absolute temp is for estimating temp increments. Put this aside. The issue is that we have a range of models that are producing a range of increments in temp that are then being used to assess the uncertainty in model space and from that used to assess uncertainty in future global temps for policy purposes. In evaluating the fitness-for-purpose of the models to use for policy you are particularly interested in the range (aka uncertainty) and what is driving it.

Now it turns out that the models that produce this range are running at different temperatures, and that there appear to be systemic differences, such that the hypothesis that each model is drawn from a the same model space appears to break down. This suggests some poking around in the features of the various models and to see if it points to models that should have less weight put on their output.

The issue is also germane to how the CMIP5 models are performing against actuals in the short run, and whether there is again any systemic difference between those that model a hot world vs those a cold. Which brings me back to some of the earlier comments in this thread.

It seems to me that the absolute temps the models run at is an important variable to include in any evaluation of model performance, not just because of he intuitive reason, but also because there’s possibly some gold in them there hills.

It’s just that I haven’t seen any literature that deals with it. Are you aware of any?

Ed could you expand on what you think these results in the light of 2015 being an El nino year might mean?

El nino is maybe the best understood source of inter-annual internal variability, clearly 2015 is one of the stronger examples of this ( along with 1998). In 1998 the El nino pushed obs to the top end of the model ensemble. As you point out this year we dont even make it to the median point of the ensemble with this extra kick from Mother Nature. While obs sit more comfortably in the midst of the model ensemble, this years data point only seems to further emphasize just how ‘unreal’ to top end of the model ensemble is in this (apples-to-pears/all the other caveats) comparison.

thanks for pointing to that post. Even if we have more to come from 2015/2016 El nino the point I made still seems to hold. From your analysis we can expect 0.1/0.2oC more in 2016 from El Nino. This still keeps us at the median point rather than the top end of the ensemble. The reason I push this is because often it can seem that in the climate discussion can be quite polarized, Nic Lewis/ATTP here is a good example, but largely the real difference exists at this top end. Skeptics have the top end unrealistic, consensus have it plausible. The analysis here seems to give some support to the sceptics at the very least. Maybe more honestly it means both positions are reasonable.

Hi HR – if you read the AR5 text describing the assessment for these figures (end of chapter 11) you will see that the IPCC said that the very top end of the projections was less likely for the near-term. 2016 looks set to be above median and boosted by El Niño which is consistent with that assessment. I think the very low end is now also looking slightly less likely.
Ed.

Hi Ed,
I really like how you just load the new data, from the same sources, into the figure from the last AR5. Seems to me the most valid way to do the comparison, not jumping on the latest data fad like Cowtan and Way. I hope you continue to extend this comparison, just as you have don here, as more observations become available.
I’m more interested in what I call the Climate Climbdown Watch. Look, the reality is that no sooner does a new projection get made, then the observations start bouncing along the bottom of the big envelope representing the spaghetti diagram of the individual realizations. In fact in the figure above, from the 1998 El Nino on backwards, has the observations bouncing along the TOP of the envelope in the hindcast, then bouncing along the bottom in the forecast period. It took the biggest El Nino ever by some measures, the 2016 El Nino, to just barely get to the mid-point of the envelope.
The red bars, as you point out, are where the IPCC expects the actual temperatures to occur through 2035. The black bar represents the likely region, and you’ll note the black bar penetrates through the bottom of the envelope. The real question is, when will the community admit that something like the top half of envelope needs to be deleted from the projections. The observations simply do not support that we will ever get into the top half of the envelope. I guess I fail to see support for your assertion that you see evidence here we can eliminate any of the low end of the CMIP projections.

I´d love to see the CMIP5 projections for the RCP8.5 only, versus incoming data, because RCP8.5 is used in most impact papers, in estimates of the cost of carbon, etc. RCP8.5 is deviating a lot from actual emissions (which have been almost constant since 2013), and it has so much CO2 concentration beyond 2050 the Carbon sinks are overwhelmed.

I´ve been trying to force the scientific community to understand this is a serious issue, because if the high concentrations and temperatures are used, the cost of carbon is probably exagerated. Coupled to what may be a high TCR estimated range the end result is that a problem we can handle with a bit of common sense is so hyped and exaggerated the proposed cures will be worse than the disease.

HIs graphs in the video link I showed are labelled as comparing TMT for both models and observations.

As I understand it, according to AGW theory, the troposphere should be warming faster than the surface anyway, so one should expect the satellite measurements to show more warming than the surface. This is not the case.

I have read the realclimate article.

I think the choice of baseline is largely arbitrary, but models and observations are calibrated to coincide at 1979. The main point is to compare the trend of anomalies and the difference between models and observation so his approach looks reasonable to me.

It is not clear what smoothing approach he is using in the video, so not clear if that criticism still stands. The chart is labelled as 5-year averages.

I think it is a moot point whether you include the statistical spread in the charts. Whatever, even the adjusted RC charts show the satellite observations are largely outside or on the boundary of the 95% confidence bands.

This is a very different result to that shown above.

So, I think the discrepancy boils down to why is there such a large discrepancy between the surface temps and the satellite temps? If theory predicts that the troposphere should warm faster than the surface (hansen et al 1988), and the data shows otherwise, then theory has significant problems.

Yes, the upper atmosphere seems to be warming less than the average of the CMIP5 models. But, there are a lot of issues here which make interpreting this far more complicated than you might imagine:
1) Observations – there are still large uncertainties in the satellite-derived temperature observations. They undergo numerous corrections to go from the microwave emission data to estimated temperature, sometimes relying on models to do so. Also, the stratosphere is actually cooling more than the models suggest which may also be influencing these estimates of upper atmospheric warming.
2) The amplification of surface warming at altitude is mainly in the tropical regions. At higher latitudes the reverse is expected – the upper atmosphere is expected to warm much less than the surface. So, just comparing the global means does not test the amplification issue properly (but see point 1).
3) If there is less amplification than expected, then that might mean that the lapse rate feedback is less negative, and so the climate may be more sensitive to GHGs than anticipated (but see point 1).
4) The graphing issues do matter, especially the baselines (e.g. https://www.climate-lab-book.ac.uk/2015/connecting-projections-real-world/) – picking a single year for calibration is simply wrong.

1) Surely the range of uncertainty in the satellite data has to be lower than the surface temps, which are subject to all sorts of interpolation and UHI effects.

2) As I understand it Christy’s charts are for the Tropical Mid-Troposphere – i.e. focusing on the area identified by Hansen as most sensitive to AGW. So he is focusing on exactly the right area.

3) I don’t understand this point at all. AGW theory states that the TMT should be most sensitive to increasing CO2. How can less warming indicate higher sensitivity to higher CO2. It seems to be counter intuitive. Please explain further.

4) As I understand it, Christy is comparing anomalies, not absolute values. So, the choice of baseline just impacts the distance of the anomaly from the baseline and nothing else. So, largely arbitrary. I agree the baselines of the different datasets should be aligned. So, for instance if the baseline was for sake of argument 15 deg C, the anomalies for each data set would vary against that baseline. So if, the predicted temperature in say 1980 was 15.1 deg C, and that for 2035 was 16.3 deg C, the difference in anomaly would be 1.2deg C. If the baseline was 14.5 deg C, and the predicted temperatures in 1980 and 2035 were respectively 14.6 and 15.8 degrees, the difference in anomaly would still be 1.2 degrees. So, what is the problem in changing baseline?

However, as I see it, there is a problem with focusing solely on anomalies in the real world. The level of ice melt for instance, is dependent upon the actual temperature of the sea (in the case of sea ice) or air temperature in the case of land glaciers. Similarly, plant growth and the level of water evaporation is dependent upon the actual temperature (and humidity), not some anomaly to an arbitrary baseline.

There does seem to be significant variation in the measures of absolute temperatures over time. In my view this needs to be addressed.

1) Definitely not – the range for the satellite estimates is far larger than for the surface and the number of corrections required is large! Carl Mears from RSS discusses that here: https://www.youtube.com/watch?v=8BnkI5vqr_0

2) OK – you also need to use tropical surface temperatures as well then. The explainers link above discusses that comparison.

I was just reading your comment above about how surface data is more accurate than satellite data, yet when I look at the +/- error margins for the temperature datasets in the IPCC AR5 for 1979–2012, the satellite datasets have a smaller error margin. Why is that?

Dear all
Greetings,
I’ve studied your explanations and learned many valuable tips, thanks.
I have a suggestion:
If you want to compare the effect of CMIP5 models and select the best one for your case study according to the RCPs data, I think it will be good to do your comparison between downscaled data of CMIP5 models under RCPs scenarios (2006-2017) and the observation data (station) during (2006-2017), through the efficiency criteria’s. With this comparison you can understand that during 2006-2017 which model has a better feedback with the reality that it happens. Then with the selected model you can use the future data.
I’ve developed a software in my website that it can do the process with different statistical downscaling methods. As well you can compare every period that you want with historical run, or RCPs data, etc. It has different downscaling methods such as QM. I invite you to visit it, maybe useful for you.