There’s been quite a lot in the media, and elsewhere, about problems with science. A common theme at the moment is the replication crisis, but you regularly see claims in the blogosphere that papers should be retracted because they supposedly have errors, and I’ve even seen some argue that we should find ways to penalise scientists who misrepresent our current understanding.

I’m a bit tired, and can’t really face writing anything too lengthy (I may fail) so thought I’d make some basics points and let others express their views in the comments. It’s clear that some of the criticisms have merit. We should promote good practice and should encourage people to publish research that is careful and thoroughly checked. What we value (lots of grant funding and papers in Nature, for example) does not always reflect actual quality. We should publish negative and null results and should aim to also publish replication studies. We could improve how we undertake peer-review.

However, science (by which I really mean fundamental research) is about understanding things, typically things we don’t yet understand fully. People should take risks; that’s how we solve interesting problems. People will get things wrong; it’s part of the process. People will try methods that turn out not to be very good; again it’s part of the method. We should encourage people to challenge existing paradigms, even if what they do ends up being horribly wrong – the quality of the challenges can tell us something of how hard it is to overthrow well-established ideas. We should base our understanding – or lack thereof – on an assessment of all the available evidence, not simply on one, or two, studies.

In some sense, that seems to be where some of the problems lie. In normal science, we trust our overall understanding when there is a large collection of consistent evidence. In some cases, however, important decisions are based on one, or a few, studies. When these turn out to be wrong, the impact can be substantial. In my view, however, this is not normal science; we shouldn’t be basing our understanding on only a few studies. Sometimes, it may be unavoidable, but we should still be careful of judging science overall, on the basis of a few examples like this.

Maybe in cases like this (when we do need to make decisions with limited evidence) we should scrutinise the evidence more carefully. In normal science, however, I would still favour trusting the method, rather than encouraging detailed scrutiny of individual studies (nothing wrong if people want to do this, but I don’t see the overall value). That doesn’t mean that there aren’t things that we could do to improve the overall quality of research, just that the problems are not – in my opinion – nearly as great as might be indicated by some of the examples that are often given.

Anyway, that’s a summary of my general views. If anyone else would like to add anything, or make a different argument, feel free to do so through the comments.

109 Responses to Science is broken!

Something I couldn’t fit into the post was a mention of this article, which says

Sociology, economics, climate science and ecology are other areas likely to be vulnerable to the propagation of bad practice, according to Smaldino.

I’ve seen this kind of suggestion before, but I’ve not really seen why people include climate science in such a list. Do they really have evidence to suggest that climate science is susceptible, or is it simply that they encounter lots of complaints about it and assume that it is susceptible,

“In normal science, however, I would still favour trusting the method, rather than encouraging detailed scrutiny of individual studies (nothing wrong if people want to do this, but I don’t see the overall value). That doesn’t mean that there aren’t things that we could do to improve the overall quality of research, just that the problems are not – in my opinion – nearly as great as might be indicated by some of the examples that are often given. ”

This parts sounds a bit unfortunate and I am not sure it expresses what you want to say. Scientists should definitely scrutinize individual studies. There needs to be pressure to produce good quality articles.

The power of science and our understanding of the world comes from many different papers once the dust has settled. To understand this, you should not look at individual papers, especially ground-breaking paper that open a new field and were written when we did not understand things yet. The problem is that these are exactly the papers Nature & Co. and the press is interested in and that they thus unfortunately shape the public opinion of science.

“People should take risks; that’s how we solve interesting problems. People will get things wrong; it’s part of the process. People will try methods that turn out not to be very good; again it’s part of the method. We should encourage people to challenge existing paradigms, even if what they do ends up being horribly wrong – the quality of the challenges can tell us something of how hard it is to overthrow well-established ideas. We should base our understanding – or lack thereof – on an assessment of all the available evidence, not simply on one, or two, studies. ”

I like this paragraph a lot. While we should look at the quality of individual papers as scientists, we should not punish people merely for getting things wrong. Being wrong is part of doing something new and showing why something is wrong can help understand a problem better. What should be punished is being trivially wrong, doing bad science, especially if you make a habit out of that, like some of our mitigation sceptical friends.

This parts sounds a bit unfortunate and I am not sure it expresses what you want to say. Scientists should definitely scrutinize individual studies. There needs to be pressure to produce good quality articles.

Quite possibly. I was more meaning that I don’t really see the point in “auditing” other studies. To me, other people should try to reproduce interesting studies to see if they get the same result (or a consistent result) normally by collecting more data, re-analysing existing data, or developing and running other models. Of course, there is nothing wrong with going through someone else’s work in detail, but not finding an error doesn’t mean the result is somehow right, and finding an error doesn’t mean that it is completely flawed.

The problem is that these are exactly the papers Nature & Co. and the press is interested in and that they thus unfortunately shape the public opinion of science.

Being wrong is part of doing something new and showing why something is wrong can help understand a problem better.

Absolutely. Getting something wrong can lead to a better understanding. It’s one reason that I think that retracting papers that turn out to be wrong is not what we should do (there might be exceptions). It’s important to have an understanding of the process that led to our better understanding and it’s also important that people don’t repeat the same error.

Retracting papers for being wrong is a terrible idea and I sometimes worry we are going in that direction. Did someone already suggest to retract Newton’s Principia? Einstein showed large parts of it “wrong”.

It should at least be a misleading paper and scientists are supposed to know their literature. (More often adding a comment (by the author) may be helpful (and is easy) nowadays. We do have more multidisciplinary studies and thus more often people reading papers who do not master the full literature.)

I was relieved to hear that “retraction” is just a flag. You can still read the paper.

Fully agree that auditing is extremely unproductive for people interested in improving scientific understanding. Much more progress brings to study the same question in a fully different way.

This distinction between “normal” science and, shall we call it, supra-normal science, is I think unfortunate. Unfortunately, philosophers of science are stuck mostly in the 17th to 19th Century, when science was as much a hobby as an occupation. The vast ocean of ignorance lay before these gentleman scientists (not to mention the occasional honorary gentlewoman!). No wonder that much was innovative with the benefit of hindsight. Now we have science as a career and not everyone can stake claim to some arid areas of science where no one has been before, The purpose is surely to develop a deeper a richer understanding, and to develop more interdisciplinary approaches to subjects. Nothing normal here in seeing, for example, climatologists, geographers, glaciologists, marine biologists, and many other disciplines, explore the impact of global warming on the Arctic. I am sure that von Humboldt and Darwin would be in awe and delighted with the diligence and perseverance of these scientists. I call that extra-ordinary science and without question, reviewed and replicated as much as any in any field of enquiry. Ok, so they are not trying to thermodynamics, Newtonian mechanics, quantum theory, or even Darwinian natural selection (they depend of all these established theories and methods), but they do create knowledge that otherwise would be hidden from us. That ain’t broken science (even if the publishing models need a good kick up the arse); that is science, pure and simple, for which we as a modern society should be very grateful.

Richard,
I agree that a lot of science/research today involves developing deeper understanding and building interdisciplinary approaches to solving problems/developing understanding. However, I would argue that the fundamental approach is still similar. We test hypotheses, try to reproduce what interesting results from other studies, and try to tackle problems that lie on the edges of our understanding.

ATTP writes — “I’ve seen this kind of suggestion before, but I’ve not really seen why people include climate science in such a list. Do they really have evidence to suggest that climate science is susceptible, or is it simply that they encounter lots of complaints about it and assume that it is susceptible,”

Smaldino is implying that any science of complex, chaotic systems are prone to bias as opposed to sciences where laboratory tests can be simply repeated by another lab for confirmation. Climate Science fits this mold.

Smaldino and co-author McElreath are hardly controversial where it comes to the core points of their study, namely that academic hiring and promotion is too based on quantity of publications over quality, and that there is a disincentive to publish negative results or replication studies in many scientific fields. I can’t say much about statistical power of designed studies since the topic doesn’t pop up that often in the physical sciences, at least in areas I’m familiar with.

But in the Guardian interview Smaldino strays too far from his area of expertise when he states that climate science is a field “likely to be vulnerable to the propagation of bad practice” because “the combination of studying very complex systems with a dearth of formal mathematical theory creates good conditions for low reproducibility”. This gives the unfortunate impression that Smaldino may be basing his understanding of this broad multidisciplinary field, at least in part on the hostile views of climate ‘skeptics’.

I’ve always thought Anthony Watts original weather station siting endeavor was a great idea. I’ve even made attempts to acknowledge to “skeptics” that that experiment was really clever (usually to be attacked as a result). His hypothesis was clearly wrong, but through that effort we know a little more about the effects of station siting on the data. I think that’s a good thing.

Magma says: “that there is a disincentive to publish negative results or replication studies in many scientific fields.”

If you have mostly empirical science studying a complex question and theory does not tie the different findings together, negative results may not be that convincing because it could easily be a measurement error. There I would add medicine and nutrition.

In climate science as a natural science negative results are wonderful. If I can show that there is no global warming: Nobel price. If I can show that precipitation is not increasing: Nature paper. If I can show that fracturing is not an important process for the collapse of ice sheets my colleagues will be infinitely grateful for the simplification of the problem. “Negative” results are results, important results.

Smaldino is implying that any science of complex, chaotic systems are prone to bias as opposed to sciences where laboratory tests can be simply repeated by another lab for confirmation. Climate Science fits this mold.

Climate isn’t chaotic… one might argue that it isn’t even particularly complex when considered in terms of essential energy balance (how much excess thermal energy accrues in the Earth system with a given enhancement of the greenhouse effect), even if this is made more difficult by uncertainty in the temporal nature of ocean heat uptake, the effects of atmospheric aerosols etc that give rise to uncertainty in, for example, estimates of climate sensitivity. The essential empirical measures in climate science are eminently reproducible (the atmospheric concentrations of CO2 and other greenhouse gases; atmospheric water vapour concentrations; sea level; Earth surface temperature; paleo temperatures from proxies; trapped atmospheric gases in ice cores etc. etc.)

So yes Smaldino may be “implying” stuff but that’s just voicing a bias. His paper (“The natural selection of bad science” with Richard McElreath) says not a word about climate science and is essentially an anecdotal pursuit of an evolutionary analogy, with a simple model which has no necessary validity and is full of parameters ( “pay-off for publishing novel result”, “pay-off for publishing positive replication” , “pay-off for publishing negative replication”, “pay-off for having novel result replicated”) whose subjective parameterisation seems rather likely to be prone to bias.

The only “data” (aka “evidence”) in the paper is an analysis of statistical power in a set of reviews of papers in the social and behavioural sciences between 1960 and 2011 (it seems not to have increased much). Does that have anything to say about the physical sciences (including climate science)? No.

My feeling is that papers/articles that align with the science-bashing bandwagon (“reproducibility crisis”; “publication crisis”, “peer-review crisis”) are given rather a free ride. Smaldino and McElreath is sort of interesting but doesn’t provide any particular evidence in support of their notion (evolutionary analogy for selection of poor science) and shouldn’t be written about as if they’ve found something “true”. That’s just perpetrating bad science.

Smaldino: “My impression is that, to some extent, the combination of studying very complex systems with a dearth of formal mathematical theory creates good conditions for low reproducibility,”

Enh. The issue here is this: when you study a problem and you don’t have the important variables hammered out yet, you may have trouble with reproducibility.

This is a large chunk of the interesting science we do. ScientistA finds Result 1, ScientistB does similar work and finds Result 2. They compare methodology to figure out why their results differ. And then they chase that bunny trail down, nailing down the differences in their methodologies and nailing down the causal factors. Eventually they can say “well, Factor X is the key here. But now we know what it does, and we can use that.”

There are problems in scientific research, but the speculative examples suggested are as other have identified as much part of the messy practise of science as errors. Individual fraud or even team group-think get corrected. There is no indication that a more strictly regulated and controlled system would produce better results. Attempts in the past to police science have not resulted in discernible improvements. Throwing money at it seems to work. But the source and strings matter.
The dogmatic contrarians, dubious plagiarists and economeretricians who have forsaken pragmatism for formalism may be a necessary cost of the success of the whole.

But science is also misused to defend economic interests. Here is a classic case. Ironically it involves the exploitation of the C-H bond for energy with deleterious consequences for human lives.

“Retracting papers for being wrong is a terrible idea and I sometimes worry we are going in that direction.”

Well….sometimes there is a good reason to retract a paper for being wrong. For example, when it is quickly discovered that there is an obvious error that invalidates the whole result, and the result is rather important (a few papers on health claims are of relevance here). Another example is related to one of my own papers that I have just submitted, pointing out a lot of papers that make a fundamentally flawed claim, which is trivially shown to be wrong and which contradicts 50 years of prior knowledge. Retractions may be necessary to stop the further spread of this claim and/or risk repetition of this same wrong claim in a few years, when new scientists find the 50+ papers making the flawed claim, and not the 1 that shows it to be wrong.

This situation is quite different from Newton=>Einstein, where Newton’s work is just a natural part of the historical development of scientific knowledge, or the situation where scientists were wrong in ‘interesting’ ways.

Smaldino is implying that any science of complex, chaotic systems are prone to bias as opposed to sciences where laboratory tests can be simply repeated by another lab for confirmation. Climate Science fits this mold.

At a certain resolution (quantum level) a glass of water is a complex chaotic system, so this statement (a) can be applied to any scientific investigation and (b) is not in “opposition” to “sciences where laboratory tests can be simply repeated”.

Phil — Generally speaking, we can model what happens on the quantum level of a glass of water based on the entire mass of water. However, if a scientist were to claim perfect knowledge of such a chaotic system based on those averages, he would be wrong.

This really isn’t that difficult, you’re taking a general statement (ex. “grass is green”) and making minute observations about that subject (“You are wrong. I found a blade of grass that is yellow.”). This is not the point that Smaldino is making. Sometimes such a minute distinction is significant and other times it is not.

For instance, in psychology, measures of success can be difficult and costly to measure and the reactions to any treatment can be unique to each individual. Such a science is prone to bias and difficult to confirm. Just because one therapy works for one person, does not mean that it will work for everyone.

Climate science has some repeatable laboratory experiments, but there is also a complex, chaotic response in nature which is not easily simulated in the laboratory. Such a science is prone to bias.

Just because a science is bias-prone does not mean that we ignore it or assume results are wrong. Rather, it means that the community must be more diligent at testing the science for bias. The criticism of Climate science is that a few dominant gatekeepers have reinforced the biases rather that challenged them.

Rather than rote replication, I would instead support funding multiple approaches to the same question. That way scientists can still be creative in their approaches yet together would help answer the key open questions in their fields. Different approaches, in my opinion, will in general bring more understanding than just repeating the same approach.

Is climate science broken? Is climate too complex to understand without bias? Those seem like funny questions to ask as the 1C warming threshold falls by the wayside, pretty much as forecast 30+ years ago.

Not entirely off topic, there’s a new paper out by Lewandowsky, Cook and Lloyd that examines the incoherent and contradictory positions held by many climate ‘skeptics’ and contrarians. It goes without saying that simultaneously holding contradictory beliefs with respect to physical phenomena (e.g. the Earth is not warming and the warming is natural) is profoundly unscientific.

Something I don’t think deniers have twigged on to yet (or if any have I missed it) is that Lewandowsky and Cook either publish in open access journals or pay the access charges in others, allowing their papers to reach a wider readership then they otherwise would. I think this is a clever strategy.

I’ve been engaging with a guest poster at Curry’s who relies on a paper using a GCM to justify a hypothesis that solar UV drives climate, yet dismisses the ” mythical overbearing warming capabilities of CO2″, and predicts global cooling – despite this being contradicted by the same GCM generating his hypothesis.

Indeed… and whilst all manner of natural causes can make the climate change quite happily – even quite trivial apparent causes, like very small solar variations – some magic prevents CO2 having an effect on the climate.

Just because a science is bias-prone does not mean that we ignore it or assume results are wrong. Rather, it means that the community must be more diligent at testing the science for bias. The criticism of Climate science is that a few dominant gatekeepers have reinforced the biases rather that challenged them..

Care to provide any evidence that climate science “is prone to bias”? Or point out some of the biases that “a few dominant gatekeepers have reinforced”?

And who are these few “dominant gatekeepers”? That seems a laughable notion to me… that stuff that might be published which contradicts some of the established science isn’t because some “gatekeepers” are stopping it! A moments reflection would show that’s a pretty absurd idea.

Still they’re your assertions and you presumably have some evidence in support of them…care to show us some of it?

lorcanbonda,
Exactly; just as we can know the macroscopic properties of a glass of water despite having imperfect knowledge of the individual sub-atomic particles, we can know the large scale properties of the climate without having data for every cubic inch of the planet. When abstracting out to planetary levels we investigate a simpler system than we do when we look at a single location.

Such a science is prone to bias.

Maybe, but that doesn’t mean a bias exists.

Just because a science is bias-prone does not mean that we ignore it or assume results are wrong. Rather, it means that the community must be more diligent at testing the science for bias.

“more diligent” than what ? Whose standards are we to judge the dilligent-ness of the science ? Why do you feel that Climate science is not aware of this ?

The criticism of Climate science is that a few dominant gatekeepers have reinforced the biases rather that challenged them.

It should be obvious that if there are indeed a “few dominant gatekeepers” and they do challenge indeed challenge the biases and find that there are none, or at least there are no systematic ones, then that will appear to be indistinguishable from reinforcing them.

While you’re collating evidence in support of your assertions I’ve been trying to think of examples of “bias” in climate science. Of course there is one example of a rather astonishing bias from the early practitioners of what was initially a rather obscure methodology – the estimation of tropospheric temperatures from Microwave Sounding Units. The early practitioners (Christy and Spencer) got this hopelessly wrong over a period of around 15 years due to a series of errors and biases (we know that some of this was due to “bias” and we could discuss why we know this).

However that’s a rather instructive case for two reasons. First since our understanding of climate science, and more specifically, of the Earth surface temperature response to enhanced greenhouse forcing, is a result of multiple lines of evidence, it became apparent that the MSU interpretations must be wrong quite a while before they were explicitly shown to be wrong. Secondly once some competent researchers addressed the MSU data more reliably, the errors/biases were uncovered.

Both of these are important in understanding something about the evidence base that underpins our understanding. Our understanding comes from a large number of different data sets, types of data and analyses. We can be confident that our understanding is secure to the extent that these different data sets and analyses (e.g. paleo analyses, paleoproxy analyses, empirical analyses, experimentation, application of theoretical understanding and computational approaches) are self-consistent. Secondly, since pretty much all types of study are undertaken by several, and sometimes, many, different individuals and groups, the likelihood that one person or groups bias dominates a field becomes less likely. That’s the case with MSU-derived tropospheric temperatures – at least four different sets of people analyse this data and so any obvious bias is likely to be uncovered.

I’m a little (OK, more than a little) concerned about Gelman’s first point, “Psychology’s discourse on validity, reliability, and latent constructs is much more sophisticated than the usual treatment of measurement in statistics, economics, biology, etc.” In my experience statistically-focused researchers across the sciences (physics, geosciences, biology, etc.) are the opposite of unsophisticated. And a lot of innumerate junk seems to get published in psychology.

I couldn’t help noticing Gelman singled out an Economist Who Cannot Be Named for special mention.

Magma,
I’m not sure I agree with that either, but I don’t have much sense of how sophisticated it is in psychology. One point might be that psychology relies on statistics more than in some other fields where you might have underlying principles that can allow you to sanity check your results.

It seems to me that some people fall into the stereotype of the iconoclast and are naturally contrary and suspicious of received opinion, so the “science is broken” story is one that appeals to them and they run with it. Many fall into the other stereotype of being relatively trusting and accepting the views of experts who have studied things more deeply than we have. So on one side, we have those who think “science is broken” and those who think “science is just fine”, but the reality is more nuanced than that, and it is more likely that the truth is nearer “science isn’t badly broken (like democracy), and (like democracy) there isn’t anything obviously better to hand, so we may as well stick to it”. Unfortunately that isn’t a terribly interesting story, so the media are not going to want to run with it, so the general public are likely to polarise according to their stereotype in the absence of something regularising there views back to the middle ground.

Sorry haven’t read all the comments, might have already been said. The problem is not about scientists sometimes getting things wrong, it’s about NHST on small data sets via the garden of forking paths being a recipe that guarantees bogus “high impact” results, time after time after time. Without any true scientific progress. It’s just a noise machine.

Most of us probably like to think of science as being a bit of a random walk towards the truth, we take wrong steps and don’t always go quite where we want to, but over time we tend to head in roughtly the right direction. That breaks when the motivation towards gee whizz overwhelms the motivation towars truth.

Magma: Scientitsts and science-wonks love to talk in science-speak and acronyms. This is a key reason why they have a difficult time communicating with the average person. I should not have to Google an acronym in order to dtermine its meaning.

The “garden of forking paths” is inevitable in science, I suppose a partial solution is not to base the forks in the path on NHSTs, but to save the NHSTs when you think you have reached the garden gate. NHSTs are a useful sanity check, but the real test is whether your idea is taken up by the research community and helps progress to be made, but assessing and rewarding that is more difficult.

Perhaps for each paper an academic submits to REF they should be required to also have a paper that refutes an existing paper or provides a worthwhile replication of it. That way post-publication review would be encouraged and publishing bad papers discouraged. ;o)

I recommend the work of Mayo and Spanos on severe testing as an improvement to the random walk produced by testing plausible effects with p values. I don’t agree with dikran’s last comment so much, bad ideas do gain consensus if they intuitively seem right. That’s why more severe tests are a good idea (which I think his second para endorses).

Do people realise that ordinary least squares trend analysis of temperature has never passed a severe test, in that it has been shown no other explanation for how the atmosphere warms has been ruled out? Yet many people argue that it has – and there is definitely a consensus that this is the case. It might be a methodological consensus, but it is now being accepted as a theoretical proposition without a detailed mechanistic explanation ever being proposed. It is all based on statistical induction.

Roger,
As I understand it, attribution studies are still frequentist (well those that attempt to determine the causes of the observed warming) and that this could suffer from the Prosecutor’s fallacy. A Bayesian approach might resolve this, but given that there are really only two suspect (anthropogenic and non-anthropogenic) I’m not sure that this is such a big issue. However, there is still an issue – I think – that it is easier to do attribution when considering all anthropogenic influences, than to try and determine the influence of individual anthropogenic factors.

ATTP,
attribution studies are not strictly frequentist because they assume the statistical model that the frequencies should be counted by. So they are frequentist subject to underlying assumptions of a statistical model of a secular trend (warning – economics does this all the time, so perhaps it’s not a great idea). There is a more basic context that asks which statistical model should be used to conduct attribution studies. The sceptical scientist asks whether the ordinary least squares [hereafter OLS -W] model is the best, or perhaps should we be using a model that explicitly represent climate as a complex system? Subjective Bayesianism is good for sensitivity studies, but in the long run is also statistical induction.

The contrarians such as Keenan are correct that there is a lot of rubbish stats in climatology. They are wrong in claiming the core theory is wrong using stats (can’t be done) – what it means is that evolving risk is complex system risk and that the straightlineans are stuck with Newton and Gauss when they should be with Lorenz.

Angech upthread made the assumption that ENSO has to be independent of the warming signal. Who says? We now know that it is involved (and also decadal variability).

Roger,
I’m a bit confused, because I had assumed the OLS is really only used to describe the data (trend, for example), rather than to make some kind of attribution statement. Surely, something more complex is used for attribution studies (a fingerprint analysis is presumably more complex than simply OLS?)?

The contrarians such as Keenan are correct that there is a lot of rubbish stats in climatology.

and ATTP – it isn’t an issue for attribution so much, but it is for characterising climate-related risk for future planning. How do you go if you get hit by an increase of 0.5C in one whack after not much change at all?

I shouldn’t pile on here but there are a number of recent papers that insist OLS is the signal and this is the narrative that is widely communicated in the broader media. The contrarians base their opposition on the fact that climate is not OLS. This whole debate is being argued on incorrect premises.

I shouldn’t pile on here but there are a number of recent papers that insist OLS is the signal and this is the narrative that is widely communicated in the broader media.

Yes, I agree that there is a perception that OLS is the signal, rather than simply a way in which to describe the data (i.e., what is the linear trend and the uncertainty in that trend – decriptive statistic, rather than inferential statistics).

The contrarians base their opposition on the fact that climate is not OLS. This whole debate is being argued on incorrect premises.

Certainly, a great deal of it is based on incorrect premises. I will say, though, that communcating science is difficult, and people often use try to keep it reasonably simple (you can’t give everyone an undergraduate degree in a relevant science). This then leads to it being easy to attack, if someone wishes to do so. It’s hard to know how to resolve this; if you make things more complex, communication becomes more difficult, if you keep it simple, you end up being criticised.

RNS: If scientists want to effectively communicate with non-scientists, they should define every single acronym the first time it is used. I presume that there are non-scientists reading this comment thread on occassion.

Me, I think the climate system is is not thatparticularly challenging since it is known that components in the system are inherently chaotic; there are feedbacks that could potentially switch sign, and there are central processes that affect the system in a complicatedfairly simple, non-linear manner. These complexfairly simple, chaotic, non-linear dynamics are an inherent aspect of the climate system. Or, I have lived a long time, and when I twist the thermostat, it always, every single time, all things being equal or not, regardless of butterflies, gets warmer.

I am particularly sensitive to this issue because I encountered it every day during my 32-year professional career in the public sector.

Now I encounter it in the boiler-room where members of the all-volunteer SkS author team gather to discuss matters of mutual interest.

Since I immersed myself into the science of climate change some 15 years ago, I have learned what many of, but not all of, the acronyms in common usage mean. For better, or for worse, “OLS” is not one of them.

Willard, I agree that when one set of premises is evacuated another set of premises is occupied, but the science community should not make it easy.

My solution would be to say that warming constitutes a complex trend over 50+ years or so, and is decidedly nonlinear on scales of less than this. Simple ordinary trend analysis is a statistical tool and not a scientific model (Slingo, chief scientist of the UK Met Office pretty much said this in her 2013 report to the UK parliament).

And JCH, the nonlinearity is fairly simple – it is regimes separated by step-like warming. The source is the ocean (warm pool), one trigger is the Pacific see-saw and ENSO when the warm pool is critical. This took a long while to figure out but several research groups are converging on a related set of mechanisms.

Sorry for not defining OLS first up – I have a broken collarbone, so am typing with one hand and capitals are tricky.

I tend to agree with John Hartz a biti here on acronymns. LMGTFY is not a reasonable answer to everything, and some acronyms are duplicated in completely different contexts. Just ask the WWE how much money its spent after being sued by the WWF. It’s hard to tell if an acronym is something familiar, or YAUA. IMHO, if I can’t recognize what a acronym is, and someone else is ROFLTAO because I can’t use Google, and telling me “YMMV” and that they won’t be my BFF. It’s not as if TPTB have forced us to be brief, and there’s a shortage of bits on the Internet. Like when my mother emailed me and asked:
“I keep seeing this jargon on the Internet. Can you tell me what IDK, ILY, and TTYL mean?
I answered:
I don’t know
I love you
Talk to you later.
to which she answered “OK. Thanks anyway. Thought I’d ask. I’ll write to your brother and see if he knows.”

…and Then There’s Physics says: September 23, 2016 at 7:18 pm
” One point might be that psychology relies on statistics more than in some other fields where you might have underlying principles that can allow you to sanity check your results.”
Well put and well punned. Deliberate I assume.
Reminds me of a joke “how do you tell the patients from the staff?’
A. the patients wear shoes
Of course that was back in the 70’s.
Psychology has made big steps since then.

And JCH, the nonlinearity is fairly simple – it is regimes separated by step-like warming. The source is the ocean (warm pool), one trigger is the Pacific see-saw and ENSO when the warm pool is critical. This took a long while to figure out but several research groups are converging on a related set of mechanisms. ….

Though I see it more in terns of the Eastern Pacific as that is the only place on the face of the earth that has effectively blunted AGW. The PDO index pretty much captures the trends in the Eastern Pacific. When the trends in the Eastern Pacific are positive, the earth’s surface warms rapidly… big steps up. Up until ~1985, when the trends in the Eastern Pacific were negative, the earth’s surface cooled/a pause in warming. In the 1985 this relationship broke… big lag in surface response… because ACO2 is now big and is mostly swamping natural variation. Natural variation slowed warming from ~1985 until ~2005, and the the PDO, fully negative by then, was finally able to briefly paws it.

1. the PDO (as a proxy for the Eastern Pacific warming or cooling) was the cause of the changes of direction in the GMST from ~1905 to ~1940 (big increase,) ~1940 to ~1950 (brief global cooling,) ~1950 to ~1975 (gentle warming,) ~1975 to ~1985 (rapid warming.) And then, once ACO2 had reached a sufficient level, the PDO failed to change the direction of the GMST until it it went true negative in the 21st century, and then all it did was cause the ever-so-brief paws in warming that all the fools are so excited about.

2. ECS is likely to be well into the north side of the range.

3. the future: no actual cooling; vanishing pauses; surging surges.

4. the AMO and the stadium wave and the 60-year cycle stuff is all complete rubbish.

Bob Loblaw: One of the nicer features of the Skeptical Science website is the pop-up glossary of terms and acronyms. Perhaps some day, Word Press will allow for that type of plug-in. In the meantime, we’ll have to muddle through the “acronym challenge” here on ATTP’s site as best we can.

Frankly, I read Gellman’s post as little more than an apologia on behalf of auditors.

I don’t quite read it like that. I think he is talking about situations where people could clearly have performed a better analysis, or drawn more cautious conclusions, or where you can clearly infer how an error influences the conclusions that are drawn. What we see in the climate blogosphere where people either find errors without ever explaining how this would influence the conclusion, or claim to find an error that may simply be an alternative way of carrying out the analysis, or claim to find errors that really aren’t any such thing. I don’t have a problem with blog critiques of papers (would be rather ironic if I did) but I do think they need to be open, the person being criticised should be able to respond without feeling as though they personally are being attacked, and they should really put the issues into context.

This paper presents new estimates of the hemispheric energy balance based on an assembly of radiative flux and ocean heat data. Further, it provides an overview of recent simulations with fully coupled climate models to investigate the role of its representation in causing tropical precipitation biases. The energy balance portrayed here features a small hemispheric imbalance with slightly more energy being absorbed by the Southern hemisphere. This yields a net transport of heat towards the NH composing of a northward cross-equatorial heat transport by the oceans and a southward heat flow in the atmosphere. The turbulent fluxes and hemispheric precipitation balance to about 3 Wm−2 with slightly larger total accumulation occurring in the NH. CloudSat data indicate more frequent precipitation in the SH implying more intense precipitation in the NH. Fully coupled climate model simulations show that reducing hemispheric energy balance biases does little to reduce existing biases in tropical precipitation. …

Global mean surface temperature change over the past 120 years resembles a rising staircase1, 2: the overall warming trend was interrupted by the mid-twentieth-century big hiatus and the warming slowdown2, 3, 4, 5, 6, 7, 8 since about 1998. The Interdecadal Pacific Oscillation9, 10 has been implicated in modulations of global mean surface temperatures6, 11, but which part of the mode drives the variability in warming rates is unclear. Here we present a successful simulation of the global warming staircase since 1900 with a global ocean–atmosphere coupled model where tropical Pacific sea surface temperatures are forced to follow the observed evolution. Without prescribed tropical Pacific variability, the same model, on average, produces a continual warming trend that accelerates after the 1960s. We identify four events where the tropical Pacific decadal cooling markedly slowed down the warming trend. Matching the observed spatial and seasonal fingerprints we identify the tropical Pacific as a key pacemaker of the warming staircase, with radiative forcing driving the overall warming trend. Specifically, tropical Pacific variability amplifies the first warming epoch of the 1910s–1940s and determines the timing when the big hiatus starts and ends. Our method of removing internal variability from the observed record can be used for real-time monitoring of anthropogenic warming.

Our field has always encouraged—required, really—peer critiques. But the new media (e.g., blogs, Twitter, Facebook posts) are encouraging uncurated, unfiltered, trash-talk. In the most extreme examples, online vigilantes are attacking individuals, their research programs, and their careers… volunteering critiques [with] personal ferocity and relentless frequency.

At the same time she says that peer critiques are essentially required, that we should all “trust but verify,” that the adversarial process makes contributions, and that there should be transparency in methods and data. Yet, AG says that,

In short, Fiske doesn’t like when people use social media to publish negative comments on published research. She’s implicitly following what I’ve sometimes called the research incumbency rule: that, once an article is published in some approved venue, it should be taken as truth.

In other words, .according to AG, we should ignore what Fiske *explicitly* says and come to the conclusion (that since she doesn’t like social media) she implicitly believes we should accept published articles as true. The logical connection escapes me. The rest of the article follows based on this premise.

oneill,
Okay, fair enough, I hadn’t read her article in detail. I certainly don’t disagree with the paragraph of hers that you quote; I think some blog critiques can be rubbish, I just don’t have a problem with social media critiques in principles. I also agree that ultimately you do need something more formal; it’s fine to start with social media, but I think a formal response is still needed.

I find this actually quite a tricky issue. I do think there are issues with academia/research that we could improve. In particular, we do publish too much and tend to use metrics that don’t necessarily indicate actual quality. On the other hand, that there are examples of situations where we can’t reproduce an earlier result doesn’t necessarily mean some fundamental issue; it is a process and we learn as we go along.

Roger Jones wrote “I shouldn’t pile on here but there are a number of recent papers that insist OLS is the signal and this is the narrative that is widely communicated in the broader media.”

Can you give an example of a paper where this happens? I don’t think I would put it like that, I suspect it is more like “there are a number of recent papers that insist OLS (implicitly) is an estimate of a first order Taylor series approximation of the signal, but this is misinterpreted in the broader media as being the signal itself”. However much of this is left unsaid as the reader is expected to be from a similar background and will understand what is meant. I sually try to follow Hanlon’s razor and look an interpretation of what is written that makes most sense, rather than to read it completely literally (and then ask questions to clarify if necessary)

“Do people realise that ordinary least squares trend analysis of temperature has never passed a severe test, in that it has been shown no other explanation for how the atmosphere warms has been ruled out?”

Statistics can’t do this, at least unless you can include all explanations as alternate hypotheses. Just fitting an OLS trend to the data can’t do that as it is not a model of the data generating process, just a way of summarising some information about the data.

“attribution studies are not strictly frequentist because they assume the statistical model that the frequencies should be counted by.”

I’m not sure I understand the point being made here, frequentists use models all the time, e.g. “unbiased coin”, without it causing any departure from the frequentists framework.

“The contrarians such as Keenan are correct that there is a lot of rubbish stats in climatology.”

There is a lot of rubbish stats everywhere in science, especially NHST’s and interpretations of confidence intervals. I blame the frequentists for having a strange way of defining probability that means they are unable to give a straight answer to the questions we actually want to ask! ;o)

ATTP wrote “I’m a bit confused, because I had assumed the OLS is really only used to describe the data (trend, for example), rather than to make some kind of attribution statement.”

Regression (including OLS) can make statements of the form “X can be explained by Y” but not “X is explained by Y”, this is another common misunderstanding of statistics, but you are right, OLS is generally used to describe the trend, unless there is a good reason to believe that the data generating process actually is a straight line (e.g. allometry).

I’ve never seen severe testing in action. From what I know, Mayo 1997 builds on Laudan 1995. The first is suspicious to me – Popper was already going a bridge too far, so I see no reason to go even further than him. I have little choice (avatar oblige) but to dismiss the second as crap, even if I don’t have much time to discuss why. (In a nutshell, I suspect his notion of reasonableness is not extensional.)

To cut to the chase, I’d like to see who an example of any kind of empirical testing that satisfies the severity criterion:

There is a very low probability that test procedure T would yield such a passing result, if hypothesis H is false.

I don’t mind much frequentism, more so for inferences offered by institutions. Even if its conservativeness appears to only be a façade in the end. As long as we keep in mind this is all for exploration and understanding, as Gelman himself suggests, all is well and good.

When I read an abstract (I could not find a free copy of either) I look at who is citing it, which is how I found the abstract that Roger, who has written extensively on the subject, had not yet seen. But mostly last night I tried to figure out when my newly acquired Chinese oxblood vases were made… I think just prior to 1910, but maybe in the years just prior to 1900.

In the in crowd, aerosols done it. Fine with me. I suspect they are a bit hazy on it. So does Trenberth.

What I’m suggesting is that monocausal arguments about the mid C20th hiatus based on ocean/atmosphere energy exchange in the Eastern Pacific are going to back you into a lukewarm corner one of these days.

Dikran: Doesn’t your ongoing discussion* with Roger Jones boil down to whether or not time, by itself, can cause changes in the climate system? We all know that the climate system changes over time, but is the reverse true?

JH, no it is a bit more subtle than that. In some applications it is reasonable to suggest that the physical process giving rise to the data can be described by a straight line. For instance allometry can be used to estimate the body mass of dinosaurs as a function of the circumference the long bones in their legs. In this case, there is a power-law relationship between the two (broadly weight goes up as the cube of linear measurements and the cross-sectional area of the bones increases to support their added weight, although it is a bit more complicated than that as bones have weight etc. and some dinosaurs had air sacs in their bones). If you plot the log of body mass and the log of long bone circumference, then you actually do get pretty much a straight line (with some noise). In this case, you could reasonably say that the OLS regression model is the “signal”, as the statistical model is the same as the physical model.

In the case of the climate, there are multiple forcings and internal feedbacks and variability, some of which is approximately linear, but a lot of it isnt, so in this case the statistical model (a straight line fitted to the data) isn’t the same as the physical model (a general circulation model, or an energy balance model etc.). This means the statistical model can be used to summarise the data in a useful manner (i.e. tell you the trend), but it doesn’t really tell you anything about the physical processes giving rise to the data (but sometimes people assume that they do).

People use linear trend models because even when the system is more complicated than that, if you look at a small enough area (or period in time) then the deviation of the “true model” from the linear approximation that the linear approximation is a useful summary (the Taylor series bit). In the case of climate, the period needs to be long enough that the major modes of internal variability (e.g. ENSO) cancel out, but not so long that we can’t treat the forcings as approximately constant. The World Meterological Office figure of 30 years is about right.

Dikran: Thanks for the detailed response. It appears that I now need to immerse myself into the wonderful world of statistics in order to better understand the higher order discussions that typically occur on this website and elsewhere. (Not tonight though — I’ll be watching the first Presidential Debate until such time as I throw a brick through our TV when Donald Trump makes an inane statement.)

I’m getting old. I did read the Wilcox paper last night. Then I read the recent stuff in the borehole at RC, which is oddly often more informative than the comments. Found Girma there! Xie is not without expertise on aerosols.

sorry, late to the party 😉 Attended the 1.5C conference last week (obviously, given that I’m working at ECI on that very stuff ;-)) and travelled over the weekend.

JCH, it’s exactly what BBD said. There is no mono-causality. The tropical Pacific is a mediator and a good proxy for deviations from expected trends in some periods, but not in others. These periods are, however, never longer than 10-15 years. If you look at Fig 1a in Kosaka and Xie 2016, there is absolutely no warming between 1940-1970 even after “correction” for the tropical Pacific. Clearly, the lack of warming back then is due to anthropogenic aerosols, in perfect agreement with Wilcox et al. 2013.

Further, if you discard the observations between 1941-1945 (in fact, ERSSTv4 is probably biased cold all the way between 1900-1970, except 1940-1945, for some reason: HadSST vs ERSST) the pre-1940 warming is much more gentle and there is certainly no cooling whatsoever between 1945-50. What we are left with is a few short periods where the tropical Pacific does indeed play a noticeable role, with the post-1998 (until 2014) period most pronounced. However, it is noteworthy that this particular anomaly in the tropical Pacific might as well have been forced by Asian aerosols to some degree (Takahashi and Watanabe 2016)