‘Tis the season, it would seem, for questioning the scientific method.

You might recall that back in October, I was a bit miffed by an article in The Atlantic entitled Lies, Damned Lies, and Medical Science and expressed my annoyance in one of my typical logorrheic posts. Then, a mere couple of weeks later, Steve Simon wrote a rather scathing criticism of the very concept of science-based medicine, which I ended up answering, again in my usual inimitable logorrheic fashion. Unfortunately, these things often come in threes. Well, maybe not always threes. It’s not as though this “rule” is anything like the count for the Holy Hand Grenade of Antioch, where “Four shalt thou not count, nor either count thou two, excepting that thou then proceed to three. Five is right out.” Except that five isn’t always right out when it comes to these sorts of criticisms of science and/or science-based medicine.

But enough of my pathetic attempt to channel Mark Crislip. The third count in articles expressing skepticism of the scientific method and science-based medicine comes, for purposes of my discussion, in the form of an article in The New Yorker by Jonah Lehrer entitled The Truth Wears Off: Is There Something Wrong With the Scientific Method? Unfortunately, the full article is restricted only to subscribers. Fortunately, a reader sent me a PDF of the article; otherwise, I wouldn’t have bothered to discuss it. Also, Lehrer himself has elaborated a bit on questions asked of him since the article’s publication and published fairly sizable excerpts from his article here and here. In any case, I’ll try to quote as much of the article as I think I can get away with without violating fair use, and those of you who don’t have a subscription to The New Yorker might just have to trust my characterization of the rest. It’s not an ideal situation, but it’s what I have to work with.

The decline effect

I’m going to go about this in a slightly different manner than one might normally expect. First, I’m going to quote the a few sentences near the end of the article right now at the beginning, because you’ll rapidly see why those of us here at SBM might find them provocative, perhaps even a gauntlet thrown down. Before I do that, I should define the topic of the article, namely something that has been dubbed “the decline effect.” Basically, this is a term for a phenomenon in which initial results from experiments or studies of a scientific question are highly impressive, but, over time, become less so as the same investigators and other investigators try to replicate the results, usually as a means of building on them. In fact, Googling “the decline effect” brought up an entry from The Skeptic’s Dictionary, in which the decline effect is described thusly:

The decline effect is the notion that psychics lose their powers under continued investigation. This idea is based on the observation that subjects who do significantly better than chance in early trials tend to do worse in later trials.

In his article, Lehrer actually does cite paranormal research by Joseph Banks Rhine in the 1930s, whose testing of a self-proclaimed psychic demonstrated lots of “hits” early on, far more than were likely to be due to random chance. Initial results appeared to support the existence of extrasensory perception (ESP). However, as further testing progressed, the number of hits fell over time, hence the term “decline effect” being coined to describe it. Lehrer spends the bulk of his article describing examples of the decline effect, discussing potential explanations for this observation, and trying to argue that the effect can be generalized to nearly all of science. Longtime readers of SBM would probably not find that much particularly irksome or objectionable in his article; that is, until we get to the final paragraph:

Such anomalies demonstrate the slipperiness of empiricism. Although many scientific ideas generate conflicting results and suffer from falling effect sizes, they continue to get cited in the textbooks and drive standard medical practice. Why? Because these ideas seem true. Because they make sense. Because we can’t bear to let them go. And this is why the decline effect is so troubling. Not because it reveals the human fallibility of science, in which data are tweaked and beliefs shape perceptions. (Such shortcomings aren’t surprising, at least for scientists.) And not because it reveals that many of our most exciting theories are fleeting fads and will soon be rejected. (That idea has been around since Thomas Kuhn.) The decline effect is troubling because it reminds us how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.

As you might imagine, this passage rather irritated me with what appears on the surface to border on a postmodernist rejection of the scientific method as “just another way of knowing” in which we as scientists have to “choose what to believe.” Moreover, it certainly seems that many of the examples provided by Lehrer are compelling and curious examples of effect sizes declining in a variety of scientific areas. What’s not quite as compelling is the way Lehrer, whether intentionally or inadvertently, gives the impression (to me, at least) of painting the decline effect as some sort of mysterious and unexplained phenomenon that isn’t adequately explained by the various explanations he describes in his article and therefore casts serious doubt on the whole enterprise of science in general and SBM in particular, given that many of his examples come from medicine. In all fairness, Lehrer did later try to justify the way he concluded his article. to boil it all down, basically Lehrer equivocated that all he meant by the above passage was that science is “a lot messier” than experiments, clinical trials, and peer review and that “no single test can define the truth.” No kidding. (The snark in me might also say that science itself can’t actually define “The Truth.”) But if that’s all that Lehrer really meant, then why didn’t he just say so in the first place instead of sounding all postmodernist, as though science can’t ever make any conclusions that are any more valid than “other ways of knowing”?

So which examples does Lehrer choose to bolster his case that the decline effect is a serious and underrecognized problem in science? He uses quite a few, several from medical sciences (in particular psychiatry), starting the article out with the example of second generation antipsychotics, such as Zyprexa, which appeared to be so much more effective than older antipsychotics in earlier studies but whose efficacy has recently been called into question, as more recent studies have showed lower levels of efficacy, levels that are no better than the older drugs. Of course Lehrer seems never to have heard of the “dilution effect,” whereby new drugs, once approved, are tried in larger and broader ranges of conditions and patients, in particular, in patients with milder cases of the diseases for which the drugs were designed. Over time, this frequently results in the appearance of declining efficacy, when in reality all that is happening is that physicians and scientists are pushing the envelope testing the drugs in patients who are less carefully selected than patients in the early trials. No real mystery here.

Another example came from evolutionary biology, specifically observations on fluctuating symmetry. This passage is taken from a blog post quoting Lehrer’s article:

In 1991, the Danish zoologist Anders Møller, at Uppsala University, in Sweden, made a remarkable discovery about sex, barn swallows, and symmetry. It had long been known that the asymmetrical appearance of a creature was directly linked to the amount of mutation in its genome, so that more mutations led to more “fluctuating asymmetry.” (An easy way to measure asymmetry in humans is to compare the length of the fingers on each hand.) What Møller discovered is that female barn swallows were far more likely to mate with male birds that had long, symmetrical feathers. This suggested that the picky females were using symmetry as a proxy for the quality of male genes. Møller’s paper, which was published in Nature, set off a frenzy of research. Here was an easily measured, widely applicable indicator of genetic quality, and females could be shown to gravitate toward it. Aesthetics was really about genetics.

In the three years following, there were ten independent tests of the role of fluctuating asymmetry in sexual selection, and nine of them found a relationship between symmetry and male reproductive success. It didn’t matter if scientists were looking at the hairs on fruit flies or replicating the swallow studies—females seemed to prefer males with mirrored halves. Before long, the theory was applied to humans. Researchers found, for instance, that women preferred the smell of symmetrical men, but only during the fertile phase of the menstrual cycle. Other studies claimed that females had more orgasms when their partners were symmetrical, while a paper by anthropologists at Rutgers analyzed forty Jamaican dance routines and dis- covered that symmetrical men were consistently rated as better dancers.

Then the theory started to fall apart. In 1994, there were fourteen published tests of symmetry and sexual selection, and only eight found a correlation. In 1995, there were eight papers on the subject, and only four got a positive result. By 1998, when there were twelve additional investigations of fluctuating asymmetry, only a third of them confirmed the theory. Worse still, even the studies that yielded some positive result showed a steadily declining effect size. Between 1992 and 1997, the average effect size shrank by eighty per cent.

And it’s not just fluctuating asymmetry. In 2001, Michael Jennions, a biologist at the Australian National University, set out to analyze “temporal trends” across a wide range of subjects in ecology and evolutionary biology. He looked at hundreds of papers and forty-four meta-analyses (that is, statistical syntheses of related studies), and discovered a consistent decline effect over time, as many of the theories seemed to fade into irrelevance. In fact, even when numerous variables were controlled for — Jennions knew, for instance, that the same author might publish several critical papers, which could distort his analysis—there was still a significant decrease in the validity of the hypothesis, often within a year of publication. Jennions admits that his findings are troubling, but expresses a reluctance to talk about them publicly. “This is a very sensitive issue for scientists,” he says. “You know, we’re supposed to be dealing with hard facts, the stuff that’s supposed to stand the test of time. But when you see these trends you become a little more skeptical of things.”

Jennions’ article was entitled Relationships fade with time: a meta-analysis of temporal trends in publication in ecology and evolution. Reading the article, I was actually struck by how relatively small, at least compared to the impression that Lehrer gave in his article, the decline effect in evolutionary biology was found to be in Jennions’ study. Basically, Jennions examined 44 peer-reviewed meta-analyses and analyzed the relationship between effect size and year of publication; the relationship between effect size and sample size; and the relationship between standardized effect size and sample size. To boil it all down, Jennions et al concluded, “On average, there was a small but significant decline in effect size with year of publication. For the original empirical studies there was also a significant decrease in effect size as sample size increased. However, the effect of year of publication remained even after we controlled for sampling effort.” They concluded that publication bias was the “most parsimonious” explanation for this declining effect.

Personally, I’m not sure why Jennions was so reluctant to talk about such things publicly. You’d think from his responses in Lehrer’s interview that scientists would be coming for him with pitchforks, hot tar, and feathers if he dared to point out that effect sizes reported by investigators in his scientific discipline exhibit small declines over the years due to publication bias and the bandwagon effect. Perhaps it’s because he’s not in medicine; after all, we’ve been speaking of such things publicly for a long time. Indeed, we generally expect that most initially promising results, even in randomized trials, will not ultimately pan out. In any case, those of us in medicine who might not have been willing to talk about such phenomena became more than willing after John Ioannidis published his provocatively titled article Why Most Published Research Findings Are False around the time of his study Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. Physicians and scientists are generally aware of the shortcomings of the biomedical literature. Most, but sadly not all of us, know that early findings that haven’t been replicated yet should be viewed with extreme skepticism and that we can become more confident in results the more they are replicated and built upon, particularly if multiple lines of evidence (basic science, clinical trials, epidemiology) all converge on the same answer. The public, on the other hand, tends not to understand this.

John Ioannidis, John Lehrer, and the decline effect, and me

Of course, the work of John Ioannidis, as discussed here before, most recently by myself, provides an excellent framework to understand why effect sizes appear to decline over time. Although Ioannidis has been criticized for exaggerating the extent of the problem and even using circular reasoning, for the most part I find his analysis compelling. In medicine, in particular, early reports tend to be smaller trials and experiments that, because of their size, tend to be more prone to false positive results. Such false positive results (or, perhaps, exaggerated results that appear more positive than they really are) generate enthusiasm, and more investigators pile on. There’s often a tendency to want to publish confirmatory papers early on (the “bandwagon effect”), which might further skew the literature too far towards the positive. Ultimately, larger, more rigorous studies are done, and these studies result in a “regression to the mean” of sorts, in which the newer studies fail to replicate the large effects seen in earlier results. This is nothing more than what we’ve been writing right here on SBM ever since its inception, namely that the normal course of clinical research is to start out with observations from smaller studies, which are inherently less reliable because they are small and thus more prone to false positives or exaggerated effect sizes

In his article, Lehrer blames in essence three things for the decline effect: publication bias, selective reporting, and the culture of science, which contributes to the proliferation of the first two problems. Publication bias has been discussed here on SBM on multiple occasions and in various contexts. Basically, it’s the phenomenon in which there is a marked bias towards the publication of “positive” data; in other words, negative studies tend not to be reported as often or tend to end up being published in lower tier, lower “impact” journals. To Lehrer, however, publication bias is not adequate to explain the decline effect because, according to him:

While publication bias almost certainly plays a role in the decline effect, it remains an incomplete explanation. For one thing, it fails to account for the prevalence of positive results among studies that never even get submitted to journals. It also fails to explain the experience of people like Schooler, who have been unable to replicate their initial data despite their best efforts.

This is what is known about being (probably) right for the wrong reasons. I would certainly agree that publication bias is probably an incomplete explanation for the decline effect, although I would be very curious about the prevalence of positive results among studies that never get submitted to journals; it’s pretty darned rare, in my experience, for positive results not to be submitted for publication unless there are serious flaws in the studies with positive results or some other mitigating circumstance takes hold, such as the death of the principal investigator, a conflict over the results between collaborating laboratories, or a loss of funding that prevents the completion of necessary controls or additional experiments. If Lehrer has evidence that show my impression that failure to publish positive results is rare, he does not present it.

I would also argue that Lehrer is probably only partially right (and makes a huge assumption to boot) when he argues that publication bias fails to explain why individual investigators can’t replicate their own results. Such investigators, it needs to be remembered, initially published highly positive results. When they have trouble showing effect sizes as large and seemingly robust as their initial results, doubt creeps in. Were they wrong the first time? Will reviewers give them a hard time because their current results do not show the same effect sizes as their original results? They hold back. True, this is not the same thing as publication bias, but publication bias contributes to it. A journal’s peer reviewers are probably going to give an investigator a much harder time for a result showing a smaller effect size if there is published data from before that shows a much larger effect size; better journals will be less likely to publish such a result, and investigators know it. Consequently, publication bias and selective reporting (the investigator holding back the newer, less compelling results, knowing the lower likelihood of getting it published in a top tier journal). Other investigators, not invested in the original investigator’s initial highly positive results, are less likely to hold back, and, indeed, there may even be an incentive to try to disprove a rival’s results.

Lehrer makes a good point when he points out that there is such a thing as selective reporting, wherein investigators tend to be less likely to report findings that do not fit into their current world view and might even go so far as to try to shoehorn findings into the paradigm they currently favor. He even goes so far as to give a good example of cultural effects on selective reporting, specifically the well-known tendency of studies of acupuncture from China to be far more likely to report positive results than studies of acupuncture done in “Western” nations. He points out that this discrepancy “suggests that scientists find ways to confirm their preferred hypothesis, disregarding what they don’t want to see.” Or, as Simon and Garfunkel once sang in The Boxer, “a man hears what he wants to hear and disregards the rest.” It is not surprising that scientists would share this quality with their fellow human beings, but it is devilishly difficult to identify and quantify such biases. That, of course, doesn’t stop proponents of pseudoscience from crying “bias!” whenever their results are rejected by mainstream science.

In this context, a high popularity of research topics has been argued to have a detrimental effect on the reliability of published research findings [2]. Two distinctive mechanisms have been suggested: First, in highly competitive fields there might be stronger incentives to “manufacture” positive results by, for example, modifying data or statistical tests until formal statistical significance is obtained [2]. This leads to inflated error rates for individual findings: actual error probabilities are larger than those given in the publications. We refer to this mechanism as “inflated error effect”. The second effect results from multiple independent testing of the same hypotheses by competing research groups. The more often a hypothesis is tested, the more likely a positive result is obtained and published even if the hypothesis is false. Multiple independent testing increases the fraction of false hypotheses among those hypotheses that are supported by at least one positive result. Thereby it distorts the overall picture of evidence. We refer to this mechanism as “multiple testing effect”. Putting it simple, this effect means that in hot research fields one can expect to find some positive finding for almost any claim, while this is not the case in research fields with little competition [1], [2].

I discussed the implications of this paper in my usual nauseating level of detail here. Suffice to say, the more scientists working on a problem there are, the more false positives there are likely to be, but, as the field matures, there is a regression to the mean. Also, don’t forget that initial exciting results are often published in the “highest” impact journals, publication in which can really make a scientist’s career take off. However, because these results are the most provocative and might even challenge the scientific consensus strongly, they also have a tendency to turn out later to be wrong. Leaving out this aspect is a major weakness in Lehrer’s analysis, particularly given that each of the examples he provided could easily have a major component of the “popularity effect” going on.

The bottom line: Is the scientific method unreliable?

As I read Lehrer’s article, I was troubled. No, I wasn’t troubled because the implications of his article were somehow shaking my view of the reliability of science and the scientific method. I certainly wasn’t troubled by his discussing known problems with how science is practiced by fallible human beings, how it almost always isn’t done completely according to the idealized version of the scientific method taught to us in high school. After all, I’ve discussed the problems of publication bias and deficiencies in the peer review system seemingly ad nauseam. Rather, I was troubled by the final paragraph, quoted above, in which Lehrer seems to be implying, if not outright arguing, that science is nothing more than competing narratives between which scientists must choose, each of them not particularly well supported by data. Jerry Coyne nails it when he comments:

But let’s not throw out the baby with the bathwater. In many fields, especially physics, chemistry, and molecular biology, workers regularly repeat the results of others, since progress in their own work demands it. The material basis of heredity, for example, is DNA, a double helix whose sequence of nucleotide bases codes (in a triplet code) for proteins. We’re beginning to learn the intricate ways that genes are regulated in organisms. The material basis of heredity and development is not something we “choose” to believe: it’s something that’s been forced on us by repeated findings of many scientists. This is true for physics and chemistry as well, despite Lehrer’s suggestion that “the law of gravity hasn’t always been perfect at predicting real-world phenomena.”

Lehrer, like Gould in his book The Mismeasure of Man, has done a service by pointing out that scientists are humans after all, and that their drive for reputation—and other nonscientific issues—can affect what they produce or perceive as “truth.” But it’s a mistake to imply that all scientific truth is simply a choice among explanations that aren’t very well supported. We must remember that scientific “truth” means “the best provisional explanation, but one so compelling that you’d have to be a fool not to accept it.” Truth, then, while always provisional, is not necessarily evanescent. To the degree that Lehrer implies otherwise, his article is deeply damaging to science.

Indeed. There is no such thing as scientific “truth,” actually. In fact, one thing I noticed right away in Lehrer’s articles is that the examples he chose were, by and large, taken from either psychology, parapsychology, or ecology, rather than physics and chemistry. True, he did point out how most gene association studies with diseases thus far have not been confirmed and how different groups find different results, but finding such associations is something that is currently popular but not a mature field. According to the “popularity effect,” it is not surprising that there is currently a lot of “noise” out there in terms of scientific results. Over the next decade, it is very likely that many of these questions and disagreements will be sorted out scientifically.

Finally, Lehrer’s view also seems not entirely consistent in some ways. I’ll show you what I mean. On his blog, as I mentioned before, Lehrer answers reader questions and expands upon his ideas a bit. A reader asks Lehrer, “Does this mean I don’t have to believe in climate change?” Lehrer’s response is, basically, that “these are theories that have been verified in thousands of different ways by thousands of different scientists working in many different fields,” which is, of course, true, but almost irrelevant given Lehrer’s previous arguments. After all, even though I accept the scientific consensus regarding anthropogenic global warming, if publication bias and selective reporting can so distort science for so long in other fields, I have to ask how would Lehrer say he accepts the science of global warming. One way is that he quite correctly points out that the “truths” of science (I really hate using that word with respect to science) depend upon the strength of the “web” supporting them, namely the number of interconnections. We say that here ourselves time and time again as arguments against pseudoscience such as, for example, homeopathy. However, if, as Lehrer seems to be arguing, scientists already put their results into the context of what is known before, isn’t he just basically arguing for doing what we are already doing, even though he has just criticized science for being biased due to selective reporting due to scientists’ existing preconceptions?

Although Lehrer makes some good points, where he stumbles, from my perspective, is when he appears to conflate “truth” with science or, more properly, accept the idea that there are scientific “truths,” even going so far as to use the word in the title of his article. That is a profound misrepresentation of the nature of science, in which all “truths” are provisional and all “truths” are subject to revision based on evidence and experimentation. The decline effect–or, as Lehrer describes it the title of his article, the “truth wearing off”–is nothing more than science doing what science does: Correcting itself.

48 thoughts on “The “decline effect”: Is it a real decline or just science correcting itself?”

As I was starting to read this post, the tires squealed as my brain slammed on the breaks after reading:
“Steve Simon wrote a rather scathing criticism”

You really think his blog was scathing? I can’t pick out what he wrote that warranted that description as it was rather tepid to me.
I’m not telling you how to write but using loaded words places the bottle of poison on the edge on the well while sitting next to the boy who cried wolf.

Any studies of Zyprexa would be in the realm of psychiatry, not psychology.

Another reason I would expect to see falling effect sizes is that when an effective treatment is discovered for one thing in one population, work is immediately undertaken to see if it’s effective for other things in other populations.

I would expect to see inclusion criteria expand over time, another reason to see a dropping effect size.

Both Ioannidis’ and Lehrer’s papers are important discussion points for both researchers and those who depend upon that research.

The challenge I have had with Ioannidis’ paper, is dealing with people touting pseudoscience. In several discussion boards I am on, his paper has come up as a reason ‘mainstream’ science is wrong; ergo their brand of pseudoscience must be correct.

His conclusions have been made widely available across the web. I had come across Lehrer’s article and I can just imagine the impact it will have on the woo-meisters.

I find these nuanced discussions particularly difficult explain to people who have a bias against science in the first place.

Science based Medicine is one of the first places I turn for assistance.

Draal, I thought the Simon article was scathing too. I attributed much of that to the “on the internet no one knows that you are a dog” effect, that because communication on the internet is slow and difficult that people increase the “contrast” to make sure their point gets across. To get heard on the internet you have to SHOUT and that shouting is easy to misinterpret. When it is misinterpreted people try to justify what they said by shouting even louder. That is how flame wars can get started and perpetuated. It takes someone quite secure in their position and even tempered (as Dr Gorski is), to not respond in kind.

I think the Lehrer article is similar, but I have not read the article, only snippets and reviews.

The problem isn’t with science, the problem is with the human foibles of scientists and the constraints that other humans impose on them to allow them to do science. The goal in science is to do science and to understand reality as it is. The goal of humans is to become more popular, to attain greater wealth, influence and kudos and to move up in the social hierarchy. So what is the goal of human scientists? To move up the social hierarchy by doing science. One can only move up in a social hierarchy when your peers move you up. In science that can only happen by doing science that is popular and exciting and that science is usually fad driven.

I do find it interesting that when non-scientists like Lehrer discuss the problem, the “solution” seems to be to put less reliability on “science”, less reliability on facts and logic and more on the “social aspects” of the doing of science, that maybe there are other ways of knowing that should be listened to. I see this approach as an artifact of why journalists write articles, to move up the social hierarchy of journalism and so attain wealth, influence and kudos. An easy way to do that is by pulling another social hierarchy down, in this case the hierarchy of scientists.

The funding of science is quite problematic. Scientists want to be popular and want to be funded, non-scientists want to fund only the most popular science. Scientists want to work on popular science. Facts and logic have nothing to do with popularity.

Doing science is a solitary endeavor. Scientists do their science by themselves in their own mind-space. They can and should interact and can learn much faster and discard wrong stuff much easier by interacting with other scientists, but those interactions are social interactions, and so the human social interaction paradigms come into play and it is very hard for a scientist at the top of the scientist hierarchy to listen to someone at the bottom, even when the scientist at the bottom is correct. It is also difficult for a scientist at the bottom to tell the scientist at the top that he/she is wrong. This was one of the reasons that Niels Bohr sought out Feynman for discussions even though Feynman was 33 years younger. Feynman didn’t let Bohr’s stature as a physicist prevent him from voicing disagreement.

Non-scientists want to have a short cut by which they can evaluate scientific findings without understanding the science. This is not possible, and any attempt to do so will fail. Unfortunately the reason non-scientists have for wanting to evaluate scientific findings (to move up in a social hierarchy) can be successful even while using pseudoscience, denialism or even magical thinking. AGW, evolution, germ theory are all extremely reliable scientific findings. There is no reliable data that calls them into question. But calling them into question does bring people wealth, influence and kudos. It should not be a surprise that people adopt positions that bring them wealth, influence and kudos even when those positions do not correspond with reality. The positions are not about reality, they are about social power.

“Whenever a commenter tells me he’s “not telling me how to write,” it’s 99% certain that he either just did or is just about to.

Just an observation, and I don’t see any reason to change my characterization.

I see. I wasn’t clear enough. Previously, a blogger said, “Simon has even written a very good deconstruction of postmodern attacks on evidence-based medicine (EBM).” A blogger followed up with saying that Simon’s follow-up rebuttal was “rather scathing criticism.” Whoa. How did it go from “very good” to “rather scathing?”
I asked for clarification because I’ve read the scathing post and apparently I missed something. Since I don’t think that the criticism was “rather scathing”, I considered whether a blogger was unfairly characterizing Simon’s post by crying wolf and/or poisoning the well. Surely I must be wrong so I looked to a blogger for clarification.

A blogger surely must be relying on confirmation bias to arrive at a 99% probability of a commenter’s use of the words “not telling me how to write”. To avoid such confusion then, a commenter should refrain from using that phrase in the future. A commenter will contemplate how to rephrase to get across the message of “please believe me when I say I’m not asking for the original text to change.”

Am I missing something? A “very good” previous post and a “scathing” post are not mutually exclusive. I’m sure that I’ve done “very good” previous posts followed by not-so-good posts, and I happen to think I’m a pretty good writer. In any case, remember, in my original post about Simon’s criticism, I was describing two different posts, and, in fact, I was contrasting the previous “very good” post with the main post.

Geez, quite frankly, I think you’re being incredibly nitpicky here about a peripheral topic–borderline off topic, even, which is why I will say: I stand by both of my characterizations, and that’s all I will say on that score any more.

Having read the full article, I must say I was utterly shocked by the stupidity of many of the claims. I am a scientist myself and believe to have a reasonable understanding of the concepts of p-values and probability. The New Yorker article was full of claims that are totally obvious if one understand what a p-value actually is. All the way through the article I felt somewhere in between hitting the author with a statistics book and laughing out loud.

And opposed to Dr. Gorskis suspicion, the author even speaks about ‘hard’ science, i.e. physics: Lehrer is in utter shock that once (!!) an experiment about gravity found a 2.5% discrepancy, after which he went on to conclude that in the end it’s all about what we believe. I would actually expect 2.5% of experiments to show a >2.5% discrepancy, and so I am kind of shocked only 1 experiment found a discrepancy. But Lehrer seems to believe he will lift of the ground anytime soon, now that gravity has been disproved. One should hand the man a basic statistics book!

In general, I think the level of understanding of statistics in the general public and in particular amongst journalists is an absolute disaster. And this article is a shocking example of this lack of education. The frustrating thing is that readers will come to believe that the truth is whatever they want it to be: creationists, quacks, climate-change deniers and many others will all feel much more comfortable in believing their pseudoscience. I come to believe that the way to fight against pseudoscience is too teach people statistics.

Unless you throw it under “culture of science” I see a few another contributing for the ‘decline effect”: Improved controls and methodology as the subject being studied becomes better understood and more rigorously studied.

For instance, the more one understands about paranormal claims such as how they are faked and how our various biases can affect perceptions of specific phenomena, the better one can design an experiment with improved methodology and tighter controls.

There are now better controls for testing acupuncture that at least tell us it doesn’t matter where you needle or whether you needle at all vs just twisting and poking with toothpicks in order to get the effect, thus (at a bare minimum) invalidating most of the traditional underlying “understanding” of acupuncture.

“Is the scientific method unreliable?” No, because it is that same method that is telling us about the decline. Science is self correcting. It wouldn’t make a whole lot of sense to say that the scientific method doesn’t work, and we have scientific proof to support that position.

“In his article, Lehrer blames in essence three things for the decline effect: publication bias, selective reporting, and the culture of science, which contributes to the proliferation of the first two problems. ”

Perhaps the clueless Lehrer failed to consider this possibility: later study protocols may sometimes be engineered to better eliminate confounding factors.

I’d just finished reading this article in my New Yorker and was wondering when I’d see someone here take it on.
My impression was that it’s actually not a bad article, and for a non-scientist reader makes a good case for just how hard science actually is.
But that very last sentence: “When the experiments are done, we still have to choose what to believe.” seems like it was tacked on by an editor.

From what I know, the gravity constant problem he is talking about is the following:

At some point, one group measured odd deviations from the gravitational constant. Other groups tried to replicate this and found anomalies but could not confirm the initial finding. Surprisingly, measuring the gravitational constant is extremely difficult. I have seen one of those experiments and they are very very complicated. All kind of things can have nasty effects, for example snow lying on the roof of the building where the experiment is stored can make it seem like gravity is weakening. If there is any problem with electro-magnetic forces ‘leaking’ into the experiment, the measurement will turn out wrong. Given all this, I would say the answer to this ‘mystery’ is that physicists are having trouble perfecting the experiments and are underestimating their errors.

I am not completely sure about the other physics example (neutron decay time) he mentions, but without a doubt measuring such constants is extremely complex and usually people start out with ‘simple’ experiments with strong systematics and begin to perfect them over time.

I am not saying that there might not be a problem with those values, but the examples he mentions are simply proof that experimental physics is very difficult and building experiments is not a simple task.

Have often read Lehrer’s columns, especially when he was blogging on ScienceBlogs, and am usually impressed by how subtle and interesting they are. So I give him the benefit of the doubt here and think that he has simply not explained himself well enough.

OTOH, it’s in the bloody New Yorker! Oof. Guess they needed to supplement Malcolm Gladwell’s nonsense.

I don’t think there’s any way to avoid some positive bias in publication, particularly the early publications on a topic.

If I have a novel theory, and I do some experiments to test it, and the experiments disconfirm the theory, there is simply no way that I can get it published. Nobody wants to read evidence that a theory that never occurred to them is wrong. If I’m lucky, I may be able so slip them into a paper on some related topic, perhaps as supplementary data. But I imagine that most scientists have data like this languishing in a drawer somewhere.

Of course, once a hypothesis is established, results that challenge it become publishable.

Moreover, “clean” negative results with tight confidence limits are not that common. More commonly, the “negative” experiment is actually inconclusive. I had an interesting hypothesis, and did some experiments that turned out to be positive, with p < 0.05. But because the experiment was pivotal, I wanted a stronger result, so I repeated the study. The replication had a lot of noise, and did not show a significant effect. The pooled data supported the hypothesis with a p of 0.3. Now what can I do with that? It's not negative, but it's too weak to even call it a "trend." Nobody will publish that; I wouldn't want to read it myself. I can't just throw out the later data, because that would bias the results; aside from a large standard deviation, there isn't anything wrong with it. Besides, I've learned to be wary of data that is too pretty–I've been led astray before by a single result that looked too clean to be wrong, but (after lengthy follow-up studies) turned out to be incorrect. So the result goes into the drawer until I am able to find a better experimental approach to the same question. As it happens, I ultimately managed to confirm the hypothesis by another method–too bad I hadn't settled for p < 0.05 (although I still think the decision was correct in principle).

The “bandwagon” effect is similar to my first thought, but I came at it from the other end.

While there is a tendency for lots of researchers to jump in and repeat a new discovery, there is also a tendency for an established discovery to later be pushed to its reasonable limits. Once a basic principle like symmetry attraction is established in bird wings, researchers will later attempt to apply it in more aggressive ways, such as linking physical symmetry to scent attraction. Those are not the same hypothesis, and the second is clearly an attempt to expand the principle rather than merely confirm it.

Reporting and publishing bias is also not just a matter of positive or negative results, it is a matter of avoiding boring results. Why should anyone carry out, or publish, an exact replica of an experiment which has already been confirmed several times? At that point, negative results are actually more interesting. Hence… we publish and read poorly-constructed articles which claim that “science” doesn’t work.

And that is where this comes in:

[Published by David Gorski]
Physicians and scientists are generally aware of the shortcomings of the biomedical literature. Most, but sadly not all of us, know that early findings that haven’t been replicated yet should be viewed with extreme skepticism

Taking an average of all published papers on a science subject is like taking the average reliability of information about Saudi Arabia from the CIA Factbook and Alex Jones’ Infowars. An idiot would then conclude that “The Internet” is a bad research tool because, on average, it is only half right. A smart person doesn’t use Alex Jones as a source, and their reliability rate for information retrieved from “The Internet” is much higher.

quote from linked page “The blog raises important issues and comments intelligently on them. I disagree with the need to distinguish between SBM and EBM. Maybe we should distinguish between EBM and PIEBM (Poorly Implemented Evidence Based Medicine).”

Lehrer:” And just because an idea can be proved doesn’t mean it’s true. ”
What the hell does that mean?

I think he mixes up proof with evidence and evidence with truth.

I think he meant to say: Just because an hypothesis is initially supported by the evidence of a positive clinical trial, doesn’t mean that, in the long run, the body of accumulating evidence will continue to support that hypothesis.

I think he meant to say: Just because an hypothesis is initially supported by the evidence of a positive clinical trial, doesn’t mean that, in the long run, the body of accumulating evidence will continue to support that hypothesis.

If that’s what he meant to say, he said it very, very badly, which is, of course, why I pointed out that he should have just said what he meant.

“Rather, I was troubled by the final paragraph, quoted above, in which Lehrer seems to be implying, if not outright arguing, that science is nothing more than competing narratives between which scientists must choose, each of them not particularly well supported by data.”

As a layman, that would not be my reading of the paragraph in question. My reading is that we should not unquestioningly accept the current scientific answers as a picture of “the truth” we should take them with a grain of salt. We should avoid the belief that our discoveries are concrete and be willing to continually adjust them as new evidence is found.

To me this statement does not degenerate the scientific approach. It encourages the laymen to understand the strengths and weaknesses of the scientific approach and encourages the scientific community to continually question their assumptions, thereby being more scientific.

Also, am I missing where Lehrer uses the words “just an other ways of knowing” or are your rephrasing and possibly injecting a new meaning into his words?

One can not comprehend or have any trust in science unless one can start to understands it’s weaknesses and how and why sometimes science finds that it was mistaken and offers corrections. Yet it often seems attempts to explain this process to laymen are attacked as anti-science, post modern or too obvious. It sometimes appears that scientists either don’t believe the public is intelligent enough to understand the intricacies of science or they are just extremely defensive about glitches in the scientific process.

Sometimes it feels like there’s a science cop standing on the corner of every interesting scientific “accident” waving laymen away saying “Carry on Folks. Nothing to see here. Move along.”

But on a more positive note. I did enjoy the explanation of the decline effect. Dr Gorski, I think you have a particular talent for making difficult concepts understandable to non-science folks.

I’m not really sure why we would have such different readings of this article (paragraph), maybe our different perspectives…

Maybe I’ll have to go out and buy the magazine to see the whole article, maybe something not quoted is leading to the difference.

“One can not comprehend or have any trust in science unless one can start to understands it’s weaknesses…”

… and I stopped dead in my tracks. Science, to my mind, is nearly without weakness. Science has an unrivaled record of recognizing, studying, illuminating, and ultimately chipping away at the huge block of human ignorance. Science never claims absolute truth, only an approximation of truth within a defined set of parameters.

“…and how and why sometimes science finds that it was mistaken and offers corrections.”

This is, of course, the single greatest strength of science – it is an evolutionary enterprise that thrives on inquiry rather than dogma.

Yes, exactly what I am saying. Science only continues to overcome the weaknesses of it’s components by understanding and accounting for them AND that’s why it’s so good. You can’t comprehend or realistically trust until you understand that science’s strength is it’s emphasis on evolutionary inquiry rather than dogma. You can’t incorporate science into a decision until you understand that your information is “The most acceptable interpretation of what we know thus far” rather than “the absolute truth”.

You can’t incorporate science into a decision until you understand that your information is “The most acceptable interpretation of what we know thus far” rather than “the absolute truth”.

What really trips you up is when you can’t distinguish between established science and interesting hypotheses.

Gravity is established science. You can choose to disregard it on the grounds that newton’s laws are not absolute truth, but you’ll have a really, really hard time of it because the fact of gravity is as close to absolute truth as we’re ever going to get.

That schizophrenia might be preventable with prenatal vitamins is an interesting hypothesis with some scientific support. You should understand the limitations of that characterization, sure. But the existence of tentative interpretations does not negate the fact of very, very solid science that cannot be treated as tentative.

Alison – “But the existence of tentative interpretations does not negate the fact of very, very solid science that cannot be treated as tentative.”

Also yes. But not my point. My point was that I did not read Lehrer’s statement to mean “There are difficulities in science that negate science” I read it more to mean “These are the difficulities that science needs to consider” leaving it somewhat open ended as to what should or is being done is science for that consideration.

As a laymen, I do not think “slippery empiricism” equals “no sking allowed!” It only makes me wonder how the slops are navigated.

Dr Goski addresses this in his article “Physicians and scientists are generally aware of the shortcomings of the biomedical literature. Most, but sadly not all of us, know that early findings that haven’t been replicated yet should be viewed with extreme skepticism and that we can become more confident in results the more they are replicated and built upon, particularly if multiple lines of evidence (basic science, clinical trials, epidemiology) all converge on the same answer. The public, on the other hand, tends not to understand this.”

So, upon reflection, I could see how Lehrer may be criticized for leaving his “slippery empiricism” question open (I rather enjoy the openness as it allows room for the reader to think about the answer rather than being feed it, but that’s personal) but I don’t read a suggestion that science should be disregarded because of slippery empiricism” or because it’s “difficult to prove anything”.

“Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.”

He is stating very clearly that science is meaningless and that belief must always be a question of personal choice. I can’t see how he could make it any clearer.

“Just because an idea is true doesn’t mean it can be proved.”
Examples, please? … Oh yes, Russell’s teapot. Of course. It’s definitely out there but we have defined it in such a way that we can never prove it. See? Science is so inadequate.

“And just because an idea can be proved doesn’t mean it’s true.”
It took me a while to think what this might mean, then I recalled my high-school classmate using this example of (inductive? deductive?) logic to illustrate this very fact:
— Cats have four legs.
— Fido has four legs.
— Therefore, Fido is a cat.
A solid, incontrovertable proof, except that we all know that the truth is that Fido is a severely wounded spider. Proof is meaningless. Gravity may be proven but that doesn’t make it true.

When the experiments are done, we still have to choose what to believe.”
Interestingly, after this whole article on tentativeness, he states that we have to believe. Agnosticism is not an option.

No. We are compelled to act, but we do not have to believe. Belief and action are different things. If there is proof, we believe. If there is no proof, merely tentative evidence, we may choose to act on that evidence without believing anything has been proven.

A pregnant woman presented with tentative evidence that vitamin supplementation might prevent schizophrenia in her child doesn’t have to choose whether to believe it or not. She can know perfectly well that the odds of this hypothesis being true are on the order of 5% and then choose to take prenatal vitamins not because she believes they will do anything but because she thinks it’s plausible that they might… and because based on the evidence, she believes (to 99.5% certainty) that they won’t hurt.

She also believes with utter certainty that if she jumps off a building she will fall, and I can say with utter certainty that she is correct.

I can’t see how that final paragraph can be interpreted any other way than accepting scientific evidence being a faith- or preference-based or otherwise arbitrary choice.

Just because there is life in a distant solar system does not mean you can prove it. There are things that currently exist which we have not currently gathered enough evidence to prove, it may be we will never gather enough evidence to prove them.

“And just because an idea can be proved doesn’t mean it’s true.”
This is vague, he could be referring to poor evidence or things such as mathematical fallacies or instances were statisticians skew results by using different parameters. To be fair, my reading was more like “just because an idea appears to be proved doesn’t mean it’s true.”

“When the experiments are done, we still have to choose what to believe.” When an experiment is finished we still have humans interpreting the results, checking them against our concept of reality, previous scientific consensus (bias, etc) and our brain’s possibly limited ability to comprehend the results…a grain of salt. I don’t see that as a reason to discard that interpretation of results.

That’s a general outline of my reading.

Now, I often assume that other people think pretty much the way I do (it’s just easier for me), but upon occasion that turns out not to be the case. So if no one else’s reading was similar to mine and most other people who are interested in science read the paragraph more along the lines of you (Alison) and Dr. Gorski, I will accept that this is one of those moments where I am just weird.

But Lehrer did clarify later, and it doesn’t seem his intentions were to discredit science, right?

Lehrer:
“And just because an idea can be proved doesn’t mean it’s true”.

BillyJoe:
“I think he meant to say: Just because an hypothesis is initially supported by the evidence of a positive clinical trial, doesn’t mean that, in the long run, the body of accumulating evidence will continue to support that hypothesis”.

David Gorski
“If that’s what he meant to say, he said it very, very badly, which is, of course, why I pointed out that he should have just said what he meant”.

I agree.
That smiley was meant to convey that, but perhaps it conveyed it very, very badly

I did enquire rather jokingly about this, but I also read it as merely a loose use of words under the impetus of the rhetoric at the time. Lehrer may well understand that he is using a distorted meaning of “proven”, and an absolute meaning of “true” that is rarely reached in the real world in the sense of “with no exceptions” or much beyond “on the evidence available at this time”.

This last is how medicine functions, as David and others have pointed out. Medicine is a practical field often demanding immediate action. What else can we to do but go on the best evidence at the time, even if it is has weaknesses?

In consequence it can be extremely confusing to be thinking in terms of “truth” and “proof” in some medical contexts, even though a lot of other medical knowledge is so solid that it would be ridiculous to expect it to ever change materially.

It’s better to think in terms of “adequacy of evidence”, understanding also that within medical decision-making there is also always an unspoken “for this particular practical purpose”.

This is why we are forced to use apparently easily bent statistical rules of thumb when assessing the effectiveness of treatments of subjective complaints. This is the main area of concern within medicine, yet it would be impossibly expensive and time-consuming to go much further than we do towards “proof” or “truth”. We expect that the sheer number of studies eventually performed and the intense scrutiny to which they are subjected will eventually bring us “close to truth”.

It is so annoying when “alternative” circles use this kind fo material as an reason for believing what they want. If science, as practiced and interpreted, has certain shortcomings the real question is “as compared to what?”

“So if no one else’s reading was similar to mine and most other people who are interested in science read the paragraph more along the lines of you (Alison) and Dr. Gorski, I will accept that this is one of those moments where I am just weird. :)”

As a patent lawyer, I can confirm the article’s minor point about how very hard it is to find publications with negative results!

By the way, the discussion of

” Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.”

was fertile and provocative. However, would anyone agree that the most valid reason for the paragraph to exist is that it sounds good? What writer could resist it, or replace it with a more reasoned analysis?

Yes, I agree. Those two sentences appear to have been used more because they sound really, really good from a language and rhetorical standpoint, rather than because they conveyed the idea that Lehrer was apparently trying to convey based on his followup blog post. In fact, even I had to stand in awe of how cool a turn of phrase those two sentences are. If he had just avoided the cool-sounding rhetoric and said what, according to his later post, he apparently really meant, I bet there would have been far fewer complaints.

Even without those sentences, though, Lehrer’s article struck me as too nihilistic and as implying that it’s so difficult for science ever to prove anything that in the end scientists are left choosing what to believe from a number of competing results, each of which, subject as it is to the decline effect, are not very compelling.

I read both the New Yorker and Atlantic articles and found them alarming, since I hadn’t been aware of the decline effect previously.

As I am a dedicated in-my-head ranter against belief in anecdotal medicine (homeopathy, chiropractic, autism from vaccines, and so on), I was not happy to find well written material casting doubt on the scientific method as applied to clinical medicine – because what else do we have?

Although sociological and psychological studies have always been easy to pick holes in, a really large, double-blind, clinical trial ought to mean something, right? So this discussion has been a great relief to me. The decline effect may be real, but it is also a phenomenon tameable by analysis and can be brought to heel.

I’d be interested in how it applies to “hard science”. Most of the discussions in both articles were about clinical and psychological studies. Lehrer did very briefly mention particle physics, but I’d like to see more about the decline effect in, say, synthetic chemistry.

“If he had just avoided the cool-sounding rhetoric and said what, according to his later post, he apparently really meant, I bet there would have been far fewer complaints.”

I agree that there is definably poetic license going on there. I enjoyed it and I think that vagueness, along with the complaints and follow up discussion actually make up a more compelling presentation of the ideas involved than if he just told us exactly what he meant from the start.

I can not say if that was intentional or not. I would suggest that it was, because the title of the article is a question, this often suggests to me that the writer expects the reader to attempt to answer the question. Discussion ensues. I could be wrong. Maybe as a writer, he just could not resist the turn of phrase.

I agree with Billy Joe and Michele on the interpretation of the author’s last paragraph. I don’t think it detracts from science’s value at all, though it perhaps knocks the pedestal slightly askew for some. It merely supports the author’s apparent belief that individuals should not remove their critical thinking hats when reading about science. While much of science’s “truths” are provisional and subject to change and that is well known to scientists, it is still news to much of the general public.

I’d like to quote a humorous gem of a book “Where are the customers’ yachts?” by Fred Schwed Jr. to help illustrate my point. Here’s the author on the problem with financial statistics “One can’t say that figures lie. But figures, as used in financial arguments, seem to have the bad habit of expressing a small part of the truth forcibly, and neglecting the other part…” I would add “at a particular moment in time” to Mr. Schwed’s use of “forcibly”.

Study results as explicated by scientific and medical journal papers and as reported on in the media are expressing a small part of a larger “truth” or “theory” forcibly and for a given moment in time. One analogy is the Pasteurian vs. modern understanding of bacteria. There is a vast difference between Pasteur’s discovery of individual free-floating planktonic bacteria and what is being learned today about the group structure of bacteria (ala biofilms) via work by Bonnie Bassler and others. What Pasteur discovered was a small part of a larger concept that is still not fully understood over 100 years later.

The “proof” provided by academic and research papers carries a certain weight, for many nonscientists that weight implies a type of solidity and finality that aren’t actually there, hence their dismay upon learning how provisional scientific “truths” often are.

So it takes time to see and recognize the whole elephant. But a working knowledge of an elephant’s foot and how not to stand underneath it has to be useful. It’s a continual process, shuttling between empirical and theoretical. Or is it lots of rooms and one hallway? Or could it be that too many metaphors spoil the point?