Reproducibility of findings is a core foundation of science. If scientific results only hold true in some labs but not in others, then how can researchers feel confident about their discoveries? How can society put evidence-based policies into place if the evidence is unreliable?

Recognition of this “crisis” has prompted calls for reform. Researchers are feeling their way, experimenting with different practices meant to help distinguish solid science from irreproducible results. Some people are even starting to reevaluate how choices are made about what research actually gets tackled. Breaking innovative new ground is flashier than revisiting already published research. Does prioritizing novelty naturally lead to this point?

Incentivizing the wrong thing?

One solution to the reproducibility crisis could be simply to conduct lots of replication studies. For instance, the scientific journal eLife is participating in an initiative to validate and reproduce important recent findings in the field of cancer research. The first set of these “rerun” studies was recently released and yielded mixed results. The results of 2 out of 5 research studies were reproducible, one was not and two additional studies did not provide definitive answers.

But there’s at least one major obstacle to investing time and effort in this endeavor: the quest for novelty. The prestige of an academic journal depends at least partly on how often the research articles it publishes are cited. Thus, research journals often want to publish novel scientific findings which are more likely to be cited, not necessarily the results of newly rerun older research.

Genetics researcher Barak Cohen at Washington University in St. Louis recently published a commentary analyzing this growing push for novelty. He suggests that progress in science depends on a delicate balance between novelty and checking the work of other scientists. When rewards such as funding of grants or publication in prestigious journals emphasize novelty at the expense of testing previously published results, science risks developing cracks in its foundation.

One of his main concerns is that scientific papers now inflate their claims in order to emphasize their novelty and the relevance of biomedical research for clinical applications. By exchanging depth of research for breadth of claims, researchers may be at risk of compromising the robustness of the work. By claiming excessive novelty and impact, researchers may undermine its actual significance because they may fail to provide solid evidence for each claim.

Prestigious journals often now demand complete scientific stories, from basic molecular mechanisms to proving their relevance in various animal models. Unexplained results or unanswered questions are seen as weaknesses. Instead of publishing one exciting novel finding that is robust, and which could spawn a new direction of research conducted by other groups, researchers now spend years gathering a whole string of findings with broad claims about novelty and impact.

Balancing fresh findings and robustness

A challenge for editors and reviewers of scientific manuscripts is assessing the novelty and likely long-term impact of the work they’re assessing. The eventual importance of a new, unique scientific idea is sometimes difficult to recognize even by peers who are grounded in existing knowledge. Many basic research studies form the basis of future practical applications. One recent study found that of basic research articles that received at least one citation, 80 percent were eventually cited by a patent application. But it takes time for practical significance to come to light.

A collaborative team of economics researchers recently developed an unusual measure of scientific novelty by carefully studying the references of a paper. They ranked a scientific paper as more novel if it cited a diverse combination of journals. For example, a scientific article citing a botany journal, an economics journal and a physics journal would be considered very novel if no other article had cited this combination of varied references before.

This measure of novelty allowed them to identify papers which were more likely to be cited in the long run. But it took roughly four years for these novel papers to start showing their greater impact. One may disagree with this particular indicator of novelty, but the study makes an important point: It takes time to recognize the full impact of novel findings.

Realizing how difficult it is to assess novelty should give funding agencies, journal editors and scientists pause. Progress in science depends on new discoveries and following unexplored paths – but solid, reproducible research requires an equal emphasis on the robustness of the work. By restoring the balance between demands and rewards for novelty and robustness, science will achieve even greater progress.

Lingulodinium polyedrum is a unicellular marine organism which belongs to the dinoflagellate group of algae. Its genome is among the largest found in any species on this planet, estimated to contain around 165 billion DNA base pairs – roughly fifty times larger than the size of the human genome. Encased in magnificent polyhedral shells, these bioluminescent algae became important organisms to study biological rhythms. Each Lingulodinium polyedrum cell contains not one but at least two internal clocks which keep track of time by oscillating at a frequency of approximately 24 hours. Algae maintained in continuous light for weeks continue to emit a bluish-green glow at what they perceive as night-time and swim up to the water surface during day-time hours – despite the absence of any external time cues. When I began studying how nutrients affect the circadian rhythms of these algae as a student at the University of Munich, I marveled at the intricacy and beauty of these complex time-keeping mechanisms that had evolved over hundreds of millions of years.

I was prompted to revisit the role of Beauty in biology while reading a masterpiece of scientific writing, “Dreams of a Final Theory” by the Nobel laureate Steven Weinberg in which he describes how the search for Beauty has guided him and many fellow theoretical physicists to search for an ultimate theory of the fundamental forces of nature. Weinberg explains that it is quite difficult to precisely define what constitutes Beauty in physics but a physicist would nevertheless recognize it when she sees it.Over the course of a quarter of a century, I have worked in a variety of biological fields, from these initial experiments in marine algae to how stem cells help build human blood vessels and how mitochondria in a cell fragment and reconnect as cells divide. Each project required its own set of research methods and techniques, each project came with its own failures and successes. But with each project, my sense of awe for the beauty of nature has grown. Evolution has bestowed this planet with such an amazing diversity of life-forms and biological mechanisms, allowing organisms to cope with the unique challenges that they face in their respective habitats. But it is only recently that I have become aware of the fact that my sense of biological beauty was a post hoc phenomenon: Beauty was what I perceived after reviewing the experimental findings; I was not guided by a quest for beauty while designing experiments. In fact, I would have been worried that such an approach might bias the design and interpretation of experiments. Might a desire for seeing Beauty in cell biology lead one to consciously or subconsciously discard results that might seem too messy?

One such key characteristic of a beautiful scientific theory is the simplicity of the underlying concepts. According to Weinberg, Einstein’s theory of gravitation is described in fourteen equations whereas Newton’s theory can be expressed in three. Despite the appearance of greater complexity in Einstein’s theory, Weinberg finds it more beautiful than Newton’s theory because the Einsteinian approach rests on one elegant central principle – the equivalence of gravitation and inertia. Weinberg’s second characteristic for beautiful scientific theories is their inevitability. Every major aspect of the theory seems so perfect that it cannot be tweaked or improved on. Any attempt to significantly modify Einstein’s theory of general relativity would lead to undermining its fundamental concepts, just like any attempts to move around parts of Raphael’s Holy Family would weaken the whole painting.

Can similar principles be applied to biology? I realized that when I give examples of beauty in biology, I focus on the complexity and diversity of life, not its simplicity or inevitability. Perhaps this is due to the fact that Weinberg was describing the search of fundamental laws of physics, laws which would explain the basis of all matter and energy – our universe. As cell biologists, we work several orders of magnitude removed from these fundamental laws. Our building blocks are organic molecules such as proteins and sugars. We find little evidence of inevitability in the molecular pathways we study – cells have an extraordinary ability to adapt. Mutations in genes or derangement in molecular signaling can often be compensated by alternate cellular pathways.

This also points to a fundamental difference in our approaches to the world. Physicists searching for the fundamental laws of nature balance the development of fundamental theories whereas biology in its current form has primarily become an experimental discipline. The latest technological developments in DNA and RNA sequencing, genome editing, optogenetics and high resolution imaging are allowing us to amass unimaginable quantities of experimental data. In fact, the development of technologies often drives the design of experiments. The availability of a genetically engineered mouse model that allows us to track the fate of individual cells that express fluorescent proteins, for example, will give rise to numerous experiments to study cell fate in various disease models and organs. Much of the current biomedical research funding focuses on studying organisms that provide technical convenience such as genetically engineered mice or fulfill a societal goal such as curing human disease.

Uncovering fundamental concepts in biology requires comparative studies across biology and substantial investments in research involving a plethora of other species. In 1990, the National Institutes of Health (NIH – the primary government funding source for biomedical research in the United States) designated a handful of species as model organisms to study human disease, including mice, rats, zebrafish and fruit flies. A recent analysis of the species studied in scientific publications showed that in 1960, roughly half the papers studied what would subsequently be classified as model organisms whereas the other half of papers studied additional species. By 2010, over 80% of the scientific papers were now being published on model organisms and only 20% were devoted to other species, thus marking a significant dwindling of broader research goals in biology. More importantly, even among the model organisms, there has been a clear culling of research priorities with a disproportionately large growth in funding and publications for studies using mice. Thousands of scientific papers are published every month on the cell signaling pathways and molecular biology in mouse and human cells whereas only a minuscule fraction of research resources are devoted to studying signaling pathways in algae.

The question of whether or not biologists should be guided by conceptual Beauty leads us to the even more pressing question of whether we need to broaden biological research. If we want to mirror the dizzying success of fundamental physics during the past century and similarly advance fundamental biology, then we need substantially step-up investments in fundamental biological research that is not constrained by medical goals.

Murder your darlings. The British writer Sir Arthur Quiller Crouch shared this piece of writerly wisdom when he gave his inaugural lecture series at Cambridge, asking writers to consider deleting words, phrases or even paragraphs that are especially dear to them. The minute writers fall in love with what they write, they are bound to lose their objectivity and may not be able to judge how their choice of words will be perceived by the reader. But writers aren’t the only ones who can fall prey to the Pygmalion syndrome. Scientists often find themselves in a similar situation when they develop “pet” or “darling” hypotheses.

How do scientists decide when it is time to murder their darling hypotheses? The simple answer is that scientists ought to give up scientific hypotheses once the experimental data is unable to support them, no matter how “darling” they are. However, the problem with scientific hypotheses is that they aren’t just generated based on subjective whims. A scientific hypothesis is usually put forward after analyzing substantial amounts of experimental data. The better a hypothesis is at explaining the existing data, the more “darling” it becomes. Therefore, scientists are reluctant to discard a hypothesis because of just one piece of experimental data that contradicts it.

In addition to experimental data, a number of additional factors can also play a major role in determining whether scientists will either discard or uphold their darling scientific hypotheses. Some scientific careers are built on specific scientific hypotheses which set apart certain scientists from competing rival groups. Research grants, which are essential to the survival of a scientific laboratory by providing salary funds for the senior researchers as well as the junior trainees and research staff, are written in a hypothesis-focused manner, outlining experiments that will lead to the acceptance or rejection of selected scientific hypotheses. Well written research grants always consider the possibility that the core hypothesis may be rejected based on the future experimental data. But if the hypothesis has to be rejected then the scientist has to explain the discrepancies between the preferred hypothesis that is now falling in disrepute and all the preliminary data that had led her to formulate the initial hypothesis. Such discrepancies could endanger the renewal of the grant funding and the future of the laboratory. Last but not least, it is very difficult to publish a scholarly paper describing a rejected scientific hypothesis without providing an in-depth mechanistic explanation for why the hypothesis was wrong and proposing alternate hypotheses.

For example, it is quite reasonable for a cell biologist to formulate the hypothesis that protein A improves the survival of neurons by activating pathway X based on prior scientific studies which have shown that protein A is an activator of pathway X in neurons and other studies which prove that pathway X improves cell survival in skin cells. If the data supports the hypothesis, publishing this result is fairly straightforward because it conforms to the general expectations. However, if the data does not support this hypothesis then the scientist has to explain why. Is it because protein A did not activate pathway X in her experiments? Is it because in pathway X functions differently in neurons than in skin cells? Is it because neurons and skin cells have a different threshold for survival? Experimental results that do not conform to the predictions have the potential to uncover exciting new scientific mechanisms but chasing down these alternate explanations requires a lot of time and resources which are becoming increasingly scarce. Therefore, it shouldn’t come as a surprise that some scientists may consciously or subconsciously ignore selected pieces of experimental data which contradict their darling hypotheses.

Let us move from these hypothetical situations to the real world of laboratories. There is surprisingly little data on how and when scientists reject hypotheses, but John Fugelsang and Kevin Dunbar at Dartmouth conducted a rather unique study “Theory and data interactions of the scientific mind: Evidence from the molecular and the cognitive laboratory” in 2004 in which they researched researchers. They sat in at scientific laboratory meetings of three renowned molecular biology laboratories at carefully recorded how scientists presented their laboratory data and how they would handle results which contradicted their predictions based on their hypotheses and models.

In their final analysis, Fugelsang and Dunbar included 417 scientific results that were presented at the meetings of which roughly half (223 out of 417) were not consistent with the predictions. Only 12% of these inconsistencies lead to change of the scientific model (and thus a revision of hypotheses). In the vast majority of the cases, the laboratories decided to follow up the studies by repeating and modifying the experimental protocols, thinking that the fault did not lie with the hypotheses but instead with the manner how the experiment was conducted. In the follow up experiments, 84 of the inconsistent findings could be replicated and this in turn resulted in a gradual modification of the underlying models and hypotheses in the majority of the cases. However, even when the inconsistent results were replicated, only 61% of the models were revised which means that 39% of the cases did not lead to any significant changes.

The study did not provide much information on the long-term fate of the hypotheses and models and we obviously cannot generalize the results of three molecular biology laboratory meetings at one university to the whole scientific enterprise. Also, Fugelsang and Dunbar’s study did not have a large enough sample size to clearly identify the reasons why some scientists were willing to revise their models and others weren’t. Was it because of varying complexity of experiments and models? Was it because of the approach of the individuals who conducted the experiments or the laboratory heads? I wish there were more studies like this because it would help us understand the scientific process better and maybe improve the quality of scientific research if we learned how different scientists handle inconsistent results.

In my own experience, I have also struggled with results which defied my scientific hypotheses. In 2002, we found that stem cells in human fat tissue could help grow new blood vessels. Yes, you could obtain fat from a liposuction performed by a plastic surgeon and inject these fat-derived stem cells into animal models of low blood flow in the legs. Within a week or two, the injected cells helped restore the blood flow to near normal levels! The simplest hypothesis was that the stem cells converted into endothelial cells, the cell type which forms the lining of blood vessels. However, after several months of experiments, I found no consistent evidence of fat-derived stem cells transforming into endothelial cells. We ended up publishing a paper which proposed an alternative explanation that the stem cells were releasing growth factors that helped grow blood vessels. But this explanation was not as satisfying as I had hoped. It did not account for the fact that the stem cells had aligned themselves alongside blood vessel structures and behaved like blood vessel cells.

Even though I “murdered” my darling hypothesis of fat –derived stem cells converting into blood vessel endothelial cells at the time, I did not “bury” the hypothesis. It kept ruminating in the back of my mind until roughly one decade later when we were again studying how stem cells were improving blood vessel growth. The difference was that this time, I had access to a live-imaging confocal laser microscope which allowed us to take images of cells labeled with red and green fluorescent dyes over long periods of time. Below, you can see a video of human bone marrow mesenchymal stem cells (labeled green) and human endothelial cells (labeled red) observed with the microscope overnight. The short movie compresses images obtained throughout the night and shows that the stem cells indeed do not convert into endothelial cells. Instead, they form a scaffold and guide the endothelial cells (red) by allowing them to move alongside the green scaffold and thus construct their network. This work was published in 2013 in the Journal of Molecular and Cellular Cardiology, roughly a decade after I had been forced to give up on the initial hypothesis. Back in 2002, I had assumed that the stem cells were turning into blood vessel endothelial cells because they aligned themselves in blood vessel like structures. I had never considered the possibility that they were scaffold for the endothelial cells.

This and other similar experiences have lead me to reformulate the “murder your darlings” commandment to “murder your darling hypotheses but do not bury them”. Instead of repeatedly trying to defend scientific hypotheses that cannot be supported by emerging experimental data, it is better to give up on them. But this does not mean that we should forget and bury those initial hypotheses. With newer technologies, resources or collaborations, we may find ways to explain inconsistent results years later that were not previously available to us. This is why I regularly peruse my cemetery of dead hypotheses on my hard drive to see if there are ways of perhaps resurrecting them, not in their original form but in a modification that I am now able to test.