In 1847, decades before the germ theory of infection, physician Ignaz Semmelweiss suggested that, by washing their hands before examinations, doctors could save the lives of many maternity ward patients. Semmelweiss came to this conclusion by noting that doctor's wards produced much higher infection and death rates than midwives' wards (midwives were willing to wash their hands, doctors were not).

Semmelweiss' advice was supported by published results showing that hand-washing produced a reduction in maternal mortality to less than 1%, compared to 10%-35% in general practice. But doctors were offended by Semmelweiss' advice and rejected it.

The bottom line was that Semmelweiss couldn't offer an explanation for his advice — a reason it might work. He could describe the benefit of hand-washing, but he couldn't explain it.

In 1887, physicists Albert Michelson and Edward Morley designed an experiment meant to detect the "luminiferous ether." In the physical theory of the day, the ether was a substance thought to fill space and provide a medium for the transmission of light. In the same way that water provides a medium for water waves on a pond, the ether was thought to provide a medium for light waves in space.

As it turned out, and to the shock of working physicists, the Michelson-Morley experiment didn't detect the ether. Because the ether was essential to the prevailing theory, and because of the extreme care taken by the experimenters, this negative result placed all of physics in limbo. Subsequent efforts to incorporate this result into theory — to "save the ether" — were unsuccessful, and it was only with the 1905 publication of Einstein's theory of Special Relativity that the crisis was resolved. (Special Relativity doesn't disprove the idea of an ether, it just doesn't need it.)

Einstein offered a new, testable theory to replace the old, and explained why the Michelson-Morley experiment didn't detect the ether.

Let's compare these examples. In the hand-washing example, Semmelweiss summarized specific experimental results — descriptions — and offered a piece of general advice. But because he had skipped over the step of explaining the experimental results, doctors refused to go along. This is an example of a description awaiting an explanation, and of people being skeptical of a mere description. The fact that Semmelweiss was absolutely right, and that his advice could have saved many lives, only dramatizes the role played by explanation.

In the ether example, a prevailing theory (an explanation) was contradicted by experiment (a description), and scientists realized they didn't have a working model for an essential part of physics. This theoretical crisis temporarily placed the physics of light outside the domain of science.

These examples show that, without explanation, description is not science, and without description, explanation is not science. (Or: without theory, evidence is not science, and without evidence, theory is not science.) Both are required — if we only offer theories and never test them, it's philosophy. If we only gather data and never shape and test a theory about the data, we're not scientists but stamp collectors.

If we collect evidence and shape a theory about the evidence, we take the perpetual risk that our theory might be falsified by new evidence. We can use our theory to predict the outcome of experiments not yet performed (examples below). Each experimental success increases confidence in our theory, but each experiment has the potential to falsify it. That's science.

Bob, Alice and Falsifiability

Readers may ask whether the above section goes too far — doesn't it define science too strictly? Can't we have sciences that rely only on description, or only on explanation? Well, no, we can't, because of the essential part played in science by falsifiability:

Let's say Bob watches a moving billiard ball collide head-on with a stationary one, notices that the first ball comes to a complete stop, and (barring friction) the second ball moves away with the exact velocity of the first — no more, no less. Bob now has a description.

Let's say that Joe, and Frank, and other observers, also witness billiard ball collisions — they all now have descriptions that may or may not agree.

Let's say Alice proposes a theory about energy — of motion, among other kinds of energy — in which it's always conserved, meaning energy can change form, or be transferred from one object to another, but is never created or destroyed. Alice now has an explanation.

By itself, Bob's observation cannot become science, because it only describes, other descriptions may differ, and more importantly, it's not possible to falsify a description. Bob's description needs an explanation.

By itself, Alice's theory cannot become science, because in isolation it is an untested, therefore unfalsifiable, explanation of reality. It must be tested, compared to reality — to descriptions — and without this requirement, Alice is free to say anything. Alice's explanation needs supporting descriptions.

Only by comparing Alice's explanation with Bob's — and Joe's, and Frank's — descriptions, do we have the chance to validate or falsify a proposed general statement about reality.

Only by comparing Alice's theory to reality (by comparing Alice's explanation to Bob's description), do we cross the threshold of science. And it doesn't matter whether the comparison succeeds or fails — it's still science.

Science isn't defined by the shaping of theories and the gathering of experimental data — it's defined by a willingness to test theories against reality and cast out theories that fail comparison with reality. This may seem self-evident, but much of what's called science in the modern world fails to meet this criterion, and the word "science" is frequently used to describe activities that are quite obviously not science.

Darwin's Clock

This is one of my favorite stories about the power of theory to unify otherwise seemingly unrelated scientific fields. While doing preliminary research for what would become the Theory of Evolution, Charles Darwin proposed a mechanism for species evolution called natural selection. But a problem arose — natural selection required a certain amount of time and, based on the theorized age of the earth, there wasn't enough time for natural selection to produce observed species and their level of complexity.

In Darwin's time, assumptions about the age of the earth hinged on assumptions about the mechanism responsible for the sun's energy. In the mid-19th century, the sun was thought to radiate by the converted energy of gravitational contraction. Unfortunately for Darwin's work, this mechanism only allowed the earth to have a lifetime of millions of years, not the billions required for an evolution from single-celled organisms to modern vertebrates. Being a scientist, Darwin took this problem very seriously — he even went so far as to consider the idea of inheritance of acquired traits, which, had it been true, would have greatly sped up natural selection. But as things stood, given the assumed lifetime of the sun and barring an ad hoc explanation like inheritance of acquired traits, Darwin's theory stood falsified by an older, better-established theory with more observational evidence.

It was only in the 1930s that Hans Bethe provided a detailed explanation showing that the sun's energy arises from nuclear fusion, not gravitational contraction. Nuclear fusion directly converts mass to energy, a process that can go on for billions of years, and the last serious barrier to acceptance of natural selection was removed. This paved the way for the so-called Modern Synthesis, the present understanding of biological evolution.

In this way a theory in particle physics provided critical support to a theory in biology. It's hard to imagine two scientific fields with less in common, and this story dramatizes the unifying effect of scientific theory.

The Winged Messenger

By the beginning of the 20th century, observational astronomy had become very precise — precise enough to detect small deviations from Newtonian orbital mechanics. One such deviation was a small precession in the perihelion (closest approach to the sun) of Mercury's elliptical orbit, amounting to 43 arc-seconds per century.

This deviation was small enough that it was initially attributed to systematic errors in observation, but newer, more careful observations eliminated that possibility. The deviation contradicted the prevailing Newtonian gravitational theory and remained unexplained until Einstein published his General Relativity theory in 1916. General Relativity explains the precession as resulting from the fact that masses change the shape of spacetime — or, as physicist John Wheeler put it, "Mass tells spacetime how to curve, and spacetime tells mass how to move."

A simpler, more accessible (and less accurate) explanation is to note that, under General Relativity, time passes more slowly near masses, therefore Mercury's velocity is slightly reduced when it's near the sun, and this changes Mercury's orbital path in a way that doesn't agree with Newtonian physics.

The point I make with this example is that Einstein's theory would never have been accepted without observational confirmation (i.e. until his explanation was vindicated by descriptions), and the Mercury-orbit example is just one of many. More important, Einstein's theory made predictions of things not yet observed, things that were later observed, a particularly compelling kind of scientific evidence.

Had Einstein's theory made predictions that were not borne out in observation, or had observations appeared that did not agree with Einstein's theory, this would have falsified the theory and it would have been abandoned. But there's no opposite outcome — scientific theories are never proven true. They are perpetually falsifiable by new evidence, and they never become "laws." This statement seems contradicted by how often one hears the expression "scientific law," but there are no scientific laws, only scientific theories with varying degrees of observational support.

Science vs. Pseudoscience

In my voluminous correspondence with science students, many from fields that are scientific in name only, I've heard many of the expressions people use when calling something "science" that isn't science. One example is "soft science." But in fact, there's no such thing as "soft science" any more than there's such a thing as "slightly dead."

Here are some of the sillier defenses offered by people claiming that a given field is really science:

"Some form of the word 'science' appears in its name."

"It's listed as a science in university curricula."

"The practitioners wear white lab coats."

"There's lots of cages with pigeons and rats."

"They have professional journals that sometimes reject articles that don't meet their standards."

"They get government grant money that's been earmarked for scientific research."

"You need a doctorate to practice it."

But science isn't defined circumstantially, it's defined in much more direct ways, and a field's standing among sciences is much easier to establish than for an individual. For a field to be counted among sciences:

The field must have theories that unambiguously define it.

The theories must be stated clearly enough to be conclusively falsifiable through evidence.

The field must have ongoing, disciplined research programs.

The research must address, and must have the potential to falsify, the field's theories.

If research falsifies one or more of the theory's claims, those claims must be abandoned.

If all the theory's claims are falsified, the field itself must be abandoned.

There are only a few named sciences that meet all these requirements. To dramatize the amount of illusion in science, in an earlier article I published a toy astrology study. The study, although rather silly, is perfectly legitimate science — it gathers evidence and analyzes it, and its result has practical value to astrologers. But it cannot be used to confer scientific status to astrology, because it fails criterion (4) in the table above — it doesn't address astrology's theories.

In science, as explained above, a field is defined by its theories, and those theories must be falsifiable by new evidence. A scientific field's practitioners are more than willing to support or falsify the field's theories through vigorous research, because (among other things) any significant finding, positive or negative, is a sure way to establish one's standing among peers.

A classic example is the Michelson-Morley experiment cited above. It failed to confirm what its authors had hoped to confirm, and by publishing this negative result the researchers temporarily undermined their own field. In the long run, this negative result was of critical significance to the progress of science, but that could not have been obvious at the time. The publication of this result was a clear act of scientific integrity.

The philosophical meaning of Michelson-Morley and many similar cases is that there are no failed scientific experiments. All legitimate scientific results, positive and negative, increase our understanding of reality.

In pseudoscience, a field may have theories, but the field's continued existence does not depend on evidence for or against its theories. Research programs are often only peripherally related to the defining theories, and published results often contradict other results in the same field without anyone noticing. The field's practitioners are often unwilling to publish outcomes that might jeopardize the field's reputation among the public or that might adversely affect professional advancement. Such a field's journal editors are sometimes unwilling to publish negative outcomes, as a result of which most published research is of positive results, regardless of how soft the original evidence is, or how many negative studies must be abandoned along the way.

There is ample confirmation for the above summary. A recent study of publications ranked by field shows a strong bias against publishing negative outcomes among the so-called "soft sciences." To quote from the linked study:

"The hypothesis of a Hierarchy of the Sciences with physical sciences at the top, social sciences at the bottom, and biological sciences in-between is nearly 200 years old ... This study analysed 2434 papers published in all disciplines and that declared to have tested a hypothesis. It was determined how many papers reported a 'positive' (full or partial) or 'negative' support for the tested hypothesis. If the hierarchy hypothesis is correct, then researchers in 'softer' sciences should have fewer constraints to their conscious and unconscious biases, and therefore report more positive outcomes. Results confirmed the predictions at all levels considered ... the odds of reporting a positive result were around 5 times higher among papers in the disciplines of Psychology and Psychiatry and Economics and Business compared to Space Science ...".

The above-cited study shows that, of the publication sample, space science published 29.8% negative outcomes, while psychiatry/psychology published just 8.5%. This graph summarizes the study's conclusions.

It's not possible to overestimate the risk to science posed by a bias against publication of negative results. If a body of evidence has only marginal statistical significance, repeated publication of positive results and rejection of negative results can produce a completely false positive impression of an idea's scientific standing. In an in-depth analysis of this risk published in the New England Journal of Medicine, researchers compared a list of successful research proposals for antidepressant medications to later article publication, and discovered that 34% of the studies simply weren't published. The study found that virtually all the published articles (94%) reported positive results. The remainder of the studies, the negative outcomes, were either not published (2/3), or were rewritten in a way calculated to suggest a positive outcome, but at odds with the study's actual results (1/3).

The authors of the above study say, "Whether and how the studies were published were associated with the study outcome", a diplomatic way of saying that negative outcomes tended to be discarded or rewritten. To a naïve reader of scientific literature, this makes antidepressants seem effective in 94% of published studies. But an FDA analysis of the same studies, including those that were not published, shows positive outcomes in 51% of the studies, and negative outcomes in 49%. This means the real results lie near the chance level and, contrary to a very public impression, support the null hypothesis (in this case, the idea that antidepressants don't actually work).

The Structure of Science

The foregoing sections are meant to show that (a) evidence validates or falsifies a theory, and (b) a theory defines a scientific field. Any breakdown — a failure to properly test the theory, or a disconnect between evidence and theory, as two examples — invalidates the scientific standing of the field.

Fields without theories (or without testable theories) tend to have research programs that either ignore or contradict each other, a fact that would become obvious if only there was a coherent theory to inform the research. My point is that the presence of scientists, and of scientific research, is by itself no assurance that a field is scientific. Even the presence of a theory that defines the field is of no use if the field's research doesn't meaningfully address the theory.

For example, let's take Superstring Theory — if it were a separate branch of physics, it would have everything a scientific field requires except the ability to test its theoretical claims. With respect to Superstring theory it's not laziness that prevents meaningful tests, the issues are deeper than that — a quote from the linked article: "... it is exceedingly hard to make predictions from any superstring theory which can be falsified by experiment, and in fact no current superstring theory makes any falsifiable prediction." Superstring Theory may eventually bear scientific fruit, but at present it's philosophy with plenty of mathematics — it's all theory and no evidence.

By contrast, the cosmological issues of Dark Matter and Dark Energy represent fields that are all observational evidence. On that basis they aren't scientific either, but for the opposite reason of the prior example: they're all evidence and no theory.

This is not to suggest that these ideas don't deserve support — they do, because they're important physics frontiers of different kinds — just to say that they're not formally sciences at present. Each of these examples deserves support on the ground that they're expected eventually to produce the missing part (evidence or theory) and turn themselves into sciences.

On the other hand, there are fields that superficially resemble sciences, but that have no chance to ever fulfill the requirements of a real science as set out above.

Asperger's: A Pseudoscience Case Study

Because there's no shortage of pseudoscientific practices at large in the world, choosing an example is somewhat arbitrary. But even in a short list of pathetic efforts to dupe the public, the Asperger's story is particularly egregious.

First, clinical psychology is fringe even among pseudosciences, and a clinical psychologist can get away with virtually anything short of killing his/her clients (although that sometimes happens). As just two examples selected at random, there are prominent therapists offering Past Lives Therapy and Alien Abduction Recovery Therapy (and many willing clients). In a scientific field, these practices would have to prove their efficacy and correspondence with the field's theories and evidence, except that in clinical psychology (a) there's no requirement for practitioners to show efficacy, and (b) there's nothing resembling a testable, scientific theory to govern the field's practices.

In 1944 Hans Asperger asserted that a distinctive form of mild autism merited its own diagnosis. Characterized by significant difficulties in social interaction, restricted and repetitive patterns of behavior and interests, physical clumsiness and atypical use of language, what became known as Asperger Syndrome had certain traits that separated it from other forms of high-functioning autism (HFA) — for example, relatively normal linguistic and cognitive development.

Around the same time, someone frosted the cake by assembling a list of people thought to have had Asperger's Syndrome or another form of mild autism, including Thomas Jefferson, Albert Einstein, Bill Gates, and many others.

The result — of making Asperger's an official DSM-IV diagnostic category, and of revealing that a great number of famous, successful people were "sufferers" — quickly turned the Asperger's diagnosis into ... wait for it ... a really cool thing to get for your quirky youngster. There was also a factor later bemoaned by Frances himself, that those having the diagnosis could force school districts to provide special education services.

In a recent interview, Frances says, "And so kids who previously might have been considered on the boundary, eccentric, socially shy, but bright and doing well in school would mainstream [into] regular classes ... Now if they get the diagnosis of Asperger's disorder, [they] get into a special program where they may get $50,000 a year worth of educational services."

These factors produced an epidemic of Asperger's diagnoses, and Frances soon regretted having been its champion. Frances, who now strongly advocates abandonment of the Asperger's diagnosis, says the true Asperger's rate is "vanishingly rare," and it's likely that better than 90% of present Asperger's diagnoses are given to people with no significant mental difficulties (the remainder likely have some mild form of autism not meriting the Asperger's label).

To summarize, the problems with an Asperger's diagnosis are:

Its association with famous, successful, very individualistic people, a trait guaranteed to attract the attention of teenagers intent on setting themselves apart from their mundane peers.

Its status as a lever to force special treatment from schools, a trait guaranteed to attract the attention of parents intent on playing the system.

The fact that the diagnosis is easily confused with the normal behavior of bright youngsters, a trait guaranteed to attract the attention of parents suffering from Münchausen Syndrome by Proxy, people constantly looking for ways to stigmatize their offspring with imaginary illnesses.

The fact that no one knows what causes autism, how to unambiguously diagnose it, how to treat it, or how to cure it, which means there are no reliable diagnostic indicators beyond the opinion of mental health practitioners, people who rarely agree on anything.

What I don't understand is that the problems listed above should have been obvious at the outset, especially to people who pride themselves in understanding behavior, who see themselves as professionals in deciphering the human psyche. But this sage clan of mental health professionals forged ahead and opened Pandora's Box.

In a larger sense, the above list shows that Asperger's — and most other mental illness diagnoses — have a serious problem: there's nothing resembling science in the process. This is not to argue that people don't have mental difficulties, it's only to say that the mental health field is a playground for unethical and incompetent practices, conjectures, fads, and gross miscalculations like those leading up to the present status of Asperger's.

In a predictable coda to the Asperger's affair, it's being abandoned as a serious diagnostic category. For those unfamiliar with the history of human psychology, every proposed diagnosis is eventually either refuted and abandoned (as with Asperger's, homosexuality and many others), or recognized as a physical ailment with mental symptoms, as with schizophrenia, bipolar disorder, and autism (the real kind, i.e. Rain Man).

Conclusion? It may be there are no true "mental illnesses" (ailments of the mind alone, diagnosable and treatable using psychological methods); there may only be fanciful psychological diagnoses awaiting abandonment, and physical illnesses with mental symptoms awaiting reassignment to the field of neuroscience.

One is reduced to such conjectures because there are no defining scientific theories in psychology. The theory vacuum in human psychology is so severe that, when researchers recently wanted to identify discredited therapies, they decided to ask clinical psychologists for their opinions and let their responses decide the issue. A quotation from the linked article: "A second major flaw in this survey was the lack of safeguards. As it stands there is no evidence, except self-report, as to what the respondents actually knew about the techniques they were 'voting on'. Thus the results could reflect nothing more reliable than the respondents' personal opinions and prejudices. There is no way of telling how far this is, or is not, the case." In spite of this laughable data-gathering method, it seems the authors couldn't think of a more reliable way to evaluate current mental health practices.

To summarize this section, psychiatry and psychology, although different pursuits, both suffer from a lack of testable theory and, in some cases, a contempt for evidence-gathering and analysis. Unfortunately, these fields are often mistaken for sciences by vulnerable members of the public, by courts of law, by medical insurers, and by school districts. This is both deplorable and correctable.

Conclusion

Because of its significance to the modern world, defining science is more than a philosophical exercise. As just one example, for those who depend on science to validate medical treatments, science's proper definition can be a matter of life or death. And because of its potential for validation, any number of people would like to associate themselves with science, but without necessarily adopting the methods and discipline this would require.

A given field may have any number of scientists at work within it, and any number of slick professional journals, but might never take the dangerous step of shaping a theory that formally defines the field — a theory specific enough to be falsified by evidence.

Such fields resemble politicians in an election year, who know they must avoid taking a stand on any issue that could backfire in the future. For this reason, they make seemingly informative public pronouncements that only appear to convey meaning, and (in the U.S.) will avoid discussing certain topics entirely — Social Security, Israel, women's rights, race relations, and a few others. In the same way, pseudoscientific fields avoid taking positions that might be falsified by research.

But imagine what might happen if these strict and unspoken rules were to be briefly suspended with respect to psychology. Someone then might candidly say, "We want the status of science, but we can't produce testable, falsifiable theories about human behavior. So we conduct endless studies, none of which address a nonexistent central defining theory, many of which are thrown away rather than being published, and some that confirm a researcher's pet theory but can't be replicated by others. And no matter what is discovered and published, it has no effect on the behavior of clinical psychologists, who simply don't care about science except to claim that their practice is based on science."

Feedback

Reader responses to this article.

Description versus Explanation I

I just read your essay "Why Science Needs Theories" and agree with everything you've written save for one nagging issue I would like to resolve. In the example of Semmelweiss and hand-washing, you state that his advice went unheeded because he only provided a descriptive analysis and did not include any explanation of his results. For clarification, do you mean that he did not provide the medical establishment at the time with the explanation of how he arrived at his empirical results, or do you mean that he did not provide a theoretical mechanism (eg. germ theory) for why hand-washing would in fact explain the decrease in mortality?The second. An empirical conclusion, based on observation but without an explanation, wasn't persuasive. Someone could argue that the evidence was circumstantial or biased. That's still true, by the way — many studies in psychology and other soft sciences seem quite persuasive until one reads all the studies that flatly contradict them, using equally persuasive descriptive evidence.This distinction is important in my understanding of whether you view scientific theory as strictly necessitating a mechanism for its explanatory power or if explanation of the empirical process used in arriving at the falsifiable theory is sufficient.Hold on. How can you have a falsifiable theory unless you offer an testable explanation? There is no description so persuasive that it can stand in for an effort to explain. So the idea of a testable explanation is the central issue.Would Semmelweiss' recommendation have been scientific if he explained his experimental process documenting the inductive process along with the conclusion derived from the collected statistical data, or would he have had to formulate a falsifiable mechanism (eg germ theory) explaining why such a result makes sense?He would have had to offer an explanation, then use it to make a prediction about an experiment not yet conducted, then the experiment would have had to support his explanation. As was true for Louis Pasteur 25 years later, when the doctors finally began to listen.

Description versus Explanation II

Hold on. How can you have a falsifiable theory unless you offer an testable explanation? There is no description so persuasive that it can stand in for an effort to explain. So the idea of a testable explanation is the central issue.

I was hinting at the type of science routinely done under the methodological title of 'observational study'. Ironically I always denounce such studies to colleagues as unreliable and merely a way to come up with hypotheses, but nevertheless, does it not have a place in the scientist's toolkit, maybe we can call it pre-science?For some years now I've been taking the position with psychologists that science must have testable theories and (thus) a basis for falsification, and I don't think this is particularly controversial if one thinks about it, because:

To count as science, claims must have the potential for falsification by way of evidence.

Falsifiability requires someone to offer an explanation.

Explanations are the everyday name for testable theories.

One cannot falsify a mere description. Penzias and Wilson really did have a noise in their microwave dish, that's hardly open to dispute. What won them a Nobel Prize was the explanation they offered for the noise, not the noise itself.It seems the point of contention is not that such methods are unscientific, but that they are not persuasive enough standing on their own merit and must eventually be followed up with testable theory of mechanism in order to be considered scientific (as yet unfalsified) fact.Yes. Scientific observations are like Olympic athletic training — many people train in ways indistinguishable from Olympic athletes, but few actually compete in the Olympics. Many observations and descriptions are given the name "scientific," but this is a polite convention, sort of like calling an athletic performance "world class" — it all depends on what happens next.So then my argument is not so much against your definition of science as it is a proposal to sometimes relax that strict definition for pragmatic reasons.And my reply must be the same one I give psychologists — if everyone describes and no one explains, at some point it's necessary to distinguish those descriptions that eventually are followed by explanations, from those that never are. The former are science, the latter are onanism.Would it not have been prudent to have listened to Semmelweiss given his data and followed up with more rigorous explanations rather than ignoring him completely?One might argue that Hans Asperger (Asperger Syndrome) deserved the same recognition, on the ground that his description seemed reasonable. Or Walter Freeman (prefrontal lobotomy). Or Wilhelm Reich (Orgone box). These descriptions predated any reasonable chance for an explanation, all were eventually abandoned, and all of them caused tremendous waste and confusion for lack of a testable explanation (and of any effort to conduct objective tests).

I chose Semmelweiss as an example because he was right, in order not to be seen as preaching to the choir (i.e. through choosing only examples in which a description turned out to be wrong, as above). The point is that, absent a testable explanation, Semmelweiss failed an implied responsibility to take more concrete steps, well-defined and time-tested, to earn the attention of doctors.

In this connection, remember the null hypothesis, the assumption that, without a tested theory (and in a manner of speaking), observations that support an idea can fairly be looked on as examples of confirmation bias, conscious or unconscious, or the placebo effect. The null hypothesis is a great labor-saving device — it forces people to do more than argue that descriptions constitute science.

References

Scientific Theory — A reasonable summary that unfortunately includes the meaningless term "scientific law."