Into the Sciences

Introduction

When I was an undergraduate student studying physics at university, my classes presented cleaned up theories. The messy details of experiments were not mentioned. How the mathematics we were using related to reality was rarely touched on. The scaffolding used to erect the theory had been carefully removed. The history of the field’s development was reduced to a few large leaps, and those leaps were assigned to the credit of a few famous names.

So we left these classes facile in manipulating formulas, but mystified by how you would actually measure the energy of the single bound state of the helium molecule ion. We could calculate the Grüneisen ratio in thermodynamics, but had no idea why we would care about it. We learned that Einstein formulated special relativity, but not that Poincaré, Lorentz, and a previous generation of workers had derived the mathematics that Einstein used by exploring the consequences of Maxwell’s theory of electromagnetism.

This is understandable. A lecturer in physics or sociology is faced with passing on some degree of technical competence in limited time. The expectation is that she will try to save time by cleaning up the mess, and that she is not teaching a history course.

But the orderliness of the structure implies that the work is basically finished, nor does the student get to see how a subject develops. For those who will only use the material as a tool, this is plenty. A structural engineer professionally need not care what led to Newton’s mechanics. For a physicist, who is supposed to construct such material rather than use it, this hides the primary models that underlie what she is being trained to do.

Worse, the simplification of a field’s history from all the actions by many individuals to a few names and steps has a disturbing subtext: unless you’re one of the chosen, the blessed, the brilliant, any work you do is just filling time until the next great mind comes along. A few practitioners, convinced that they are one of the great men, will even openly espouse this view, as David Fischer does in Historians’ Fallacies:[23]

[A] general interpretation is fashioned by an essayist not as a heuristic hypothesis but as an affirmative proposition. In the next twenty years or so, a legion of gradgrinds manufacture monographs which reify the essay, with a few inconsequential changes…This process continues until another essayist publishes another brilliant general interpretation, and another generation of gradgrinds are wound up like mechanical rabbits and set to running about in ever-smaller circles.

What a horrid, damaging narrative to inflict on the student or young practitioner. For most students that narrative is softened somewhat by working under the guidance of a skilled practitioner, but it never quite goes away.

Nor is it only a practitioner’s relationship with her own field that can be toxic. I remember making the acquaintance of a toxicologist while waiting for a ferry. When I explained that I was trained as a physicist, her comment was, “Oh, real science.” Similarly, physicists have collections of jokes about biologists and social scientists. From physics and chemistry to history and musicology, these are all “real sciences.” But if a practitioner’s understanding of what is scientific was acquired by osmosis from one or two practitioners, her view of science will necessarily be parochial, and the practices of other fields will seem alien.

That alienness keeps people within the boundaries of their fields. I was originally trained as a physicist, then forced by circumstance to retrain as a biologist, in complete isolation from physics. It took a year before I started to absorb the different ways of thinking, and during that year I regularly reacted with alarm as some process of thought apparently made no sense. I was in graduate school at an institution that was solely biology, and there were a number of physicists there who had already made the transition who talked me through some of the mental difficulties or I would likely have retreated.

That transition, earlier exposure to musicology, and later dabbling in history and the social sciences forced me to reformulate an idea of science for myself that described all of what I was seeing. This book aims to set down that result, which provides the young practitioner with a view of science that is both more accurate and more empowering than the narrative of great men. It is meant to describe all the sciences, from chemistry to musicology, not just certain fields chosen as ideals. It replaces the amorphous activity of “research” with concrete activities and modes of progress. And it provides a framework for a practitioner to understand why other fields than her own exist, why they don’t seem to operate the way hers does, and how to efficiently go about learning another field if she feels so inclined. The examples are unfortunately biased towards the physical and biological sciences, since that is where I have the most experience, but, despite that, I hope at the end you will know how to delve into the messy, unsanitized realities of a field and emerge with a useful understanding of how it works.

Lastly, a quick note on words: I eschew the word “scientist,” which carries too much baggage. Our society loads the word with images of lab coats, professorships, lonely geniuses, and men. The truth is very different. Some wear lab coats. Many don’t. Most aren’t professors. They almost all work in collaboration. And they are mostly women. Thus I refer to someone who does science as a “practitioner,” and I use the female pronoun to describe that person.

Idealized trials

Let us begin with the activities that occupy the vast majority of practitioners’ time: gathering data and analyzing it. It could be an astrophysicist measuring the spectrum of the light emitted by a celestial object; a microbiologist measuring how fast a population of bacteria in a liquid culture increases; a botanist producing illustrations of plants for a flower atlas; an agronomist measuring the effect of a fertilizing regime on a crop.

I will refer to all of these activities as “trials.” It is important to note that many of these are not experiments. They are observations of phenomena outside the control of the practitioner. In these cases, practitioners seek “naturally occuring experiments,” situations close to how the practitioner would have designed the experiment were it possible, and there are whole fields that have no experiments whatsoever. Astrophysics must work with the stars and galaxies that exist, and history has no way to get a second historical record.

Having only observations is not an insurmountable limitation. The strongest evidence linking smoking and lung cancer comes, not from experiments, but from long-term observation of identical twins in Scandanavia.[25, 42]

Nor should we artificially limit the notion of a scientific trial. A botanist producing illustrations for a flower atlas is performing a trial, though its outcome—a drawing—is very different from what is produced when a physicist measures the speed of light, and different again from the trial of an historian producing an account of the past.

But we see children pouring liquids back and forth or closely watching bugs. What makes a chemist testing for the presence of mercury a scientific trial and a child pouring liquids back and forth not? The different is one of intent. The chemist has in mind some idealized trial which she is trying to approximate. The idealized trial is conceived so that, if it were perfectly achieved, it would impart knowledge about the world. For example, all of the following, if no details trip up the practitioner, will produce a solid piece of knowledge:

Weigh the amount of corn produced by two strains grown next to each other to find which one yields more.

Compare the number of objects of north African origin that are found in Gaul that date to before and after the Goths replaced government by Rome in that region to see how the volume of long distance trade changed.

Hook up a piece of material to a pair of electrodes and measure the resistance across the material electrodes while slowly cooling it until the resistance drops to zero to find the superconducting transition temperature of the material.

Select a number of specimens of dandelion and use them to make a drawing capturing the important features that identify a dandelion.

There is no universal logic of idealized trials. Those readers hoping for an onslaught of symbolic logic or hard rules about the form of idealized trials will be disappointed. Idealized trials aren’t derived from pre-existing logic. Rather, the logic is developed to describe existing trials. As new idealized trials are developed, the logic is expanded to account for them.

On the other hand, certain constraints were developed in the 17th century that we now consider essential for an idealized trial, the most important of which is the expectation that an independent practitioner can reproduce the results of the trial. Reproducibility is a negative criterion. It says, “You cannot use this idealized trial.” and “These trials do not agree. At least one of them cannot be accepted.”

Making a trial “reproducible” is harder and more subtle than it would first seem. What does it mean to reproduce results in a field like history where the historical record is fixed? If we present two historians with the same documents and subject, we cannot expect word for word identical accounts. If one of the historians is removed in time from the other, we would have an enactment of Borges’s story Pierre Menard, Author of the Quixote[10], where an author labors to make himself such a creature that, in the 20th century, he will produce an original work that is word for word the same as Don Quixote. Even in experimental sciences, measurements are messy. If one physicist measures the Young’s modulus of copper to be 117.3Gpa, and another measures it to be 117.41GPa, do we say the result is not reproducible? It is actually quite reproducible, but we cannot expect lockstep.

We might try to argue our way out of this latter situation by saying that the physicists should report a value with a 95% confidence interval, but, in practice, this rarely works due to Hamming’s rule: “90% of the time the next independent measurement will fall outside the previous 90% confidence limits.” Why? As Hamming puts it,

Consider how you, in fact as opposed to theory, do an experiment. You assemble the equipment and turn it on, and of course the equipment does not function properly. So you spend some time, often weeks, getting it to run properly. Now you are ready to gather data, but first you fine tune the equipment. How? By adjusting it so you get consistent runs! In simple words, you adjust for low variance; what else can you do? But it is this low variance data you turn over to the statistician and is used to estimate the variability.[33]

We can see this in practice in particle physics. Consider the measurement over time of the mass of the η particle:[4]

Each vertical bar represents a measurement of the mass of this fundamental particle. The lower end of the bar is a lower limit on the mass, the upper end an upper limit. The bars are arranged chronologically. But the bars do not all overlap vertically, meaning that the experiments disagree. In the early 1990’s there was a jump in both precision (shown by the shorter bar) and in the measured value, followed by more shifts and more increases in precision. This would seem to indicate that the trial is not reproducible. But when we examine the actual numbers we find that all of the trials give values between 547.0 MeV (An MeV is a mega electron volt, a convenient and ubiquitous unit in particle physics.) and 549.5 MeV, a variation of less than half a percent, which seems quite repoducible, despite clear evidence of Hamming’s rule. And worries about error bars and the like barely apply in anthropology or history.

Arguments about reproducibility have been going on since the rise of the modern sciences,[63] and despite its seeming difficulty, we have made progress. Looking back, seventeenth century practitioners seem almost fussy to modern counterparts in their obsessive noting of detail in hopes of reproducibility.[27]

I think the clearest example of why anyone cares about reproducibility comes from musicology. Performance practice in western music—the conventions of how to interpret a piece of written music—has traditionally wandered from generation to generation. Each generation only had the performance of the previous generation from whom they had learned as a basis, so it could wander without limit. Over the course of several hundred years since the seventeenth century composer Archangelo Corelli wrote his music, the performance practice drifted to one of slow tempos, no dynamic variation, no improvisation or decoration, a practice that produced a lot of performances best described as insipid. We know today that this is not at all how Corelli played.

Nor does it take hundreds of years for performance practice to dramatically change. Sergei Rachmaninoff wrote in the late nineteenth century, only two or three generations of performer ago. Today his works are usually played with lots of rubato and schmaltz. Yet we know he did not play that way. He played his works at one point on a player piano, which precisely recorded his performance on rolls of punched paper. (The player piano is most familiar today as the piano in the corner of saloons in western films that’s playing by itself.) When we play those same rolls back in a player piano today, we hear renditions that are precise and rhythmic.

We have no such access to Corelli or to any musician before the mid nineteenth century, but we do have the instruments that were played and many treatises written at the time. Starting in the early 20th century, a new field arose called musicology that tried to infer a performance practice from these artifacts in a reproducible way (with many of the same problems as the field of history). In the process, the field reclaimed several golden ages of music for us which had otherwise been almost lost, such as the viol consort music of England in the 17th century.

A performer today can work with a performance practice for Corelli that has a clear justification as a starting point. Of course some areas remain unknown, such as the exact nature of the tuning that J.S. Bach wrote his Well Tempered Clavier to show off. We have no way of knowing it, though there are many proposals, most of which are reasonable given the documentation available, and many of which sound utterly wild to modern ears. But despite these holes, we have a clear starting point for most western music since the twelfth century. A performer may depart from this practice, and often does, but she can start with a practice that she can defend as reasonable and work from there.

Let us turn now to the progress of idealized trials. New idealized trials open up new possibilities for what can be studied. For example, we cannot study nature versus nurture in humans the way we do in mice. In mice, we would perform controlled breeding and raise the offspring in different environments to see which aspects of their behavior are due to heredity and which are due to environment. In humans, this is unethical. Our society does not allow scientists to force people to breed, nor to decide the fates of children, and, even if it did, such a study would take decades to run as opposed to a few years in mice. The solution to this problem is to use naturally existing breeding experiments, notably twins. In a population of twins, all the differences measured in identical twin pairs should be due to environment, while in fraternal twin pairs it will be due to both environment and heredity. The difference between fraternal and identical twins wasn’t demonstrated until the 1920’s,[56] but as soon as it was, twin studies became a mainstay of epidemiology and human genetics.

Existing idealized trials also shift over time. Daston and Galison open their book Objectivity[14] with a particularly clear example of this shift, taken from the history of fluid mechanics. Arthur Worthingon, from 1875 to 1894, painstakingly studied what occurs when a falling droplet hits a surface by illuminating his darkened laboratory with a precisely timed, millisecond flash, then sketching the latent image impressed on his retina. From thousands of such sketches he assembled a description of the behavior of splashing droplets, describing various outcomes, all characterized by the utter symmetry of the splashes.

In 1894, he first succeeded in capturing the image on a photographic plate instead of his retina, and the illusion of symmetry vanished. Images that had seemed symmetric seen with his eye were revealed to be irregular on the photographic plate. Worthington could no longer believe what his eye told him about the symmetry of a form recorded by a latent image, and photographic plates replaced the eye.

Such large scale shifts do not replace everything that came before. In history, for example, objectivity—removing the human from the production of information—is meaningless. An historian trying for objectivity, for the purely mechanical selection of her facts, is engaging in what Feynman called “cargo cult science” (The reality of cargo cults is actually a fascinating piece of anthropology, and their effectiveness in Feynman’s parable is more a commentary on our own culture than on the cults in question.[52]) :

In the South Seas there is a cargo cult of people. During the war they saw airplanes land with lots of good materials, and they want the same thing to happen now. So they’ve arranged to imitate things like runways, to put fires along the sides of the runways, to make a wooden hut for a man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas—he’s the controller—and they wait for the airplanes to land. They’re doing everything right. The form is perfect. It looks exactly the way it looked before. But it doesn’t work. No airplanes land. So I call these things cargo cult science, because they follow all the apparent precepts and forms of scientific investigation, but they’re missing something essential, because the planes don’t land.[20]

Another shift in idealized trials occurred in microbiology and molecular biology starting in the late 1990’s. Microbiological work from the 1970’s to the 1990’s focused on properties of microbes as though they were the same everywhere. For example, when the amount of RNA transcribed from a locus of DNA changed due to a change in the microbe’s environment, microbiologists thought of it as a fixed change in the number of molecules, the same in all cells of the population, and starting from and arriving at fixed levels.

A series of papers published in the first decade of this century[19, 65] by a number of physicists—whose training told them that any system on the scale of chemical reactions in a cell would be noisy and variable—changed that. The strange part of this tale is that there was a literature from the 1920’s on just such variation, exquisitely documented and illustrated, and completely forgotten.[36]

To summarize, the basic activity of practitioners is running trials driven by trying to match some idealized. We have also seen our first two modes of progress: gathering data from trials, and creating and refining idealized trials. Next we look at how practitioners judge whether a trial has reached its ideal.