During the 1935 autumn term at University College London, Ronald A. Fisher offered a course on the history of biometry to a broad audience of biologists and statisticians. The course consisted of 14 lectures given on consecutive Thursday afternoons at 2:30 p.m.; the fourth, fifth and sixth lectures concerned Gregor Mendel. As Allan Franklin reminds us in the "Overview" essay with which Ending the Mendel-Fisher Controversy opens, Fisher's interest in Mendel dated from his student days at the University of Cambridge . As a 21-year-old undergraduate in 1911, Fisher—who was probably influenced by analyses performed by Frank Weldon in 1901—had commented on the exceptionally good fit of Mendel's data to his theory. Fisher's 1935 lectures allowed him to return to the topic with a new set of eyes, after two decades of pathbreaking research in statistics and genetics. The lectures summarized a painstakingly detailed critical analysis of Mendel's experiments, as those experiments were reported in an 1865 account. Although that account was published in 1866, it was ignored until 1900, when it was brought to wide public attention with great scientific fanfare.

Fisher subsequently wrote up this material for publication in the inaugural issue of the journal Annals of Science in 1936. That article is itself a masterpiece in what may be called the forensic history of science. Fisher was an extraordinarily creative and insightful statistician, an equally accomplished geneticist, and an experienced agricultural biologist; he was widely read in the history of science. He was as well qualified as a person could be to perform such an analysis, and he did not disappoint. Fisher dissected the 1866 article with a keen appreciation of Mendel's likely state of mind, called detailed attention to the methodological excellence of the experimentation and the care taken in carrying it out, praised Mendel in very high terms and discussed the different polemical purposes the work had served in early 20th-century biology.

But Fisher also called attention to a troubling aspect of the reported data: They not only fit the Mendelian theory well, they fit it too well, as judged by the use of the chi-square test Karl Pearson had introduced in 1900. The chance of getting such a good fit under standard genetic models was judged to be less than 0.0001. And this was not just in one simple experiment; Fisher found the phenomenon to be systematically true: The overly good fit was pervasive in data on thousands of plants in dozens of experiments conducted over eight years, which Fisher evaluated with many separate chi-square tests. To make matters even worse, Fisher argued that in a few cases Mendel had used the wrong theoretical counts, and the data also fit those erroneous counts too well! Fisher's inescapable conclusion was that someone had screened or otherwise slightly sophisticated the data, as if silently rejecting some disparate data, thereby reducing the variation with an eye on the known and expected theoretical values. He could not bring himself to attribute this to Mendel, so he offered the hypothesis that an assistant might have done it, without Mendel's knowledge.

Fisher's statement on the data sophistication was buried late in the 1936 article, and his whole account was itself largely ignored until the arrival in 1965 of the Mendel Centennial. In that and the succeeding year two books appeared, each republishing Mendel's account in translation as well as Fisher's paper, together with a selection of materials commenting on either or both works. Over the past 40 years this material has given rise to an industry perhaps aptly describable as CSI: Mendel. Some of its output is sanctimonious tongue-clucking about fraud in science, but a greater portion is devoted to attempts to explain away all or part of Fisher's analysis as due to factors other than purposeful sophistication of the data. Ending the Mendel-Fisher Controversy is both a summary of the activity in that industry and a contribution to it.

The book consists of Franklin's 78-page overview, a pre-1909 translation of Mendel's account, Fisher's 1936 article, and four other articles selected by Franklin that were originally published in 1986, 1994, 1998 and 2001. These papers are all by distinguished authors: Franklin is a noted historian of science; A. W. F. Edwards, a statistician, was Fisher's last student; Daniel L. Hartl is a geneticist, and his coauthor Vítezslav Orel is a biographer of Mendel; Teddy Seidenfeld is a philosopher of science who writes on statistics; Daniel J. Fairbanks is a plant biologist, and his coauthor Bryce Rytting is a musicologist. Edwards, Hartl, Seidenfeld and Fairbanks have all written postscripts to their original articles, offering recent reflections or even recent data; together with an appendix on chi-square tests by Fairbanks, this new material fills 40 pages. As might be expected with such a plan (and so many papers written at different times for different audiences), there is significant repetition, with each author offering his own summary of the same case and extensive quotation from some of the same works included in the volume.

The texts are all well written and cogently argued. Every author has something interesting to say; each has a different point of view, and all add to the story. Anyone wishing an informed introduction to the issues involved will find it here, although a thorough, careful reading will take them to a level of Mendelian minutiae that only CSI: Mendel aficionados are likely to enjoy. Persevering readers will be introduced to an astoundingly varied set of clever speculations about how Mendel might have conducted his experimentation to arrive at the effect Fisher found, some of these tested by fresh experiments. Some readers might wish that Eva Sherwood's 1966 translation of Mendel had been reprinted rather than the 1909 version (which was criticized by P. S. Hewlett in a 1975 piece in Biometrics for not respecting the paragraph breaks of the German original and thereby giving wrong emphasis in places).

The book's provocative title may lead readers to believe that a smoking gun, or maybe even some incriminating DNA, has been found. I suspect that anyone harboring an expectation that this book will permit the case to be closed will be disappointed. What then do these distinguished authors find? The bottom line, put more simply than some of the authors would fully endorse, is that Fisher was right in 1936. He was right that Mendel performed all the experiments he described, and many others to boot. He was right that Mendel's standards of experimentation were very high. He was right that slight sophistication of some sort must have been performed, but we will probably never know the full story. These authors show no sympathy for Fisher's generous placement of the blame on an unnamed assistant, however.

The sophistication of data—be it trimming, selective reporting, alteration or omission of data, whatever words may be used—is not considered proper in today's experimental world if done consciously without a full and open explanation and discussion. But even today Mendel's conjectured practice, which may well have been entirely subconscious, would most likely not rise to the level of fraud, and neither Fisher nor any of the present authors believe that it did. Thousands of plant classifications needed to be performed, and biases toward known expected counts on the order of a few percent would have been sufficient to account for what Fisher found. As Sewall Wright wrote about this case in 1966, "Checking of counts that one does not like, but not of others, can lead to systematic bias toward agreement." Mendel knew some probability theory, but the statistical cautions we now regard as needed in scientific data analyses were not then in common currency. As a student in Fisher's course in 1935, William G. Cochran took notes that record Fisher as saying that Mendel was "not worried about variability of biological material."

What of the particular cases where Fisher found Mendel's counts biased toward a 2:1 ratio Mendel believed to be correct, when in fact on reasonable expectation another ratio should hold? Here the authors differ; they unite in finding difficulties with most other peoples' attempts to explain this instance of the phenomenon, but several of them also believe that their own speculations can win the day for Mendel. These arguments include criticizing Fisher's use of a one-sided test rather than a two-sided one in one instance (the one-sided test seems to me to be right here), speculation that Mendel had performed subsequent experiments to rule out the problem, and informed guesses that the number of seeds and number of plants may be different from the numbers Fisher used. Any of these latter arguments may be correct, but all are speculative and none seem compelling to this reviewer. The suggestion by Wright in 1966 that the classification of 3 percent of some 1,000 plants studied in this case might be off due to subconscious bias toward expected ratios passes William of Occam's test more easily, given the general agreement that something like this happened in Mendel's other experiments.

In any case, all the authors agree with the basic finding of some data sophistication. Why then do they say, "It is time to end the controversy"? Essentially because they believe that no further resolution can be reasonably expected, and the existence of the controversy in any case is partly based on misreading Fisher as accusing Mendel of fraud. Continual harping on this problem, the book says, damages Mendel's reputation unfairly, with unwarranted staining of scientific practice in the process. Whether or not you agree with that, an actual end to that discussion is unlikely to be a consequence of this book. A generation of genetics students have had their interests whetted by these issues, and thanks to these lucid, insightful and balanced articles, another generation will be able to join the quest with even better understanding. And who knows? Maybe some one of these students will find a universally compelling explanation that even Fisher overlooked.

Stephen M. Stigler is Ernest DeWitt Burton Distinguished Service Professor and Chair of the Department of Statistics and the College at the University of Chicago. His books include Statistics on the Table: The History of Statistical Concepts and Methods (Harvard University Press, 1999) and The History of Statistics: The Measurement of Uncertainty before 1900 (The Belknap Press of the Harvard University Press, 1986).