Introduction

The burgeoning support for evidence-based medicine in the 21st century has made clear why physicians must depend on quantitative data generated by clinical trials for decisions about treatment. The value to them of data reported from trials depends in part on the adequacy of a trial’s design and how carefully the trial carried out an appropriate design. In addition, if the conclusions reached in a trial are to be reliable, they must be justified by appropriate statistical assessment of the data reported. Today’s ubiquity of statistical analysis of data from clinical trials developed mainly during the second-half of the 20th century. But the concept of needing numerical data to justify conclusions about treatments goes back at least three centuries.

An early major attempt to assess with quantitative data the validity of a medical treatment came in the first half of the 18th century when James Jurin (Jurin 1724) and other English physicians gathered data on differences in mortality from smallpox inoculation as a preventive treatment compared with mortality from naturally-acquired smallpox (Miller 1957; Rusnock 2002; Huth 2005). Jurin’s judgments came, however, simply from inspecting those mortality data and not from critical statistical analysis of the validity of conclusions that might be drawn from them; such statistical methods had not surfaced for applications in medicine. His use of such mortality data can be described as descriptive statistics, crudely put – “simply eyeballing the data”.

Jurin’s simple statistical assessment of quantitative data was only one of an increasing number of such assessments in 18th century Britain (Tröhler 2000). But these judgments relied, in essence, on comparing mortality or morbidity proportions in groups of treated and untreated persons. Even the concept of considering the value of data for judgments began to be entertained. For example, in 1785 Gilbert Blane noted (Blane 1785) that:

There is . . . a great difficulty attending all practical inquiries in medicine; for in order to ascertain truth, in a manner that is satisfactory to a mind habituated to chaste investigation, there must be a series of patient and attentive observations upon a great number of cases, and the different trials must be varied, weighed, and compared, in order to form a proper estimate of the real efficacy of different remedies and modes of treatment.

Blane’s “weighed” suggests the concept of critical statistical assessment, but there is nothing elsewhere in his book to suggest specifically how this should be carried out. There is a hint in Blane’s “compared” of the concept of looking for bias, but he does not explain his use of the word.

Relatively simple assessments of treatment data continued into the early 19th century. The most famous among these were the analyses of Pierre-Charles-Alexandre Louis (1787-1872), an eminent clinician in Paris. He used data comparing mortality of patients treated relatively early with blood-letting and others in whom treatment was delayed (Louis 1835; Morabia 2004) to judge the efficacy of the treatment. Louis’ reports stimulated advocacy of la methode numerique (the numeric method) for formal judgments on the efficacy of treatments, rather than simply accepting physicians’ opinions. His reports provoked intense debates in Paris in the Académie des Sciences in 1835 and in the Académie de Médecine in 1837, one faction lauding the numerical method as a scientific advance and an opposing faction lauding the central importance of a physician’s judgment based on the experience of applying a treatment to a particular patient.

The origins of Jules Gavarret’s interest in medical statistics

The debates in the Académies did not include any data on the precision or reproducibility of the mortality rates reported by Louis. There had already been suggestions by the French mathematicians Pierre-Simon Laplace (1749-1827) (Gillispie 1978) and Siméon-Denis Poisson (1781-1840) (Costabel 1978) that le calcul des probabilités (probability calculation) could be applied to judgments on quantitative data in medicine. Such calculations had already been applied in various ways — to gambling decisions, insurance risks, demographic data, juridical questions, astronomical data — but not to judgments on medical treatment. Laplace (1825) noted in his Essai philosophique sur les probabilités (Philosophical essay on probabilities) its potential for its use in medicine.

The probability calculus can make one appreciate the advantages and disadvantages of the methods used in the speculative sciences. Thus, to discover the best treatment to use in curing a disease, it is sufficient to test each treatment on the same number of patients, while keeping all [other] circumstances perfectly similar. The superiority of the most beneficial treatment will become more and more evident as this number is increased, and the calculus will yield the corresponding probability of its benefit and of the ratio by which it is greater than the others.

Jules Gavarret, a young physician who had not yet established the reputation he later attained in Paris medicine, attended the debates in the Académies and applied the probability calculation to Louis’ data to judge the validity of his conclusions on blood-letting. Gavarret’s work surfaced in 1840 in his Principes généraux de statistique médicale ou développement des règles qui doivent présider à son emploi (General principles of medical statistics, or development of rules that should govern their use) (Gavarret 1840). It must be noted here that the meaning elsewhere of ‘statistics’ in medicine of this period was narrower than that of today. One of today’s dictionaries of scientific terms ([Anonymous] 1984) defines ‘statistics’ thus:

A discipline dealing with methods of obtaining data, analyzing and summarizing it, and drawing inferences from data samples by the use of probability theory.

In Gavarret’s time, ‘statistics’ in medicine represented the first half of this definition and not the ‘inferences’ in the second half. In the case of a datum on a mortality rate, an inferential statistical analysis of the data could permit one to judge the probable range of mortality data that would be found with re-sampling of a patient population. As will be seen below, Gavarret’s book appears to be the first and pioneering work on how to apply inferential statistics to therapeutic data for critical judgments on the value of therapies. His book has never been translated into English for publication, and awareness of his concepts has been largely limited to historians of medicine and statistics. My translations are not highly literal but are cast, I hope, in today’s scientific idiom in English. Parts of the quotations within ‘square brackets’ ([]) are my judgments on proper explanatory enlargements of Gavarret’s own text.

Who was Gavarret and what propelled him to write his Principes?

Louis-Dominique-Jules Gavarret (Le Tourneur 1982; Matthews 1998) was born 28 January 1809 in Astaffort, Lot-et-Garonne, a town roughly half way between Bordeaux and Toulouse. In 1829 he entered l’École Polytechnique in Paris and then in late 1831 went into military service as an artillery officer (“un sous-lieutenant d’artillerie”). At the beginning of 1833 he resigned his commission to begin medical studies with the already eminent Gabriel Andral (1797-1876) (Prevost 1936). Gavarret’s collaborative research with Andral centered on chemical studies of blood and respiration, and they helped to establish clinical chemistry and hematology as clinical sciences. Andral was among the supporters of Louis and his advocacy of the numerical method for judgments on treatments. But Andral apparently did not stimulate Gavarret’s statistical inquiry into Louis’s data. In his preface to Principes, Gavarret describes (p ix-xi) the stimulus:

At the time I began my medical studies, one heard everywhere of propositions mathematically proved, of laws mathematically established. From the professor and academician to the lowliest student, everyone spoke the same way. One had to say, truly, that thanks to the rigor of the new methods of investigation, nothing could stop the progress of medicine. However, on considering this more closely, one was not slow to see that if the great studies of pathological anatomy had made and were making clear every day the great service of research in establishing the site, and defining diagnostic criteria, of diseases, therapy was far from making such rapid progress. …. In the midst of so many tireless and meritorious students [of new therapies], several distinguished men perseveringly fought for adopting the use of statistics in medicine [Note that here Gavarret is using statistics solely in the sense of numerical data, what we can call descriptive statistics, and not in the sense of today’s inferential statistics.]. It was, they said, the sole means of collecting the current results of therapies. With no aim but looking for the truth, I gave myself up to a serious study of works published on this subject; I followed with the greatest care the arguments made in the medical journals, and despite all my efforts, it was impossible at first to understand what seemed to be attached to this discussion. For finally, in the way which the question was posed and seen, was it seen as debating for one side or the other? Only to know if one replaced by numeric reports the words often, rarely, in most cases, et cetera, et cetera? The numerical method, considered from this narrow point of view, could be taken to mean a simple reform of language, but it was impossible to see it as a question of scientific method and philosophic principle.

Then Gavarret describes the debate in the Académie des Sciences in October 1835 that stimulated his interest in applying probability calculation to therapeutic data.

From the beginning the field of discussion was widened; instead of considering what up to then was called the numeric method, there was discussion of the possibility of applying probability calculation to therapeutic research. The reporter, M. Double, opposed the possibility of this application. Navier, in an outstanding discourse, treated the principal points of the topic with the greatest lucidity and closed with conclusions favourable to this kind of calculation.

Claude-Louis-Marie-Henri Navier (1785-1836) (McKeon 1974) was a highly regarded mathematician and engineer who had been elected to the Académie des Sciences in 1824. He had been a member of the faculty of the École Polytechnique, attended by Gavarret before the start of his short military career and probably regarded by him as a speaker with real authority (p xi, xii).

From then on I had the conviction that the question of medical statistics was not a trivial one but right off a question exciting one’s further interest. Navier’s discourse made clear his grasp of the subject and the judgments one could draw from the use of the principles of probability calculation in therapeutic research.

On the following page, Gavarret makes clear his dependence on the mathematician Poisson for applying the probability calculation to medical data.

The sources from which I have drawn the principles I develop on this subject are the course of M. Poisson [in the École Polytechnique?] and his fine work on the Probabilité des Jugements.

Gavarret goes on in his Preface (p xv) to justify further his writing of Principes. Here is his central stimulus.

A final reason that got me to prepare this book is thecomplete lack of any work on the use of calculation in medicine. One has been able, it is true, to properly speak of “statistics”, of “probability calculation”, of “numeric method”, et cetera, but no one has had the idea of preparing a treatise on this material. And, nevertheless, is it not indispensable that physicians, strangers for the most part to the study of higher mathematics, have at their disposal, stripped of all algebraic formulas, the fundamental principles on which depend all investigations with medical statistics?

I thus tried to do nothing less than fill this important void up to now in [medical] science.

The Preface is followed by four Chapters. Following these are six Notes providing the calculation methods and results that have been summarized and discussed in the preceding two chapters.

Chapters I and II discuss general principles underlying and justifying Gavarret’s more specific discussion in Chapters III and IV, such as the inadequacy of logic for coming to conclusions about results in medicine, the influence of variables on determining “facts” in medicine, and other similar considerations.

Gavarret’s definition of the conditions for reliable statistical analysis
Gavarret’s Chapter III, Application of the law of large numbers to therapeutic research, opens (p 100) with an echo of Laplace’s “[testing] each treatment on the same number of patients, while keeping all [other] circumstances perfectly similar”.

Observations, to be legitimately added up, do not have to be identical, but only related to phenomena whose manifestation was due to the effect of any cause among a group of possible invariable causes during the whole duration of the trial. …. It is thus the group of all possible causes of death and of cure that affect the patients that one has to make invariable to be able to regard a medical statistic as containing homogeneous qualities.

He then specifies (p 110-2) the five principles that define “the sources of all possible causes of death and cure that affect a patient with a known disease and treated with a medication”.

1. The individual conditions. …. All the circumstances that relate to the age and the sex of the patient, to his temperament, and to his constitution, to the diseases he has already had, to the state of his health in which the present affliction presented. ….

2. The state of health preceding the development of the illness. …. The profession, the social position, the life style of the patients, the condition of ventilation, of nourishment which they regularly find themselves, the moral influences that might have affected them.

3. The hygienic conditions during treatment. …. The healthiness of the place in which the patient was cared for, the moral influences that could have affected him during the duration of the illness, and the exactness with which the orders of the physician were carried out.

4. The illness itself. …. All the causes that relate to the nature of the illness, to the extent and severity of the organic lesions, to its influence on all the body’s economy, to the time between the onset of the illness and the beginning of treatment, to the various complications that could develop during the course ofobservation.

5. The therapeutic method used. …. Not only such and such medication, but all the means that make up the treatment of the patient. The dose used could vary depending on the particulars of the cases.

A few pages further on he defines (p 116-7) the conditions needed during a trial for proper statistical judgments.

So that a statistic can be considered composed of similar facts, and that therefore its information can enable us to measure the value of a medication, the observer has to hold to the following conditions:

a. The patients have to be drawn exclusively in the same locality and from the same classes of the population.

b. The illness experienced has to have a precise diagnosis and perfect definition. It has to be nosologically well delineated and separate from the illnesses resembling it most in this group ….

c. The statistic within the makeup of the illness considered to be specific has to contain the precise indication of the number of cases within each of its varieties.

d. The medication tried has to be clearly formulated, as well as its main modifications for each of the varieties of the illness.

e. The medical statistician has to be competent.

Surely it is clear from these last two excerpts that Gavarret is setting very high standards for the design of trials and the quality of the data that will result from a trial. But it must be kept in mind that Gavarret did not necessarily have in mind prospectively-initiated trials but, perhaps, simply case collections for comparisons of a treatment and no treatment, or of two different treatments. They are standards that some investigators years later attempted to meet in case-control or cohort studies. The practical difficulties in thus trying to meet such standards eventually led to the development of design methods to attempt, instead, to carry out adequate randomized allocations of patients to one of two or more arms of a prospectively-initiated trial, so that influences on the outcome variable studied, other than the treatment itself, would be adequately distributed between, or among, trial arms. The fidelity of the randomization could then be judged by comparing potentially significant prognostic variables other than the treatment(s) under study in the patients in the different arms of the trial. These were developments that did not fully develop for medicine until the second half of the twentieth century. I have found no evidence in Principes suggesting that Gavarret anticipated the need for alternation or rotation to treatments being compared. There is, however, a suggestion that he anticipated the need for randomization in selection of patients for treatment in each arm of a treatment regimen. This can be seen below in the quotation from Principes’ page 156.

In his Chapter III, to demonstrate the weakness of “the numerical method” when the probable correctness of a datum from a numerical summary of outcomes is not known (p 141), Gavarret applies le calcul des probabilités (probability calculation) of Poisson to data from Louis on “cures” and “death” in 140 patients treated with blood-letting.

To finally finish with numerical reports considered as a measure of the influence exercised by a medication, look at what errors have been recently produced by the physicians recommending the use of statistics. We will be satisfied with one example of such and we will allow ourselves to select those of M. Louis, which represent the largest number of observations. This skilful observer, in his research on typhoid fever, has tried to classify the treatment of this disease in carrying out the most detailed analysis of 140 cases of this disease. The observed subjects are divided thus:

52 died

88 cured

140 total patients

Thus the mean mortality is, in these cases, equal to

0.37143

This is to say, in taking this datum as the average measure of the cure used, one has to take as shown that 37143 persons among 100,000 patients died or approximately 37 of 100 patients. [in today’s terms, 37%] If, with the help of the principles of the law of numbers, we seek to determine the extent of possible error that may weaken such a conclusion, we find it equal to

0.11550

Thus all that we have learned from the work of M. Louis, in reality, is that under the influence of the curative means used in his 140 observations

the number of deaths must vary between48 493 and 25 593 per 100 000 patients

Or approximately between49 and 26 per 100 patients.

Here Gavarret tells us that we cannot take 37% [0.37143] to necessarily be the true mortality with this treatment. With a sample of 140 patients, the true mortality could be, with a probability of just over 99.9 to 1, as high as 49% or as low as 26%, depending on which 140 patients made up the sample. He goes on to demonstrate that, as the number of patients sampled goes up, the range of probably correct values for mortality narrows. He used the term “les limites d’oscillation” (limits of oscillation) when referring to this range of calculated values.

Judging differences in the effects of two different medications

In the closing ten pages of Chapter III, Gavarret shows (p 156-7) how calculating “the limit of possible errors” enables one to judge whether a difference between two average mortality rates in two groups of patients – each group having received different treatments – probably represents a true differential effect between the treatments.

Suppose that in an epidemic, 500 patients chosen at random have been assigned to a medication, and 500 others also chosen at random to a different treatment, one were to obtain the following results:

1st medication

2nd medication

100 died

150 died

400 cured

350 cured

500 patients

500 patients

. . . Under the influence of this first medication, there have to die 20,000 persons of 100,00 patients

Under the influence of this second medication there have to die 30,000 persons of 100,000 patients

The difference between the two mortalities is thus:

10,000 of 100,00 patients

In thus following the logic of M. Louis, one concludes from it that the first medication is preferable to the second.

To estimate the true value of this difference, in calculating the limit of possible errors in this case, we will find it equal to:

7,694 of 100,00 patients.

The difference between the mortalities found is greater in this a posteriori conclusion, so thus we must recognize that in reality the first medication is superior to the second.

The mathematical details of these calculations are in Principes’s Note D (p 287-8).

Gavarret’s summary of his views

The closing chapter of Principes takes up application of the probability calculation (determining the limits of possible errors) to medical demographic, non-therapeutic questions and need not be considered here. It is followed, in addition to the technical Notes mentioned above, by Gavarret’s General Conclusions (p 245-8), the first seven of which are relevant to his views on judging the effects of therapies.

If we now take a quick look at all of the considerations we have developed in the course of this work, we are led to put forth the following propositions as definitely demonstrated.

PROPOSITION I
The rules of logic are inadequate for judging the effect of a given medication in an equally given disease and for classifying the medications recommended for this same disease in the order according to their effects.

PROPOSITION II
The principles of the law of large numbers are strictly applicable to therapeutic research and are solely able to solve these two important problems.

PROPOSITION III
The mean mortality, as provided by statistics, is never the exact and precise representation of the effect of the treatment tried but approaches it as the number of observations is increased.

PROPOSITION IV
A therapeutic law ensuing from the comparison of a small number of observations may be so far from the truth that it merits no degree of confidence in any case whatsoever.

PROPOSITION V
A therapeutic law can never be absolute; its application can always oscillate between certain limits which are all the narrower the more the collected observations are multiplied and which one can determine with the aid of the numbers constituting the statistics that have provided the law.

PROPOSITION VI
To be able to decide in favour of one treatment over another, it is not enough that the method yields better results but that the difference found must also exceed a certain limit, the value of which is a function of the number of observations.

PROPOSITION VII
Any difference between the obtained results that is below this limit, while this limit decreases as the number of observations increases, must be disregarded and deemed void.

The remaining three propositions summarize Gavarret’s conclusions in Chapter IV.

The reception and fate of Gavarret’s Principes

Gavarret’s book got wide attention in Europe and some notice in the United States (Matthews 1995; p 39-61; 16). The one American visibly impressed with Gavarret’s thinking was Elisha Bartlett (1804-1855) (Osler 1899). Bartlett had spent many months observing medicine in Paris after his graduation in 1826 and apparently read French without difficulty. In his 1844 An Essay on the Philosophy of Medical Science (Bartlett 1845; Huth 2006b) he gives seven pages to summarizing, with clear approval, Gavarret’s principles needed for the collection of satisfactory data on treatments and the need for large numbers of cases in trials.

… I shall enter into a somewhat detailed exposition of the subject before us, … the treatment of disease; for the materials of which I am almost entirely indebted to the admirable treatise of M. Gavarret, on Medical Statistics. [Bartlett, p 159].

European views of Gavarret’s Principes through the remainder of the 19th century differed widely, ranging from approval with advocacy of Gavarret’s views and procedures, to complaints that they did not contribute to “science” (Matthews 1995). Yet by the end of the nineteenth century, Gavarret’s application of the probability calculation for inferential statistical judgments on treatments seems to have sunk out of sight. It seems to have been unknown to (or unacknowledged by) the early twentieth century founders of the principles on which medical statistical concepts and methods now stand, for example, such men as William Sealy Gosset (“Student”) (1876-1937), Ronald Aylmer Fisher (1890-1962), Jerzy Neyman (1894-1981) and Austin Bradford Hill (1897-1991). In particular, the 1934 paper in which Neyman (1934) advanced the concept of confidence intervals and coined the term, does not mention Gavarret and his use of the calculation of limits of possible error.

The form of this solution [of the problem of finding “the distribution of certain characters in repeated samples”] consists in determining certain intervals, which I propose to call confidence intervals…, in which we may assume are contained the values of the estimated characters of the population.

It seems fair to say that the concept in Gavarret’s pioneering use of the probability calculation for estimation of the limits of possible error (limits of oscillation) for inferential judgments on data on treatment surfaced and became widely applied in the form of today’s closely-related and widely applied confidence intervals more than a century after publication of his Principes. Gavarret’s statistical work has been well represented in late 20th-century histories of statistics (Lancaster 1944, 21). Both David Lilienfeld (1978) and Alvan Feinstein (1998) specifically refer to his probability calculation as producing the equivalent of today’s confidence interval.

It is important to emphasize, however, that Gavarret’s book was not “pioneering” in the sense of offering an innovation in statistical concepts and methods. But it does appear to be “pioneering” in that, as far I know, it was the first application of a method in inferential statistics to a medical question about the efficacy of a treatment. As I have already noted above, the probability calculation had been applied in other fields but not, apparently, to questions in clinical medicine.

The most prominent of early advocates of a wider use of confidence intervals in reporting medical research was Kenneth Rothman (1978), founding editor of the journal Epidemiology. The strongest advocates among English-language clinical journals were, in the United States, the Annals of Internal Medicine and, in Great Britain, the British Medical Journal with their publication in the 1980s and early 1990s of a number of articles of advocacy (Altman 2000). In general, the position of the advocates was that the confidence interval gave more information on the reported variables than the then much more widely used hypothesis testing and resulting p values.

The remainder of Gavarret’s life

After publication of Principes, Gavarret did not return to work on medical statistics. He continued work with Andral on blood chemistry and respiratory physiology. His later work covered many topics in biophysics and physiology, including acoustic and phonation phenomena, heat production, and vision, and his productivity is manifested in the 85 references to his publications in the Index-Catalogue of the Library of the Surgeon General’s Office ([National Library of Medicine] 2006). Gavarret developed a prominent place in Parisian medicine and medical education, serving for a term as president of l’Académie de Médecine. His scientific eminence was recognized in 1847 with the decoration of the Legion of Honor, and fully marked in 1886 with his appointment as a Commander of the Legion. He died in 1890, 4 years later, age 91.

Conclusion

Why did the view of what was needed in critically-judged numerical data for conclusions on the value of treatments — a view exemplified by Gavarret’s Principes — take over a century to be applied for estimating the value or lack-of-value of a treatment? The answer is probably complex. Clearly, innovations in treatment, aside from surgery, were scarce through the nineteenth century until the years around the beginning of the twentieth century. Established treatments were apparently generally accepted as justified during a continuing reign of authoritarianism in medicine — what the professor says is right and need not be challenged by running trials to re-examine a treatment’s efficacy. Perhaps these emerged only when economic pressures and the growth of complexity in treatment possibilities began to tell us that we had better be sure of the value of what we do in medicine. There may be other reasons yet to be teased out by historians.

This James Lind Library commentary has been republished in the Journal of the Royal Society of Medicine 2008;101:205-212. Print PDF

Huth EJ (2006b). Transatlantic ideas on the philosophy of therapeutics in the middle of the 19th century. The James Lind Library.

Jurin J (1724). A letter to the learned Dr. Caleb Cotesworth, F. R. S. of the College of Physicians, London, and physician to St. Thomas’s Hospital; containing a comparison between the danger of the natural small pox, and that given by inoculation. Philosophical Transactions of the Royal Society of London (1722 – 1723), 32:213-227.3.

Miller G (1957). The Aadoption of inoculation for smallpox in England and France. Philadelphia: University of Pennsylvania Press.

Morabia A (2004). Pierre-Charles-Alexandre Louis and the evaluation of bloodletting. The James Lind Library (www.jameslindlibrary.org).

[National Library of Medicine (2006)]. Index-Catalogue of the Library of the Surgeon General’s Office, U. S. Army. Accessed at Index-Catalogue of the Library of the Surgeon General’s Office at http://indexcat.nlm.nih.gov/ 18 July 2006.

Neyman J (1934). On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. J R Stat Soc 94:558-625.