Abstract

Systematic reviews are currently favored methods of evaluating research in order to reach conclusions regarding medical practice. The need for such reviews is necessitated by the fact that no research is perfect and experts are prone to bias. By combining many studies that fulfill specific criteria, one hopes that the strengths can be multiplied and thus reliable conclusions attained. Potential flaws in this process include the assumptions that underlie the research under examination. If the assumptions, or axioms, upon which the research studies are based, are untenable either scientifically or logically, then the results must be highly suspect regardless of the otherwise high quality of the studies or the systematic reviews. We outline recent criticisms of animal-based research, namely that animal models are failing to predict human responses. It is this failure that is purportedly being corrected via systematic reviews. We then examine the assumption that animal models can predict human outcomes to perturbations such as disease or drugs, even under the best of circumstances. We examine the use of animal models in light of empirical evidence comparing human outcomes to those from animal models, complexity theory, and evolutionary biology. We conclude that even if legitimate criticisms of animal models were addressed, through standardization of protocols and systematic reviews, the animal model would still fail as a predictive modality for human response to drugs and disease. Therefore, systematic reviews and meta-analyses of animal-based research are poor tools for attempting to reach conclusions regarding human interventions.

Introduction

Review articles in the scientific literature can be classified as a general review article, a systematic review (SR), or meta-analysis (MA). The purpose of a review article is to provide readers with a summary of published research in a particular field. Reviews usually focus on areas of progress over the recent past, for example five years. A general review article attempts to summarize all the relevant, published literature and provide some analysis of the controversial areas of the field or topic. In addition, it may suggest some novel ways to advance the field further [1]. Such review articles provide a concise analysis of a large body of literature and hence are important for readers from a variety of fields. Articles in PubMed, for example, can be searched based on whether they are classified as review articles.

In contrast, SRs seek to be more rigorous and comprehensive in addition to providing an opinion about outcomes or practice. For example, in medical science, SRs are used in hopes of ascertaining whether treatment A is superior to treatment B. Why is such an analysis necessary? Unfortunately, few research protocols are perfect so there may be controversy surrounding treatment options even after numerous studies. Therefore, combining studies and analyzing them may be useful. However, there is another reason explained by Greenhalgh: “Experts, who have been steeped in a subject for years and know what the answer 'ought' to be, are less able to produce an objective review of the literature in their subject than non-experts. This would be of little consequence if experts' opinions could be relied on to be congruent with the results of independent systematic reviews, but they cannot” [2]. One of the premises upon which the practice of SRs is based, is the inability of informed scientists to evaluate, without bias, the controversies in their own field. This is a reflection of human nature and is unlikely to change anytime in the near future [3]. A SR requires clearly stated objectives and rigorous criteria for what studies can and cannot be included, should be reproducible, include all relevant studies, seek to detect bias, and attempt to make determinations [4, 5]. SRs are acknowledged as being an integral component of evidence based medicine, where the goal is to analyze the evidence gained from the best scientific studies that qualify for consideration in order to make a determination regarding clinical intervention. The SR is thus “the conscientious, explicit, judicious use of current best evidence in making decisions about the care of individual patients” [6].

The term meta-analysis was coined in 1980 by Smith, Glass and Miller and involves a statistical analysis of the topic of a SR. A MA can be thought of as a quantitative SR [7]. Greenhalgh stated: “A meta-analysis is a mathematical synthesis of the results of two or more primary studies that addressed the same hypothesis in the same way” [2].

While the purpose of any scientific literature review is to summarize and evaluate relevant articles in a scholarly and rigorous manner, the review must also consider relevant research in other disciplines of science—consilience—as well as the scientific underpinnings of the topic under consideration. For example, any SR of research articles regarding acupuncture should take place in light of the fact that no mechanisms have been discovered that would allow scientists to expect success from using acupuncture in order to alleviate objective pathology [8, 9]. In contrast, the Germ Theory of Disease supports a SR of the efficacy of antibacterial use for preventing complications from, or shortening the course of, ear infections in children. An example of this concept comes from oncological surgeon David Gorski who criticized the National Center for Complementary and Alternative Medicine (NCCAM) for spending resources to study: “treatment modalities that are inherently unscientific, being as they are based on prescientific or demonstrably incorrect understandings of human physiology and disease” [10]. An example of knowledge from other fields of science affecting how a SR might be conducted can be found in homeopathy. Knowledge from chemistry and physics vis-à-vis how to apply Avogadro's number when calculating dilutions, should inform scientists seeking to evaluate homeopathy by conducting a SR [11].

Finally, the fact that conclusions drawn from SRs and MAs have been shown to be wrong should also be considered when evaluating a treatment or other practice being evaluated by a SR or MA. For example, a meta-analysis by the Cochrane Group reported that albumin increased deaths in critically ill patients [12]. However, a large randomized study in Australia later revealed no such effects [13]. In summary, SRs and MAs are a valuable tool in assessing what is currently known regarding a subject but, like any tool, can fail.

Systematic reviews and standardization of animal model protocols

Because nonhuman animal models (hereafter referred to as animal models or animals) have on multiple occasions been unsuccessful in predicting human response to drugs and disease (we will address this claim in depth), many have called for SRs in order to improve the models [14-24]. An example of this predicament would be the animal models used to determine which drugs to develop in an attempt to diminish neurological damage from ischemia events of the central nervous system (CNS) [17, 25-30]. By analyzing animal-based research with SRs, flaws in the methodology would also become apparent thus leading to eventual standardization of such studies. This would ostensibly also lead to better predictive values for humans (see table 1 for calculating such values). Bracken supports this, stating:

One reason why animal experiments often do not translate into replications in human trials or into cancer chemoprevention is that many animal experiments are poorly designed, conducted and analyzed. Another possible contribution to failure to replicate the results of animal research in humans is that reviews and summaries of evidence from animal research are methodologically inadequate [18].

Table 1

Binary classification and formulas for calculating predictive values of modalities such as animal-based research.

Further evidence that SRs are expected to transform the predictive value of animal-based research comes in the form of the 1st International Symposium and Workshop on Systematic Reviews in Laboratory Animal Science that was held at the Radboud University Nijmegen Medical Centre on February 9-10, 2012. The workshop celebrated “5 years of the 3R [the 3R here refers to Reduce, Refine, and Replace animals used in research] Research Centre (3RRC) and stimulating an international discussion and collaboration between animal and clinical researchers on Systematic Reviews (SRs) of animal studies”[31]. Malcolm Macleod, the keynote speaker, discussed, “The transforming potential of the systematic evaluation of laboratory research.” The brochure for the conference stated:

. . . the use of SRs for the optimisation of animal testing is still rare which can lead to waste in funding and harm to patients and research volunteers. The 3RRC encourages the use of SRs in animal studies as they improve scientific quality, lead to implementation of the 3Rs principles, improve translational research and help in determining the value of animal studies to human health [31].

There are several claims here for the value of SRs. While we do not dispute the value of SRs to improve the quality of research and perhaps increase acceptance of the 3Rs, we strongly contest the notion that SRs will allow scientists to develop animal models that are predictive modalities for human responses to drugs and disease. Claims such as those above by Bracken and the organizers of the Symposium (and more we will cite below) regarding the benefit of using animal models in translational research however, directly assumes predictive ability. We will discuss this further when describing table 2.

There are methodological problems in current animal-based research. Pound et al. [32] highlighted some of the potential flaws when using animal models, including:

Variations in drug dosing schedules and regimens that are of uncertain relevance to the human condition.

Variability in the way animals are selected for study, methods of randomization, choice of comparison therapy (none, placebo, vehicle), and reporting of loss to follow up.

Small experimental groups with inadequate power, simplistic statistical analysis that does not account for potential confounding, and failure to follow intention to treat principles.

Nuances in laboratory technique that may influence results may be neither recognized nor reported, e.g. methods for blinding investigators.

Selection of a variety of outcome measures, which may be disease surrogates or precursors and which are of uncertain relevance to the human clinical condition.

Length of follow up before determination of disease outcome varies and may not correspond to disease latency in humans [32].

Hooijmans et al [14] have called for a gold standard for research involving animals that includes stating the specifics regarding housing, species, randomization, cage size and bedding among other parameters. Other checklists and suggestions aimed toward improving standardizations have also been published [21, 33-38]. Note that even here however, Hooijmans et al link standardization to prediction of human response stating: “In addition, an improved experimental design contributes to a better translation to the clinic and increases patient safety” [14]. Many reviews and opinions have echoed the above reasons for translational failure or predictive failure and have suggested ways to improve the likelihood of successfully predicting human responses to drugs and disease. The ARRIVE (Animals in Research: Reporting In Vivo Experiments) Guidelines for Reporting Animal Research [39] consist of a 20 item checklist containing:

the minimum information that all scientific publications reporting research using animals should include, such as the number and specific characteristics of animals used (including species, strain, sex, and genetic background); details of housing and husbandry; and the experimental, statistical, and analytical methods (including details of methods used to reduce bias such as randomization and blinding) [39].

Another example of such an effort is the CAMARADES group (the Collaborative Approach to Meta-Analysis and Review of Animal Data in Experimental Studies). For example, the CAMARADES group identified significant sources of bias in a sample of almost 5,000 animal studies. These shortcomings included a frequent lack of blinding, randomization, and sample size calculation, in addition to overstatement of treatment efficacy due to unpublished studies.

While some scientists are more modest in their claims for the value of SRs and the standardization of protocols, clearly there are high hopes for what SRs can accomplish regarding the predictive value of animal models. We will now examine, in more depth, the reasons for the above concerns regarding the predictive value of animal models.

Prediction in science

The use of animals in science and research can be categorized per table 2 [40]. While all uses of sentient animals are cause for ethical concern [41-43], the use of animal models to predict human response to drugs and disease appears to be the main focus of the scientific community when attempting to justify animal use to society [44-52] [[53] p3]. This is consistent with Giles, writing in Nature, who stated:

In the contentious world of animal research, one question surfaces time and again: how useful are animal experiments as a way to prepare for trials of medical treatments in humans? The issue is crucial, as public opinion is behind animal research only if it helps develop better drugs. Consequently, scientists defending animal experiments insist they are essential for safe clinical trials, whereas animal-rights activists vehemently maintain that they are useless [54].

Statements from advocates for animal-based research acknowledge the importance society places on animal models being able to predict human response to drugs and disease. For example, Cheng stated: “Animal tests are necessary for some research, such as testing drugs for toxicity. It would be, in my opinion, improper to release drugs for human use without animal testing” [55]. Heywood likewise stated: “Animal studies fall into two main categories: predictive evaluations of new compounds and their incorporation into schemes designed to help lessen or clarify a recognised hazard” [56]. Vassar agrees, stating: “Chronic dosing in mice and monkeys is necessary to show the efficacy and safety of the antibody before it's taken into humans” [51]. The Council for International Organizations of Medical Sciences implies prediction when they state: “clinical testing must be preceded by adequate laboratory or animal experimentation to demonstrate a reasonable probability of success without undue risk” [45]. Rudczynski wrote: “the basic research model used by Yale University and its peer institutions is scientifically valid and predictive of human disease” [57]. (Emphasis added.) Such statements could be easily multiplied. The animal-based research community clearly stresses the importance and validity of using animals to predict human response to drugs and disease.

Table 2

Nine categories of animal use in science and research.

1. Animals are used as predictive models of humans for research into such diseases as cancer and AIDS.

2. Animals are used as predictive models of humans for testing drugs or other chemicals.

3. Animals are used as “spare parts”, such as when a person receives an aortic valve from a pig.

4. Animals are used as bioreactors or factories, such as for the production of insulin or monoclonal antibodies, or to maintain the supply of a virus.

5. Animals and animal tissues are used to study basic physiological principles.

6. Animals are used in education to educate and train medical students and to teach basic principles of anatomy in high school biology classes.

7. Animals are used as a modality for ideas or as a heuristic device, which is a component of basic science research.

8. Animals are used in research designed to benefit other animals of the same species or breed.

9. Animals are used in research in order to gain knowledge for knowledge sake.

The above claims are, however, in direct opposition to those advocating for SRs in order to improve the predictive ability of animal-based research. Before we survey the literature for empirical confirmation and present views of other scientists that strongly disagree with the above, we need to first define the term predict and refresh the reader's memory of how it is used in science.

Predict can be used in essentially two ways when discussing science. First, scientists develop hypotheses, which generate predictions that can then be tested. Several confirmations of the hypothesis, by predictions that are found to be true, strengthen the hypothesis while one failed prediction may neccesitate revising the hypothesis or even destroy it altogether. This is standard science based on the hypothetico-deductive method and we have no issues with using the term predict in this manner. Animal use involving categories 5, 7, and 9 in table 2 would employ this use of predict.

The second manner predict is used is when discussing the predictive value of a modality or practice. Such is the case with categories 1 and 2 in table 2. An example outside of biomedical science would be when Italian geologists were asked whether a series of small quakes in the area meant that residents should evacuate their houses because a major earthquake was likely forthcoming. The geologists stated that a major earthquake was unlikely and this was consistent with current knowledge of earthquakes. Nevertheless, the Italian legal system convicted the scientists on charges that essentially said they were negligent in failing to warn the residents to evacuate [58]. This was a cause for concern in the scientific community as an analysis revealed that small quakes forecast a major quake only 2% of the time [59, 60]. Clearly, a practice or modality that correctly calculates the answer only 2% of the time does not qualify as predictive. Exactly what percentage is necessary to qualify will vary with the field of study. Finding a method that will result in the correct answer 51% in the field of gambling, in blackjack for example, would be very productive and probably qualify as meeting the criteria for being a predictive practice. Using instruments to fly an aircraft on the other hand, requires that the instruments correctly communicate the exact location of the aircraft 100% of the time. While medical science does not require predictive values of 100%, it does require very high values. Tests that correlate with reality even 70% of the time are not very useful.

Just as important as what the word predict means in terms of predictive value and how PPV and NPV are calculated, is what does not constitute predictive value. For example, a single example of correlation does not qualify a model as predictive or indicate a high PPV or NPV. A modality or practice must be evaluated based on its history of correlating with reality. Cherry-picking examples is not allowed. Moreover, one must be very precise when defining what is being evaluated for predictive value. If one wishes to evaluate animal models in general then all the wrong answers by all species must be included in the calculation as well as all the correct answers. If one is calculating PPV for a specific animal model, say using beagles in hepatotoxicity testing, then all correct and incorrect answers for beagles should be included but not outcomes from different species or even different breeds.

With that background we can now evaluate the claims of animal models being, or not being, predictive for human response to drugs and disease.

Animals as predictive models

Empirical evidence

The assumption that animal models are predictive of human outcome is foundational for much of their use in biomedical research and for justifying animal-based research in general. Whether this assumption is true is a separate issue from that of methodology and study design although methodology may influence predictive value. The prevailing view within the animal model community among those calling for standardization and SRs, per above, is that animal models would perform better, meaning they would have a higher PPV and NPV for humans, if researchers adhered to strict criteria with respect to study design and methodology [61]. It is important to note that the potential validity of the animal model per se for predicting human response to drugs and disease is not questioned, at least in most of the literature that addresses SRs and standardization. We acknowledge that animals can successfully be used in categories 3-9 in table 2 and that SRs could positively impact on such use and that some calling for SRs and standardization advocate for such on this basis. However, it appears that the main emphasis among those calling for SRs and standardization is to improve predictive value. Therefore we consider it appropriate to explore whether a proper understanding of evolutionary biology and complexity science allows for the use of one species to predict responses to drugs and diseases for another, even under ideal circumstances [40, 41, 62-66]. SRs require the practice under study to be scientifically tenable. If the practice per se is not viable, then SRs will be of little value. We will now present the empirical evidence and later seek to place it within the context of complexity science and evolutionary biology.

Empirical evidence regarding the predictive value of animal models comes in the form of research amenable to quantification via table 1 and examples of multiple failures over many years in the same subject. Examples of the latter would include the search for a vaccine against HIV and neuroprotective drugs. Approximately 100 vaccines have been shown effective against an HIV-like virus in animal models, however, none have prevented HIV in humans [67, 68]. Even if an HIV vaccine came from animal-based research tomorrow, the animal model per se would not be predictive for humans as the PPV would be somewhere in the 0.01 area. Likewise, up to one-thousand drugs have been shown effective for neuroprotection in animal models but none have been effective for humans [23-25, 29, 38, 61, 69-71]. The predictive value is again minimal even if a successful drug is currently in development. The animal model has failed as a modality for predicting neuroprotection. Along the same lines, of twenty two drugs tested on animals and shown to be therapeutic in spinal cord injury, none were effective in humans [72]. As we are attempting to prove that animal models are not predictive such examples are important. Relatively few failures can disqualify a practice from being of predictive value while proving the opposite requires a large number of successes.

The success of the animal model in basic research can also be questioned based on the fact that, according to one report, only 0.004% of basic research papers in leading journals led to a new class of drugs [41, 73] and the fact that the success rate for target identification is similarly dismal [74-78]. For example, in part because the targets derived from animal models are not predictive for humans, the percentage of new drugs in development, after initial evaluation, that ultimately make it to market is somewhere in the area of 0.0002% [79, 80]. We acknowledge that the goals of basic research differ from the goals of applied research where predictive values are most often evaluated. However, because of funding challenges, research that would have historically been considered basic is now being promoted as applied and hence should be judged accordingly [41].

The empirical evidence from research outcomes quantifiable by the calculations in table 1 also supports our position that animal models cannot currently predict human response. Consider the following. In 1962, Litchfield [81] studied rats, dogs, and humans in order to evaluate responses to six drugs. The rat model demonstrated a PPV of 0.49 while the dog model demonstrated a PPV of 0.55. A PPV around 0.5 is not sufficient to qualify a modality as predictive in medical science. It is what one would expect from tossing a coin. Medical science demands values of 0.8 or higher if the modality is to be used for anything that will intersect with patient care. (Drug development is a clear example of a product or modality intersecting with patient care.) A similar study reported in 1990, examined six drugs in animal models, the side effects of which were already known from human data. The study found that at least one species demonstrated 22 side effects, but the models incorrectly identified 48 side effects that did not occur in humans, while missing 20 side effects that did occur in humans. This translates to a PPV of 0.31 [[82] p73]. A similar study - reported in 1990 - examined drugs abandoned during clinical trials secondary to toxicity. In 16 out of 24 cases, the toxicity had no correlation in animal models [[83] 49-56]. A 1994 study revealed that only six of 114 drug toxicities had animal correlates [[84] p57-67]. While the data do not allow the calculations in table 1 to be made, obviously these numbers fall far short of qualifying as a predictive medical modality or test. Likewise, figure 1 illustrates the random nature of bioavailability correlation among species. These examples could be easily multiplied (for example, see [85] [ [86] p67-74] [38, 87-93]). Moreover, in 1995, Lin compared pharmacologically important parameters in different species and pointed out that many examples of animal models predicting human response were in fact retrospective and hence not predictive at all [94].

We acknowledge that the empirical evidence could be interpreted in two ways. First, the animal model per se is simply not predictive of human response to drugs and disease. (For more on the failure of animal models of human disease to correlate with humans, see [62, 64-66, 96-100].) Second, perhaps the proposed SRs and standardization will allow for correction of methodological problems that have resulted in animal models failing to be of predictive value. Perhaps the problem is confined to methodology. In light of this dichotomy, the following questions must be addressed: Is there an all-encompassing explanation for the failure of animal models to be of predictive value regardless of methodology? Is there a theory or law in science that explains the empirical evidence we presented? We propose that the fact that all animals are examples of evolved complex systems constitutes a scientific theory explaining why animal models fail to be predictive modalities for human response to drugs and disease. In addition, this theory requires us to question whether an animal model will ever be a predictive modality for humans at the level of organization where disease and drug response occurs, regardless of methodological improvements.

Figure 1

Comparison of oral bioavailability among three species. Data from reference [95].

(Click on the image to enlarge.)

Evolved complex systems

Science as a discipline can arguably be dated to Newton and Descartes, both of whom accepted a mechanistic, deterministic universe amenable to study by reductionism [101, 102]. Because the systems under examination at that time were simple systems that were no more than the sum of their parts, exhibited predictable behavior with few interactions and feedback loops, and hence could be intuitively understood, linear cause and effect relationships were the order of the day. Because of the nature of the universe, such systems are amenable to laws while complex systems are usually described using statistics. Hence biological complex systems are more likely to be described by theories than laws [103-105]. Moreover, outcomes are usually described as involving a causal chain as opposed to a linear cause and effect relationship [105].

Ecosystems, climate, financial markets, and the US power grids are examples of complex systems, while humans and animals are examples of evolved complex systems. Reductionism has been of value in the study of complex systems but because of the nature of complex systems, reductionism alone is inadequate to fully describe the system [106-108]. Van Regenmortel states:

The reductionist method of dissecting biological systems into their constituent parts has been effective in explaining the chemical basis of numerous living processes. However, many biologists now realize that this approach has reached its limit. Biological systems are extremely complex and have emergent properties that cannot be explained, or even predicted, by studying their individual parts. The reductionist approach—although successful in the early days of molecular biology— underestimates this complexity and therefore has an increasingly detrimental influence on many areas of biomedical research, including drug discovery and vaccine development [109].

Complex systems have very specific characteristics that influence the ability of one complex system to predict the response of another [102, 106, 107, 109-127].

Complex systems are more than the sum of their parts, thus reductionism will yield an incomplete analysis of a complex system. As animal modeling is based in large part on reductionism [65, 105, 109, 120, 125, 128-132], this portends problems.

Complex systems exhibit emergence, meaning that new properties of a complex system arise from the interactions of the parts. These new properties cannot be determined even in light of full knowledge of the component parts, thus compromising reductionism even further.

Complex systems are resistant to changes and exhibit redundancy in their components. This again complicates extrapolation between complex systems.

Complex systems exhibit self-organization.

Complex systems demonstrate responses to perturbations that are nonlinear.

Complex systems are very dependent upon initial conditions (for example, genetic make-up). For example, strains of mice have been noted to respond very differently to gene deletion [133, 134] and groups of humans, such as sexes [135-140] and ethnic groups [141-149], respond differently to drugs and disease. Monozygotic twins have also been discovered to respond differently to perturbations because of small differences in genetic make-up [150-154].

Complex systems are composed of many components, which can be grouped into modules that interact with each other.

Complex systems have hierarchal levels of organization (different levels can even respond oppositely to the same perturbation).

Such systems [like the human brain] are characterized by large numbers of highly heterogeneous components, be they genes, proteins, or cells. These components interact causally in myriad ways across a very large spectrum of space-time, from nanometers to meters and from microseconds to years. A complete understanding of these systems demands that a large fraction of these interactions be experimentally or computationally probed. This is very difficult. . . . fields as diverse as neuroscience and cancer biology have proven resistant to facile predictions about imminent practical applications. Improved technologies for observing and probing biological systems has only led to discoveries of further levels of complexity that need to be dealt with. This process has not yet run its course. We are far away from understanding cell biology, genomes, or brains, and turning this understanding into practical knowledge [159].

In summary, complex systems are very different from the simple systems described so well by Newtonian physics and which are routinely studied by reductionism. Complex systems are best described by partial differential equations and many of the values of the variables are unknown. Hence predicting intra-complex system response is difficult and predicting inter-complex system response is essentially impossible at higher levels of organization.

The fact that the complex systems under study have evolved is also significant (see figure 2). While all of the characteristics of a complex system influence inter-system extrapolation, we will illustrate the importance of evolution on just one characteristic—initial conditions. Changes in initial conditions can produce very different outcomes to the same perturbation. Evolution has used numerous mechanisms to match species to niche and all of these mechanisms affect initial conditions. Even among humans, very small differences in genetic makeup can result in dramatically different outcomes to perturbations such as drugs and disease. For example, copy number variants (CNVs) in monozygotic twins can influence outcomes [150]. CNVs have also been shown to influence viral load in HIV patients [160]. Single nucleotide polymorphisms (SNPs) among family members and/or other humans [161-163], pleiotropy [164], alternative splicing [165], the fact that different genes and molecules can accomplish the same purpose, and that the same gene can be used for different purposes [166] all influence response to drugs and disease. Changes in initial conditions such as the presence of different alleles, SNPs, CNVs and so forth negate the similarities between complex systems in terms of predicting response to perturbations that occur at higher levels of organization such as where drug and disease response occurs.

The reality is even more complicated however, as gene regulation and expression account for the major changes in evolution [167, 168]. Theoretically, by varying the regulation and expression of the same genes, a new species could evolve with the same structural genes of its ancestor. Gene expression varies greatly in humans [169-172] and in animals [173-176]. Somel et al. studied gene expression in the brains of humans, chimpanzees, and macaques and discovered accelerated evolution of gene expression in the human prefrontal cortex [177] thus casting doubt on the ability to extrapolate inter-species research for that area. Puente et al discovered at least twenty genes implicated in human cancers that differ significantly from chimpanzees [178]. In addition, chimpanzees are essentially immune to HIV, hepatitis B, and common malaria and they respond differently to other human pathogens [179-182].H According to Caldwell, “It has been obvious for some time that there is generally no evolutionary basis behind the particular-metabolizing ability of a particular species. Indeed, among rodents and primates, zoologically closely related species exhibit markedly different patterns of metabolism” [183]. Festing stated: “There is substantial genetic variation in the response of laboratory rats to xenobiotics, and this variation has important implications for toxicologic research and screening.” Festing goes on to describe a study that reported on “rat” articles published in the journal Toxicology and Applied Pharmacology from 1979 to 1999. In a majority of the articles, the authors did not specify which rat strain was being used [184]. The above has profound consequences for using animal models to predict human response to drugs and disease.

Figure 2

Evolution acts on complex systems.

(Click on the image to enlarge.)

It is important to note here that many of the scientists quoted above do not take the position that animal models will never be predictive modalities. While we do not want to speculate as to their reasons, we must point out that the fact that animals and humans are evolved complex systems that are differently complex and this leads us to our conclusion that animal models will fail as predictive modalities. The fact, and implications, of models as differently complex is not addressed by most animal modelers quoted above and we suspect this may, in part, explain their position.

This brings us to the logical conclusion of our animals as evolved complex systems argument. It is also perhaps our best reason against expecting animal models to ever be capable of predicting human response to drugs and disease: the concept of personalized medicine. Personalized medicine is perhaps best illustrated by Allen Roses, then-worldwide vice-president of genetics at GlaxoSmithKline (GSK), who stated: “The vast majority of drugs - more than 90% - only work in 30 or 50% of the people” [185]. Most drugs have an efficacy rate of 50% or lower. Physicians have long recognized intra-species variation in response to drugs and disease [186, 187]. It is now understood that the variations in response are caused by variations in the genome (see tables 3 and 4) including epigenetic changes. For example, because of differences in genes, like SNPs, some children are not protected by a vaccine [162, 163]. King states: “between 5 and 20 per cent of people vaccinated against hepatitis B, and between 2 and 10 per cent of those vaccinated against measles, will not be protected if they ever encounter these viruses” [163]. In the future, such children may be able to receive a personalized vaccine. Personalized medicine will result in medical practice resembling the outline in figure 3 whereas today medical practice is more often “one size fits all.” The fact there is such variation among humans and that this variation causes so much concern [188-197] should cast doubts on the ability of another species to predict human response to drugs and disease [63, 65].

Also illustrative of the problems of extrapolation between complex systems, and in line with the basis for personalized medicine, is the fact that the sexes respond differently to drugs and diseases [135-140, 198], as do ethnic groups [141-149]. Moreover, monozygotic twins respond differently to drugs and disease [74, 199-204]. If monozygotic twins respond differently to perturbations such as drugs and disease, then expecting even genetically modified animals to be of predictive value seems naïve. Indeed genetically modified animals have failed to be of predictive value [74, 199-204]. (For more on personalized medicine see [63, 205, 206].)

Most diseases are heterogeneous and the use of molecular diagnostics can divide them into biological subgroups each with their targets and drugs [207].

(Click on the image to enlarge.)

Consensus on prediction

Our position, and apparently the position of scientists calling for standardization of animal protocols and SRs, that animal models do not currently qualify as predictive modalities for human response to drugs and disease is supported by experts in various fields of science. For example, Alan Oliff, then-executive director for cancer research at Merck Research Laboratories stated: “The fundamental problem in drug discovery for cancer is that the model systems are not predictive at all” [210]. An editorial in Nature Reviews Drug Discovery states: “Clearly, one part of the problem [of drug research] is poorly predictive animal models . . .” [211]. Ellis and Fidler echo this staing: “Preclinical models, unfortunately, seldom reflect the disease state within humans” [212]. Horrobin addressed the use of animal models stating: “Does the use of animal models of disease take us any closer to understanding human disease? With rare exceptions, the answer to this question is likely to be negative” [98]. Fliri pointed out that: “Currently, no method exists for forecasting broad biological activity profiles of medicinal agents even within narrow boundaries of structurally similar molecules” [213]. Speaking of toxicity trials for new drugs in humans, an unnamed clinician was quoted in Science as stating: “If you were to look in [a big company's] files for testing small-molecule drugs you'd find hundreds of deaths” [214]. Frances Collins, director of NIH, has also spoken out on the poor predictive value of animal models [215, 216].

Neuzil et al state: “Animal testing is not ideal either, as the predictive value of such tests is limited owing to metabolic differences between humans and animals, and many ethical issues are raised by the testing” [217]. Cook et al state:

Over many years now there has been a poor correlation between preclinical therapeutic findings and the eventual efficacy of these [anti-cancer] compounds in clinical trials [218, 219]. . . . The development of antineoplastics is a large investment by the private and public sectors, however, the limited availability of predictive preclinical systems obscures our ability to select the therapeutics that might succeed or fail during clinical investigation. [220]

Seidle [221] reported on the conclusions of a conference of experts in toxicology from pharmaceutical companies, contract research companies and others. The consensus was that: “the information obtained from conventional acute toxicity studies is of little or no value in the pharmaceutical development process” [222]. This statement was “subsequently considered and endorsed by regulators and scientists from the EU, US and Japan at a workshop in November 2006 [222].” A survey at the conference [223] revealed that:

100% of respondents found data from acute toxicity studies of little or no use and only used the information in dose setting for other studies in exceptional circumstances.

100% of respondents agreed that they would not carry out acute toxicity testing if it were not a regulatory requirement.

100% of respondents agreed that acute toxicity studies were not used to identify target organs.

100% of respondents never use acute toxicity data to help set the starting dose in man.

81% of respondents thought the data obtained from acute toxicity studies was of no use to regulators or clinicians. [221]

Sharp and Langer summarized the current situation: “The next challenge for biomedical research will be to solve problems of highly complex and integrated biological systems within the human body. Predictive models of these systems in either normal or disease states are beyond the capability of current knowledge and technology” [224].

We note that the above scientists have not, to the best of our knowledge, agreed with us that animal models are incapable of being predictive modalities. We again attribute this to the fact that the discussion regarding evolved complex systems is relatively new. We also again note that SRs and standardization may contribute to the use of animals in categories 3-9 of table 2. We do not deny that animals can be successfully used for such endeavors in science and research and recognize the value of SRs in improving such uses. However, we have presented a case against expecting animal models to ever be predictive modalities for human response to drugs and disease regardless of improvement in methodology. Even if methodological issues were to prove the problem in some of the studies that reveal PPVs of ~0.5, the lack of studies revealing any animal model to be predictive modality (for example in teratogenicity, carcinogenicity, hepatotoxicity, efficacy for a class of drugs, mechanisms of a class of diseases) is consistent with our theory.

Summary

Animal models have historically been unable to predict human response to drugs and disease and animal-based research has historically displayed methodological problems that make SRs difficult. One proposed solution that would address both problems is standardization of protocols thus permitting SRs of animal models, which would in turn improve the models thus possibly allowing accurate predictions, via high PPV and NPVs, for human response to drugs and disease. We have argued that even if the methodology for animal models could be standardized and subject to SRs, animal models would still fail to be predictive modalities for human response to drugs and disease because of considerations from complexity theory and evolutionary biology. Put succinctly, humans and animals are complex systems with different evolutionary trajectories.

We also reject the notion that a combination of the results of several studies in a SR or meta-analysis may produce information relevant for judging the safety and efficacy of drugs that is not directly visible in the individual animal studies (such as significant side effects or overall efficacy). The problem is that animal models are not predictive modalities, not that animal models fail to reveal side effects. Many side effects from drugs in development are already observed in animal models but there is no predictive value for humans.

As we discussed, SRs are only useful if there is scientific validity to the assumptions or axioms underlying the research. There is no reason to conduct SRs of homeopathy nor does complexity theory and evolutionary biology offer any reason to expect SRs of animal models to be productive. Regardless of how the problem is approached, animal and humans will always be differently complex. Personalized medicine puts this in perspective.

One reason SRs are necessary is that experts are unreliable for evaluating controversies in their own field. We would extend that concept to include the fact that human nature is also problematic when questioning assumptions is required. Tradition, the status quo, “We always do it that way,” resistance to change both individually and in the form of institutional inertia, all combine to challenge those who ask epistemological questions. Financial interests also complicate the situation. Add to all of this the fact that the axioms underlying such practices are not usually discussed among scientists (being I the realm of philosophy of science) and the result is that challenging the axioms upon which these practices are based becomes almost impossible. Nevertheless it is vital to do so in order for science in general, and medical science in particular, to advance.

23.
Amarasingh S, Macleod MR, Whittle IR. What is the translational efficacy of chemotherapeutic drug research in neuro-oncology? A systematic review and meta-analysis of the efficacy of BCNU and CCNU in animal models of glioma. Journal of neuro-oncology. 2009;91:117-25 doi:10.1007/s11060-008-9697-z

24.
Dirnagl U, Macleod MR. Stroke research at a road block: the streets from adversity should be paved with meta-analysis and good laboratory practice. British Journal of Pharmacology. 2009;157:1154-6

108.
Goodman AF, Bellato CM, Khidr L. The Uncertain Future for Central Dogma. Uncertainty serves as a bridge from determinism and reductionism to a new picture of biology. The Scientist. 2005:19

109.
Van Regenmortel M. Reductionism and complexity in molecular biology. Scientists now have the tools to unravel biological complexity and overcome the limitations of reductionism. EMBO Rep. 2004;5:1016-20