Introduction

Population level studies commonly induce the ecological fallacy, that is, the belief that a relationship identified at the population level implies a cause-and-effect relationship at the level of the individual. A recent research report (DAMASCENE [discrepancies in autologous bone marrow stem cell trials and enhancement of ejection fraction]) that related clinical trial effect sizes in cardiovascular cell therapy trials to reporting discrepancies in the trials themselves has compounded difficulties in meta-analysis with ecological fallacy issues. This dangerous combination has produced misleading conclusions and speculation that the research community should set aside.

Is the Increasing Use of Smartphones Causing Stroke in Young Americans?

The data tell us that the incident stroke rate has increased in the 20- to 54-age group from 26 to 48 cases per 100 000 for a 10-year period.1 During this time, the use of cell phone, particularly smartphone, has increased from near 0 to >50%.2 Although there have been concerns about the relationship between electromagnetic emissions of portable phones and brain cancer for years, this is the first time that a temporal correlation has been identified between the newer smart phones and the chronic disease. The implications have immense health and financial consequences.

The implications are also false. This is an example of a spurious relationship residing within a population-based study, which is known as an ecological fallacy.

The ecological fallacy is the belief that one can take population-level correlations and transfer them to cause-and-effect relationships within the individual.3 In this example, what may be driving stroke incidence in these younger populations is the rising rate of obesity and diabetes mellitus. On the other hand, smartphone purchases can be motivated by declining prices, the attractiveness of a sleek device, and the desire for social connectivity. The population-level smartphone–stroke relationship does not represent what is occurring at the level of the individual where motivations for smartphone purchases and the drivers of cerebrovascular disease are different. To think that the population-level relationship can be successfully transferred to the individual is the nature of the ecological fallacy (other examples of temporal ecological fallacies may be found at http://www.tylervigen.com).

In this artificial case, we can see through the diaphanous population-level relationship to get to the heart of the matter because we have a solid a priori sense of the reasons that motivate smartphone purchases and the causes of strokes. This understanding permits us to hold the conclusory statement that smartphone use causes stroke at bay as we think through the greater complexity of the problem. We buffer and protect ourselves from the incorrect conclusion while analyzing the complexities of the important underlying relationships.

However, in other situations, the findings may be too sensational to resist drawing conclusions or we may not possess the intimate knowledge required to think through the background logic of the relationship with all of its subtleties. This shortcoming is cured by article authors who warn us of alternative explanations for their ecological findings and by the peer reviewers and editors who insist that the readers be cautioned about the weak logic underlying the author’s putative results. The absence of these checks and balances permits misperceptions to reverberate throughout the research community. Moreover, the lack of research discipline generates the sense that physician-scientists are out of control and that our conclusions are not the product of a systematic approach, but rather the result of a wild and random search for significance (Figure; published by King Features Publishing in 1997).

The Return of the Ecological Study

Once belittled as noncontributory and fallacious, ecological studies have rebounded in recent years.4 Part of the reason is the recognition that there are in fact societal trends at the population level that can inform causal relationships, for example, the attraction of armed urban gangs to a teenager or the impact of high unemployment rates on the job hunter’s psyche. However, another reason for the surge in ecological research is the ease with which the data are collected and analyzed. Studying clinical data at the individual level takes substantial time and effort as one writes a protocol, navigates complex Institutional Review Board rules, ensures HIPAA (Health Insurance Portability and Accountability Act) compliance, collects data, and performs sophisticated analyses, to name but a few of the tasks. This is in contrast to the smartphone–stroke relationship that required only 5 minutes of internet research. The ease with which these population-level analyses are generated can make it seem like we are wallowing in a swamp of loudly proclaimed but methodologically unsound investigations.

An interesting new adaptation of population-level research is to combine it with meta-analyses. The classic meta-analysis collects an effect size (eg, mean effect, odds ratio, or relative risk), reflecting the strength of association of a risk factor and a disease within each study, and then combines each of these study-specific measures into one composite assessment. The result is a summary of the within-study findings. The new adaptation attempts to build a relationship across studies. In this situation, measures of the risk factor and the event are obtained from each study (whether they are related to each other within that study or not), and then a relationship is built across studies. Thus, correlations are generated from studies within which the relationship may not exist—the heart of the ecological fallacy. This combined meta-ecological analysis is problematic because the weaknesses of meta-analyses (eg, publication bias, differing end point definitions/follow-up times across studies, and analysis complexities) and those of ecological designs (ie, inferring that population-level effects affect individuals) do not cancel but rather amplify each other. These combined meta-ecological designs have among the weakest of causal foundations.

An example of this more complex population-level evaluation is the finding by Mostofsky et al5 in 2012 that related caffeine ingestion and heart failure (HF). The standard approach would have been to compute the relative risk reflecting the strength of association between caffeine ingestion and the occurrence of HF in each study and then to combine them into 1 summary measure. However, these investigators went further. Taking the caffeine level from each study and the HF incidence from each study, they build a mathematical (regression model) relationship between HF and caffeine ingestion across studies, concluding that 4 daily servings of coffee offered protection against HF.

The public health and economic implications of these meta-ecological research results are profound. Yet, in a response by Palatini,6 it was pointed out that it was the CYP1A2 polymorphism, an important regulator of caffeine metabolism, that was the likely true driver of this relationship. The author’s letter concluded that studying the average risk of coffee drinking without taking into account population genetics was misleading.

Ecological relationships all have the same motif. Powered by a provocative description, its hint of a new causal tie with powerful implications titillates. Also, by perhaps igniting the tinder of our internal biases and subjectivities, we permit ourselves to be pulled along the easy path to their conclusion’s acceptance. This is the gravity well of the ecological fallacy.

However, we can resist its pull by keeping in mind that (1) embedding a population-level relationship within a meta-analysis is a red flag because it injects the strong assumption (commonly fallacious) that there is cause–effect relationship at the level of the individual and (2) the validity of ecological research is determined not by the results’ sensationalism but the conclusion of a thoughtful and deliberate review of the findings in the field.

Ecological Investigation in Research Discrepancies: The DAMASCENE Study

A prime example of the ecological fallacy is the DAMASCENE study that appeared in the April 29, 2014, edition of the British Medical Journal.7 Beginning with a standard meta-analysis approach, the investigators identified 49 randomized trials that reported treatment effects measured by the change in left ventricular ejection fraction. However, they also added an ecological evaluation, reporting on a population-level relationship between left ventricular ejection fraction effect size and the number of reporting discrepancies or mistakes per trial. They observed that trials with larger than average effect sizes also had larger than average discrepancy reporting rates. The extension of the meta-analysis to include this ecologically derived left ventricular ejection fraction–discrepancy rate relationship makes DAMASCENE a meta-ecological analysis.

Publications of related analyses, focusing on research article retractions, have already appeared, and a review of that work provides a useful perspective on DAMASCENE. For example, Nath et al8 reported that the number of article retractions is related to the number of authors on the masthead, an equally unusual finding. Although one could conclude that journals should support leaner mastheads with fewer authors, more careful consideration suggests that studies with numerous authors represent more complex research efforts with many interwoven parts; these might be expected to be more prone to error and retraction.

Nath et al8 also pointed out that the greatest number of retractions occur in the journals Science, Proceedings of the National Academy of Sciences, and Nature. Are we to conclude that these premier journals are particularly susceptible to shoddy research representations? Nath et al8 fortunately drew the more helpful conclusion that detection bias was in operation. In this case, highly publicized articles receive greater scrutiny by more readers and thereby have more errors discovered. This phenomenon, the bandwagon effect, has been observed in new drug development. There, the early and sensational announcement of new and dangerous adverse effects excites healthcare providers, pharmacists, and patients to report more problems with the drug, leading to the appearance that the pharmacological agent in question generates more adverse effects than equivalent drugs.9 Is this not a possible explanation, already available in the literature, for the DAMASCENE effect seen in the controversial field of cardiovascular cell therapy? Its authors are mute.

Another well-known factor that could help our understanding of DAMASCENE is the Proteus effect or the observation that early results in a new field commonly overstate the true effect size that is itself moderated in subsequent research efforts.10 The Proteus effect affected early work in hypertension11 and hyperlipidemia treatment,12,13 ultimately having a negligible impact in these 2 fields that moved on to maturity. If in DAMASCENE, the large effects with their discrepancies were seen in early work and were followed by research efforts with smaller effects, then the authors’ findings may simply be a manifestation of the Proteus phenomenon. DAMASCENE authors say nothing about this well-accepted possibility.

At the other end of the spectrum, there are phenomena, already developed in the clinical trial literature that may be more likely explanations of small effect sizes than a miniscule number of reporting discrepancies. Pseudo-placebo dose and a greater than anticipated number of patients lost to follow-up are each well-established causes that reduce effect sizes in clinical trials. These explanations must certainly be considered in any attempt to explain small effect sizes, yet DAMASCENE remains mute on this.

Perhaps, the principal explanation of the findings resides within the methodological difficulty embedded in DAMASCENE’s approach to develop the relationship between discrepancy rates and left ventricular ejection fraction effect size. Specifically, the discrepancy rate and effect size was extracted from each study (without any evaluation of whether discrepancies were related to effect size within the study) and then assembled across the studies. To think that the relationship across studies reflects the relationship within these studies without examining the within-study relationship is the heart of DAMASCENE’s ecological fallacy.

Fortunately, the DAMASCENE appendices do offer an opportunity to examine some of the intricacies of these reporting discrepancy determinations. The authors’ decision to include in their error count small mistakes (eg, that a cohort size once stated as 16 was in reality 15, or that 8 women and 2 men became 9 women and 1 man) is worthy of comment. Certainly, all researchers and research writers should work to minimize the occurrences of these flubs.

However, what are the implications of their occurrences? To think that reporting the rates of small errors is worthwhile is to assume that these errors undermine the research’s results. However, although perfectly positioned to examine this critical point, DAMASCENE does not provide the necessary data dissection to support this thesis. Would the DAMASCENE authors have the community think that the presence of miniscule discrepancies in a research effort provides prima facie invalidation of its conclusions?

In complicated research enterprises, it is not uncommon to have different investigators assigned to different tasks; those who describe the research may not themselves conduct it. Poor editing of manuscript implies only that the editing requires improvement, not that the research result is false. Epidemiology teaches that flawed methodology (eg, that of DAMASCENE), not small errors or typos, invalidates study conclusions. An approach that gives substance to small mistakes of this magnitude is not a lever to overthrow the large body of evidence developed from properly conducted meta-analyses.14–16

The DAMASCENE weaknesses aside, we cannot dismiss all ecological or population-level modeling. Incorporating cultural connections and social influences can be illuminating. For example, it may be difficult to understand why adults, surrounded by all of the monitories about the dangers of smoking, continue to engage in this behavior without understanding the culture of stress and the influence of peer groups. Individual determinants of disease are not the sole determinants of disease17 and to think so is to embrace the individualistic fallacy.4 Because societal factors also play a role in the manifestation of individual risk factors, both population-level and individual-level effects must be considered to fully understand the results of ecological studies.

Solutions to the Ecological Dilemma

We must carefully choose our path through the imbroglio of ecological research. Epidemiologists and their biostatistical colleagues are working to build research efforts that do not just incorporate population-level inference or individual level-effects but include both, what is aptly titled a multilevel model.18 However, these models are complex and, for the time being, remain in the realm of the quantitative experts. It is helpful, however, to recognize that combining ecological relationship construction with meta-analyses leads to results that are perplexing at best and delusory at worst.

Additionally, the difference must be clear in our own minds between (1) a factor’s association with an effect (ie, they merely occur together) and (2) and a causal relationship, in which the factor actually excites the production of the effect. Having this association–causation polarity in mind enables us to mark the location of the ecological finding on this grid, and its placement challenges our own thought processes about the relationship’s plausibility. It is an open question whether society, continually buffeted by heavy media tidal flows, can stop to think these relationships through. However, as scientists and professionals, we can think and are relied upon to do so.

Therefore, we must remember the phrase caveat emptor (buyer beware). It is essential that we be both cognizant and disciplined, distancing ourselves from ecological research incitements that seem to call for action before thoughtful consideration of explanations and reproducibility. We must not let the new, loudly proclaimed relationship impact our understanding and core point of view about an issue until we have taken the time to think about the underlying ties between the variables that drive the relationship. And like a car purchase, the sweeter the deal, the more wary the purchaser should be. Buffering ourselves from these provocative conclusions, for example, DAMASCENE enables us to respect the complexity of inflammatory issues, be informed about the multidimensional web of causality and avoid overreacting to a mere 1-dimensional (and misleading) representation of the finding. Building an ecologically derived relationship on top of a meta-analytic approach to span the complicated problem of research error is going a bridge too far. We need more clinical trial data, not beguiling and misleading attempts to understand and undermine the data already collected.

Disclosures

Dr Moyé reports receiving research grant funding and support for travel expenses to meetings for the Cardiovascular Cell Therapy Research Network.

Footnotes

The opinions expressed in this article are not necessarily those of the editors or of the American Heart Association.