Case-control studies are a less expensive and often used type of epidemiological study which can be carried out by small teams or individual researchers in single facilities, in a way which more structured trials often cannot. They have pointed the way to a number of important discoveries and advances, but their very success has led some to place excessive faith in them to the point where their credibility has been significantly undermined. This is largely the result of misconceptions regarding the nature of such studies, which extend to the medical community itself.

The great triumph of the case-control study was probably the establishment of a link between tobacco smoking and lung cancer, by Sir Richard Doll and others. Using this technique, Doll was able to show a statistically significant association between the two. Skeptics, largely backed by the tobacco industry, argued (correctly) for many years that this type of study cannot prove causation, but the eventual results of double-blindprospective studies confirmed the causal link which the case-control studies suggested, and it is now accepted that tobacco smoking greatly raises the risk of lung cancer.

Contents

While the 'gold standard' in terms of study design is the double blind prospective randomized controlled trial, followed by the cohort study, in order to study infrequent events using either of these techniques a very large population must be tracked to see a large enough number of cases; if the event may take a long time to develop, this large population must also be tracked for many years, leading to many 'drop-outs' from the study. The expense involved is too large to permit this methodology to be used to investigate every suspected risk factor.

The case-control study provides a much cheaper and quicker study of risk factors; if the evidence found is convincing enough, then resources can be allocated to more robust and comprehensive studies.

The case-control study begins with the identification by researchers of an outcome or effect (e.g. lung cancer, heart disease, or even longevity), and a number of potential causative factors. If investigating lung cancer, for example, the factors selected might include smoking history and asbestos exposure.

A group of cases is selected who exhibit the outcome under investigation. Using medical records or interviews researchers record the variables identified as risk factors plus other non-risk variables which can then be used to select matching controls; typically these would be demographic variables such as age, sex, race, income bracket, geographic area of residence etc.

A number of Control subjects (or controls) are then chosen who do not exhibit the outcome or effect under investigation - there may be one or more per case subject. These controls should match the cases as closely as possible with respect to the non-risk variables; this allows the proposed non-risk variables to be ignored in the analysis. Sometimes more than one control group is used.

The case and control groups are then compared on the proposed causal factors and statistical analysis used to estimate the strength of association of each factor with the studied outcome. For example, when studying heart disease, if all the cases were found to be overweight but none of the controls, that might result in an estimate of a high degree of association of overweight with heart disease. Omission of a variable with a real effect will cause its effect to be erroneously assigned to another variable which is related to both; for instance, omission of smoking as a variable in a lung cancer study might cause a spurious association to be seen with low weight, since smokers tend to be of lower weight than nonsmokers.

The most common statistical analysis technique used is multiple regression and its analogues such as logistic regression. Multiple regression analysis assigns each hypothetical 'causal variable' an estimated independent strength of association with the effect being measured (i.e. what the correlation might be if all the other 'causal variables' were identical except for the one being calculated), and an estimated confidence interval (i.e. the region within which the actual value of correlation might be expected to lie). The result may be a positive association, where the variable increases the chances of seeing the effect in question; negative, if the variable decreases the frequency of the effect; or zero, if the variable has no association with the effect, positive or negative. Usually, the variables used to select controls are included in the regression, to check on whether they were correctly balanced between the case and control populations.

Superficially, a study of such a large population might be supposed to be inherently superior to a case-control study of a few hundred individuals or fewer; this is overly simplistic. For example, cross-sectional studies confirm that people who consume large amounts of alcohol also show high rates of many other diseases; but alcohol consumption is also associated with improper nutrition and hygiene, high rates of smoking and abuse of illegal drugs, and many other behaviors risky to health. Cross-sectional studies cannot differentiate between these possible causes, but case-control studies can determine that gastrointestinal bleeding, say, is directly associated with high alcohol consumption, while memory deterioration is more associated with improper nutrition among alcoholics.

The advantage of case-control studies over cross-sectional studies, then, is the ability to determine the association between potential cause and effect on an individual basis. In the cross-sectional study individual variables are aggregated over the population as a whole, then an association is sought between the aggregated variables; in the case-control study, the association is determined for each individual case-control pair, then aggregated. This provides a more specific analysis of the possible associations, and potentially determines more accurately which possible causes are directly related to the effect being studied, and which are merely related by a common cause.

One benefit of cross-sectional studies is that they are considered to be "hypothesis generating", such that clues to exposure/disease relationships can often be seen in these studies, and then other studies, such as case-control or cohort studies, can be implemented to study this relationship.

One major disadvantage of case-control studies is that they do not give any indication of the absolute risk of the factor in question. For instance, a case-control study may tell you that a certain behavior might increase the risk of death tenfold, which sounds alarming; but it would not tell you that the actual risk of death would change from one in ten million to one in one million, which is quite a bit less alarming. For that information, data from outside the case-control study must be consulted.

Another problem is that of confounding. The nature of case-control studies is such that it is difficult, often impossible, to separate the chooser from the choice. For example, studies of road accident victims found that those wearing seat belts were 80% less likely to suffer serious injury or death in a collision, but data comparing rates for those collisions involving two front-seat occupants of a vehicle, one belted and one unbelted, show a measured efficacy only around half that. Many case-control studies have shown a link between bicycle helmet use and reductions in head injury, but long-term trends - including from countries which have substantially increased helmet use through compulsion - show no such benefit. Analysis of the studies shows substantial differences between the 'case' and 'control' populations, with much of the measured benefit being due to fundamental differences between those who choose to wear helmets voluntarily and those who do not.

More controversially, a significant number of case-control studies identified a link between combined hormone replacement therapy (HRT) and reductions in incidence of coronary heart disease (CHD) in women. Credible mechanisms were advanced as to why this might be, and a consensus arose that HRT was protective against CHD (e.g. Estrogen replacement therapy and coronary heart disease; a quantative assessment of the epidemiological evidence Stampfer M, Colditz G. Int Jour Epid 2004;33:445-53). The evidence was sufficiently compelling that a full clinical trial was initiated - and this indicated that the effect was both far smaller and in the opposite direction - combined HRT showed a small but significant increase in risk of CHD in the study population. Subsequent analysis has shown that the group of women opting for HRT were predominantly from higher socio-economic groups and therefore had, on average, better diet and exercise habits. The studies had falsely attributed the benefits of these confounding factors to the intervention itself (see The hormone replacement - coronary heart disease conundrum: is this the death of observational epidemiology? Lawlor DA, Smith GD & Ebrahim S, International Journal of Epidemiology, 2004;33:464-467). There have been similar controversies regarding links between vitamins and cancer; MMR and autism; antibiotics and asthma; cannabis and psychosis. All these have been identified through small-scale case-control studies but fail to show any effect in whole population time series or other investigations.

A comparison with the tobacco/cancer link is instructive. Here the case-control studies pointed the way, but further confirmation was available in the form of time series showing rates of lung cancer tracking levels of smoking in whole populations, and in the form of laboratory experiments on animals.

Recent research has shown that a substantial majority of highly cited case-control studies are subsequently contradicted or found to be substantially over-ambitious when more rigorous investigations are conducted.

As a result the following guidelines have been proposed when assessing case-control evidence (Hormone replacement therapy and coronary heart disease Pettiti D, International Journal of Epidemiology, 2004;33:461-463):

Do not turn a blind eye to contradiction. Do not ignore contradictory evidence but try to understand the reasons behind the contradictions.

Do not be seduced by mechanism. Even where a plausible mechanism exists, do not assume that we know everything about that mechanism and how it might interact with other factors.

Suspend belief. Of the researchers defending observational studies, Pettiti says this: "belief caused them to be unstrenuous in considering confounding as an explanation for the studies". Do not be seduced by your desire to prove your case.

Maintain scepticism. Question whether the factor under investigation can really be that important; consider what other differences might characterise the case and control groups. Do not extrapolate results beyond the limits of reasonable certainty (e.g. with grandiose forecasts of "lives saved").