In epidemiologic research, it is essential to avoid bias, to control confounding and to undertake accurate replication. Bias, confounding and random variation/chance are the non-causal reasons for an association between an exposure and outcome. These major threats to internal validity of a study should always be considered as alternative explanations in the interpretation. Bias is a mistake of the researcher; confounding is not a mistake and when obvious, it can be controlled; replication is up to the researcher.

Bias is defined as 'any systematic error in an epidemiologic study that results in an incorrect estimate of the association between exposure and risk of disease.' Two major classes of bias are:

1. Selection bias

2. Observation/information (misclassification) bias

Selection bias is an important problem in case-control and retrospective cohort studies while it is less likely to occur in a prospective cohort study. Selection bias cannot be completely excluded in a case-control study because nonparticipation between cases and control subjects may have differed. The major types of bias in the observation bias group include recall bias, interviewer bias, follow-up bias and misclassification bias.

Confounding is sometimes referred to as the third major class of bias. It is a function of the complex interrelationships between various exposures and disease. Confounding can be controlled in the design (randomization, restriction and matching) and in the analysis (stratification, multivariable analysis and matching). The best that can be done about unknown confounders is to use a randomized design.

Bias and confounding are not affected by sample size but chance effect (random variation) diminishes as sample size gets larger. A small P value and a narrow odds ratio/relative risk are reassuring signs against chance effect but the same cannot be said for bias and confounding.

Types of bias

Chance findings are caused by random variation but bias is caused by systematic variation. Potential sources of bias should be eliminated or minimized through rigorous design considerations and meticulous conduct of a study. Each analytic study design has particular types of bias to which it is most vulnerable. In case-control studies, selection bias (knowledge of exposure status influences the identification of diseased and non-diseased study subjects) and recall bias (knowledge of disease status influences the determination of exposure status) are most important. In cohort studies, bias due to loss to follow-up (attrition) would be the greatest danger (and selection bias in retrospective studies). The potential for misclassification is present in all types of epidemiologic studies. Despite all preventive efforts, bias should always be considered among alternative explanations of a finding. It has to be remembered that bias:

- may mask an association or cause a spurious one

- may cause over or underestimation of the effect size

Increasing the sample size will not eliminate any bias. A study that suffers from bias lacks internal validity.

BIAS & CONFOUNDING

Most simplistically, there are three types of bias: (1) selection bias, (2) information / misclassification bias, (3) confounding bias. This basic classification derived from the studies by Miettinen in the 1970s (see for example Miettinen & Cook, 1981). The list presented below may help to consider potential sources of bias in planning, executing, and interpreting a study. Following the classic paper by Sackett (1979), biases are classified according to stages of research:

1. Positive bias: The observed measure of effect (e.g., odds ratio) is larger than the true measure of effect (applies to both protective and risk associations)

2. Negative bias: The observed measure of effect is smaller than the true measure of effect

3. Toward the null: The observed measure of effect is closer to 1.0 than the true measure of effect (results from non-differential errors of observation/classification and makes an existing association weaker)

4. Away from the null: The observed measure of effect is farther from 1.0 than the true measure of effect (results from differential errors of observation/classification and creates a spurious association when it does not exist)

(Note that many biases can be grouped in different classes which may be confusing. They may also have different names in different contexts.)

Ascertainment Bias: Systematic error arising from the kind of individuals or patients that the individual observer is seeing. Also systematic error arising from the diagnostic process.

Prevalence-incidence bias: Selective survival may be important in some conditions. For these diseases (such as cancer, HIV infection) the use of prevalent instead of incident cases usually distorts the measure of effect. The frequency of glutathione S-transferase class mu (GSTM) deletion is for example different in incident (newly and sequentially diagnosed) cases and prevalent (all patients at a point in time) cases (Kelsey, 1997).

Berkson (admission rate) bias: This is a special example of selection bias. Where cases/controls are recruited from among hospital patients, the characteristics of these groups will be influenced by hospital admission rates. This occurs when the combination of exposure and disease under study increases the risk of hospital admission, thus leading to a higher exposure rate among the hospital cases than the hospital controls (Berkson J. Limitations of the application of fourfold table analysis to hospital data. Biometrics Bulletin 1946;2:47-53). Examples include oral contraceptive usage and deep vein thrombosis suspicion leading to higher referral rate to hospitals; usage of non-steroid anti-inflammatory drugs and peptic ulcer (lower co-occurrence in hospital controls with peptic ulcer) or orthopedic disorders (higher co-occurrence in hospital controls from orthopedic ward). For details, see Feinstein, 1986; Flanders, 1989; Schwartzbaum, 2003; Hernan, 2004.

Healthy worker effect (HWE): The overall mortality experience of an employed population is typically better than that of the general population (in Western countries at least). Use of blood donors as controls is a kind of HWE. Blood donors are self-selected on the basis of better life styles.

Design bias: The difference between a true unknown value and that actually observed, occurring as result of faulty design of a study.

Detection bias: The risk factor investigated itself may lead to increased diagnostic investigations and increase the probability that the disease is identified in that subset of persons. An example is women with benign breast diseases who undergo detailed follow-up programs which would detect cancer at early stages (Silber & Horwitz, 1986).

Information bias: A flaw in measuring outcome or exposure that results in differential accuracy of information between compared groups. Many different biases (recall, reporting, measurement, withdrawal etc) are collectively grouped in this class.

Misclassification bias: This is systematic distortion of estimates resulting from inaccuracy in measurement of classification of study variables. The probability of misclassification may be the same in all study groups (nondifferential misclassification) or may vary between groups (differential misclassification). Non-differential misclassification generally dilutes the exposure effect (toward to null effect) (Copeland, 1977). It is worse when the proportions of subjects misclassified differ between the study groups (differential misclassification). Such a differential between cases and controls may mask an association or cause a spurious one. This type of misclassification is rare when exposures are recorded before the outcome is known (as in cohort design). This bias usually results from deeper investigation or surveillance of cases. Typical sources of misclassification/information bias are:

- variation among observers and among instruments

- variation in the underlying characteristic

- misunderstanding of questions by study subjects (interview or questionnaire)

- incomplete or inaccurate record data

Recall bias: Recall bias is caused by differences in accuracy of recalling past events by cases and controls. There is a tendency for diseased people (or their relatives) to recall past exposures more efficiently than healthy people (selective recall). For example, because women with breast cancer are more likely to remember a positive family history than control subjects, retrospective study designs are likely to overestimate the effect size of family history as a risk factor. This bias is avoided by prospective studies, and indeed the risk estimates from prospective cohorts are smaller than those for other types of study. Another example is the mothers of leukemic children who would remember even trivial exposures. This situation in the case group results in differential accuracy.

Reporting bias: Selective suppression or revealing of information such as past history of sexually transmitted disease. There is no point in doing an HIV-positivity prevalence study on people volunteering to be tested (selective suppression would result in no HIV-positives).

Family information bias: Within a family, the flow of information about exposures and diseases is stimulated by a family member who develops the disease. A person who develops a disease is more likely than his or her unaffected siblings to know that a parent has a history of the disease.

Non-respondent bias: Non-respondents to a survey often differ from respondents. Volunteers also differ from non-volunteers, late respondents from early respondents, and study dropouts from those who complete the study. Also called response bias (systematic error due to difference in characteristics between those who choose to participate in a study and those who do not).

Sampling bias: Unless the sampling method ensures that all members of the 'universe' or reference population have the same probability of inclusion in the sample, bias is possible.

Selection bias due to missing data: When there are a large number of variables, the regression procedure excludes an entire observation if it is missing a value for any of the variables (listwise deletion). This may result in exclusion of a considerable percentage of observations and induce selection bias. In genetic association studies, missing data may be distributed differentially between cases and controls and may generate spurious associations (Clayton, 2005).

Regression to mean: This is an example of how random variability can lead to systematic error (Davis, 1976). An example would be a follow-up study of people with highly elevated cholesterol levels. During follow-up, part of reduction in cholesterol levels would be due to regression to the mean rather than drug or life modification effects. If the initial very high level was partly because of a large positive random component (and of course, some were truly very high), next time some of those high values will be found closer to normal range. This is an information bias mainly concerning cohort studies.

End-aversion bias (end-of-scale or central tendency bias): In questionnaire-based surveys, respondents usually avoid ends of scales in their answers. They tend to try to be conservative and wish to be in the middle.

Overmatching bias: When cases and controls are matched by a non-confounding variable that is associated to the exposure but not to the disease, this is called overmatching. Overmatching can underestimate an association. For a numerical example, see slides 41-49 in the Case-Control Studies presentation by Chen. See also Bland & Altman, 1994 and Sorensen & Gillman, 1995. Matching should only be considered for confounding variables but such known confounding can be controlled at the analysis phase in an unmatched design.

Competing death bias: As each person will only die once, if there are mutually exclusive causes of death, they compete with each other in the same subject (Chiang, 1991). For example, in parenteral drug users, liver failure and AIDS are competing causes of death and may influence any research on either subject. Likewise, the apolipoprotein E genotype is associated with cardiovascular disease mortality and Alzheimer's disease; AD and death are competing risks involving apolipoprotein E genotype frequency changes with old age (Corder, 1995). Likewise, in parental drug users, AIDS and liver failure are competing causes of death (Delgado-Rodriguez & Llorca, 2004).

Publication bias: Editors and authors tend to publish articles containing positive findings as opposed to negative result papers. This results in a belief that there is a consistent association while this may not be the case. Plots of relative risks by study may be used to check publication bias in meta-analyses. If publication bias is operating, one would expect that, of published studies, the larger ones report the smaller effects, as small positive trials are more likely to be published than negative ones. This can be examined using the funnel plot in which the effect size is plotted against sample size (Sterne & Egger, 2001). If this is done, the plot resembles an inverted funnel, with the results of the smaller studies being more widely scattered than those of the larger studies, as would be expected if there is no publication bias. One consequence of publication bias is that the first report of a given association may suffer from an inflated effect size (Ioannidis, 2001). Genetic association studies and drug trials are more likely to be published if reporting positive findings and in a meta-analysis, this has to be considered in the interpretation. See Publication Bias in Cochrane Collaboration.

Significance chasing bias: This is deliberately used by people who set out to prove a point rather than examining a hypothesis. Unjustified number of multiple comparisons and subgroup analyses in the absence of a prior hypothesis are telling signs of significance chasing bias. The ultimately “found” significant P value is never replicated by the same people (nor by anyone else!).

Bias involves error in the measurement of a variable; confounding involves error in the interpretation of what may be an accurate measurement. Confounding in epidemiology is mixing of the effect of the exposure under study on the disease (outcome) with that of a third factor that is associated with the exposure and an independent risk factor for the disease (even among individuals nonexposed to the exposure factor under study). The consequence of confounding is that the estimated association is not the same as true effect. In contrast to other forms of bias, in confounding bias the actual data collected may be correct but the subsequent effect attributed to the exposure of interest is actually caused by something else. Classic example of confounding is the initial association between alcohol consumption and lung cancer (confounded by smoking, which is associated with alcohol use and an independent risk factor for lung cancer). Likewise, an association between gambling and cancer is confounded by at least smoking and alcohol. Confounding can cause overestimation or underestimation of the true association and may even change the direction of the observed effect. An example is the confounding by age of an inverse association between level of exercise and heart attacks (younger people having more rigorous exercise) causing overestimation. The same association can also be confounded by sex (men having more rigorous exercise) causing underestimation of the association. For a variable to confound an association, it must be associated both with the exposure and outcome, and its relation to the outcome should be independent of its association with the exposure (i.e., not through its association with the exposure). Confounding factor should not be an intermediate link in the causal chain between the exposure and disease under study. Age and sex are associated with virtually all diseases and are related to the presence or level of many exposures. Even though they act as surrogates for etiologic factors most of the time, age and sex should always be considered as potential confounders of an association.

Confounders can be positive or negative. Positive confounders cause overestimation of an association (which may be an inverse association), and negative confounders cause underestimation of an association. It is not easy to recognize confounders. A practical way to achieve this is to analyze the data with and without controlling for the potential confounders. If the estimate of the association differs remarkably when controlled for the variable, it is a confounder and should be controlled for (by stratification or multivariable analysis). To be able to do this, investigators should make every effort to obtain data on all available risk factors for the disease under study.

A factor can confound an association only if it differs between the study groups. Therefore, in a case-control study, for age and sex to be confounders, their representation should sufficiently differ between cases and controls. This is the basis of methods to control confounding in the design:

- Randomization (ensures that potential confounding factors, known or unknown, are evenly distributed among the study groups),

- Restriction (restricts admission to the study to a certain category of a confounder),

- Matching (equal representation of subjects with certain confounders among study groups) can overcome a great deal of confounding.

The protection against confounders obtained by randomization will only be maintained if all participants remain in the group to which they were allocated and no systematic loss to follow-up occurs. To avoid this possibility, it is best to try to maximize follow up and carry out an intention to treat analysis (Hollis & Campbell, 1999). In practice, restriction ensures comparisons to be performed only between observations that have the same value of the confounder (for example only white men); and matching ensures comparisons between groups that have the same distribution of the confounder (frequency matching or one-to-one matching). In addition to the extra effort, time, money and loss of potential study subjects, the more important disadvantage of restriction and matching is the inability to evaluate the effect of the variable used for restriction or matching. One other problem with matching is that in the analysis, the effective sample size is reduced because the analyses are based on only discordant pairs. Despite the use of these methods, at the analysis stage of a study, it may still be necessary to control for residual confounding. For example if a study of heart attacks restricted the entry to 40-65 year-old subjects, there may still be an age effect (residual confounding) within this range. Methods used to control for confounding at the analysis stage include stratified analysis (unable to control simultaneously for even a moderate number of potential confounders) and multivariable analysis (can control for a number of confounding factors simultaneously as long as there are at least ten subjects for every variable investigated -in a logistic regression situation-). If matching was done, then the most common analysis methods would be McNemar test or conditional logistic regression.

In summary, for a variable to be a confounder it has to meet the following conditions:

1. Relationship with the exposure

2. Relationship with the outcome even in the absence of the exposure

3. Not on the causal pathway

4. Uneven distribution in comparison groups

5. Similar odds ratio (OR) or relative risks (RR) in stratified groups and this (adjusted) OR/RR is at least 15% different from the crude OR/RR (see Confounding Lecture by S Dorjee at ACVCS)

Thus, smoking confounds an association of alcohol drinking with lung cancer but alcohol drinking does not confound association of smoking with lung cancer. Likewise, maternal age is a confounder for birth order association in Down syndrome but the opposite is not true (see examples in Bias and Confounding Lecture).

Effect modification is not bias or confounding and when found it actually provides information for the nature of an association. Effect modification is not something that violates internal validity of the study and has nothing to do with sample size. It is explored by adding an interaction term to the statistical model and if statistically significant, requires stratified analysis for different levels of the interacting factor (like sex or age group). (See examples in Bias and Confounding Lecture and MHC & Leukemia Associations in Humans).