Note: Javascript is disabled or is not supported by your browser. For this reason, some items on this page will be unavailable. For more information about this message, please visit this page: About CDC.gov.

The recent completion of the first draft of the human genome sequence (1,2) and advances in technologies for genomic analysis are generating tremendous opportunities for epidemiologic studies to evaluate the role of genetic variants in the etiology of human disease (3). The basis of this evaluation will be identification of the allelic variants of human genes, description of the frequency of these variants in different populations, identification of diseases influenced by these variants and assessment of the magnitude of the associated risk. The process of identifying DNA variation that may be associated with disease is now underway through the cataloguing and mapping of single nucleotide polymorphisms (SNPs) throughout the genome. The analysis of genotype data on SNPs may aid in the identification of DNA alterations that result in or contribute to disease states.

While the spectrum of the relationship between genes and disease is very broad, ranging from single gene disorders to multifactorial conditions, many common methodological issues apply throughout this spectrum (4). These issues relate to the planning and analysis of original studies, the critical appraisal of individual studies and to the integration of evidence from diverse studies.

Many papers deal with critical appraisal. A number of national organizations have specified criteria for assessing evidence on which policies and guidelines are based; for example, the Preventive Services Task Force Guide to Community Preventive Services (5), the Scottish Intercollegiate Guidelines Network (6), and the Australian National Health and Medical Research Council (7). Issues that are particularly important in the appraisal of studies of genotype prevalence and gene-disease associations include the analytical validity of genotyping, selection of subjects, confounding (especially as a result of population stratification), gene-environment and gene-gene interactions, statistical power and multiple statistical comparisons. Issues regarding the integration of evidence include identification of studies, the adequacy of reporting methods and results from individual studies, publication bias, quality scoring schemes and the appropriateness of quantitative synthesis of the evidence.

This paper presents recommendations regarding considerations that should be addressed when conducting, analyzing, and reporting studies of genotype prevalence and gene-disease associations, both for individual investigations and for systematic reviews. The recommendations resulted from a meeting of an expert panel workshop convened by CDC-NIH in January 2001. The methods used to develop the recommendations are described in the accompanying editorial (Khoury, 2001)

The checklists presented in Table 1 are intended to guide investigators in preparation of proposals and manuscripts, and to guide those who need to appraise proposals, manuscripts and published papers. Thus, the checklists comprise considerations that need to be addressed. They should not be regarded as exhaustive lists of points that have to be presented in all journal articles. We recognize that it may not always be feasible to address all of the considerations, for example in studies of rare conditions in clinical settings. However, it would be useful if a record were kept of the coverage of these points for each study in order to enable information to be sought when evidence from different studies is being synthesized. We suggest establishing a web-based methods register to record such information.

The checklist relating to studies of genotype prevalence is presented as Table 1 , and that relating to studies of gene-disease association as Table 1 . As many methodological issues relate to both types of studies, we discuss these in parallel. A number of issues about which there is ongoing methodological debate are discussed in the text but are not included in the checklists because resolution has not been achieved.

Reporting and Appraisal of Single Studies

Analytical validity of genotyping

Issues in the appraisal and reporting of the analytical validity of genotyping include the definition of the genotype, the genotyping method used, the type of samples and timing of obtaining these, and the quality control measures.

Genotyping by PCR methods

PCR methods are widely used in genotyping. They are routinely used to analyze DNA extracted from leukocytes separated from whole blood, but a variety of DNA sources can be used (8-14). Theoretically, genotyping results using genomic DNA from different tissues should be identical because the DNA should contain identical sequence in all tissues. However, participation rates may be higher for studies based on mouthwash or hair samples than those based on blood samples (11), and DNA is often more difficult to obtain and purify from certain tissues than from whole blood.

Sometimes the source of DNA is peripheral blood for controls, while it is a different tissue (e.g. tumor specimen) for cases. Although this should not in theory affect the genotypic assay when germline mutations are being assessed, the technique may be easier to perform in DNA extracted from blood and therefore the results of the genotype may be more accurate in controls than in cases. In studies on cancer in particular, the ability to extract DNA from tumor tissue depends on several factors, including the amount of necrosis and the quantity of tissue, and therefore, can result in the inclusion of selected cases. Because the timing of sample collection and the storage period before DNA extraction and/or analysis may differ between cases and controls, we recommend that the number of specimens collected, the success rate and timing of DNA extraction, and success rate and timing of genotyping, should be reported by study group.

Numerous PCR-based methods are available for SNP genotyping, including restriction fragment length polymorphism (RFLP) analysis (15), oligonucleotide ligation assays (16), the TaqMan assay (17), single-base-pair extension assays, and others. We recommend that the description of the genotyping assay should include primer sequences, thermocycle profile, number of cycles and reference.

Accuracy of PCR methodology is generally quite high, although different types of laboratory errors can occur. Poor or non-specific amplification of PCR products or lack of complete enzymatic function of restriction enzymes, leading to incomplete digestion in RFLP assays can occur with measurable frequency. These errors can be controlled for by optimization of assays prior to genotyping and the use of internal controls and repetitive experiments. A data set containing less than 95% reproducibility between replicates indicates a potential problem and should not be considered as accurate. Rothman et al. (18) noted that misclassification of genotype can bias measures of association between genotype and disease especially when the prevalence of the genotype is either very high or very low.

Some genotypic tests require visual inspection and interpretation of electrophoresis gels, and therefore, observer variability may be important (19). Studies should implement blind scoring and data entry followed by electronic comparison of the blind entries. Discrepancies should be flagged automatically and adjudicated by a third (experienced) person. A recent appraisal of 40 studies in which molecular genetic techniques were used demonstrates the need for universal standards for quality control. Molecular genetic analyses were repeated in a pertinent sample of specimens or the test was confirmed with another procedure in only 15 (37.5%) studies and assays were conducted blind to pertinent characteristics of subjects or hypotheses in only 13 (32.5%) studies (19).

We recommend that authors specify what measures were used to assure quality control of the genotyping assay used, and provide information on the degree of reproducibility between quality control replicates. Quality control measures should include (1) internal validation for analytic validity; (2) blinding of laboratory personnel to pertinent characteristics of the subject or hypotheses; (3) procedures for establishing duplicates, and quality control numbers from blind duplicates; (4) test failure rate, by study group; (5) inspection of whether genotype frequencies are in Hardy-Weinberg equilibrium and are in line with other reports for the same population (this criterion should not be binding); and (6) blind or automated data entry and third party adjudication. With the large number of new high-throughput methods being developed, quality control procedures will become even more important.

Definition and grouping of genotypes

Enormous sequence variability within some genes presents challenges to genotyping. For example, more than 240 different constitutional neurofibromatosis type 1 mutations have been documented, the majority of which lead to a truncated protein product (20). A protein truncation test is commercially available, but its sensitivity, specificity and predictive value have not been established. A “significant” proportion of cases identified by the protein truncation test are not confirmed by sequencing, suggesting a problem of false positives for the protein-based assay. False negative results may also be a problem as the protein truncation test had a sensitivity of around 70% in small series of clinically diagnosed neurofibromatosis type 1 (20).

An important problem with many association studies is that only a single polymorphism is tested for association with a disease phenotype. A key issue will be to distinguish true functional variants from markers that are associated with a disease because they are in linkage disequilibrium with the true functional variant, but have no functional significance of their own. It has been suggested that it would be useful to type several polymorphisms throughout a candidate gene and then construct haplotypes, which could then be tested for association with the phenotype of interest. The increasing availability of mapped SNP markers (21- 25) offers the opportunity for such an approach. Methodological issues relating to this approach are still under development. In particular, in studies based on unrelated individuals, haplotypes can only be estimated probabilistically based on allele frequencies, and inference may be affected by the quality and availability of the data on haplotype frequencies in the relevant population. With an increasing number of SNP loci, the number of possible haplotypes can become very large, in turn raising the issues of multiple comparisons and sparse data for many haplotypes (26,27). A potential limitation of the approach of constructing haplotypes is that the effect of a true functional variant might be diluted when haplotypes rather than loci are the units of analysis.

The quality of functional studies of gene variants has implications for grouping genotypes on the basis of putative functional effects. This is potentially important in testing hypotheses about gene characterization. For multi-allelic systems, it may be relevant to group genotypes according to functional effects, as has been done for the NAT2 polymorphisms (28).

Genotyping on the basis of phenotype

In some studies, genotype has been inferred on the basis of a phenotypic test. A potential advantage of this approach is that the phenotypic assay reflects enzyme activity level, and therefore may provide a direct measure of the functional significance of the underlying genetic polymorphism. The genotype is one determinant of long-term enzyme activity level. Nevertheless, an enzyme activity assay provides a measure only at a single point in time, and potentially may be distorted by systematic influences (e.g. effects of disease stress on metabolism, inducing factors) as well as random measurement error. Contrasting results between studies based on phenotypic and genotype assays have been observed, for example, for the acetylator polymorphism and colorectal cancer (28) and the glutathione S-transferase m polymorphism and lung cancer (29). However, these differences may have been due to reasons other than the genotyping method, e.g. selection or participation bias of cases and/or controls. In addition, a given gene may metabolize multiple environmental substrates, and thus phenotypic assays, using one chemical to induce the gene, may not truly reflect the metabolizing activity of that gene. It is also possible that other DNA variants may alter enzyme function or activity.

Selection of subjects

Many epidemiological texts emphasize the importance of minimizing the potential for selection bias (30-33). Evaluation of potential selection bias requires consideration of study design and fieldwork. In regard to genotype prevalence, many early studies were based on convenience samples and not infrequently, little information was given on how the samples were selected (28, 34, 35, 36).

To date, most studies of gene-disease associations have used the case-control design. In a number of studies the selection of cases has not been well described (35). In a review of type1 diabetes and HLA-DQ polymorphisms, it was noted that many studies were based on convenience samples of cases in which there may have been over-representation of multiplex families as well as the inclusion of type 2 diabetics who use insulin in their treatment regime (37). In several studies of cancer, prevalent cases have been included to varying extents (29). In these studies, bias would occur if the genotype affected survival or if genotypes were assayed by a phenotypic test that was influenced by disease progression and/or treatment.

A recurrent problem in case-control studies of gene-disease associations with unrelated controls has been that the controls were not selected from the same source population as the cases (28, 35, 36, 37). The potential problem of selecting controls who do not represent the population from which cases arise is illustrated by the divergence in odds ratios for the association between colorectal cancer and the GSTT1 null genotype (38), when the different control groups were analyzed (36). In general, controls should be selected from those who will become cases if they develop the disease.

While many genetic epidemiology association studies use the case-control design, cohort studies have a number of advantages, including the capacity to examine incubation periods and multiple disease outcomes. Cohort studies are often more expensive and lengthier than case-control studies, but the use of archived samples that are suitable for genotypic analysis potentially can minimize these disadvantages (39-41).

In case-cohort studies, controls are a random sample of the cohort, and the effects of the key time variable age is controlled for in the analysis only. In more traditional nested case-control designs, controls are selected to match the cases on a temporal factor, such as age, and the main comparisons are within the time-matched sets (42). A major advantage of the case-cohort design for studies where expensive biological markers are collected is that the same control group can be used for several different disease outcomes.

Population stratification

There has been concern about the possible effects of population stratification on the results of population-based case-control studies (43-46). Population stratification includes differences between groups in ethnic origin, and can also arise because of differences between groups of similar ethnic origin but between which there has been limited admixture, such as in isolated populations. For example, a population might be comprised of the descendants of waves of immigrants from the same source, but differ generally because of founder effects. The differences may then be apparent because insufficient time has elapsed for mixture between the groups. In an exploration of the possible degree of bias from population stratification in US studies of cancer among non- Hispanic Americans of European descent, it was concluded that this bias is unlikely to be substantial when epidemiological principles of study design, conduct and analysis are rigorously applied (47). Variations in the frequency of certain genotypes in African-Americans appear to be much wider than those observed in subjects of European origin and therefore the possibility of stratification may be higher (48).

Concern about the possible effects of population stratification has stimulated development of family-based case-control designs, which essentially eliminate potential confounding from this source (49,50). The most commonly used examples of such designs involve the use of siblings or parents as controls. Sibling controls are derived from the same gene pool as cases. However, selection bias could arise from the fact that a sibling may not be available for every case – bias would arise if determinants of availability e.g. sibship size, were associated with genotype. In addition, because of over matching on genotype, there is a loss of statistical power compared with the use of unrelated controls (51). Case-parental control studies have been advocated for the identification of modest gene-disease associations (52). However, the need to obtain samples from parents is a practical problem limiting the applicability of the design for diseases of late onset. Three main analytical approaches have been proposed. The case-pseudosib method is based on the comparison of the genotype of the case to the other three genotypes the parents could have transmitted to the case, given their own genotypes (53-55). The second approach involves the use of log-linear modeling. This extends the case-pseudosib method by: (1) enabling triads in which data from either parent is missing to be included (56,57); (2) allowing for maternally-mediated (prenatal) effects of the maternal genotype (58); and (3) allowing for (and testing for) parental imprinting effects (59). The third is the transmission disequilibrium test (TDT), in which the unit of analysis is the heterozygous parent (60,61). A standard binomial test is carried out, based on comparing the number of heterozygous parents who transmitted the given allele to the affected offspring with the number of heterozygous parents who did not.

Another approach proposed to minimize the potential problem of population stratification when using unrelated controls is to measure and adjust for genetic markers of ethnicity that are not linked to the disease under investigation (62-65). This would be expected to control for ethnic variation in disease risk attributable to genetic factors. However, residual confounding from other sources of ethnic variation in disease risk would be a potential issue. It is unlikely that a single measure will capture the important sources of ethnic variation (66).

Statistical issues

Where available or inexpensive to obtain, we recommend that data on genotype frequencies be presented rather than on allele frequencies alone, both for studies of genotype prevalence and for studies of gene-disease associations. We make this recommendation because it is genotype that determines risk and because allele frequencies can be calculated from genotype data (e.g. to determine Hardy-Weinberg equilibrium), whereas if only allele frequencies are presented, genotype frequencies cannot be calculated. Clearly, this recommendation would not be appropriate for studies based on DNA pooling, which may be a valuable approach in estimating allele frequency distributions in many loci in multiple populations (67), as well as initial investigation of disease loci or to follow-up and confirm regions identified in linkage studies (68).

Calculation of risk difference (i.e. the risk of the disease in those with the genotype under investigation minus the risk of the disease in those without the genotype), in the context of gene-disease associations may be useful in that it measures the potential impact of the association in public health terms (33). However, the magnitude of the risk difference is less likely to be generalizable to other populations than the relative risk (30), because it depends on the baseline risk in those without the genotype, which is likely to vary between populations.

A small study size is a limitation of many studies testing a priori hypotheses about gene-disease associations (e.g. 36,69). A possible solution is pooled analysis (see below). One research strategy proposed for the future is large-scale testing by genome-wide association mapping (26, 52, 70, 71). It is important to note that this strategy is hypothesis-generating rather than hypothesis-based and thus may require additional safeguards against type 1 error. For example, Risch and Merikangas (52) suggested specifying a higher significance level. However, increasing the significance level will increase the number of subjects required to have adequate statistical power, although this may not make studies unfeasible (52). An alternative approach is to emphasize replication of findings and to obtain data on biological plausibility, for example, from in vitro studies. We recommend that all tests performed should be reported, not just the “significant” ones. This would require reviewers and editors to give importance (and journal space) to negative results.

Associations between several genes and a disease can be tested according to a priori hypotheses based, for example, on the biological mechanism of these genes in determining the disease. It is recognized that it is becoming usual practice in human genome epidemiology studies to initiate a study to test hypotheses that are current at that time and also to establish a resource to test additional hypotheses proposed later, on the basis of knowledge external to the resource. These are all a priori hypotheses. We reiterate the need to distinguish hypothesis-testing and hypothesis-generation.

Integration of Evidence

There are established principles for the integration of observational evidence in relation to causal inference (72,73). Since these principles were documented, there has been considerable work on the identification of the base of admissible evidence to which the principles should be applied.

Identification of studies

A comprehensive search is one of the key differences between a systematic review and a traditional review (74). We recommend that details of the strategy used to identify relevant papers should be specified as described by Stroup et al. (75). There have been several instances of sequential or multiple publications of analyses of the same or overlapping datasets. An aid to identifying this problem is to organize evidence tables first by geographical area and then by study period within a specified area. If it is clear that the reports relate to the same or overlapping datasets, then we recommend including data only from the largest or most recent publication. It is possible that, under these circumstances, details of the methodology are described in greater detail in an earlier publication. If so, we recommend including the reference to the earlier publication with the reference to the publication from which the data were abstracted in the evidence tables.

Publication bias

Potentially, publication bias is a serious problem for the integration of evidence.

One method of minimizing the potential impact of publication bias is to identify and include “gray literature,” which includes abstracts, technical reports, and non-English journals (76) that may not be identified by electronic searches. We recommend caution in using various types of “gray literature” because the material may not be peer-reviewed, may be subject to modification and revision, and information on study methods may be insufficient to assess study quality. We suggest that consideration should be given to including “gray literature” if study quality can be assessed adequately. The international links of HuGENet™ (http://www.cdc.gov/genomics/hugenet) are being developed to address publication bias by providing links to non-English journals.

A labor intensive method to minimize publication bias is to establish a research register for studies of gene-disease association similar to the Cochrane collaboration, which maintains a register of controlled trials (77; http://www2.cochrane.org/resources/brochure.htm) and the International Agency for Research on Cancer’s web-based Directory of On-going Studies in Cancer Prevention (78). Such an initiative would be challenging to administer, as data for each additional allele genotyped would need to be added to the database. In other fields, quantitative and qualitative methods of detecting publication bias have been used, such as the fail-safe technique where the number of new studies averaging a null result needed in order to bring the overall effect to non-significance is calculated (79, 80). After this is calculated, a judgement can be made as to whether it is realistic to assume that so many unpublished studies exist in the field of investigation. If the assumption were realistic, then there would be doubt about the validity of conclusions based on potential evidence. Other quantitative and qualitative methods have been reviewed by Sutton et al. (81), and Thornton and Lee (82). In general, all the methods have limitations. Therefore, it seems appropriate to take into account the fact that the evidence base possibility may be skewed towards positive results in drawing conclusions about causal relationships.

Quality scoring

There are a number of publications concerning the rating of the quality of analytical observational studies. In relation to case-control studies, these include those of Feinstein (83), Horwitz and Feinstein (84), Breslow and Day (30), Kopec and Esdaile (85), Crombie (86), Liddle et al, (87) and Elwood (32). The articles by Feinstein and colleagues are part of a series of articles documenting the deficiencies of epidemiologic research; they have been challenged on the grounds of technical errors, failure to distinguish important from unimportant biases, and ignoring the need to weight the totality of the evidence about a relationship (88,89). The checklists presented by Crombie (86) may over-emphasize the potential problems of case-control studies as compared to cohort studies (e.g. it does not explicitly include confounding in the checklist for cohort studies). The assessment of study quality proposed by the New South Wales Group (87) and Dixon et al. (90) does not make it easy to assess differences between methods applied in the case and control groups, or between different exposure (prognostic) groups.

A number of authors have proposed quantitative quality scoring systems for critical appraisal (e.g. 90). Other schemes have been developed for the purposes of meta-analyses in which an attempt has been made to assess the importance of study quality in accounting for heterogeneity of results between studies (91-93). This type of assessment has also been considered for pooled analysis (94,95). Certain features of the assessment schemes are specific to the disease and/or the exposure under consideration, and each aspect of the study is given equal weight. Thus, summation of points might result in worse quality scores for a study with several minor flaws than for a study with one major flaw. While empirical studies on a large number of primary investigations might suggest an overall relationship between a specific aspect of study design and the reported results, this relationship is ecological and may not be true for a specific investigation. Therefore, it is very difficult to isolate specific non-causal factors, which might affect the interpretation of a single investigation. Jüni et al. (96) observed that the use of scores to identify clinical trials of high quality is problematic, and recommended that relevant methodological aspects should be assessed individually and their influence on the magnitude of the effect of the intervention explored. We recommend similar caution in consideration of studies of gene-disease associations. As in clinical trials, it may be more appropriate to consider multi-dimensional domains than a single grade in the integration of evidence from observational studies.

Currently, there is little or no empirical evaluation of the quality scoring of association studies. However, we recognize that many users of data on genotype prevalence and gene-disease associations need a robust means of grading evidence. We recommend following the approach by the Scottish Intercollegiate Guidelines Network
(6). In this approach, studies of gene-disease association in which all or most of the criteria specified in Table 1 are satisfied would be graded as “++”. Criteria that have not been fulfilled would not affect the grade if it were thought that the conclusions of the study would be very unlikely to be affected by their omission. Studies in which some of the criteria have been fulfilled, and those that were not fulfilled would be thought to be unlikely to alter the conclusions, would be graded as “+”. Studies in which few or no criteria were fulfilled, and the conclusions of the study would be thought likely or very likely to be altered by multiple omissions in required criteria for an acceptable study would be graded as “-”. For studies of genotype prevalence, similar considerations would be applied to the criteria listed in Table1.

Qualitative synthesis

Hierarchy of evidence

In many schemes of qualitative synthesis of evidence, there is a hierarchy whereby certain study designs are considered inherently superior to others. In general, analytical epidemiological designs are stronger than ecological designs and studies of case series or reports. While it has been argued that cohort studies may be less subject to bias than case-control studies, there are important issues about quality of follow-up and case-ascertainment. Therefore, it seems more rigorous to weight the evidence from specific studies of these types on the basis of a full critical appraisal rather than solely on the basis of general design. In addition, case reports may lead to novel hypotheses and be of value in considering biological plausibility (see below).

Causal inference

There are well-established criteria for causal inference (72,73). In relation to the consistency of gene-disease associations, it is important to consider that differences between studies in distributions of subjects by age and gender will be sources of heterogeneity. For example, hormonal alterations can affect ligand binding, enzyme activity, gene expression and the metabolic pathways influenced by gene expression. Hunter et al. (accompanying paper) note that some inconsistency between the results of gene-disease association studies may be secondary to variation among studies in the prevalence of interacting environmental factors that have not been assessed. It would be appropriate to test a priori hypotheses about differences in gene-disease associations and genotype frequencies between studies that may arise from these sources. We recommend that information on the age distribution of subjects be presented and that consideration be given to presenting data on gene-disease associations by gender.

In considering strength of association, many of the genetic variants so far identified as influencing susceptibility to common diseases are associated with a low relative and absolute risk (97). Therefore, exclusion of non-causal explanations for associations is crucial.

Biological plausibility is a particularly important issue. It is linked with consideration of: (1) whether a known function of the gene product can be linked to the observed phenotype; (2) whether the gene is expressed in the tissue of interest; and (3) temporal relationships, including the time window of gene-expression in relation to age-specific gene-disease relationships. Thus, the gene should be in the disease pathway and/or involved in the mechanism that is responsible for the development of the disease. If not, then the effect of the gene may be indirect. It may also be relevant to consider maternally mediated effects of the maternal genotype and parental imprinting. Case reports may provide clues that could not be obtained from epidemiological designs. For example, evidence from a heteropaternal twin pair provided a lead to genetic differences in the metabolism of phenytoin that accounted for a lack of concordance for teratogenic effect (98).

In regard to temporality, it is possible that the disease could influence the result of a phenotypic assay of the genotype under investigation. This should not be a problem with PCR methods. Methods to analyze longitudinal phenotypes, such as changes in blood pressure over time, are being developed. If data were available on the time window of gene-expression, it would be relevant to consider this in relation to age-specificity of gene-disease relationships.

Experimental support for a gene-disease association is most likely to be derived from studies of gene expression in knockout or other experimental animals, from in vitro data on gene function or from experimental interactions based on clinical protocols aimed at normalizing the levels of a product regulated by the gene (e.g. with gene therapy in cystic fibrosis).

Quantitative synthesis

There are two types of quantitative synthesis of evidence: (1) meta-analysis of the results of studies; and (2) pooled analysis of data on individual subjects obtained in several studies. There has been debate about the validity of meta-analysis of observational studies (75,99). On the one hand, meta-analysis may indicate a “spurious precision” and it has been suggested that either meta-analysis of observational studies should be abandoned altogether (100) or that the focus of attention should be the consideration of possible sources of heterogeneity between studies (101). On the other hand, meta-analysis can help clarify whether or not an association exists and provide an indication of the quantitative relationship between the dependent and independent variables (102). The indication of the quantitative relationship, although potentially biased, may be of value in consideration of public health effects of interventions based on knowledge of the genetic factor and/or its interactions.

Pooled analysis requires data on individual subjects. This approach offers many advantages over the meta-analysis of the results of studies, including standardization of definitions of cases and variables, better control of confounding and consistent determination of subgroup effects (94). Nevertheless, pooling approaches require much greater resources (103). We recommend that this type of quantitative synthesis should be done whenever possible in preference to meta-analysis of the results of studies when a high degree of accuracy of the measures of effect is required. However, stratification by original study may still be important, to allow for and elucidate causes of heterogeneity among the data sets being pooled.

Conclusion

Analytical epidemiological methods remain a critical issue in studies of genotype prevalence and gene-disease associations. While recognizing the innovative aspects of early work, it is noteworthy that many studies have had limited value because one or more of the issues of analytical validity of genotyping, possible selection bias, confounding, possible gene-environment and gene-gene interactions, and statistical power was inadequately addressed. We recognize the particular importance of inter-disciplinary collaboration in this fast expanding field and recommend that both individual studies and systematic reviews involve epidemiological input.

Acknowledgements

This paper represents consensus from the Centers for Disease Control and Prevention-National Institutes of Health Human Genome Epidemiology Workshop, January 2001.

Gale KB, Ford AM, Repp R, et al. Backtracking leukemia to birth: identification of clonotypic gene fusion sequences in neonatal blood spots. Proceedings of the National Academy of Sciences of the United States of America 1997;94:13950-13954.

Wacholder S, Rothman N, Caporaso N. Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. Journal of the National Cancer Institute 2000;92:1151-1158.

Garte S. The role of ethnicity in cancer susceptibility gene polymorphisms: the example of CYP1A1. Carcinogenesis 1998;19:1329-1332.

Weinberg CR, Wilcox AJ, Lie RT. A log-linear approach to case-parent-triad data: assessing effects of disease genes that act either directly or through maternal effects and that may be subject to parental imprinting. American Journal of Human Genetics 1998;62:969-978.

Weinberg CR. Methods for detection of parent-of-origin effects in genetic studies of case-parent triads. American Journal of Human Genetics 1999;65:229-235.

Doll R. The use of meta-analysis in epidemiology:diet and cancers of the breast and colon. Nutrition Reviews 1994;52:233-237.

Steinberg KK, Smith SJ, Stroup DF, et al. Comparison of effect estimates from a meta-analysis of summary data from published studies and from a meta-analysis using individual patient data for ovarian cancer studies. American Journal of Epidemiology 1998;145:917-925.