Abstract

BACKGROUND & AIMS

Heritable factors contribute to the development of colorectal cancer. Identifying the genetic loci associated with colorectal tumor formation could elucidate the mechanisms of pathogenesis.

METHODS

We conducted a genome-wide association study that included 14 studies, 12,696 cases of colorectal tumors (11,870 cancer, 826 adenoma), and 15,113 controls of European descent. The 10 most statistically significant, previously unreported findings were followed up in 6 studies; these included 3056 colorectal tumor cases (2098 cancer, 958 adenoma) and 6658 controls of European and Asian descent.

CONCLUSIONS

In a large genome-wide association study, we associated polymorphisms close to nucleic acid binding protein 1 (which encodes a DNA-binding protein involved in DNA repair) with colorectal tumor risk. We also provided evidence for an association between colorectal tumor risk and polymorphisms in laminin gamma 1 (this is the second gene in the laminin family to be associated with colorectal cancers), cyclin D2 (which encodes for cyclin D2), and T-box 3 (which encodes a T-box transcription factor and is a target of Wnt signaling to β-catenin). The roles of these genes and their products in cancer pathogenesis warrant further investigation.

Keywords: Colon Cancer, Genetics, Risk Factors, SNP

Colorectal cancer has a sizable heritable component; a large twin study estimated that 35% of colorectal cancer risk may be explained by heritable factors.1 Over the past several years, genome-wide association studies (GWAS), which focus on common single-nucleotide polymorphisms (SNPs), successfully have discovered low-penetrance loci for colorectal cancer.2–12 These analyses have highlighted genes within the known transforming growth factor-β and Wnt signaling pathways (eg, bone morphogenetic protein 2 & 4, SMAD7), as well as regions and genes not previously strongly implicated in colorectal cancer (eg, zinc finger protein 90, laminin alpha 5, disco-interacting protein 2), thereby highlighting pathways previously not understood to be involved in colorectal carcinogenesis.2–12

To identify additional common genetic risk factors for colorectal tumors, we conducted a genome-wide scan across 14 independent studies including nearly 28,000 subjects and follow-up evaluation of nearly 10,000 independent subjects. We included both colorectal cancer cases and colorectal adenoma cases. Colorectal adenoma is a well-defined colorectal cancer precursor13 and the majority of colorectal cancers develop through the adenoma-cancer sequence.14 It has been estimated that the 10-year cumulative rate for advanced adenoma to transition to colorectal cancer is between 10% and 45%, depending on age and sex.13,15,16 Accordingly, the 2 phenotypes have overlapping etiology.17 Inclusion of adenoma cases can increase sample size, and hence statistical power, to identify genetic risk factors related to early events in the adenoma-carcinoma process, during which risk factor intervention strategies may offer the greatest potential benefit for cancer prevention.

Materials and Methods

Study Participants

Each study is described in detail in the Supplementary Materials and Methods section and the number of cases and controls as well as age and sex distributions are listed in Supplementary Table 1. In brief, colorectal cancer cases were defined as colorectal adenocarcinoma and confirmed by medical records, pathologic reports, or death certificate. Colorectal adenoma cases were confirmed by medical records, histopathology, or pathologic reports. Controls for adenoma cases had a negative colonoscopy (except for the Nurses’ Health Study and the Health Professionals Follow-up Study controls matched to cases with distal adenoma, which either had a negative sigmoidoscopy or colonoscopy examination). All participants provided written informed consent and studies were approved by their respective institution’s Institutional Review Boards.

Genotyping

GWAS in the Genetics and Epidemiology of Colorectal Cancer Consortium and the Colon Cancer Family Registry

We conducted a meta-analysis of GWAS from 13 studies within the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) (10,729 cases and 13,328 controls) and additional GWAS within the Colon Cancer Family Registry (CCFR) (1967 cases and 1785 controls). Details on genotyping, quality assurance/quality control, and imputation can be found in the Supplementary Materials and Methods section. Average sample and SNP call rates and concordance rates for blinded duplicates are listed in Supplementary Table 2. In brief, all analyses were restricted to European ancestry. Genotyped SNPs were excluded based on call rate (<98%), lack of Hardy–Weinberg equilibrium in controls (HWE, P < 1 × 10−4), and low minor allele frequency (MAF). Because imputation of genotypes is established as standard practice in the analysis of genotype array data, we imputed the autosomal SNPs of all studies to the Utah residents with Northern and Western European ancestry from the Centre d’etude du polymorphisme humain (CEPH) collection (CEU) population in HapMap II (available at: http://hapmap.ncbi.nlm.nih.gov/). Imputed SNPs were restricted based on MAF (≥1%) and imputation accuracy (R2 > 0.3). After imputation and quality control (QC), a total of 2,708,280 SNPs were used in the meta-analysis of GECCO studies and CCFR. In our detailed result table (Supplementary Table 3), we list for each SNP the number of studies with directly genotyped or imputed data and the mean imputation R2. These data show, as expected, that imputed SNPs tend to show very similar results as SNPs that were directly genotyped if the correlation is high between SNPs.

Follow-up studies

We selected the 10 most statistically significant regions (excluding known GWAS loci) based on the P value from the GECCO and CCFR meta-analysis for further follow-up evaluation in colorectal cancer studies in the Asian colorectal cancer consortium and a US-based colorectal adenoma study. Details on genotyping, quality assurance/quality control, and imputation can be found in the Supplementary Materials and Methods section. After quality control exclusions, 2098 colorectal cancer cases and 5749 controls, and 958 colorectal adenoma cases and 909 controls remained in the analysis.

Statistical Analysis

GWAS in GECCO and CCFR

Statistical analyses of the GECCO and CCFR samples were conducted centrally at the coordinating center on individual-level data to ensure a consistent analytic approach. For each study, we estimated the association between SNPs and risk for colorectal cancer by calculating β values, odds ratios (ORs), standard errors, 95% confidence intervals, and P values using logistic regression models with log-additive genetic effects. Each directly genotyped SNP was coded as 0, 1, or 2 copies of the risk allele. For imputed SNPs, we used the expected number of copies of the risk allele (the dosage), which has been shown to provide unbiased estimates in the association test for imputed SNPs.18 We adjusted for age, sex (when appropriate), center (when appropriate), smoking status (Physicians’ Health Study only), batch effects (The french Association STudy Evaluating RISK for sporadic colorectal cancer), and the first 3 principal components from EIGENSTRAT (available at: http://genepath.med.harvard.edu/~reich/EIGENSTRAT.htm) to account for population substructure. Because CCFR set 2 is a family-based study, we used a conditional logistic regression stratified by family identification while adjusting for age and sex. When analyzing genotyped SNPs on the X chromosome we need to account for different genotype variances between males and females. Therefore, we used the 1 degree of freedom modified Cochran–Armitage test19 to test for associations. This method has been shown to have robust and powerful performance across a wide range of scenarios.20 We used logistic regression to model SNP × SNP interaction effects for a log-additive model, in which the interaction term is the product of the 2 SNPs.

Quantile-quantile plots were assessed to determine whether the distribution of the P values in each study was consistent with the null distribution (except for the extreme tail). We also calculated the genomic inflation factor (λ) to measure the overdispersion of the test statistics from the association tests by dividing the median of the squared Z statistics by 0.455, the median of a chi-squared distribution with 1 degree of freedom. The inflation factor λ was between 0.999 and 1.044 for individual studies based on all SNPs including both directly genotyped and imputed, indicating there is little evidence of residual population substructure, cryptic relatedness, or differential genotyping between cases and controls. This result was consistent with the visual inspection of the study-specific quantile-quantile plots.

We conducted inverse-variance weighted, fixed-effects meta-analysis to combine β estimates and standard errors across individual studies. In this approach, we weighed the β estimate of each study by its inverse variance and calculated a combined estimate by summing the weighted β estimates and dividing by the summed weights. For imputed SNPs, it has been shown that the inverse variance is approximately proportional to the imputation quality.18 Thus, the inverse variance weighting scheme automatically incorporates imputation quality in the meta-analysis for imputed SNPs. We calculated the heterogeneity P values based on Cochran’s Q statistic21 and investigated sources for heterogeneity if the P value was less than .05 for the 10 most significant SNPs. For the most significant SNPs highlighted in this article, we also examined recessive and unrestricted genetic models and compared models by calculating the Akaike information criterion. We used PLINK (available at: pngu.mgh.harvard.edu/~purcell/plink/)22 and R (available at: http://www.r-project.org/)23 to conduct the statistical analysis and summarized results graphically using LocusZOOM (available at: http://csg.sph.umich.edu/locuszoom/).24

Follow-up studies

The 10 most significant SNPs from the GWAS meta-analysis described earlier were analyzed in the follow-up studies (P values from GWAS meta-analysis 2.5 × 10−7 to 6.5 × 10−6). For the Asian colorectal cancer follow-up study, genotyped SNPs and dosage data of imputed SNPs were analyzed using the program mach2dat (available at: http://www.sph.umich.edu/csg/abecasis/MACH/download/).25 The association between SNP and colorectal cancer risk was assessed using logistic regression with log-additive genetic effects after adjusting for age and sex. Meta-analyses were performed using the inverse-variance method based on a fixed-effects model, and calculations were implemented in the METAL package (available at: http://www.sph.umich.edu/csg/abecasis/metal/).26 Because the MAF in the Asian follow-up population was very low for the locus on chromosome 14q23.1 (MAF, 0 – 0.006 in Han Chinese individuals from Beijing, China), we excluded this SNP from the follow-up evaluation in the Asian studies. Given potential differences in the linkage disequilibrium structure between European and Asian descent subjects, we also included all SNPs correlated with these 10 selected SNPs (r2 > 0.5 in CEU).

For the adenoma follow-up study (all European descent), the association between each genetic marker and risk for colorectal adenoma was estimated by calculating ORs and 95% confidence intervals, using a log-additive genetic model. SNPTESTv2.2.0 (available at: https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html) with the “-method score” option27 was used for logistic regression with the frequentist test, and the model was adjusted for age and sex.

For a combined analysis of GWAS and follow-up results, we conducted inverse-variance weighted fixed-effects meta-analysis to combine ORs from log-additive models across individual studies and measured heterogeneity using Cochran’s Q statistic, as discussed earlier.

Criterion for genome-wide significance

Based on an increasing number of articles28–33 providing a detailed discussion on the appropriate genome-wide significance threshold, which all arrive at similar values in the range of 5 × 10−7 to 5 × 10−8 for European populations, we decided to use a P value of 5 × 10−8 as the genome-wide significance threshold. In addition, we reported on SNPs with P values between less than 5 × 10−7 and greater than 5 × 10−8 as a potentially novel SNP that merited additional follow-up evaluation.

Heritability estimates

We estimated the additive heritability of colorectal cancer explained by all genotyped SNPs using the method by Yang et al34 and implemented in the Genome-wide Complex Trait Analysis tool.35 We set the prevalence of colorectal cancer to 0.004, based on Surveillance, Epidemiology and End Results incidence and National Center for Health Statistics mortality statistics.36 We used all genotyped SNPs of Darmkrebs: Chancen der Verhütung durch Screening set II and Diet, Activity, and Lifestyle Study set I given the sizable sample set, different genotyping platforms, and inclusion of both sexes (Supplementary Table 1). We also estimated the heritability of previously and newly identified variants by using the method of So et al.37 Furthermore, we used the method described by Park et al38 to estimate the total number of loci expected to be identified for colorectal cancer based on the observed effect sizes and power for identifying the loci known to date (Table 1 and Supplementary Table 4).

Risk Estimates for Newly Identified SNPs Associated With Colorectal Cancer at a P Value Less Than 5 × 10−7

Functional annotation of findings

We conducted a functional annotation for all tagging SNPs (and correlated SNPs) highlighted in this article. As detailed in the Supplementary Materials and Methods section, we queried multiple bioinformatic databases based on the University of California, Santa Cruz genome browser.

Results

Summary results of the GWAS meta-analysis of GECCO and CCFR are shown in the Manhattan plot (Supplementary Figure 1). Several of the previously identified GWAS SNPs were highly significantly associated with colorectal cancer, and overall we found a nominal significant association (P < .05) in the same direction for 16 of 18 previously identified GWAS loci (Supplementary Table 4). After excluding previously identified regions, we followed up the 10 most significant regions from the GWAS meta-analysis (P = 2.5 × 10−7 to 6.5 × 10−6; Supplementary Table 3). In 4 regions the follow-up studies showed evidence of replication with the association in the same directions as the GWAS and an overall improved significance level (Table 1). Of these 4 regions, 1 region reached the conventional genome-wide significance level at a P value less than 5.0 × 10−8 in the combined analysis (GWAS + follow-up evaluation). This region was on chromosome 2q32.3 (rs11903757: OR, 1.16 per risk allele; P = 3.7 × 10−8; Table 1 and Supplementary Figure 2). The SNP showed no evidence for heterogeneity (P = .27) across all studies. The SNP was correlated strongly (r2 > 0.9) with several SNPs in the same region, which showed similar results (Supplementary Figure 3 and Supplementary Table 3).

The other 3 regions had P values less than 5.0 × 10−7 (and P > 5.0 × 10−8) in the combined analysis (GWAS + follow-up evaluation). Reporting by chromosomal position, the first of these 3 regions was on chromosome 1q25.3. In this region, the association with rs10911251 had the lowest P value (OR, 1.09 per risk allele; P = 9.5 × 10−8; Table 1 and Supplementary Figure 2), showing no evidence of heterogeneity (P = .69) across studies. This was correlated strongly with a large number of SNPs in the same region showing similar allele frequencies, risk estimates, and P values spanning across the entire laminin gamma 1 (LAMC1) gene (Supplementary Figure 3 and Supplementary Table 3).

The second region with P values less than 5.0 × 10−7 and greater than 5.0 × 10−8 was on chromosome 12p13, within the cyclin D2 (CCND2) gene. The most statistically significant SNP was rs3217810 (OR, 1.20 per risk allele; P = 5.9 × 10−8; Table 1 and Supplementary Figure 2). Furthermore, only 17.1 kb apart resides a second SNP, rs3217901, which was not strongly correlated with rs3217810 (r2 = 0.052 – 0.063) and showed a slightly lower significance level (OR, 1.10 per risk allele; P = 4.9 × 10−7). Although the risk allele frequency of rs3217810 in our European descent studies was on average 0.16, this SNP is very uncommon in Asian populations (0.03 in Japanese in Tokyo, Japan, and 0.01 in Han Chinese individuals from Beijing, China) and, hence, the follow-up evaluation of rs3217810 did not include the Asian cases and controls. Both SNPs were not heterogeneous across studies (P for heterogeneity = .51 and .91). When we included both SNPs simultaneously in the logistic regression analysis the significance of both SNPs was reduced (Supplementary Table 5).

The third region with P values less than 5.0 × 10−7 was in the T-box 3 (TBX3) gene on chromosome 12q24.21. The most statistically significant SNP in this region was rs59336 (OR, 1.09 per risk allele; P = 3.7 × 10−7; Table 1 and Supplementary Figure 2). Again, we observed no evidence for heterogeneity across studies (P = .39).

We investigated if the 4 regions listed earlier might be more significant (lower P value) under a different genetic model than the log-additive model. None of the variants was more significant when we modeled the unrestricted, dominant, or recessive mode of inheritance (Supplementary Table 6).

When we stratified results by colorectal adenoma and cancer we observed stronger associations for adenoma compared with cancer for rs11903757 at 2q32.3, similar associations for rs3217810 and rs3217901 at 12p13/CCDN2 and for rs59336 at 12q24.21/TBX3, and a weaker association for rs10911251 at 1q25.3/LAMC1 (Supplementary Table 7). For previously identified loci, in particular, associations for rs16892766 at 8q23.3/EIF3H and rs4939827 at 18q21/SMAD7 tended to be stronger for adenoma, whereas associations for other loci tended to be similar or weaker compared with cancer (Supplementary Table 4).

We observed no evidence for interaction between the SNPs in the newly identified regions or with SNPs in previously identified regions. The smallest P value for interaction was .017 for rs59336/TBX3 and rs11632715/15q13 and was not significant after accounting for multiple comparisons.

As popularized by Yang et al,34 we estimated that the additive heritability of colorectal cancer explained by all genotyped SNPs would be 14.2% (standard error, 8.2%). The newly identified loci (Table 1) and previously identified loci (Supplementary Table 4) explained about 11% of the additive heritability and cumulatively these newly and previously identified loci explain 1.6% of the variation of colorectal cancer. Based on the study by Park et al38 we estimated that the total number of loci expected to be identified for colorectal cancer would be between 239 and 500 if the type I error rate was between 5 × 10−7 and 5 × 10−8.

Discussion

In this large genome-wide scan meta-analysis and follow-up evaluation of a total of close to 38,000 subjects, we identified an intergenic region on chromosome 2q32.3 close to nucleic acid binding protein 1 (NABP1) that was associated with colorectal tumor risk with P values less than 5.0 × 10−8, the conventional genome-wide significance level. Furthermore, we identified 3 regions with P values less than 5.0 × 10−7: one on chromosome 1q31 in LAMC1, a second on chromosome 12p13 in CCND2, and a third on chromosome 12q24.21 in TXB3. All showed highly significant associations with P values less than 5 × 10−7.

Our study provides strong support for an intergenic locus on chromosome 2q32.3. The most significant SNPs in this region are in closest proximity to the NABP1 gene (44 kb centromeric) and the gene serum deprivation response (112 kb telomeric), which encodes for the serum-deprivation response phosphatidylserine-binding protein. The SNPs are downstream of NABP1, which also is known as human single-strand DNA binding protein 2 or oligonucleotide/oligosaccharide binding fold-containing protein 2A. This protein binds single-stranded DNA via the oligonucleotide/oligosaccharide binding fold domain.39 Single-stranded DNA binding proteins are important for diverse DNA processes, such as DNA replication, recombination, transcription, and repair.40–42 Cells depleted of NABP1 show hypersensitivity to DNA-damaging reagents; NABP1 participates in repair of DNA double-strand breaks and ataxia telangiectasia mutated–dependent signaling pathways,43 similar to the role of its homolog, NABP2 (which is also known as human single-strand DNA binding protein 1).39 Although our functional annotation did not provide further insights on the function of the SNPs, the biologic data described earlier support the importance of NABP1 with respect to genomic stability, which could explain a link to the development of cancer.44

In addition to the genome-wide significant region we observed 3 regions that were slightly less significant with P values less than 5 × 10−7 but greater than 5 × 10−8. As has been shown previously,45 a large fraction of SNPs with borderline genome-wide–significant associations replicated when results from additional studies were added, suggesting that further follow-up evaluation of these regions is warranted. The first of these 3 regions was on chromosome 1q31 and included correlated SNPs showing associations that spanned across the LAMC1 gene. Interestingly, previous genome-wide scans of colorectal cancer identified a different laminin gene on chromosome 20q13.33, laminin alpha 5, as associated with colorectal cancer,9,11 supporting the importance of this gene family for the development of colorectal cancer. Laminins are extracellular matrix glycoproteins that constitute a major component of the basement membrane in most tissues46 and in the colon are part of the intestinal epithelial barrier. Laminins are involved in a wide variety of biological functions, such as regulation of cell adhesion, differentiation, migration, signaling, and metastasis.47–50 Loss of cell-surface laminin anchoring has been found in many cancer cells, particularly those with aggressive subtypes.51

LAMC1 is a large gene spanning 122 kb and containing 28 coding exons. rs10911251 is correlated strongly (r2 > 0.8) with several other SNPs across the gene (Supplementary Figure 3 and Supplementary Table 3). Upon functional annotation, we identified a potential functional candidate (rs10911205) that is correlated strongly with the most significant tagSNP (r2 = 0.73) and located 72 kb upstream within the first intron of LAMC1. As shown in the University of California, Santa Cruz Genome Browser view (Supplementary Figure 4), rs10911205 is located within a highly evolutionarily conserved region and, given its close proximity to the promoter, it is possible that this region influences gene transcription. In addition, the patterns of histone modifications and DNase signals indicating accessibility for transcription factors suggest that this variant may affect cell-type–specific enhancer activity. In summary, given the statistical evidence, support from functional annotation, and evidence from a previous GWAS that identified another laminin gene to be associated with colorectal cancer, we believe there is strong support for the importance of LAMC1 in the development of colorectal cancer. It is of note that the biologic role of this gene family has not yet been studied substantially in relation to colorectal cancer, supporting the novelty of this finding.

A second region with P values less than 5 × 10−7 was on chromosome 12p13.32, with 2 independent SNPs both located in the intron of CCND2, which belongs to the highly conserved cyclin family, specifically encoding for the protein cyclin D2. Through regulation of CDK4 and CDK6, cyclin D2 affects the cell-cycle transition of the G1/S phase.52,53 Furthermore, cyclin D2 interacts with tumor-suppressor protein retinoblastoma. Recent studies have identified CCND2 as an microRNA target gene in different colorectal cancer cell lines.54,55 Interestingly, genetic variants in CCND1 also have been related to colorectal cancer56,57 and a previous GWAS identified a SNP in CCND1 to be associated with breast cancer.58

The third region with P values less than 5 × 10−7 we identified was within the TBX3 gene, which encodes the T-box transcription factor. TBX3 is overexpressed in several cancers, including pancreas, liver, breast cancer, and melanoma,59 playing multiple roles in normal development and cancer.60 In liver cancer, TBX3 was identified as a downstream target of the Wnt/β-catenin pathway, mediating β-catenin activities on cell proliferation and survival.61 The Wnt/β-catenin pathway plays a key role in colorectal cancer development.62TBX5, another member of the T-Box gene family, has been suggested as an epigenetically inactivated tumor-suppressor gene in colon cancer63 and provides an additional mechanism by which this gene family may influence colorectal cancer development.

Our study adds further support for all, except 3, previously identified GWAS loci for colorectal cancer. The 3 SNPs (on chromosomes 1q41, 3q26.2, and 6p21) that did not replicate are among the more recently identified GWAS loci9,12 and have smaller effect sizes (OR for risk allele, ≤1.1) compared with the earlier GWAS findings. As a result, larger sample sizes may be needed to fully replicate these SNPs. Furthermore, it is possible that the effect varies by environmental exposures, which may differ among the study populations. Overall, effect sizes from our study for previous GWAS loci tend to be weaker than in the initial reports, which may be explained by the fact that previous results were subject to the “winner’s curse.”64

The large sample size of our GWAS and follow-up studies and availability of individual-level GWAS data are important advantages of our study. However, the study also had limitations. To increase the sample size, we included Asian descent subjects, who may have different linkage disequilibrium patterns, and the SNPs analyzed may be tagging different underlying causal variants. To address this potential limitation we included all SNPs correlated with the most significant SNPs, which likely will identify any variant that genuinely is associated with colorectal cancer risk across different ancestral groups, as shown for other GWAS loci.65–68 Given that genotyping platforms only capture a subset of the genome, we used imputation to HapMap II to obtain a better coverage of the common variation across the genome and to generate a common set of SNPs from the different platforms. Because imputed SNPs tend to result in less significant findings depending on their imputation accuracy,69 we expect that our results provide relative conservative significance levels.70 Similar to previous GWAS,2,4,6–10,12 we included colorectal adenoma as the major precursor of colorectal cancer to improve our statistical power and to identify genetic variants that act early in the adenoma-cancer sequence, where adenomas and cancer have a shared etiology. Although the inclusion of adenoma also may add heterogeneity because adenomas will not show an association for genetic variants that act later in the carcinogenic process (ie, on progression from adenoma to cancer) or for variants that act through adenoma-independent pathways, stratified analysis may provide insights into the mediating roles of genes within the normal to adenoma to cancer pathway. We show that for some of the newly and previously identified loci, associations are stronger for adenomas compared with cancer; however, we observed similar or weaker associations for other loci. These results may suggest that some genes are important in early stages of cancer development while others may be more important for the progression from adenoma to cancer. However, given the relatively small number of adenoma cases (only 6.5% of the GWAS and 31% of the follow-up cases were adenoma cases), it is important that our findings are replicated in studies with larger numbers of adenoma cases.

In summary, in this large study, we identified one novel susceptibility locus associated with the risk of colorectal tumor on chromosome 2q32.3 close to NABP1, and 3 potential loci with borderline genome-wide significant results within LAMC1, CCND2, and TBX3. These findings are supported by biologic plausibility, functional annotation, and previous GWAS findings within the same gene family, emphasizing the potential relevance of these genes in the etiology of colorectal cancer.

Supplementary Material

01

Acknowledgments

The authors wish to thank the following:

Asian Consortium: The authors wish to thank the study participants and research staff for their contributions and commitment to this project, Regina Courtney for DNA preparation, and Jing He for data processing and analyses.

The french Association STudy Evaluating RISK for sporadic colorectal cancer: The authors are very grateful to Dr Bruno Buecher without whom this project would not have existed. The authors also thank all those who agreed to participate in this study, including the patients and the healthy control persons, as well as all the physicians, technicians, and students.

The Genetics and Epidemiology of Colorectal Cancer Consortium study was supported by the National Cancer Institute, National Institutes of Health, and the US Department of Health and Human Services (U01 CA137088; R01 CA059045). The Asian Consortium was supported by a Grant-in-aid for Cancer Research, the Grant for the Third Term Comprehensive Control Research for Cancer and Grants-in-Aid for Scientific Research from the Japanese Ministry of Education, Culture, Sports, Science and Technology (17015018 and 221S0001). The french Association STudy Evaluating RISK for sporadic colorectal cancer was supported by a Hospital Clinical Research Program (PHRC) and by the Regional Council of Pays de la Loire, the Groupement des Entreprises Françaises dans la Lutte Contre le Cancer, the Association Anne de Bretagne Génétique, and the Ligue Régionale Contre le Cancer. The Assessment of Risk in Colorectal Tumours in Canada study was supported by the National Institutes of Health through funding allocated to the Ontario Registry for Studies of Familial Colorectal Cancer (U01 CA074783; see the Colon Cancer Family Registry support section below); and by a GL2 grant from the Ontario Research Fund, the Canadian Institutes of Health Research, by a Cancer Risk Evaluation Program grant from the Canadian Cancer Society Research Institute, and by Senior Investigator Awards (T.J.H. and B.W.Z.) from the Ontario Institute for Cancer Research, through generous support from the Ontario Ministry of Economic Development and Innovation. The Hawaii Colorectal Cancer Studies 2 and 3 studies were supported by the National Institutes of Health (R01 CA60987). The Colon Cancer Family Registry was supported by the National Institutes of Health (RFA CA-95-011) and through cooperative agreements with members of the Colon Cancer Family Registry and P.I.s. This genome-wide scan was supported by the National Cancer Institute, National Institutes of Health (U01 CA122839). The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of the collaborating centers in the cancer family registries, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government or the cancer family registries. The following colon cancer family registries centers contributed data to this article and were supported by National Institutes of Health: the Australasian Colorectal Cancer Family Registry (U01 CA097735), Seattle Colorectal Cancer Family Registry (U01 CA074794), and the Ontario Registry for Studies of Familial Colorectal Cancer (U01 CA074783). The Darmkrebs: Chancen der Verhütung durch Screening study was supported by the German Research Council (Deutsche Forschungsgemeinschaft, BR 1704/6-1, BR 1704/6-3, BR 1704/6-4, and CH 117/1-1), and the German Federal Ministry of Education and Research (01KH0404 and 01ER0814). The Diet, Activity, and Lifestyle Study was supported by the National Institutes of Health (R01 CA48998 to M.L.S.); Guangzhou-1 was supported by the National Key Scientific and Technological Project (2011ZX09307-001-04) and the National Basic Research Program (2011CB504303) was supported by the People’s Republic of China. The Health Professionals Follow-up Study was supported by the National Institutes of Health (P01 CA 055075, UM1 CA167552, R01 137178, and P50 CA 127003), the Nurses’ Health Study was supported by the National Institutes of Health (R01 137178, P01 CA 087969, and P50 CA 127003), and the Physicians’ Health Study was supported by the National Institutes of Health (CA42182). The Korean Cancer Prevention Study-II study was supported by the National R&D Program for cancer control (1220180), and the Seoul R&D Program (10526, Republic of Korea). The Multiethnic Cohort study was supported by the National Institutes of Health (R37 CA54281, P01 CA033619, and R01 CA63464). The Prostate, Lung, Colorectal Cancer, and Ovarian Cancer Screening Trial was supported by the Intramural Research Program of the Division of Cancer Epidemiology and Genetics, and supported by contracts from the Division of Cancer Prevention, National Cancer Institute, National Institutes of Health, Department of Health and Human Services. Control samples were genotyped as part of the Cancer Genetic Markers of Susceptibility prostate cancer scan, supported by the Intramural Research Program of the National Cancer Institute. The data sets used in this analysis were accessed with appropriate approval through the dbGaP online resource (http://www.cgems.cancer.gov/data_acess.html) through dbGaP accession number 000207v.1p1. Control samples also were genotyped as part of the GWAS of Lung Cancer and Smoking (Yeager, M et al. Nat Genet 2008;124:161–170). Support for this work was provided through the National Institutes of Health, Genes, Environment and Health Initiative (Z01 CP 010200). The human subjects participating in the genome-wide association study were derived from the Prostate, Lung, Colon, and Ovarian Screening Trial and the study was supported by intramural resources of the National Cancer Institute. Assistance with genotype cleaning, as well as with general study coordination, was provided by the Gene Environment Association Studies, Geneva Coordinating Center (U01 HG004446). Assistance with data cleaning was provided by the National Center for Biotechnology Information. Funding support for genotyping, which was performed at the Johns Hopkins University Center for Inherited Disease Research, was provided by the National Institutes of Health, Genes, Environment and Health Initiative (U01 HG 004438). The data sets used for the analyses described in this article were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000093 v2.p2. The Postmenopausal Hormone Study was supported by the National Institutes of Health (R01 CA076366 to P.A.N.). The Shanghai-1 and Shanghai-2 studies were supported by the National Institutes of Health (R37CA070867, R01CA082729, R01CA124558, R01CA148667, and R01CA122364), as well as an Ingram Professorship and Research Reward funds from the Vanderbilt University School of Medicine. The Tennessee Colorectal Polyp Study was supported by the National Institutes of Health (P50CA95103 and R01CA121060) and was conducted by the Survey and Biospecimen Shared Resource, which was supported in part by the Vanderbilt-Ingram Cancer Center (P30 CA 68485). The VITamins And Lifestyle study was supported by the National Institutes of Health (K05 CA154337). The Women’s Health Initiative program was funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, US Department of Health and Human Services, through contracts HHSN268201100046C, HHSN268201100001C, HHSN268201100002C, HHSN268201100003C, HHSN268201100004C, and HHSN271201100004C.

Abbreviations used in this paper

CFR

Colon Cancer Family Registry

CCND

cyclin D2

CEU

Utah residents with Northern and Western European ancestry from the CEPH collection