Abstract

OBJECTIVE

Variation in the transcription factor 7-like 2 (TCF7L2) locus is associated with type 2 diabetes across multiple ethnicities. The aim of this study was to elucidate which variant in TCF7L2 confers diabetes susceptibility in African Americans.

RESEARCH DESIGN AND METHODS

Through the evaluation of tagging single nucleotide polymorphisms (SNPs), type 2 diabetes susceptibility was limited to a 4.3-kb interval, which contains the YRI (African) linkage disequilibrium (LD) block containing rs7903146. To better define the relationship between type 2 diabetes risk and genetic variation we resequenced this 4.3-kb region in 96 African American DNAs. Thirty-three novel and 13 known SNPs were identified: 20 with minor allele frequencies (MAF) >0.05 and 12 with MAF >0.10. These polymorphisms and the previously identified DG10S478 microsatellite were evaluated in African American type 2 diabetic cases (n = 1,033) and controls (n = 1,106).

CONCLUSIONS

In African Americans, these observations suggest that rs7903146 is the trait-defining polymorphism associated with type 2 diabetes risk. Collectively, these results support ethnic differences in type 2 diabetes associations.

Diabetes is estimated to affect nearly 24 million people in the United States. This significant disease burden translates to a major economic impact. Prevalence is observed disproportionately across ethnicities with the some of the highest rates observed in African Americans, i.e., 11.8% (1). Increased risk is likely to be multifactorial, resulting from the combination of shared cultural, environmental, and genetic factors. Although recent genome-wide association studies of type 2 diabetes in European-derived populations have revealed novel, reproducible susceptibility loci (2–11), few have been replicated in African Americans (12–14).

An exception to this observation is the association of the transcription factor 7-like 2 (TCF7L2) gene with type 2 diabetes in African (15) and African-derived populations, i.e., African Americans (12,16). TCF7L2 is a transcription factor involved in the Wnt signaling pathway (17). Although initial reports implicated TCF7L2 in the regulation of the glucagon gene in the L cells of the gut (18), more recent reports suggest involvement in insulin secretion (19) potentially through epigenetic mechanisms (20). The initial report of association between TCF7L2 and type 2 diabetes in an Icelandic cohort identified a 64-kb linkage disequilibrium (LD) block of strong LD encompassing the intron 3 to intron 4 region of the gene (21) in this European population. Refinement of this signal in expanded populations revealed the strongest evidence of association with the single nucleotide polymorphism (SNP) rs7903146 with a relative risk of 1.45–1.49 (15). Although it has been inferred from these studies that rs7903146 is most likely the causative variant, the large (64 kb) LD block in European-derived populations and the large number of variants in this region have made it challenging to definitively conclude that variation at this SNP confers susceptibility to type 2 diabetes based solely upon genetic studies.

We previously reported association of TCF7L2 variants and type 2 diabetes in a large African American case-control cohort (16). Of the SNPs evaluated, association was observed with rs7903146 and rs7901695 (admixture-adjusted additive P = 3.77 × 10−6 and 0.0030, respectively) in a collection of 577 type 2 diabetic case subjects enriched for nephropathy and 596 controls. Given the evidence of association in our African American cohort (12,16), we sought to refine the genomic interval of TCF7L2 associated with type 2 diabetes in African Americans and, using a comprehensive analysis of variation in TCF7L2, to define the genetic basis for type 2 diabetes susceptibility.

RESEARCH DESIGN AND METHODS

Study subjects.

This study was conducted under Institutional Review Board approval from Wake Forest University School of Medicine. Identification, clinical characteristics, and recruitment of African American patients and controls have been previously described in detail (22). Briefly, 1,033 unrelated African American patients with type 2 diabetes were recruited from dialysis facilities. Type 2 diabetes was diagnosed in African Americans who reported developing type 2 diabetes after the age of 25 years and who did not receive only insulin therapy since diagnosis. In addition, cases had to have at least one of the following three criteria for inclusion: 1) type 2 diabetes diagnosed at least 5 years before initiating renal replacement therapy, 2) background or greater diabetic retinopathy, and/or 3) ≥100 mg/dL proteinuria on urinalysis in the absence of other causes of nephropathy. An additional 1,106 unrelated African Americans without a current diagnosis of diabetes or renal disease were recruited from the community and internal medicine clinics as controls. All type 2 diabetic cases and nondiabetic controls were born in North Carolina, South Carolina, Georgia, Tennessee, or Virginia. DNA extraction was performed using the PureGene system (Gentra Systems, Minneapolis, MN).

Sequencing.

The DNA screening panel was composed of 96 African American subjects: 48 type 2 diabetic cases and 48 controls. PCR primers were designed to independently amplify a 4.3-kb region of TCF7L2. Primer sequences are available on request. DNA sequencing reactions were performed using BigDye Terminator v.1.1 Cycle Sequencing Kits and analyzed on the Applied Biosystems 3730xl DNA Analyzer (Applied Biosystems, Foster City, CA). Sequencing reactions were performed on both DNA strands. Sequence alignment and polymorphism identification were performed using Sequencher 4.2 (Gene Codes, Ann Arbor, MI). All polymorphisms were validated through observation on both strands. A search of the region sequenced was performed at dbSNP (www.ncbi.nlm.nih.gov) to record all previously identified polymorphisms reported in the region.

Genotyping.

Genotyping of DG10S478 was performed by fragment length analysis on an ABI Prism DNA Analyzer 3700 with previously published primers (21) in a manner similar to that previously described (23). Fragment length was determined using ABI Prism GeneMapper software v3.0. Fifty-six duplicate samples were run for quality control purposes. SNP genotyping was performed on the iPlex MassARRAY genotyping platform (Sequenom, San Diego, CA). Blind duplicates and blanks were included for quality control and error checking. For all SNPs, the genotyping success rate was greater than 95%.

Four common SNPs identified from direct sequence analysis had minor allele frequencies (MAF) ≥0.05 and failed assay design on the Sequenom platform. With the use of the resequencing genotype data obtained from the 96 African American samples as reference, these SNPs were imputed in the remaining 2,043 samples with high-quality score (rsq-hat ≥0.5) using the software MACH (www.sph.umich.edu/csg/abecasis) (24). The imputed most likely genotypes were then used for subsequent association tests.

Analysis.

SNPs were tested for departure from Hardy-Weinberg equilibrium (HWE) using an exact test of HWE proportions for the combined group of cases and controls and then for cases only and controls only (25). Those SNPs out of HWE were noted but still included for the genotypic analysis. Haplotype block structure was established using Haploview 4.1 (26), defining blocks using the method from Gabriel et al. (27).

Unadjusted measures of LD and association were assessed using the software SNPGWA (http://www.phs.wfubmc.edu) (28). SNPGWA computes LD statistics, D′ and r2, for each pair of tandem SNPs. SNPGWA also performs multiple tests of association including the overall two-degree of freedom test (genotype), dominant model, recessive model, additive model (Cochran-Armitage trend test), and the corresponding lack of fit to the additive model. Odds ratios, 95% confidence intervals, and P values were computed for each model of association. Population attributable risk (PAR) was calculated as (X − 1)/X. Assuming a log additive model, X = (1 − f)2 + 2f(1 − f)γ + f2γ2 where γ is the estimated odds ratio (OR) and f is the risk allele frequency. DG10S478 was converted to a biallelic marker for analysis.

Ancestry estimates were determined from 70 biallelic admixture informative markers (AIMs) as previously described (16,29). Briefly, AIMs were selected to maximize European and African allele frequency differences and sample all non-acrocentric arms of the autosomal genome. Reference population allele frequencies were derived by genotyping 44 African (Yoruba from Ibadan, Nigeria) and 39 European Americans. Individual ancestral proportions were generated for each subject using FRAPPE (30), an expectation-maximization algorithm, under a two-population model. The individual ancestral estimates were used as covariates in the association analyses.

Conditional haplotype analysis.

To test whether SNPs were independently associated with type 2 diabetes, we performed an omnibus test for the haplotype association using PLINK (31). We further adjusted the omnibus test by controlling for one of the SNPs at a time. An insignificant conditional test suggests that the conditioned SNP is sufficient to explain the haplotype association and there is a single, rather than multiple, association signals at the haplotype.

RESULTS

Population characteristics.

Characteristics of the African American case-control populations are shown in Table 1. Controls were significantly younger than type 2 diabetic cases (P < 0.0001), although they were significantly older than the mean age at type 2 diabetes diagnosis in the cases (P < 0.0001). BMI was not significantly different between cases and controls (P = 0.49). Similar proportions of women were present in cases and controls (61% and 58%, respectively). FRAPPE (30) analysis of AIM genotypes estimated the mean proportion of African ancestry overall was 0.79 ± 0.12 and differed significantly (P < 0.0001) between type 2 diabetic cases and controls (0.80 ± 0.12 and 0.78 ± 0.12, respectively). Therefore, all results are presented with adjustment for admixture.

Refinement of the type 2 diabetes associated genomic interval.

A preliminary analysis assessed association with type 2 diabetes for 59 SNPs across the entire 216-kb TCF7L2 gene ±10 kb from the Affymetrix 6.0 array-based analysis of 965 type 2 diabetic ESRD cases and 1,029 controls (Fig. 1). With the use of the Tagger program of Haploview (26), 43 of the 59 SNPs from the Affy array were available in the HapMap YRI dataset and captured common variation at 121 SNPs (MAF >0.05; aggressive tagging algorithm) with a mean r2 = 0.73. Notably, the SNP of greatest interest, rs7903146, is not typed on the Affymetrix 6.0 array. This SNP is located in a genomic interval that is not tagged well (max r2 = 0.45), resulting in only nominal evidence of association in the Affymetrix 6.0 analysis. To circumvent limited coverage of the genetic diversity in this specific region, imputation was used. Using data from HapMap phase II hybrid panel (1:1, YRI:CEU), 108 SNPs were imputed across the TCF7L2 gene ± 10 kb (Supplementary Fig. 1). As a result, no variants were identified with the same magnitude of significance as rs7903146. In a separate analysis, 497 SNPs were imputed from the 1000 Genomes YRI Pilot 1 dataset (Supplementary Fig. 2). As a result, two variants were identified (rs33998771, chr10:114740378) proximal to our region with significant P values (3.58 × 10−5 and 3.29 × 10−5, respectively) similar to that of rs7903146. To test whether these SNPs (rs33998771, chr10:114740378, rs7903146) were independently associated with type 2 diabetes, we performed an omnibus test for the haplotype association controlling for one SNP at a time. Four common haplotypes (ACT, ATT, TTT, and TTC) accounted for 99.9% of all haplotypes. These haplotype frequencies were significantly different between type 2 diabetic cases and controls (0.042, 0.019, 0.268, and 0.672, respectively for cases; 0.020, 0.016, 0.229, and 0.735, respectively for controls, omnibus test P = 5.4 × 10−6), with the haplotypes TTC and ACT strongly associated with protection and risk for type 2 diabetes, respectively (P < 0.0001). Omnibus analysis revealed that the haplotype association was significantly reduced after adjusting for rs7903146 (P = 0.023), whereas strong association remained by conditioning rs33998771 or chr10:114740378 (P = 0.0006 and 0.002, respectively), suggesting that rs7903146 alone explained most of the haplotype association.

Regional association plot for TCF7L2 ±10 kb (C10:114689999–114926060). All SNPs genotyped on the Affy 6.0 array are plotted with their −log10P values of association with type 2 diabetes versus the genomic position (National Center...

These results lead us to focus on a single region spanning ~16.5 kb. Fourteen additional SNPs were genotyped in an expanded set of type 2 diabetic cases and controls within this interval and analyzed for association as summarized in Table 2. The core region of association was between SNPs rs4132115 and rs7903146 (admixture-adjusted additive P values ranging from 0.012 to 2.38 × 10−6). This region encompasses a 4.3-kb LD block in the YRI population (HapMap Phase II YRI data), which is bounded by the two SNPs rs7901695 and the previously associated rs7903146 (16).

In addition, the previously associated (21) microsatellite marker DG10S478, which lies 38 kb distal to and outside of this LD block, was typed, and the results of the analysis are presented in Table 3. Allele sizes and frequencies were consistent with prior genotyping in samples from African populations (15). Evidence of association was observed with the protective alleles −4 and 16 (P = 0.043 and 5.02 × 10−5, respectively) and the risk allele 8 (P = 0.022).

The core region of association (C10:114744078–114748339) ± 2 kb was analyzed by direct sequence analysis in 96 samples (48 type 2 diabetic cases and 48 controls). A total of 46 SNPs were identified, of which 72% (33/46) were novel and 17% (8/46) had a MAF >5%. When genotyped in the expanded cohort, five of the novel SNPs were found to be monomorphic and 10 could not be typed on the Sequenom platform because of the repetitive nature of the region. These 10 were genotyped via direct sequence analysis on a subset of 96 type 2 diabetic cases and 96 controls. Four SNPs were common, and these sequencing data were paired with existing data to be used as the known set of haplotypes for imputation in the remaining cohort. In addition, rs61875120 was common but yielded poor quality imputation (rsq = 0.31) and was therefore genotyped by direct sequence analysis. The remaining five SNPs had low MAF (MAF <0.05) and were not genotyped. Table 4 summarizes sequence variants and association analysis results. Of the 36 SNPs typed or imputed, 15 SNPs were found to be nominally associated with type 2 diabetes (admixture-adjusted additive P values ranging from 0.050 to 6.32 × 10−6). The most striking associations were observed at rs34872471, rs35198068 (imputed), and rs7903146, which were correlated (r2 > 0.74; Fig. 2) and associated with disease susceptibility (OR = 1.30–1.37) under an additive model. Three common haplotypes (CCT, CCC, and TTC) accounted for 99.6% of all haplotypes. These haplotype frequencies were significantly different between type 2 diabetic cases and controls (0.342, 0.056, and 0.602, respectively for cases; 0.278, 0.058, and 0.664, respectively for controls, omnibus test P = 3.7 × 10−5). Omnibus analysis revealed that the haplotype association was lost after adjusting for rs7903146 (P = 0.85), whereas modest association remained by conditioning rs34872471 or rs35198068 (P = 0.05), suggesting that rs7903146 alone is sufficient to explain the overall haplotype association.

DISCUSSION

This study illustrates the power of genetic analyses in African-derived populations to facilitate identification of trait-defining variants. TCF7L2 has been identified as one of the strongest type 2 diabetes susceptibility genes to date with associations across multiple ethnically diverse populations (12,15,16,21,32). Our study is consistent with the initial association of SNP rs7903146 in an African American type 2 diabetic case-control population. By taking advantage of reduced LD in the African American population, we have been able to narrow the critical interval for association. This 4.3-kb region, flanked by SNPs rs4132115 and rs7903146, was the focus of resequencing in an effort to infer which sequence variant(s) are causally associated with type 2 diabetes.

Of the 46 SNPs identified by resequencing, 15 SNPs were nominally associated with type 2 diabetes with the most significant associations observed at rs34872471, rs35198068 (imputed), and rs7903146, which were highly correlated (r2 > 0.74; Fig. 1) and associated with disease susceptibility (OR = 1.30–1.37). Conditional omnibus haplotype analysis suggested that rs7903146 was sufficient to explain the haplotype association. This analysis suggests that association at rs34872471 and rs35198068 was the result of correlation with the true signal from rs7903146.

Although this study has eliminated the possibility of additional common variants (MAF >0.01) contributing to type 2 diabetes susceptibility within the fine-mapped interval (C10:114744078–114748339 ± 2 kb) of the TCF7L2 locus, four variants (IVS3 +42245, IVS3 +42428, IVS3 +43487, and IVS4 −43007) were found to have MAF <0.01. These variants, which were located in highly repetitive regions, were not evaluated. To date these variants have not been identified by other ongoing studies, i.e., the 1000 Genomes project, suggesting they are private mutations. Additionally, given the low MAF, these variants are not likely to explain the association observed at the TCF7L2 locus, but we cannot rule out the possibility that these and other unidentified rare variants contribute to disease susceptibility. If this were so, effect sizes of such rare variants would have to be in a range unprecedented for noncoding variants.

As a result of fine-mapping the TCF7L2 locus to determine the region most likely to harbor susceptibility variants, the microsatellite marker DG10S478 was excluded as the causal variant. DG10S478 is located 41 kb proximal to the critical interval defined in the African American population and is in weak LD with rs7903146 (D′ = 0.35, r2 = 0.07). Only a single common allele of DG10S478 is nominally associated with type 2 diabetes, with the strongest association, which is protective, seen with low MAF variants. These data suggest that the contribution to disease by DG10S478 is nominal.

This study represents the first comprehensive evaluation of variation within the TCF7L2 gene in a large African American population. Taking advantage of the LD structure in our African-derived sample of African Americans, we were able to reduce the genomic interval of association to ~4.3 kb and exclude the possible contribution of the previously identified microsatellite marker to type 2 diabetes susceptibility. Our analysis identified three SNPs, rs34872471, rs35198068 (imputed), and rs7903146, which were highly associated with type 2 diabetes; all had P values that were two orders of magnitude stronger than other SNPs. Conditional omnibus haplotype analysis suggested that rs7903146 was sufficient to explain the haplotype association. SNP rs7903146 remains the most significantly associated variant within the TCF7L2 gene with a calculated PAR of 17.4%.

This investigation has used genetic approaches to focus on rs7903146. Alternative explanations can be proposed. For example, rs7903146 could be in LD with an unknown common variant. We cannot exclude this possibility with total confidence, but the assessment of markers in TCF7L2 by direct genotyping, imputation, and then through the use of the 1000 Genomes data using conditional analysis suggests the likelihood that such a common variant exists is low. Evaluation of long range LD on chromosome 10 shows little evidence for a remote variant (data not shown). An alternative is the possibility that a rare variant of large effect in LD with rs7903146 is the actual functional variant. This also seems unlikely. Although theoretically possible (33), we have recently shown empirically that it is easy to differentiate between a rare functional variant with large effect and a common variant in LD (34).

Thus, fine-mapping at the TCF7L2 locus using an African ancestry population has statistically implicated rs7903146 as the causal variant. It is noteworthy that Gaulton et al. (20) have implicated rs7903146 as a functional variant by mapping sequence variants to open chromatin sites. They found that rs7903146 is located in islet-selective open chromatin, and human islet samples heterozygous for rs7903146 showed allelic imbalance in islet enhancer activity. Thus genetic and functional studies make a consistent case for a functional role for rs7903146.

ACKNOWLEDGMENTS

This work was supported by National Institutes of Health grants K99-DK081350 (to N.D.P.), R01-DK066358 (to D.W.B.), R01-DK053591 (to D.W.B.), R01-HL56266 (to B.I.F.), and R01-DK070941 (to B.I.F.), and in part by the General Clinical Research Center of the Wake Forest University School of Medicine Grant M01-RR07122.

No potential conflicts of interest relevant to this article were reported.

N.D.P. researched data and wrote the article. J.M.H., S.S.A., A.A., and C.R. researched the data. C.D.L. provided analytical support, contributed to discussion, and reviewed and edited the article. B.I.F. contributed to discussion and reviewed and edited the article. M.C.Y.N. provided analytical support, contributed to discussion, and reviewed and edited the article. D.W.B. contributed to discussion and reviewed and edited the article.

The authors thank the patients, their relatives, and the staff of the Southeastern Kidney Council/ESRD Network 6 for their participation.