Abstract

Epigenetic changes may contribute substantially to risks of diseases of aging. Previous studies reported seven methylation variable positions (MVPs) robustly associated with incident type 2 diabetes mellitus (T2DM). However, their causal roles in T2DM are unclear. In an incident T2DM case-cohort study nested within the population-based European Prospective Investigation into Cancer and Nutrition (EPIC)-Norfolk cohort, we used whole blood DNA collected at baseline, up to 11 years before T2DM onset, to investigate the role of methylation in the etiology of T2DM. We identified 15 novel MVPs with robust associations with incident T2DM and robustly confirmed three MVPs identified previously (near to TXNIP, ABCG1, and SREBF1). All 18 MVPs showed directionally consistent associations with incident and prevalent T2DM in independent studies. Further conditional analyses suggested that the identified epigenetic signals appear related to T2DM via glucose and obesity-related pathways acting before the collection of baseline samples. We integrated genome-wide genetic data to identify methylation-associated quantitative trait loci robustly associated with 16 of the 18 MVPs and found one MVP, cg00574958 at CPT1A, with a possible direct causal role in T2DM. None of the implicated genes were previously highlighted by genetic association studies, suggesting that DNA methylation studies may reveal novel biological mechanisms involved in tissue responses to glycemia.

Introduction

Type 2 diabetes mellitus (T2DM) is a major and increasing public health problem. Genome-wide studies have identified more than 240 genetic variants (1) that are robustly associated with T2DM. However, these only explain a minor portion of T2DM susceptibility variance (2,3). Environmental factors, including diet and physical activity, and also early life factors during fetal and early postnatal development are reported to contribute to the etiology of T2DM. Epigenetic variation can occur as a result of genetic and/or environmental factors (4). DNA methylation (DNAm) at cytosine-guanine dinucleotides (CpG sites) is the most commonly studied epigenetic mechanism to date, due to its role in expression regulation and available assays to quantify DNAm intensity at multiple sites across the epigenome that are applicable to large-scale studies. Unlike genotypic variation, DNAm intensity patterns are liable to change over time, with age or following disease or other exposure, and therefore disease-associated changes may be either causal or consequential (5).

Previous epigenome-wide association studies (EWAS) have identified seven methylation variable positions (MVPs) that are significantly associated (P < 1.0 × 10−7) with incident T2DM (6,7). However, the causal role of those markers in T2DM is unclear. Here, we aimed to elucidate DNAm determinants of T2DM by performing an EWAS for incident T2DM in the European Prospective Investigation into Cancer and Nutrition (EPIC)-Norfolk study (8). By further integrating genome-wide genetic array data, we aimed to identify methylation quantitative trait loci (methQTLs) for any T2DM-associated MVPs in order to assess the likely causal role of DNAm markers in T2DM through Mendelian randomization analyses (9).

Research Design and Methods

Cohort Descriptions

The discovery phase EWAS was undertaken in an incident T2DM case-cohort study nested within the EPIC-Norfolk study (8), a prospective cohort study that recruited 25,639 individuals aged between 40 and 79 years at baseline in 1993–1997. The cohort was representative of the general population of England and Wales for age, sex, anthropometric measures, blood pressure, and serum lipids but differed in that 99.7% of the cohort were of European descent. We defined a random subcohort of the whole EPIC-Norfolk study population excluding known prevalent case subjects of diabetes at baseline using the same definitions as used in the InterAct Project (10) who had available genotype data. Incident T2DM cases were ascertained from multiple sources: two follow-up health and lifestyle questionnaires providing self-reported information on doctor-diagnosed diabetes or medications, medications brought to the second clinical exam, and medical record linkage. Record linkage to external sources included the listing of any EPIC-Norfolk participant in the general practice diabetes register, local hospital diabetes register, hospital admissions data with screening for diabetes-related admissions, and Office of National Statistics mortality data with coding for diabetes. Participants who self-reported a history of diabetes that could not be confirmed against any other sources were not considered confirmed case subjects. Follow-up was censored at date of diagnosis of T2DM, 31 July 2006, or date of death—whichever came first. By definition in a case-cohort design, there are case subjects within and outside the random subcohort, but for the purposes of this analysis, we considered them in the incident case set only, with noncase subjects forming the comparison group. BMI and HbA1c levels were measured for each participant at baseline (Table 1). All participants in the EPIC-Norfolk study gave signed informed consent, and the study was approved by the local research ethics committee.

Baseline characteristics of participants in the EPIC-Norfolk, LOLIPOP, and FHS study samples

Confirmation of top signals from the discovery EWAS was sought in two further studies. The London Life Sciences Prospective Population (LOLIPOP) study is a prospective population study of Indian Asian (N = 17,606) and European (N = 7,766) individuals, recruited at age 35–75 years from the lists of 58 family doctors in west London, U.K., between 1 May 2002 and 12 September 2008. Indian Asians had all four grandparents born on the Indian subcontinent (India, Pakistan, Sri Lanka, or Bangladesh). The LOLIPOP study is approved by the National Research Ethics Service (07/H0712/150), and all participants gave written informed consent at enrolment. The LOLIPOP nested case-control study of incident T2DM has previously been described (6). Briefly, at follow-up, on 31 December 2013, individuals with T2DM were identified by primary care electronic health records and structured queries. Participants with incident T2DM were defined as those who did not have T2DM at baseline but who developed the disease during follow-up. Control subjects were identified from a random subset of 7,640 participants with a clinical assessment of fasting blood glucose concentration and HbA1c and questionnaire assessment between 11 January 2010 and 31 December 2013.

The Framingham Heart Study (FHS) is a community-based longitudinal study of participants living in and near Framingham, MA, at the start of the study in 1948 (11). The Offspring cohort comprised the children and spouses of the original FHS participants, as previously described (12). Briefly, enrollment for the Offspring cohort began in 1971 (N = 5,124), and in-person evaluations occurred approximately every 4–8 years thereafter. The current analysis was limited to participants from the Offspring cohort who survived until the eighth examination cycle (2005–2008) and consented to genetics research. DNAm data of peripheral blood samples collected at the eighth examination cycle were available in 2,741 participants. Prevalent T2DM was defined as having fasting glucose ≥7 mmol/L or as reporting taking T2DM medication at any examination cycle up to the eighth examination. All participants provided written informed consent at the time of each examination visit. The study protocol was approved by the institutional review board at Boston University Medical Center (Boston, MA).

Methylation Array Profiling

In all studies, DNAm intensity was measured using the Illumina Infinium Human Methylation 450K BeadChip array (12-sample array for FHS and 96-sample array for EPIC-Norfolk and LOLIPOP). Bisulfite conversion of DNA was performed using the EZ DNAm kit (Zymo Research, Orange, CA).

For 1,378 EPIC-Norfolk participants, DNAm was measured in DNA extracted from whole blood samples collected at baseline. Converted DNA was assayed by PCR and gel electrophoresis. Each 96-well DNA sample plate contained two duplicate samples. The average correlation between the duplicate samples was 98%.

In LOLIPOP, DNAm was measured among the first 1,074 Indian Asian participants with incident T2DM and 1,590 matched Indian Asian control subjects. Control subjects were matched to case subjects by age (5-year groups) and sex. DNAm was quantified in the baseline DNA samples collected at study enrollment. Samples were analyzed in random order, with masking to case-control status.

EWAS Quality Control and Normalization

In EPIC-Norfolk, epigenome-wide DNAm data were analyzed in R (version 3.2.2). Initial quality control was performed as recommended by the array manufacturer; methylation intensity values were corrected using the Illumina background correction algorithm as implemented in minfi (13), methylation intensities with a detection P value ≥ 0.01 were set to “missing,” and methylation intensity β values were calculated for each methylation marker per sample. For duplicate samples, the sample with the lower CpG detection percentage was excluded.

Sample call rates were calculated as the proportion of missing data in each sample, by autosomal, X and Y chromosomes. For the autosomal data, 77 samples with a call rate ≤ 0.99 were excluded. All samples passed the call rate threshold on the X chromosome. For the Y chromosome, seven male samples that did not pass the call rate and two further female samples were excluded. Distributions of methylation intensities were also inspected by autosomal and sex chromosomes and separately in females and males, leading to the exclusion of two additional samples that had an unusual distribution of methylation intensities. After those quality control procedures, data on 1,290 samples remained. All further downstream analyses were restricted to autosomal methylation markers.

Marker call rates were calculated as the proportion of missing data at each CpG site. 8,775 CpGs with a call rate ≤0.95 were excluded. The R package ENmix (14) was used to identify CpG sites with multimodal distributions of methylation intensity, which typically arise from technical artifacts; 3,295 such CpG sites were excluded. A further 18,874 CpG sites with probes previously identified as mapping to more than one genomic location were also excluded (15).

To ensure reliability of the data, we repeated filtering on sample and marker call rates until all samples and all markers passed their respective call rate thresholds. After exclusion of prevalent T2DM case subjects at baseline, the final data set comprised 1,264 samples (563 incident T2D case subjects, including 22 case subjects from the subcohort, and 701 noncase subjects) with methylation intensities at 442,920 autosomal CpG sites. Quantile normalization of methylation intensity β values was applied separately to the different subgroups of markers based on color channel, probe type, and methylated/unmethylated subtypes as proposed by Lehne et al. (16)

In LOLIPOP, DNAm data were analyzed in R (version 2.15) using minfi (13) and other R scripts. Marker intensities were normalized by quantile normalization as previously described (6).

In FHS, DNAm data were normalized using the Dasen methodology implemented in the wateRmelon package (17). Sample exclusion criteria included poor single nucleotide polymorphism (SNP) matching of control positions, missing rate >1%, outliers from multidimensional scaling, and sex mismatch. Probes were excluded if missing rate was >20%. Data from laboratory batches were pooled, leaving up to 2,635 samples and 443,304 CpG probes for analysis. Additional information on DNAm, normalization, and quality control is available the previously published work by Aslibekyan et al. (18). Differences in DNAm data generation, quality control, and statistical models are summarized in Supplementary Table 1.

EWAS Statistical Analyses

In EPIC-Norfolk, to identify MVPs associated with incident T2DM, we performed a logistic regression model for each methylation marker with incident T2DM status, with adjustment for age, sex, estimated cell counts, and sample plate using the EWAC pipeline. A conservative multiple test–corrected P value threshold was applied (P < 1 × 10−7). Different methylation profiles have been observed between the different cell types in whole blood (19), and blood-based profile of DNAm was shown to predict the underlying distribution of cell types (20). To correct for cell composition variability (21), we estimated first the proportions of different cell types (CD4+ and CD8+ T-cell subtypes, natural killer cells, monocytes, granulocytes, and B cells) from DNAm data using the algorithm described by Houseman et al. (22) as implemented in the R package minfi (13). These cell count estimates were then used as covariates in the epigenome-wide regression models for incident T2DM.

We used STRING (23) to perform gene set enrichment on the significant genes associated with the 18 significant MVPs identified in the EWAS. We also performed a modified version of the MAGENTA (24) pipeline to identify the pathways associated with genes at the loci of the significant MVPs. Since MAGENTA uses SNP data to identify loci, we assigned to each CpG a “nearest SNP” based on HapMap3 data and using build 36 positions for both the CpG site and the SNPs (average distance to the nearest SNP = 4,175 base pairs [bp] [interquartile range 1,375–4,859]; 1,707 of 466,039 CpGs were not assigned a SNP). In effect, rather than using a SNP P value to rank genes to assess enrichment, we used the P value from the methylation site to run MAGENTA.

For LOLIPOP, an epigenome-wide association of DNAm was performed in Indian Asians with incident T2DM who were identified from the 8-year follow-up of the study. Differential white blood cell (lymphocyte, monocyte, and granulocyte) count was available for all participants, and epigenome-wide methylation scores were used to impute a further four lymphocyte subsets (CD4, CD8, natural killer, and B cells). Principal components analysis was performed to quantify latent structure in the data, including batch effects. Associations between incident T2DM and the 18 significant MVPs identified in EPIC-Norfolk were tested using logistic regression including intensity values from Infinium HumanMethylation450 BeadChip assay control probes, bisulfite conversion batch, measured white cells, and imputed white cell subsets and the first five principal components as covariates. Association results were corrected for the genomic control inflation factor. For testing of the predictive ability of the 18 markers for incident T2DM, univariate logistic regressions were run for each of the 18 markers to obtain individual effect sizes (β values) for incident T2DM. A weighted methylation risk score was subsequently calculated from these β values, and receiver operating curve analyses were performed to provide estimates for area under the curve (AUC).

In FHS, association between each identified MVP (associated with incident T2DM in EPIC-Norfolk) was tested for association with prevalent diabetes and glycemic traits (fasting glucose, fasting insulin, and HbA1c). The analysis of glycemic traits included only individuals without diabetes. Fasting insulin was natural log transformed. Random effects statistical models were used to analyze the data to account for sibling correlation and included adjustments for age, sex, white blood cell counts, technical covariates, batch effects, and BMI, with DNAm as the dependent variable.

We also examined each T2DM-associated MVP for additional cross-sectional association with type 1 diabetes mellitus (T1DM) in an earlier EWAS of 52 monozygous twin pairs discordant for T1DM, in cell-sorted peripheral blood mononuclear cells (monocytes, B cells, or T cells) (25). As T2DM and T1DM have largely differing aetiologies, MVPs that are consistently associated with both outcomes may indicate metabolic effects of diabetes on DNAm.

Other Tissues

The relevance of changes in DNAm intensity in whole blood to other tissues was tested by analysis of genome-wide DNAm data, generated using the Illumina Infinium Human Methylation 450K BeadChip array, from human liver, adipose tissue, and skeletal muscle, as previously published (26). Human liver DNAm data were from participants of the Kuopio Obesity Surgery Study (KOBS); 35 with T2DM and 60 without (27). Data on adipose tissue (14 pairs), skeletal muscle (17 pairs), and blood (19 pairs) were from monozygotic twins discordant for T2DM (26,28,29). Adipose tissue and skeletal muscle from the same individual were available for most of these twin pairs (16 pairs in blood/muscle and 14 pairs in blood/fat); concordance in DNAm intensity across these tissues was tested for each highlighted MVP by Spearman correlation tests. We further tested cross-tissue correlations in DNAm at T2DM-associated MVPs between blood and other tissues of relevance to T2DM etiology, liver, and pancreas in publicly available Infinium HumanMethylation450 BeadChip array data from six cadavers sampled within 12 h postmortem (mean [SD] age 65.5 [7.2] years) (30).

Mendelian Randomization Analyses

We performed bidirectional Mendelian randomization analyses to test whether any T2DM-associated MVPs had a causal effect on T2DM or are a consequence of metabolic differences that had originated before the baseline measurement in this study. To predict the causal effect of each of T2DM-associated MVPs on T2DM, methQTLs associated with each MVP (FDR <0.05) in whole blood in 3,841 adults of European descent were identified using the BIOS QTL browser (31). To run Mendelian randomization analyses, we converted the Z score for each methQTL to β and SE using the following formulas (32):where N is the sample size and MAF is the minor allele frequency. We then tested these methQTLs in Mendelian randomization analyses (9) for T2DM. Genetic associations with T2DM were estimated in 69,677 case and 551,081 control subjects from the UK Biobank study (33), the EPIC-InterAct study (10) and the DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) consortium (2). A summary statistics method (inverse variance weighted) that combines all the SNPs for each MVP as a genetic instrument was used to predict the effect of that MVP on T2DM (34). To ensure that the instruments are independent, we performed clumping. The Egger regression for Mendelian randomization was also used to assess the sensitivity of the results to violations of Mendelian randomization assumptions. Mendelian randomization analyses were run using the R package TwoSampleMR (35).

For the reverse direction causal assessment, we tested SNPs with previously reported associations with T2DM (2) or related metabolic phenotypes (BMI [36], fasting glucose [37], 2-h glucose [38], fasting insulin [39], fasting insulin adjusted for BMI [37], insulin resistance [40], insulin secretion [41], and waist-hip-ratio adjusted for BMI [42]) to test whether these traits have causal effects on methylation intensity at any T2DM-associated MVP. We used summary statistics methods (inverse variance weighted and Egger tests) that combine all the SNPs for each trait as a genetic instrument to predict the effect of that trait on each T2DM-associated MVP (34) in the cohort control samples of EPIC-Norfolk (N = 613), in which genotype data were generated using the Affymetrix Axiom UK Biobank chip. All genotypes passed standard quality control criteria as specified by the Affymetrix best practice pipeline, and SNPs with MAF <5% in this sample were excluded.

Results

MVPs Associated With Incident T2DM

In the EPIC-Norfolk study, we identified 18 MVPs that are associated with incident T2DM at P < 1 × 10−7, including 15 novel signals (Table 2). None of these was reported to have a SNP on the target CpG (15). The two strongest associations were the previously reported signals at TXNIP (cg19693031; P = 2.7 × 10−21) and ABCG1 (cg06500161; P = 6.4 × 10−14) (6,7). We confirmed a third previously reported signal at SREBF1 (cg11024682; P = 6.0 × 10−10) and provide supportive evidence for an additional signal at PROC (cg09152259; P = 4.2 × 10−4) that had previously not been considered to be true due to lack of replication in Europeans (Supplementary Table 2).

We sought confirmation of the top 18 MVPs in data on 1,074 incident T2DM cases and 1,590 control samples from the LOLIPOP study and in cross-sectional data from FHS (403 with prevalent T2DM and 2,204 control subjects) (Table 3). All 18 MVPs showed directionally consistent associations with incident T2DM (14 at P < 0.05) and prevalent T2DM (16 at P < 0.05).

Novel MVPs associated with incident T2DM include cg14476101 (P = 2.8 × 10−10), located in the gene body of PHGDH, which encodes phosphoglycerate dehydrogenase, an enzyme involved in the synthesis of l-serine and other amino acids, and cg00574958 (P = 5.2 × 10−9) in the 5′ UTR (untranslated region) of CPT1A, which encodes an enzyme that initiates mitochondrial oxidation of long-chain fatty acids (Supplementary Table 11). Four of the 18 MVPs were located within solute carrier family genes (SLC1A5, SLC43A1, SLC9A1, and SLC9A3R1), which encode plasma membrane proteins that regulate cell transport of amino acids and other metabolites.

To systematically explore the biological pathways implicated by T2DM-associated methylation signals, we first tested the 18 MVPs for gene set enrichment using STRING and identified significant enrichment for three pathways: “positive regulation of cholesterol biosynthetic process” (indicated by MVPs at ABCG1, SREBF1, and POR), “carnitine metabolic process” (indicated by CPT1A and POR), and “AMPK signaling” (indicated by PFKFB3, CPT1A, and SREBF1). We then tested the full EWAS data set in a modified MAGENTA pipeline and identified significant enrichment for T2DM-associated methylation signals in 10 pathways (Supplementary Table 4), including “insulin receptor signaling,” “IGF-1 signalling,” “erythropoietin signaling,” “JAK signaling,” and “integrin signaling.”

Relevance of Whole Blood MVPs to Other Tissues

To explore the possible relevance of changes in DNAm intensity in whole blood to other tissues, relevant to T2DM pathogenesis, we examined these 18 MVPs in liver, adipose tissue, and skeletal muscle from individuals with and without T2DM. Nominal associations (P < 0.05) were found with only our 2 strongest whole blood MVP signals: cg06500161 at ABCG1 in adipose tissue (as previously published [26]) and cg19693031 at TXNIP in skeletal muscle (Table 4). Furthermore, at 12 of the 18 MVPs there was evidence for a positive correlation in DNAm intensity between whole blood and liver, pancreas, adipose tissue, or muscle (Supplementary Table 7).

Causal Effects on T2DM

To investigate the potential causal effects of the 18 T2DM-associated MVPs, we used the BIOS QTL browser (31) to identify methQTLs (genetic sequence variants) that are robustly associated (at P < 5 × 10−8) with DNAm intensity at any of the 18 MVPs. We found 54 methQTLs (33 cis, 21 trans), each associated with one of 16 MVPs (Supplementary Table 8). We then used these methQTLs as instrumental variables in Mendelian randomization analyses, based on aggregated publicly available GWAS data in 69,677 T2DM case and 551,081 control subjects (DIAGRAM [2], UK Biobank [33], and EPIC-InterAct [10]). Only one of the 16 T2DM-associated MVPs with an identified methQTL showed nominal evidence for a direct causal association with T2DM, cg00574958 at CPT1A (P = 0.01); however, for other MVPs the genetic-predicted effects overlapped with the observed effects in the LOLIPOP study (Fig. 1 and Supplementary Table 9).

We performed reverse direction causal analyses to identify causal effects of BMI and glycemic traits on methylation intensity at the 18 MVPs. Among participants without T2DM in EPIC-Norfolk (N = 613), none of the genetic instruments for the tested glycemic or metabolic traits (T2DM, BMI, fasting glucose, 2-h glucose, fasting insulin, fasting insulin adjusted for BMI, insulin resistance, insulin secretion, and waist-to-hip ratio adjusted for BMI) showed a consistent association with any of the 18 T2DM-associated MVPs (Supplementary Table 10).

Prediction of T2DM

In the LOLIPOP study sample, which was independent of the discovery EWAS, the top 18 T2DM-associated MVPs in aggregate showed no predictive ability for incident T2DM (AUC = 0.53). Furthermore, the addition of these 18 MVPs did not improve on a prediction model based on other baseline phenotypes (BMI, HbA1c, age, sex: AUC = 0.761; BMI, HbA1c, age, sex, plus 18 MVPs: AUC = 0.762).

Discussion

In this prospective study, we substantially increased the number of MVPs in whole blood that are robustly associated with incident T2DM. Associations for 17 of the 18 MVPs were confirmed with either incident or prevalent T2DM in two independent studies, which indicates the consistency of T2DM-associated whole blood DNAm intensity changes across different settings and ethnicities. Genetic causal modeling identified evidence to support a causal effect of DNAm on T2DM at one of these MVPs, cg00574958 at CPT1A.

The prospective designs of the EPIC-Norfolk and LOLIPOP studies aimed to identify MVPs that precede the development of T2DM. However, the identified T2DM-associated DNAm intensity changes were largely attenuated by adjustment for differences in BMI and glycemia that had developed prior to the baseline measurement in the prospective studies. Our Mendelian randomization analyses failed to find evidence for direct causal effects for the majority of T2DM-associated MVPs, as indicated by no detectable genetic-predicted effect of DNAm intensity on T2DM and a wide discordance between the observed and genetic-predicted effects. Conversely, overlap between EWAS signals for T2DM and T1DM was consistent with effects of glycemia on DNAm intensity for at least 7 of the 18 T2DM-associated MVPs.

Whether or not they show directly causal associations, these novel and consistent T2DM-associated MVPs are highly informative with regard to implicated genes and biological pathways. Notably, none of the genes implicated by this EWAS were previously identified by genetic variant association studies. This stark difference may suggest that T2DM-associated DNAm intensity changes may reveal novel biological mechanisms involved in tissue responses to glycemia rather than in the pathogenesis of insulin resistance or insulin secretion, which are implicated by those genetic studies. The topmost signal, cg19693031, which lies on TXNIP, is also the most significant observation in other T2DM EWAS studies (6,7). Phosphoglycerate dehydrogenase (PHGDH) catalyzes the first and rate-limiting step in glucose-derived serine synthesis and may indicate consequent purine and deoxythymidine nucleotide synthesis in response to hyperglycemia and potential tissue proliferative responses (43). Functional variation in carnitine palmitoyltransferase 1 (CPT1A) regulates the composition of circulating polyunsaturated n-3 fatty acids and docosahexaoenic acid (44) and is reported to activate lipolysis and mitochondrial activity in brown fat (45,46) and to maintain pancreatic islet secretion of the principal hyperglycemic hormone, glucagon (47). Solute carrier family members are sodium-dependent membrane transporters that regulate intracellular cell pH, cell volume, and other cellular events such as adhesion, migration, and proliferation and also contribute to systemic homeostasis of fluid volume, acid-base balance, and electrolytes. Specifically, SLC9A3R1 (NHERF1) binds to PTEN to activate the PI3 kinase signaling cascade involved in cell survival, growth, proliferation (48) and is a key component of insulin and IGF-1 signaling pathways that we found enriched for T2DM EWAS associations. These highlighted pathways could potentially contribute to the pathogenesis of micro- and macrovascular complications of hyperglycemia. PFBK3, a regulator of glycolysis and insulin signaling in mice, was recently highlighted by a SNP association with late-onset autoimmune diabetes, and we here provide independent evidence to support its role in human glucose regulation (49).

We recognize a number of limitations of our study. Both of the prospective study samples displayed large differences in baseline glycemia and BMI between incident T2DM case and noncase subjects. This nested prospective study design aimed to identify interactions between genetic factors and baseline lifestyle factors measured prior to the development of clinically diagnosed T2DM (10). Since it is impossible to develop T2DM except by passing through a phase of nondiabetic hyperglycemia, it is inevitable that people who go on to develop incident diabetes in a cohort study will have raised glucose levels at baseline if follow-up is of short or medium duration. Future studies that have samples stored many years prior to disease onset would be required to identify when in the development of diabetes the T2DM-MVP associations become apparent. Secondly, our assessments of other, nonblood, tissues were limited in the range of tissues and numbers of samples available. Despite concordant changes in DNAm intensity between whole blood and various tissues relevant to T2DM pathogenesis at 12 of the 18 T2DM-associated MVPs, nominal differences in DNAm were found only for our strongest two MVPs, which suggests that larger study samples are needed. We recognize that whole blood is not a tissue of interest to the pathogenesis of T2DM; however, current, and most likely future, large-scale EWAS are confined to such samples, and functional insights will depend on follow-up of whole blood signals in other tissues (50,51). The same issue of appropriate tissue of interest limits our genetic modeling approach, which identified genetic markers of DNAm intensity in peripheral blood. Furthermore, the sample size for this approach (N = 3,841 in BIOS QTL [31] and N = 613 in the EPIC-Norfolk cohort control group) is relatively small compared with data on QTLs for gene expression in peripheral blood (N = 8,086 in the study by Westra et al. [52]). Hence, we found only nominal evidence for a causal effect of DNAm at only 1 of the 18 T2DM-associated MVPs, at CPT1A, and for several MVPs the genetic-predicted effects were overlapping with the observed effects. Similarly, a recent large EWAS for BMI found a causal role of methylation at only one MVP (cg26663590 at NFATC2IP) (53). There are various possible conceptualizations of the functional interplay between SNP, MVP, and T2DM, which provide alternative explanations other than SNP-to-DNAm-to-T2D (54), but they do not limit the statistical detection of apparent causal signals. Future, larger reference data on QTLs for DNAm intensity in whole blood are being generated (Genetics of DNA Methylation Consortium [GoDMC]), which will allow more powerful tests for causality, although their relevance to DNAm in tissues of interest remains an important question. Finally, the determinants of the identified T2DM-associated MVPs remain unknown. Again, larger reference panels of GWAS and DNAm array data, as well as new methods to integrate findings across multiple methQTLs for each MVP, will inform future causal analyses. Future studies are needed to identify the potential lifestyle and developmental determinants of these T2DM-associated MVPs.

In conclusion, we identified several robust and consistent DNAm markers for incident T2DM. These appear to be related to T2DM via glucose and obesity-related pathways that had their effects before the collection of baseline samples in these cohort studies, which commenced in midlife. These associations indicate several plausible biological mechanisms involved in tissue responses and comorbidities of hyperglycemia.

Article Information

Acknowledgments. The authors are grateful for all of the participants and staff of the EPIC-Norfolk, LOLIPOP, and FHS cohorts. The authors thank Dr. Stephen Burgess (University of Cambridge) for advice on methQTLs and Dr. Jan Bert van Klinken (Leiden University) for advice on the BIOS QTL data as well as Stephen Sharp and Dr. Jian’an Luan (University of Cambridge) for advice on statistical analyses and Ylva Wessman, Per-Anders Jansson, and Emma Nilsson (Lund University Diabetes Center) for help with the study on twin pairs discordant for T2DM.

Funding. EPIC-Norfolk is supported by program grants from the Medical Research Council (MRC) [G9502233, G9502233, and G9502233] and Cancer Research UK [C864/A8257]. The generation and management of the Illumina Infinium Human Methylation 450K BeadChip array data in this cohort are supported through the MRC Cambridge initiative in metabolomic science [MR/L00002/1]. The genome-wide genotyping data in EPIC-Norfolk was funded by MRC award MC_PC_13048. This work is also supported by MRC program grants MC_UU_12015/1, MC_UU_12015/2, and MC_UU_12015/5. The LOLIPOP study is supported by the National Institute for Health Research (NIHR) Comprehensive Biomedical Research Centre Imperial College Healthcare National Health Service (NHS) Trust, the British Heart Foundation (SP/04/002), the MRC (G0601966 and G0700931), the Wellcome Trust (084723/Z/08/Z, 090532, and 098381), the NIHR (RP-PG-0407-10371), the NIHR Official Development Assistance (award 16/136/68), and the European Union Seventh Framework Programme (FP7) (EpiMigrant, 279143) and Horizon 2020 Framework Programme (iHealth-T2D, 643774). We acknowledge support of the MRC-PHE Centre for Environment and Health and the NIHR Health Protection Research Unit in Health Impact of Environmental Hazards. The work was carried out in part at the NIHR/Wellcome Trust Imperial Clinical Research Facility. J.C.C. is supported by the Singapore Ministry of Health’s National Medical Research Council under its Singapore Translational Research Investigator (STaR) Award (NMRC/STaR/0028/2017). The FHS is supported by grants N01-HC-25195 and HHSN268201500001I. The laboratory work for this investigation was funded by the Division of Intramural Research, National Heart, Lung, and Blood Institute, and by the National Institutes of Health (NIH) Director’s Challenge Award (principal investigator: D.L.). The analytical component of this project was funded by the Division of Intramural Research, National Heart, Lung, and Blood Institute, and the Center for Information Technology, NIH. J.B.M. is supported by National Institute of Diabetes and Digestive and Kidney Diseases grants U01 DK078616 and K24 DK080140. Data on T1DM-discordant twin pairs arose from studies funded by the EU FP7 project BLUEPRINT (282510). The Cardiovascular Epidemiology Unit at the University of Cambridge is supported by the U.K. MRC (MR/L003120/1), British Heart Foundation (RG/13/13/30194), and NIHR (Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust). Data from human tissues are from studies supported by grants from the Novo Nordisk foundation; Swedish Research Council, Region Skåne (ALF); Euoropean Foundation for the Study of Diabetes; EXODIAB; Swedish Foundation for Strategic Research (IRC15-0067); Swedish Diabetes Foundation; and Albert Påhlsson Foundation.

The views expressed are those of the authors and do not necessarily represent the views of the NHS, the NIHR, the Department of Health and Social Care, the National Heart, Lung, and Blood Institute, the NIH, or the U.S. Department of Health and Human Services. The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the manuscript.

Duality of Interest. A.Y.C. is currently employed by Merck Research Laboratories. No other potential conflicts of interest relevant to this article were reported.

Author Contributions. A.C., F.R.D., J.R.B.P., M.L., A.Y.C., B.L., D.S.P., I.D.S., and C. Liu performed data analyses. A.C., F.R.D., and K.K.O. drafted the manuscript. A.C. constructed the figure. A.C., F.R.D., J.R.B.P., N.J.W., and K.K.O. had full access to all of the data in the study. A.C. and K.K.O. had final responsibility for the decision to submit for publication. L.A.L., N.D.K., R.A.S., K.-T.K., N.G.F., C.La., M.M.M., D.L., S.B., R.D.L., J.D., J.B.M., J.S.K., M.-F.H., J.C.C., N.J.W., and K.K.O. contributed to the data collection. K.-T.K., N.G.F., J.D., J.B.M., J.S.K., C.Lin., M.-F.H., J.C.C., N.J.W., and K.K.O. contributed to the study design. All authors contributed to data interpretation and revisions of the manuscript. A.C. and K.K.O. are the guarantors of this work and, as such, had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

. Design and cohort description of the InterAct Project: an examination of the interaction of genetic and lifestyle factors on the incidence of type 2 diabetes in the EPIC Study. Diabetologia2011;54:2272–2282