Abstract

Mammalian somatic cells can be directly reprogrammed into induced pluripotent stem cells (iPSCs) by introducing defined sets of transcription factors. Somatic cell reprogramming involves epigenomic reconfiguration, conferring iPSCs with characteristics similar to embryonic stem cells (ESCs). Human ES cells contain 5-hydroxymethylcytosine (5hmC), which is generated through the oxidation of 5-methylcytosine by the TET enzyme family. Here we show that 5hmC levels increase significantly during reprogramming to human iPSCs mainly due to TET1 activation, and this hydroxymethylation change is critical for optimal epigenetic reprogramming, but does not compromise primed pluripotency. Compared with hES cells, we find iPS cells tend to form large-scale (100 kb-1.3 Mb) aberrant reprogramming hotspots in subtelomeric regions, most of which display incomplete hydroxymethylation on CG sites. Strikingly, these 5hmC aberrant hotspots largely coincide (~80%) with aberrant iPS-ES non-CG methylation regions. Our results suggest that TET1-mediated 5hmC modification could contribute the epigenetic variation of iPSCs and iPSC-hESC differences.

Pluripotency is defined as a stem cell state with the potential to differentiate into any of the three germ layers. Somatic cells can be reprogrammed to a pluripotent state by defined factors such as OCT4, SOX2, KLF4, c-MYC, NANOG and LIN281-3. These iPSCs are extremely similar to ESCs. During the reprogramming process, the global epigenetic landscape in somatic cells has to be reset to reach a pluripotent state via DNA methylation/demethylation and chromatin remodelling processes.

Besides 5-methylcytosine (5mC), which is known to display dynamic changes during early embryonic and germ cell development as well as the reprogramming process, the mammalian genome also contains 5hmC, which is generated by oxidation of 5mC by the TET family of enzymes4, 5. The Tet proteins function in ESCs regulation, myelopoiesis and zygote development6-10. 5hmC was found to be widespread in many tissues and cell types at different levels11, 12. Particularly, 5hmC is abundant in the central nervous system and ESCs. Several reports have explored the genome-wide distribution of 5hmC modification in mES cells and hES cells, and suggest that it is enriched in gene bodies and enhancers13, 14.

Multiple studies suggest there are subtle yet substantial genetic and epigenetic differences between iPS cells and hES cells16, 17. The current consensus is that iPS cells and ES cells are two overlapping classes of heterogeneous cells, with iPS cells being more variable than hES cells18. Although iPS cells and hES cells are functionally equivalent in general, the subtle genetic and epigenetic differences could lead to functional consequences among individual lines. Previous study of the base-resolution methylomes of iPSCs and ESCs identified differentially methylated regions (DMRs) between iPSCs and ESCs, consisting of CG-DMRs and non-CG-DMRs16, 17. However, the traditional bisulfite sequencing technique they used could not distinguish 5mC from 5hmC19, which means how these DMRs are caused by hydroxymethylation differences remains unknown.

Here we show that 5hmC levels increase significantly during reprogramming to human iPSCs mainly due to TET1 activation, and this hydroxymethylation change is critical for optimal epigenetic reprogramming. We found that during reprogramming extensive genome-wide 5hmC modification occurs. Importantly, we identified specific aberrant reprogramming hotspots in iPS cells, which cluster on a large-scale (100kb-1.3Mb) at subtelomeric regions bearing incomplete CG hydroxymethylation. These hotspots largely overlap with aberrant non-CG methylation hotspots, suggesting hydroxymethylation contributes to the epigenetic difference between iPS cells and hES cells.

RESULTS

TET1-mediated hydroxymethylation plays a critical role during reprogramming to pluripotency in human cells

DNA methylation is a major barrier to iPS cell reprogramming. Several lines of evidence suggest that 5hmC is involved in the process of DNA demethylation20, 21. We found a significant increase of 5hmC level in human iPS cells compared to their original fibroblasts, with the amount in iPSCs being similar to hES cells (Fig. 1a).

TET1 is associated with increased hydroxymethylation during human iPSC reprogramming

TET family proteins (TET1, TET2 and TET3) could convert 5mC to 5hmC6. We found a statistically significant increase of TET1 and TET3; with a more dramatically increase of TET1, and a slight decrease of TET2 expression (Fig. 1b). RNA-seq reveals that TET1 is at a comparable level to NANOG in pluripotent cells, but the expression of TET2 and TET3 are significantly lower (Fig. 1c). Depletion of TET1 but not TET2 and TET3 by siRNA could significantly decrease total 5hmC levels in human iPS cells (Fig. 1d and Supplementary Fig. S1a,b). Therefore, we conclude that TET1 is the main TET protein regulating hydroxymethylation during human iPS cells reprogramming.

Reprogramming confers a 5hmC epigenome in a pattern with a bias towards telomere proximal regions in autosomes

Based on a negative binomial model for testing differential expression of sequencing data22, we found 267,664 regions in the genome showing differential 5-hydroxymethylation modification between iPS cells and fibroblast (false discovery rate (FDR): 0.01), which denoted as differential 5-hydroxymethylated regions (DhMRs). Among them, 231,866 are hyperDhMRs (5hmC level is higher in iPS cells), and 35,798 are hypoDhMRs (5hmC level is lower in iPS cells) (Fig. 2b). The hyperDhMRs show higher gain of 5hmC than the loss of 5hmC observed at hypoDhMRs (Fig. 2c). The hyperDhMRs are distributed across all autosomes, but largely missing in sex chromosomes (Fig. 2d). Particularly, of the top 20000 hyperDhMRs (ranked by adjusted p-values), they have a higher probability (p<0.0001) of being located in the telomere proximal regions (Fig. 2e), as shown by example of Chromosome 1 and Chromosome X (Fig. 2f).

The analysis described above suggests a global hydroxymethylation change during reprogramming. 5hmC has been suggested linked with gene expression in ES cells and neurons13, 14, 23-26. To assess the correlation between 5hmC modifications and gene expression changes during reprogramming, we stratified genes into 9 categories based on gene expression changes between iPS cells and fibroblasts (category 1: high expression in iPS cells, low expression in fibroblast; category 2: medium expression in iPS cells, low expression in fibroblast, etc). We then quantified the amount of 5hmC around transcription start site (TSS). As a result, those 9 categories can be clustered into 3 distinct patterns (Fig. 3a). Of note, most expressed genes during reprogramming show a bimodal distribution with a depletion of 5hmC in TSS sites, whereas genes remain silenced after reprogramming show a peak in TSS sites. Among 3 clusters, cluster1 has the lowest 5hmC levels in TSS; cluster 3 has the highest levels of 5hmC in TSS, but has lowest 5hmC levels in gene bodies (Fig. 3b).

5hmC is associated with gene activity and pluripotency regulatory networks in stem cells

We then examined the correlation between absolute amount of transcripts and 5hmC enrichment. We noticed that hyperDhMRs tend to form bimodal distribution associated with gene activity in iPS cells, with the lowest level similar to the level in fibroblast in TSS regions (Fig. 3c and Supplementary Fig. S2). TES regions also show a bimodal distribution, the depletion is more dramatic in a narrower region centred on TES (Supplementary Fig. S2). However, compared with hypoDhMRs, hyperDhMRs are more enriched in TSS, exons and TES (Supplementary Fig. S3a). We observed a significant negative correlation between 5hmC level of TSS surrounding regions (±200bp) and gene expression levels in iPS cells (Supplementary Fig. S3b).

We also observe bidirectional correlation between 5hmC level and DNA methylation during reprogramming process. 80% of the partially methylated domains (PMD), which displays lower levels of CG methylation in somatic cells than stem cells27, have increased 5hmC levels, with the rest have no 5hmC level change (Fig. 3d). Interestingly, we also found around 60% stem cells hypoDMRs (lower CG methylation in stem cells) shows increased 5hmC modification (Fig. 3b). Collectively, our results suggest that increased hydroxymethylation not only occur in loci with increased methylation but also loci with decreased methylation during reprogramming.

Based on the results of bimodal distribution of 5hmC in TSS and TES, we then determined whether this distribution is associated with core pluripotency regulatory networks. We found that pluripotent master regulators, such as OCT3/4 and NANOG, bear this typical modification in iPSCs but not in fibroblasts (Fig. 3e). We further tested the relation of 5hmC and key pluripotency factors binding sites27. We found a more than 8-fold higher than expected overlap between 5hmC-enriched regions and OCT4, KLF4 binding sties, with a weak association with NANOG and SOX2 binding sites (Fig. 3f). Our results suggest that OCT4 and KLF4 regulatory networks may require 5hmC to regulate pluripotency during reprogramming. Furthermore, gene ontology analysis shows that genes acquiring most 5hmC are involved in stem cell differentiation and patterning process (Fig. 3g), suggesting 5hmC in stem cells are highly correlated with pluripotency.

Sequence preferences of 5hmC modification during reprogramming

We compared the CG, CH (CA, CT, CC), CHG preference of hyperDhMRs and hypoDhMRs. HyperDhMRs tend to be located at higher C and G enriched regions, as well as CHG and CH enriched regions, whereas hypoDhMRs have the same level as the genome background (Fig. 3h). Previous observations suggest that 5hmC modification is related to CpG-density24, 28. We find that in iPSCs, the low CpG content group of CpG islands tend to have more 5hmC modifications (Supplementary Fig. S3c), which is consistent with the observation that DNA methylation occurs more frequently in CpG islands with low CpG content29. Furthermore, 5hmC modifications acquired during reprogramming tend to occur within the unique sequence in which the methylation is evolutionarily less conserved30(Supplementary Fig. S3d-f).

Reprogramming of somatic cells to a pluripotent state requires complete reversion of the somatic epigenome into the pluripotent epigenome, which is an ES-like-state. iPSCs retain some type of somatic memory from their previous identity31-33. We further determined the genome-wide 5hmC modification differences between iPS and ES cells, aiming to understand whether 5hmC modifications underlie the differences between hES cells and iPS cells. To reduce the biases of tissue origins, we used 9 iPS cells derived from different origins, 6 of which are from fibroblasts as mentioned earlier, 2 are derived from peripheral blood cells, and 1 is derived from human exfoliated deciduous teeth cells (SHED).

In general, global DNA hydroxymethylation patterns are very similar between iPS and ES cells (Fig. 4a). A comprehensive analysis of 372,423 5hmC-enriched regions between 4 hES cell and 9 iPS cell lines led to the identification of 113 iPS-ES-DhMRs that were differentially hydroxymethylated in at least one iPS cell or ES cell line (FDR<0.01), as shown for the SIGLEC6 and SIGLEC 12 locus in Fig. 5a. Surprisingly, these regions are not randomly located across the genome; instead, they tend to cluster at the telomere-proximal regions, in particular, at chromosome 3, 7, 8, 12, and 20 (Fig. 4b).

In contrast to the symmetric pattern of DMRs between iPS and ES cells17, 105 of the 113 iPS-ES DhMRs are hypo-hydroxymethylated, with 5hmC levels similar to their respective progenitors blood cells or fibroblast (Fig. 4c,d). Of these DhMRs, the 5hmC patterns are more variable compared with hES cells (Fig. 4d). Unsupervised hierarchical clustering using the top 1,000 most variable 5hmC modified regions among all samples could not distinguish hESCs from hiPSCs, suggesting that the variability among iPSCs is not due to different levels of pluripotency, and the 5hmC deviation of iPSCs is not a key determinant to distinguish hESCs from iPSCs (Fig. 4e).

Copy number variation (CNV) has been reported to contribute to the variations of iPSCs34,35. Since DhMRs cluster at subtelomeric regions and shows depletion of hydroxymethylation, we further examined whether the DhMRs were simply due to genetic variation, such as CNV, instead of real aberrant 5hmC epigenetic modification. To this end we used high-density comparative genomic hybridization (aCGH) array to examine 3 iPSCs and 2 human ESCs. Array CGH yields an average of 70 CNVs on autosomes, none of which is overlapping with the iPS-ES-DhMRs we identified (Supplementary Fig. S4). Therefore, iPS-ES-DhMRs are caused by aberrant epigenetic modification.

Concordance of large-scale 5hmC hotspots and iPS-ES non-CG DMRs

Our results suggest that iPS-ES-DhMRs tend to cluster at telomere proximal regions, forming aberrant reprogramming hotspots. To better define these large-scale regions, we developed a statistical method to identify potential large-scale aberrant reprogramming hotspots. An aberrant reprogramming hotspot is defined as a genomic region satisfying the following conditions: (1) large variability of 5hmC levels among iPS cells, (2) the average 5hmC difference between iPSCs and ESCs is statistically significant, and (3) longer than100kb. 20 large scale regions were identified. Among them, 19 are hypoDhMRs, all of which have the same epigenetic status as their parent cells, pointing to a “somatic memory” during reprogramming, and 1 is hyperDhMRs (Table 1).

We then compared DhMRs with the DMRs identified previously using whole-genome single base bisulfite sequencing, which would not be able to distinguish 5mC from 5hmC17. Of the total 113 DhMRs, only 5 overlap with 1,175 CG-DMRs (Fig. 5b). Surprisingly, out of the 19 hypo large-scale hotspots, 84.2% overlap with the 24 mega-scale hypo-non-CG-DMRs, whereas the expected percentage is 1.6% based on permutation (Fig. 5c). Fig. 5d shows one of these regions, chr10: 132010002-133270002, 5-mCH are depleted in iPS cells but not hESC lines; similarly, of the 9 total iPS cells, only iPS-S1 and iPS-S2 derived from blood bear similar levels of 5hmC compared with hESC counterparts. Of note, the variances from iPS cells are significantly larger than ES cells (Fig. 6a and Supplementary Fig. S5a, b). None of the iPS cell lines has all of the 19 hypo large-scale DhMRs restored the same level as the 4 human ES cell lines (Fig. 6b). This indicates that these large-scale regions tend to form aberrant reprogramming hotspots that were resistant to reprogramming. We did not observe a statistically significant (p=0.54) correlation between passage number of iPSCs and the number of aberrant hotspots (Supplementary Fig. S5c), implying that passage number may not be a key determinant of hotspots number in each iPSC line.

The aberrant 5hmC reprogramming hotspots we identified may also explain the transcription level variability in iPSCs. Notably, some of the genes such as TCERG1L and FAM19A (Table 1), have been reported to be expressed at a significantly lower level in many but not all iPSCs as compared to ES cells36, 37.

The observed extremely high concordance between hypo large-scale DhMRs and non-CG-DMRs is surprising, and might indicate that of the previously identified aberrant 5mCH hotspot regions, a significant portion of CH consists of 5hmC; alternatively, these regions could contain both non-CG (mC) and CG (hmC) aberrant modification. The majority of 5hmC in ESCs is found at CG sites38. In addition, 5hmC quantification by Tet-Asisted-Bisulfite sequencing (TAB-Seq) and the chemical capture approach is well correlated both genome-widely and within the 20 large-scale hotspots regions (Supplementary Fig. S6a,b). Therefore, it is very likely that the aberrant 5hmC is caused by CG modification.

To test this possibility experimentally, we applied TAB-Seq, which can detect hydroxymethylation status at base resolution, to 2 hESCs and 4 iPS cell lines. We performed base-resolution analysis of 5hmC in 3 randomly chosen large-scale regions, chr10, chr18, chr22, and amplified 5hmC enriched regions by PCR (Fig. 7a and Supplementary Table S6,7). We then subjected them to deep sequencing. Deep sequencing of PCR amplicons after traditional bisulfite conversion confirmed that there is epigenetic variation in non-CG sites but not CG sites (Fig. 7b,d). Consistent with the results obtained by capture method, we saw the similar 5hmC variations in iPS cells (Fig. 7c and Supplementary Fig. S6c,d). Importantly, this incomplete hydroxymethylation is caused by CG modification, but not CH modification (Fig. 7c and Supplementary Fig. S6c,d). For example, in the Chr10 hotspot, iPS-B22 and B23 show incomplete 5hmC in CG dinucleotides, but not in CH dinucleotides (Fig. 7e). Therefore, our results suggest the coexistence of aberrant non-CG methylation and CG aberrant hydroxymethylation in subtelomeric hotspots (Fig. 7f). The concordance of aberrant CG hydroxymethylation with those aberrant CH large-scale regions suggests there might be crosstalk between epigenetic pathway regulates hydroxymethylation and pathway regulates CH methylation; this crosstalk may behave more stochastically in those subtelomeric regions.

DISCUSSION

Our study suggests that the significant increase of 5hmC during reprogramming is mainly due to the activation of TET1 protein in human iPS cells, which is in contrast to the previous observations that both Tet1 and Tet2 are upregulated in mouse iPS cells. Mouse ESCs are different from human ESCs in many aspects, such as X-chromosome inactivation status in female lines39. From a cell signaling perspective, human pluripotency (primed pluripotency) depends mainly on FGF and Activin-Nodal signaling pathways, whereas mouse pluripotency (naïve/ground state pluripotency) is maintained by LIF-STAT pathways. The difference between human and mouse TET family proteins involved in reprogramming may be caused by FGF signaling selection of a subpopulation of hiPSCs. Several studies of generating naïve human iPSCs under LIF signaling have been reported40, 41. So it is possible that TET1 and TET2 have distinct roles in regulating pluripotency, with TET2 being involved in naïve pluripotency and TET1 functioning in primed pluripotency. On the other hand, it is possible that TET1-mediated 5hmC modification is unique in human regardless of different pluripotent stages. Since TET1/2 is dispensable for maintaining stem cells pluripotency, and their loss are compatible with embryonic and postnatal development42, it is likely that TET2 expression is not under positive section for stem cell functions during evolution, thus eventually silenced in human pluripotent stages.

Reprogramming induces a remarkable epigenomic reconfiguration throughout the somatic cell genome. Recently, it was shown that TET1 and TET2, in synergy with NANOG, enhance the efficiency of mouse iPS cells reprogramming43. Here we show TET1-mediated hydroxymethylation change is critical for optimal human iPS cells reprogramming. We further show that TET1-mediated-5hmC modification only affects reprogramming efficiency, but does not alter the essential pluripotency in human stem cells. The pathways involving TET1 regulation largely remain unknown. It would be interesting to know whether the known epigenetic factors such as DOT1L, Kdm2b, etc 44, 45 which are negative and positive modulators for reprogramming are linked to TET1-regulated hydroxymethylation modification.

Human iPS cells hold great promise for regenerative medicine and for establishing models of specific diseases. iPS and ES cells are known to share key features of pluripotency, including the expression of pluripotency markers, teratoma formation, cell morphology, the ability to differentiate into germ layers, and tetraploid complementation46. Two models depict the equivalence, or lack thereof, between iPSCs and ESCs. One model posits there may be small but consistent differences between ESCs and iPSCs, as suggested before36, 47; the other model states that iPSCs and ESCs should be treated as two partially overlapping groups that share unique features. In this second model, single iPS cell lines cannot be distinguished from ES cell lines, though iPSCs shows more epigenetic variance. Mounting evidence supports the latter model16, 17, 32. Therefore, each iPSC may represent a unique epigenetic status with variable differentiation potential. The cause and degree of variation remain to be determined. Our study integrates the 5hmC epigenomic mark into the investigation of ES-iPS equivalence. We find that 5hmC occurs extensively in iPS cells at levels similar to ES cells, and there are no consistent 5hmC markers that can distinguish iPSCs from hESCs; however, we identified 20 regions in iPSCs that tend to form large scale (100kb-1.3Mb) aberrant reprogramming hotspots, supporting the current consensus that iPSCs are more epigenetically variable than ESCs. Remarkably, these regions with 5hmC variations tend to cluster in telomere-proximal regions. The close proximity of the hotspots to telomeres indicates there may be a distinct cellular process that could impede the reprogramming process.

Almost none of the DhMRs overlap with CG-DMRs, suggesting CG-DMRs identified previously are primarily caused by DNA methylation. DNA methylation in non-CG contexts is abundant in pluripotent stem cells (mCHG and mCHH, where H = A, C or T), comprising almost 25% of all cytosines at which DNA methylation is identified. Strikingly, ~80% of large-scale iPS-ES DhMR regions coincide with previously reported non-CG DNA methylation aberrant hotspots17. Reciprocally, ~50% of non-CG DMRs overlaps with our identified DhMRs. It was reported that non-CG DMRs also occur in the peri-centromeric zones. Notably, these peri-centromeric regions contain low level of 5hmC (stem cells have similar levels of 5hmC as fibroblasts), suggesting cells do not need to establish 5hmC in these regions during reprogramming (Supplementary Fig. S7). Thus, the concordance occurs mainly at telomere proximal regions. By applying TAB-Seq, we show that incomplete hydroxymethylation occur predominantly at CG sites, but not CH sites, suggesting the co-existence of aberrant non-CG methylation and aberrant CG hydroxymethylation in these regions. During reprogramming, both CH methylation and hydroxymethylation need to be established de novo from the somatic epigenome. It is known that non-CG cytosine methylation is exclusively catalysed by Dnmt3a and Dnmt3b48. The concordance suggests there might be crosstalk between epigenetics pathways that regulate the activities of TET and DNMT3, which may behave more stochastically in those subtelomeric regions.

In summary, our results indicate that TET1-mediated 5hmC modification contributes to both the human iPS cell reprogramming process and differences between iPSCs and hESCs. In particular, we identified 20 large-scale aberrant hotspots, suggesting iPSCs are more epigenetically variable than ESCs in terms of 5hmC modification. Our data suggest that, when studying aberrant epigenetic reprogramming events, as well as their functional consequences, at the DNA level, 5hmC modification merits particular consideration, in addition to 5mC.

METHODS

Methods and any associated references are available in the online version of this paper.

12

13

14

15

16

2

ACKNOWLEDGEMENTS

We thank Joshua Suhl, Michael Santoro, Steve Bray and Cheryl Strauss for critical reading of the manuscript. We thank Xinping Huang from the Viral Vector Core of the Emory Neuroscience NINDS Core Facilities for preparing the retrovirus/lentivirus used in this study. We are grateful to Julie Mowrey, Viren Patel, Craig Street and Sandeep Namburi for support on Illumina Hiseq2000/Miseq sequencing. This study was supported in part by the National Institutes of Health (NS079625 and HD073162 to P.J.; MH089606 and HD24064 to S.T.W.), the Emory Genetics Discovery Fund, and the Autism Speaks grant (#7660 to X.L.).

Footnotes

AUTHOR CONTRIBUTIONS

T.W., S.T.W. and P.J. designed the study and interpreted the results. T.W. and H.W. analyzed the data. T.W. performed the majority of experiments; Y.L., L.L., X.L. performed 5hmC capture and parts of library preparation. M.Y. C.X.S, H.G. and C.H. assisted with the TAB-Seq experiment and 5hmC capture experiment. A.D. and K.E.S. contributed to the Illumina sequencing, I.G. and K.R. contributed array CGH experiments. I.C., S.C., J.H., M.K., Y.Y., and Q.C. provide some of the hESC and hiPSC lines. T.W., S.T.W. and P.J. wrote the paper with assistance from H.W.

35. Laurent LC, et al. Dynamic changes in the copy number of pluripotency and cell proliferation genes in human ESCs and iPSCs during reprogramming and time in culture. Cell stem cell. 2011;8:106–118.[PMC free article][PubMed]

40. Wang W, et al. Rapid and efficient reprogramming of somatic cells to induced pluripotent stem cells by retinoic acid receptor gamma and liver receptor homolog 1. Proceedings of the National Academy of Sciences of the United States of America. 2011;108:18283–18288.[PMC free article][PubMed]