Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, David Geffen School of Medicine, Los Angeles, California, United States of America.

2

Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, South Korea.

3

Department of Biostatistics, University of California Los Angeles School of Public Health, University of California Los Angeles, Los Angeles, California, United States of America.

Abstract

Long interspersed element-1 (LINE-1 or L1) retrotransposition induces insertional mutations that can result in diseases. It was recently shown that the copy number of L1 and other retroelements is stable in induced pluripotent stem cells (iPSCs). However, by using an engineered reporter construct over-expressing L1, another study suggests that reprogramming activates L1 mobility in iPSCs. Given the potential of human iPSCs in therapeutic applications, it is important to clarify whether these cells harbor somatic insertions resulting from endogenous L1 retrotransposition. Here, we verified L1 expression during and after reprogramming as well as potential somatic insertions driven by the most active human endogenous L1 subfamily (L1Hs). Our results indicate that L1 over-expression is initiated during the reprogramming process and is subsequently sustained in isolated clones. To detect potential somatic insertions in iPSCs caused by L1Hs retotransposition, we used a novel sequencing strategy. As opposed to conventional sequencing direction, we sequenced from the 3' end of L1Hs to the genomic DNA, thus enabling the direct detection of the polyA tail signature of retrotransposition for verification of true insertions. Deep coverage sequencing thus allowed us to detect seven potential somatic insertions with low read counts from two iPSC clones. Negative PCR amplification in parental cells, presence of a polyA tail and absence from seven L1 germline insertion databases highly suggested true somatic insertions in iPSCs. Furthermore, these insertions could not be detected in iPSCs by PCR, likely due to low abundance. We conclude that L1Hs retrotransposes at low levels in iPSCs and therefore warrants careful analyses for genotoxic effects.

L1 expression was evaluated by quantitative real-time RT-PCR on total RNA extracted from iPSC clones derived from (A) NHDF1 (B) HFF (C) IMR90 cell line. To evaluate the respective basal level of L1 expression, total RNA extracts from the respective parental cells were subjected to real-time PCR. Real-time RT-PCR results were normalized with respect to GAPDH content. Fold increase of L1 expression was then calculated with respect to the result obtained from the parental cells. Results are shown as average ± standard deviation. RNA extracts from the H1 human embryonic stem cell line was used a positive control. Asterisks denote statistical significant increase in L1 expression when compared to the reference parental cells as assessed by the Wilcoxon rank sum test (p<0.05).

L1 up-regulation is observed during the reprogramming process and is independent of the transducing vector.

HFF were transduced with either the FRh11 lentiviral vector or the pMX murine γ-retroviral vector encoding OCT4, C-MYC, SOX2 and KLF4. Three days post-transduction, the cells were then seeded onto a feeder layer of iMEFs and cultured under hESC conditions. Total cells were collected at 8, 14, 21 and 28 days post-seeding and iMEFs were removed by positive selection. Total RNA extracts were obtained from the remaining human cells which were then subjected to quantitative real-time RT-PCR to assess L1 expression. Total RNA extracts obtained from H1-hESC and iMEFs were used as positive and negative controls, respectively. Quantitative real-time RT-PCR results were normalized with respect to GAPDH content. Fold increase of L1 expression was then calculated with respect to the results of HFF. Results are shown as average ± standard deviation. Asterisks denote statistical significant increase in L1 expression when compared to the reference parental cells as assessed by the Wilcoxon rank sum test (p<0.05).

Schematic of PCR strategy for template preparation for 454 sequencing of L1Hs family members (adapted from Ewing et al) .

L1Hs libraries were prepared as previously described, except that the 454 primers A and B were used instead of Illumina adapters and that high throughput sequencing was performed by using the primer A instead of the primer B, thus allowing the detection of the polyA (pA) sequence followed by the sequence of the new locus of insertion. The sequences were then processed for mapping on the genome to detect reference as well as non-reference L1Hs insertions. L1Hs reference insertion sequences would match the reference genome from their 3′UTR sequence to the end of their flanking sequence in one location only while non-reference insertion sequences will have their 3′UTR sequence and flanking sequence match the genome on two distinct locations.

General PCR strategy to verify non-reference germline and somatic L1Hs insertions is shown. (a) DNA fragment are amplified with the primer AC5931 located in L1Hs and the reverse primer Z located near the new locus of insertion. To confirm our results, some of these fragments were subjected to nested-PCR by using the internal primers G6015 and NR. Primers PF were used to verify amplification of empty sites. (b) Typical results of L1Hs confirmed in HFF by the AC5931 and Z primers are shown. The arrow shows the 500 bp band of the 100 bp ladder (M). Unnecessary lanes were removed.