Department of Respiratory Medicine, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.Division for Health Service Promotion, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan.

Department of Respiratory Medicine, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.Department of Clinical Laboratory, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.

Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan.Telethon Kids Institute, the University of Western Australia, Perth, Western Australia, Australia.

Department of Biochemistry, Nihon University School of Dentistry, Chiyoda-ku, Tokyo, Japan.Division of Functional Morphology Dental Research Center Nihon University School of Dentistry, Chiyoda-ku, Tokyo, Japan.

Department of Respiratory Medicine, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.Division for Health Service Promotion, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.

Visual Overview

Abstract

Lung cancer is the leading cause of cancer-related deaths worldwide. The majority of cancer driver mutations have been identified; however, relevant epigenetic regulation involved in tumorigenesis has only been fragmentarily analyzed. Epigenetically regulated genes have a great theranostic potential, especially in tumors with no apparent driver mutations. Here, epigenetically regulated genes were identified in lung cancer by an integrative analysis of promoter-level expression profiles from Cap Analysis of Gene Expression (CAGE) of 16 non–small cell lung cancer (NSCLC) cell lines and 16 normal lung primary cell specimens with DNA methylation data of 69 NSCLC cell lines and 6 normal lung epithelial cells. A core set of 49 coding genes and 10 long noncoding RNAs (lncRNA), which are upregulated in NSCLC cell lines due to promoter hypomethylation, was uncovered. Twenty-two epigenetically regulated genes were validated (upregulated genes with hypomethylated promoters) in the adenocarcinoma and squamous cell cancer subtypes of lung cancer using The Cancer Genome Atlas data. Furthermore, it was demonstrated that multiple copies of the REP522 DNA repeat family are prominently upregulated due to hypomethylation in NSCLC cell lines, which leads to cancer-specific expression of lncRNAs, such as RP1-90G24.10, AL022344.4, and PCAT7. Finally, Myeloma Overexpressed (MYEOV) was identified as the most promising candidate. Functional studies demonstrated that MYEOV promotes cell proliferation, survival, and invasion. Moreover, high MYEOV expression levels were associated with poor prognosis.

Implications: This report identifies a robust list of 22 candidate driver genes that are epigenetically regulated in lung cancer; such genes may complement the known mutational drivers.

Introduction

Lung cancer is the leading cause of cancer-related deaths worldwide, with non–small cell lung cancer (NSCLC) representing the majority of cases (1). In recent years, the genomic landscape of NSCLC has been extensively investigated (2–4). However, for the majority of NSCLC patients, treatment options are still limited, and the overall prognosis remains poor. Thus, new robust biomarkers and therapeutic targets are required to guide and aim the treatment. Beyond the mutation profiles, epigenetic changes and deregulation of gene expression are decisive for cancer pathophysiology (5). Vogelstein and colleagues introduced a term “epigenetic driver” for genes that are altered by epigenetic mechanisms and confer a selective growth advantage by differential gene expression (6).

Although tens of mutational drivers are known, epigenetic drivers represent a largely unknown territory with a great potential for finding theranostic targets (6, 7). Optimal transcriptional biomarkers should be stably upregulated. As DNA methylation is involved in stable and long-term regulation, we hypothesize that differentially expressed genes that are caused by aberrant DNA methylation are optimal candidates for biomarkers. Differential expression driven by other factors, for example, aberrant signaling or broken regulatory networks may be variable over time.

Here, we apply a novel approach to deeply integrate transcriptomics and epigenetics data to find robust biomarkers that are regulated at epigenetic level (epi-markers/drivers), including novel long noncoding RNAs (lncRNA) and transcribed repetitive elements. We integrate the data from Functional ANnoTation Of Mammalian genome 5 (FANTOM5) and The Cancer Genome Atlas (TCGA) consortia, together with other available datasets, to find the optimal set of epigenetically regulated genes in lung cancer. Such genes can then be used as biomarkers, and we expect a subset of them to be epi-drivers.

As DNA methylation can have a different function depending on where on the gene it occurs (8), the optimal data integration has to be performed at promoter resolution. To do that, we utilized the unique Cap Analysis of Gene Expression (CAGE) dataset from the FANTOM5 consortium. CAGE is a 5′ sequence tag technology that globally determines transcription start sites (TSS) in the genome and their expression levels (9, 10). FANTOM5 dataset contains CAGE profiles of 16 NSCLC cell lines and 16 normal lung epithelial cells that allow differential expression analysis at the promoter/TSS level. This work is part of the FANTOM5 project. Data downloads, genomic tools, and copublished manuscripts are summarized online at http://fantom.gsc.riken.jp/5/.

Materials and Methods

FANTOM5 CAGE data

We obtained the CAGE expression data of 16 normal lung epithelial cells and 16 NSCLC cell lines from the FANTOM5 project (Supplementary Table S1). We used the expression table containing CAGE tag counts under 184,827 CAGE peaks, which represents genome-wide, promoter-level expression profiles (9). We selected 65,131 active promoters (tags per million > 1 in at least two samples) and annotated the CAGE peaks using the GENCODE v19, miTranscriptome, and PLAR annotations (11–13).

To identify up- and downregulated transcripts in NSCLC cell lines, we performed differential expression analysis using edgeR (14). The P values were adjusted for multiple testing by Benjamini–Hochberg method. The thresholds of absolute fold change > 2 and FDR < 0.05 were used. We also evaluated the expression status in binary fashion: ON (expressed, count > 0), OFF (not detected, count = 0), as described before (15). In short, we calculated the frequency of expression (OFF/ON analysis), and features expressed four times more frequently in NSCLC were determined as turned ON.

TCGA RNA-seq data

We obtained the gene expression profiles of 515 lung adenocarcinoma (LUAD) and 59 normal tissue controls, and 501 lung squamous cell carcinoma (LUSC) and 49 normal lung tissue controls from The Cancer Genome Atlas (TCGA) Data Portal (data status as of June 1, 2016). The level3 RNASeqV2 expression profiles of 20,502 genes were downloaded. The expression levels of recently annotated lncRNAs were downloaded from the Genomic Data Commons Data Portal (HTSeq pipeline, GDC.h38 GENCODE v22, https://gdc-portal.nci.nih.gov). The data were log2 transformed and used for the differential expression analysis by limma package (16). The thresholds of absolute fold change > 2 and FDR < 0.01 were used. We also performed OFF/ON analysis as described above.

DNA methylation array data

We obtained TCGA and GSE36216 (Supplementary Table S2) datasets generated by Illumina Infinium 450K methylation array (HumanMethylation450 BeadChip) platform (17). Probes that map to multiple genomic locations and nonspecific probes were removed from analyses according to the previous report (18). We processed the data using the minfi package to generate the β values for DNA methylation status, and to analyze the differences in methylation levels between cancer and normal groups (19). FDR < 0.05 was considered significant.

Integration of DNA methylation data and CAGE/RNA-seq

Each CAGE promoter was linked to DNA methylation sites within −1500 to +500 bp from the center of CAGE peak. Genomic positions of the probes were taken from the annotation file of the Infinium 450K methylation array. Gene symbols from the annotation file were used for integration of RNA sequencing (RNA-seq) and DNA methylation data. RNA-seq (E-MTAB-2706) and Infinium 450K methylation array (GSE36216) data for the same panel of NSCLC cell lines were used to validate the correlations between gene expression and methylation (20). Spearman ρ < −0.3 was used to show inverse correlations between expression and methylation levels in NSCLC cell lines or TCGA tumor samples.

Quantitative RT-PCR

Detailed procedures were described previously (33). GAPDH expression levels were used for the normalization. The PCR primers are as follows. GAPDH forward: GGTGAAGGTCGGAGTCAACGGA, reverse: GAGGGATCTCGCTCCTGGAAGA. MYEOV forward: GAGCAGGCCATTAGCTCG, reverse: CATCCATTCGCGCCCACATA.

siRNA experiment in A549 cells

siRNA against human MYEOV (Stealth RNAi, siMYEOV-1: HSS120166, siMYEOV-2: HSS120167) and the negative control (Stealth RNAi, siNC: #12935-300) were purchased from Life Technologies. A549 and H441 cells were transfected with 20 nmol/L siRNA using Lipofectamine RNAiMAX (Life Technologies). Cell proliferation, apoptosis, and invasion assays were performed as described previously (28, 33). Total RNA was extracted from A549 cells 24 hours after siRNA transfection using the RNeasy Mini Kit. Microarray analysis was carried out using Affymetrix GeneChip Human Genome U133 Plus 2.0. Significance Analysis of Microarrays (SAM) was used for statistical analyses (34). The dataset was deposited in the GEO database (GSE65968).

The microarray CEL files were submitted to the ISMARA (The Integrated System for Motif Activity Response Analysis) online tool and selected MYC as the top transcription factor with highest activity significance (35). To find genes regulated by MYC, we obtained cMYC ChIP-seq peaks in A549 cells from ENCODE (36) and selected two nearest genes within 1 kb from the peak using GREAT 3.0.0 tool (37)

5-aza-dC and TSA treatment

SAECs were treated with 3 μg/mL of DNA demethylating agent, 5-aza-2′-deoxycytidine (5-aza-dC) on day 1, 3, and 1 μg/mL of histone deacetylase (HDAC) inhibitor, trichostatin A (TSA) on day 4. Total RNA was extracted on day 5, and microarray analysis was carried out using Agilent SurePrint G3 Human Gene Expression 8 × 60K v3. The dataset was deposited in the GEO database (GSE82214).

Miscellaneous statistical analyses

For cell culture experiments, differences were examined by ANOVA with Tukey post hoc test with JMP, version 11 (SAS Institute Inc.). Fisher exact test was used to assess the frequencies of promoters, genes, and repeat families. P < 0.05 was considered significant. All data are expressed as means ± SEM.

Results

To find the optimal set of epigenetically regulated genes in lung cancer, we performed an integrative analysis of datasets from the FANTOM5 and TCGA consortia. In the first step, we aimed to (i) identify differentially expressed (DE) promoters in NSCLC (FANTOM5 CAGE data); (ii) identify differentially methylated (DM) sites in NSCLC (69 NSCLC cell lines compared with 6 normal lung epithelial cells); (iii) link the differentially methylated sites to differentially expressed genes at promoter resolution, and (iv) validate the epigenetically regulated genes (DM-DE genes) in large lung cancer datasets from TCGA (LUAD and LUSC).

Upregulated promoters in NSCLC cell lines

Using the CAGE data of the FANTOM5 project, we compared the transcriptome of 16 NSCLC cell lines and 16 normal lung epithelial cells (Supplementary Table S1). We used the tag counts within 184,827 CAGE peaks as quantitative measures of promoter-level expression. Multidimensional scaling plot demonstrated the distinctive patterns of CAGE profiles of cancer and normal cells (Fig. 1A).

Linking differential expression to the DNA methylation changes

To integrate the promoter-level differential expression with aberrant DNA methylation results, we directly linked the CAGE-defined promoters to DNA methylation probes located within −1500 to +500 bp from the promoters. In some cases, multiple probes were linked to a single promoter. In addition, we used the matched gene expression and methylation profiles in 58 NSCLC cell lines (RNA-seq, E-MTAB-2706; Infinium 450K methylation array, GSE36216) to select methylation probes whose methylation inversely correlated with gene expression (Spearman ρ < −0.3, P < 0.05).

We overlapped the upregulated promoters from FANTOM5 CAGE analysis with the hypomethylated sites and found 379 methylation probe–promoter pairs where the promoter was upregulated in FANTOM5 cancer cell lines, and the methylation site was hypomethylated in cancer versus normal cell comparison (Fig. 1C; Supplementary Table S4). Among them, we further selected 159 methylation probe–promoter pairs that showed inverse correlation (Spearman ρ < −0.3, P < 0.05) between DNA methylation of the probe and expression of the gene in 58 NSCLC cell lines.

At the gene level, this corresponded to 49 protein-coding genes and 10 noncoding RNAs that were upregulated in cancer and had at least one associated hypomethylated probe in NSCLC (Supplementary Table S4). Expression and methylation of representative genes in 58 NSCLC cell lines are shown in Fig. 1D.

Validation in TCGA dataset

To confirm that our candidates for epigenetically regulated biomarkers are relevant to clinical tumors, we performed a rigorous validation using RNA-seq and DNA methylation data of 426 LUAD, and 359 LUSC samples from TCGA consortium. As above, we calculated the correlation between DNA methylation of the probes and expression of the associated genes (Spearman ρ < −0.3, P < 0.05). We also selected genes that were upregulated or turned ON in tumors compared with normal lung tissues (Supplementary Table S4). We found a robust signature of 22 epigenetically regulated biomarkers (epi-markers; 15 coding genes and 7 noncoding RNAs) that met the criteria in LUAD, LUSC, or both (Supplementary Table S5; Table 2). Expression and methylation of representative genes in TCGA tumor (LUAD and LUSC) or normal lung tissue samples are shown in Supplementary Fig. S1A and S1B.

Interestingly, 5 of the 7 lncRNA epi-markers were initiated from promoters overlapping repeat elements: 2 LTR elements and 3 REP522 elements (Table 2). Five of 15 protein-coding epi-markers were known cancer testis antigens (CTA; ref. 38). We also note that 13 of 22 epi-markers showed binary (OFF/ON) expression pattern: not expressed in normal (OFF) and expressed (ON) in cancer (Table 2), which is important in clinical applications for both biomarkers and therapeutic targets.

Frequency of epi-marker upregulation

To investigate how often the epi-markers are upregulated in clinical NSCLC tumors, we calculated the fraction of tumors that express a given epi-marker above a normal (baseline) threshold, which we defined as the maximum expression value observed in normal lung control samples from TCGA (alternative thresholds are presented in Supplementary Table S6). The frequencies of upregulated epi-markers were 44% for lncRNAs and 40% for protein-coding genes. The frequencies were on average higher in LUSC than in LUAD (47% vs. 37%). In addition, lncRNA epi-markers showed minimal or no expression in normal cells, while protein-coding genes demonstrated low to intermediate expression in normal samples (Table 2; Supplementary Figs. S2 and S3). A few epi-markers, like CTAs PRAME, CEP55, and lncRNA PCAT7, were upregulated in remarkably high percentage (> 85%) of LUSC tumors (Table 2).

Epi-marker survival analysis

We then performed survival analysis to test whether the expression of epi-markers is associated with prognosis. Expression levels of 6 of 22 epi-markers (MYEOV, LINC00857, ARL14, CT83, IGF2BP3, and CEP55) were associated with poor prognosis in LUAD by multivariate Cox regression analysis. In LUSC, only one gene, MYEOV, was associated with poor prognosis (Table 2; Supplementary Table S7). Correlation analysis of 22 epi-markers showed that four epi-markers with prognostic impact were grouped into the same cluster of correlated genes in LUAD (MYEOV, LINC00857, ARL14, and CT83; Supplementary Fig. S4A and S4B).

The overlapping genes included TOP2A, which is upregulated across multiple cancer types (15, 40), and six kinases including AURKA and AURKB that are implicated in various cancer types (41). Among 22 epi-markers, IGF2BP3 and CEP55 (associated with poor survival in LUAD as described before) showed highest correlations with the cell-cycle genes (Supplementary Fig. S4C). We note however that those correlations at transcript levels may not directly reflect causal relationships. No clear enrichments were observed in LUSC (Supplementary Fig. S4D).

DNA demethylation treatment

In search of genes whose expression is epigenetically silenced in normal lung epithelial cells and can be upregulated by DNA demethylation, we treated SAECs with a demethylating agent (5-aza-dC) and an HDAC inhibitor (TSA) and performed gene expression profiling (Supplementary Table S8). We observed 51 genes (45 coding genes and 6 noncoding RNAs) that were both upregulated by DNA demethylation and upregulated in NSCLC cell lines compared with normal epithelial cells (Supplementary Table S9). We found that nine of them overlapped our candidate epi-markers (seven coding genes and two noncoding RNAs; Table 2).

To further illuminate the epigenetic regulation, we assessed histone3 modifications (data from ref. 30) within REP522 elements with or without the CAGE-defined promoters. In PC-14 LUAD cells, enrichment of H3K4me3, H3K27ac, and RNA polymerase II was noted in REP522 elements with promoters, but not in those without promoters, confirming transcriptional activation of promoter regions. No enrichments were observed in REP522 elements of normal cells (SAECs), indicating inactivity of REP522 elements in normal lung epithelial cells (Fig. 2B).

Upregulation of lncRNAs from REP522 elements in NSCLC

The upregulated promoters that overlap REP522 elements were associated with lncRNAs, such as AL022344.4, PCAT7, and RP1-90G24.10. Interestingly, their expression levels were turned ON in NSCLC and highly cancer specific as confirmed in different RNA-seq datasets of NSCLC tissues (TCGA LUAD, TCGA LUSC, and GSE81089; Fig. 2C; Supplementary Fig. S1B and S1C).

We utilized BS-seq and RNA-seq data in 26 LUAD cell lines (30) and confirmed that DNA methylation and gene expression were negatively correlated for several REP522-associated lncRNAs, such as AL022344.4 (Fig. 2D). The expression of these lncRNAs was also associated with epigenetic signature typical of active promoters (enrichment of H3K4me3 and H3K27ac), as well as RNA polymerase II binding (Fig. 2E; Supplementary Fig. S5). Interestingly, several REP522-associated lncRNAs were upregulated in the same cell lines, such as PC-14 and VMRC-LCD cells, suggesting a possible genome-wide mechanism involved in transcriptional activation of REP522 repeat elements.

Epigenetic regulation of MYEOV in NSCLC

Among the upregulated/hypomethylated promoters in NSCLC cell lines, three promoters for MYEOV gene exhibited highly cancer-specific expression (Supplementary Table S5; Supplementary Fig. S6). In addition, MYEOV was the only gene that was prognostically relevant in both histologic subsets of lung cancer. As the function of MYEOV in NSCLC is largely unknown, we selected MYEOV for further studies.

First, we validated MYEOV expression in cultured NSCLC cell lines and lung epithelial cells by quantitative RT-PCR, which confirmed the cancer-specific expression of MYEOV (Fig. 3A). To investigate the DNA methylation status in NSCLC cell lines in more detail and resolution, we utilized BS-seq and RNA-seq data in 26 LUAD cell lines (30). In all 9 NSCLC cell lines with high MYEOV expression, we observed DNA hypomethylation in the 5′UTR region (Fig. 3B). To consolidate the epigenetic regulation of MYEOV, we compared histone3 modifications in SAECs and A549 LUAD cells (30). In A549 cells, the MYEOV gene locus was enriched with active marks (H3K4me3, H3K9/14ac, and H3K27ac) in contrast with SAECs (Fig. 3C). Next, we tested whether demethylation could restore MYEOV transcription in SAECs. Quantitative RT-PCR revealed that 5-aza-dC and TSA treatment led to MYEOV mRNA expression (Fig. 3D), which validated the result of microarray analysis (Supplementary Table S8). These overall findings were consistent with the notion that MYEOV transcription is epigenetically regulated through DNA methylation and parallel chromatin modifications.

MYEOV regulates proliferation, survival, and invasion of LUAD cells

To explore the functional role of MYEOV, we generated MYEOV loss-of-function knockdown models by transiently transfecting A549 and H441 LUAD cell lines with negative control or two different MYEOV siRNAs. In these cell lines, MYEOV knockdown inhibited cell proliferation (Fig. 4A) and increased the cell fraction of early apoptosis following serum deprivation (Fig. 4B). MYEOV silencing also showed attenuated invasion of A549 cells (Fig. 4C).

MYEOV knockdown in A549 and H441 LUAD cells. A, Cell numbers of siRNA-transfected A549 and H441 cells were counted on days 2, 4, and 6 after seeding. *, P < 0.05. B, Top, A549 and H441 cells transfected with siNC or siMYEOV were serum deprived for 48 hours to induce apoptosis. FITC-conjugated anti-Annexin V antibody and propidium iodide (PI) were detected by flow cytometry; bottom, the fraction of early apoptosis cells (Annexin V positive, PI negative). *, P < 0.05. C, Matrigel invasion assay in A549 cells. Data are representative of three independent experiments. *, P < 0.05. D, Heatmap showing the expression of significantly downregulated genes after knockdown of MYEOV with two different siRNAs (siMYEOV-1 and siMYEOV-2) in A549 cells. Genes with the cMYC-binding site within 1 kb from the promoter are marked with red. E, Inferred activity profile of MXI1_MYC_MYCN motif calculated by ISMARA (Integrated System for Motif Activity Response Analysis) tool (35). MXI1/MYC/MYCN was selected as the top motif by activity significance. The motif is shown in the top right corner of the figure.

Furthermore, we performed microarray analyses in A549 cells following MYEOV knockdown (Supplementary Table S12) and identified 38 commonly downregulated genes by two different siRNAs, which included tumorigenic genes such as TERT (Fig. 4D). To identify key transcription factors responsible for observed expression changes, we used ISMARA online computational tool (35). ISMARA models gene expression by integrating predicted transcription factor–binding sites, and it identifies transcription factors most likely to drive the expression changes observed in given experiment. ISMARA predicted MYC as the transcription factor that was most affected by MYEOV knockdown, and MYC activity decreased consistently in knockdown samples with both siRNAs (Fig. 4E). The biological interpretation is that genes with predicted MYC-binding sites in promoter region were downregulated after MYEOV knockdown.

To confirm the prediction, we used experimental binding sites of MYC in A549 cells (ChIP-seq data from ENCODE; ref. 36) and found that 17 of 38 downregulated genes (44.7%) have an MYC-binding site within 1 kb from the promoter and thus are potentially regulated by MYC. We marked those genes on Fig. 4D.

MYEOV is associated with poor prognosis in NSCLC

The univariate and multivariate Cox regression models in TCGA LUAD and LUSC datasets revealed that MYEOV is associated with poor prognosis as described above (Supplementary Table S7). We performed an extensive meta-analysis using six microarray datasets of LUAD that further substantiated the prognostic impact of MYEOV [Supplementary Fig. S7A, HR = 1.20; confidence interval (CI), 1.08–1.33; P = 0.0007]. The meta-analysis in five datasets of LUSC also confirmed MYEOV as an indicator of poor prognosis (Supplementary Fig. S7B, HR = 1.18; CI, 1.03–1.35; P = 0.0017).

We investigated the association between expression of epi-markers and mutation status of EGFR and KRAS in TCGA LUAD dataset (Supplementary Table S13). We found that the KRAS mutation–positive subgroup had higher MYEOV expression levels than wild-type group (P = 0.03), which indicates a possible link between MYEOV overexpression and KRAS mutation–associated oncogenic mechanisms.

Discussion

Our analyses identified 22 robust epigenetically regulated markers (epi-markers) in lung cancer. By directly integrating the CAGE and DNA methylation data, we were able to determine precisely the methylation sites and promoters of the candidate epi-markers, whose promoter hypomethylation leads to upregulation of the associated transcript in cancer. Our unbiased, genome-wide analyses also allowed for the discovery of epigenetic activation of specific promoters located within repeat elements and identification of seven lncRNA epi-markers. Six of 15 protein-coding epi-markers (MYEOV, AIM2, ARL14, KAZALD1, MKRN3, and PRAME) are among 578 genes recently reported to show a negative correlation between expression and promoter methylation levels among NSCLC cell lines (42). Only one of our candidates (AIM2) was among 57 upregulated and hypomethylated genes in LUADs reported by Selamat and colleagues (43). In addition, we observed no overlap between our candidates and genes reported by Mullapudi and colleagues, where 24 NSCLC tumors were compared with nontumor tissues, using a methylation-sensitive restriction enzyme–based HELP-microarray assay (44). The relatively small overlap indicates that despite previous reports, there is no consensus on epigenetically regulated genes in NSCLC. We propose that our integrative analyses based on both NSCLC cell line and large-scale tumor datasets provide a shorter and more robust list of candidates for further studies.

The frequencies of upregulation of candidate epi-markers are quite high in lung cancer tissues, and they average around 40% and are significantly higher that mutation frequencies of most common mutational drivers (KRAS 32%, EGFR 11.3%, and BRAF 7% in the TCGA LUAD cohort; ref. 2). The ontological link of our epi-markers with cell-cycle regulation and the significant associations with survival indicate a role in the tumorigenesis of lung cancer. However, more research is needed to elucidate the mechanisms of activation and functions of proposed epi-markers.

Five of 15 protein-coding epi-marker candidates are known CTAs, and three of them (MAGEA1, MAGEC2, and PRAME) were upregulated by demethylating treatment. This finding is consistent with previous reports showing that CTAs are epigenetically activated in multiple cancers (45) and that the expression of CTAs can be activated by a demethylating agent (46, 47). Upregulation of CTAs as immunogenic targets by a demethylating agent could be therapeutically attractive, in particular in combination with immunotherapy in NSCLC (48, 49). However, we note that three epi-markers activated by a demethylating agent (MYEOV, ARL14, and LINC00857) were also associated with poor prognosis, which can have implications when considering demethylating agents for therapeutic use in NSCLC, as oncogenic/carcinogenic genes can be activated by such treatment.

In our previous pan-cancer study, we showed for the first time the transcriptional activation of multiple copies of the REP522 interspersed repeat in cancer (15). In this study, we observed that the transcriptional activation is accompanied by parallel epigenetic changes: DNA hypomethylation and histone modifications characteristic to promoter activation. Interestingly, the activation of REP522 bidirectional promoters appears to be coordinated and occurs at multiple REP522 elements in the genome in the same cell line. With TCGA data, we were able to confirm the DNA hypomethylation and transcriptional activation (upregulation of associated transcript) in clinical tumors (both LUSC and LUAD datasets). Our previous research indicates that REP522 activation is a pan-cancer phenomenon; thus, the epigenetic activation is likely to occur in other cancers. However, further research is needed to confirm this assumption.

We hypothesized that some of our candidate epi-markers could confer a selective growth advantage to cancer and thus act as epigenetic drivers (epi-drivers), and we selected one of our epi-markers, MYEOV, for further studies. MYEOV was initially reported as an overexpressed gene in multiple myeloma (50) and upregulated across multiple cancer types (15). In this study, we demonstrated for the first time that MYEOV is highly expressed in NSCLC cell lines and is induced by DNA demethylation in normal lung epithelial cells. Functional studies showed tumorigenic roles of MYEOV, and survival analyses revealed that MYEOV is an indicator of poor prognosis. Furthermore, MYEOV knockdown experiment followed by gene expression profiling suggested that MYEOV effects are linked to the MYC-regulatory pathway. However, further studies are needed to validate this finding and to elucidate the mechanism.

Although it is known that cancer cells take advantage of epigenetic changes by silencing tumor suppressor genes via DNA hypermethylation, our results indicate that specific (as opposed to global) hypomethylation can play a role in cancer by allowing for activation of cancer-associated genes. We believe that our selected epi-markers have a great theranostic potential in NSCLC and are a good starting point for further studies.

Disclosure of Potential Conflicts of Interest

P. Carninci has ownership interest (including patents) in RIKEN on transcriptome technologies and is a consultant/advisory board member for Dnaform, Inc. No potential conflicts of interest were disclosedby the other authors.

Grant Support

This work was supported by JSPS KAKENHI grant #16K18437 (to M. Horie), #26461185 (to A. Saito), grant #16K11843 (to Y. Yamaguchi), grant #15K15768 (to M. Ohshima), grant #16H02653 (to T. Nagase), a grants-in-aid of the Public Trust Fund for Clinical Cancer Research (to M. Horie), and the Health and Labour Sciences Research Grant for Research on Allergic Disease and Immunology (to T. Nagase) from the Ministry of Health, Labour and Welfare, Japan. B. Kaczkowski was supported by Foreign Postdoctoral Researcher (FPR) program from RIKEN, Japan. A.R.R. Forrest was supported by a Senior Cancer Research Fellowship from the Cancer Research Trust and funds raised by the MACA Ride to Conquer Cancer. FANTOM5 was made possible by a Research Grant for RIKEN Omics Science Center and a grant of the Innovative Cell Biology by Innovative Technology (Cell Innovation Program; to Y. Hayashizaki) from the MEXT, Japan. This study was also supported by research grants for RIKEN Preventive Medicine and Diagnosis Innovation Program and RIKEN Centre for Life Science Technologies, Division of Genomic Technologies from the MEXT, Japan.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Acknowledgments

The authors are grateful to Kousuke Watanabe, Takahiro Kishikawa, Hirokazu Urushiyama, Masahito Yoshihara, and Dijana Djureinovic for technical support and useful discussions. We would like to thank all members of the FANTOM5 consortium for contributing to generation of samples and analysis of the data set and also thank GeNAS for data production.

Footnotes

Note: Supplementary data for this article are available at Molecular Cancer Research Online (http://mcr.aacrjournals.org/).