Abstract

Background

Long intergenic non-coding RNAs (lncRNAs) represent an emerging and under-studied
class of transcripts that play a significant role in human cancers. Due to the tissue-
and cancer-specific expression patterns observed for many lncRNAs it is believed that
they could serve as ideal diagnostic biomarkers. However, until each tumor type is
examined more closely, many of these lncRNAs will remain elusive.

Results

Here we characterize the lncRNA landscape in lung cancer using publicly available
transcriptome sequencing data from a cohort of 567 adenocarcinoma and squamous cell
carcinoma tumors. Through this compendium we identify over 3,000 unannotated intergenic
transcripts representing novel lncRNAs. Through comparison of both adenocarcinoma
and squamous cell carcinomas with matched controls we discover 111 differentially
expressed lncRNAs, which we term lung cancer-associated lncRNAs (LCALs). A pan-cancer
analysis of 324 additional tumor and adjacent normal pairs enable us to identify a
subset of lncRNAs that display enriched expression specific to lung cancer as well
as a subset that appear to be broadly deregulated across human cancers. Integration
of exome sequencing data reveals that expression levels of many LCALs have significant
associations with the mutational status of key oncogenes in lung cancer. Functional
validation, using both knockdown and overexpression, shows that the most differentially
expressed lncRNA, LCAL1, plays a role in cellular proliferation.

Conclusions

Our systematic characterization of publicly available transcriptome data provides
the foundation for future efforts to understand the role of LCALs, develop novel biomarkers,
and improve knowledge of lung tumor biology.