This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Protein-coding genes, guiding differentiation of ES cells into neural cells, have extensively been studied in the past. However, for the class of ncRNAs only the involvement of some specific microRNAs (miRNAs) has been described. Thus, to characterize the entire small non-coding RNA (ncRNA) transcriptome, involved in the differentiation of mouse ES cells into neural cells, we have generated three specialized ribonucleo-protein particle (RNP)-derived cDNA libraries, i.e. from pluripotent ES cells, neural progenitors and differentiated neural cells, respectively. By high-throughput sequencing and transcriptional profiling we identified several novel miRNAs to be involved in ES cell differentiation, as well as seven small nucleolar RNAs. In addition, expression of 7SL, 7SK and vault-2 RNAs was significantly up-regulated during ES cell differentiation. About half of ncRNA sequences from the three cDNA libraries mapped to intergenic or intragenic regions, designated as interRNAs and intraRNAs, respectively. Thereby, novel ncRNA candidates exhibited a predominant size of 18–30 nt, thus resembling miRNA species, but, with few exceptions, lacking canonical miRNA features. Additionally, these novel intraRNAs and interRNAs were not only found to be differentially expressed in stem-cell derivatives, but also in primary cultures of hippocampal neurons and astrocytes, strengthening their potential function in neural ES cell differentiation.

INTRODUCTION

In recent years, the number of proposed non-coding RNA (ncRNA) transcripts has dramatically been rising. For example, in the human genome there is an estimated number of up to 450 000 ncRNA transcripts predicted (1). In agreement, a recent study designated as ENCODE project (2), which focused on 1% of the human genome in high resolution, revealed that up to 90% of the human genome might be transcribed, with only 1.5% of RNA transcripts encoding for proteins. Thus, it has been proposed that the remaining 88.5% of RNA transcripts might serve as a source for regulatory ncRNAs (3). However, it is currently still unclear which of the 450 000 predicted ncRNA candidates, encoded on the human genome, are functional and which ones represent spurious transcription products or degradation intermediates (4). Therefore, it is important to clearly identify the functional portion of the ncRNA transcriptome in model organisms. Several features might be employed to filter out and preselect functional, regulatory ncRNAs from a background of spurious transcription/degradation intermediates such as (i) analysis of differential expression of ncRNAs during cell differentiation and development, (ii) ncRNA expression in disease or (iii) ncRNA expression during development. In addition, since most functional ncRNAs are known to bind to proteins forming ribonucleo-protein particles (RNPs), isolation by RNPs might increase the likelihood for identifying functional ncRNAs (5).

For embryonic stem (ES) cell maintenance and pluripotency, non-coding RNAs have recently emerged as important regulators of gene expression (6–8). Up till now, specific microRNAs, a class of small regulatory ncRNAs, sized 21–24 nt (9,10), have been investigated in neural development during ES cell differentiation. In particular, expression of the ES cell specific miR-290 cluster, harboring miRNAs-290, −291, −293, −294 and −295, respectively, has been shown to be significantly down-regulated upon differentiation (11), while regulating de novo methylation in ES cells by repressing the transcriptional repressor Rlb2 (12,13). Inhibition of mir-145 in human ES cells has been shown to reduce their capacity for differentiation (14), while maturation repression of the pre-let7 miRNA precursor by the Lin28 protein is a mechanism blocking commitment to neural fate (15). In addition, microRNA-array analysis in ES cells versus differentiated cells reveals specific microRNA expression signatures (16,17), thus implying transcriptome changes during differentiation related to microRNA function. Notably, lack of expression of pre-microRNA-processing proteins Dicer (18,19) and DGCR8 (20) was shown to result in severe differentiation defects.

In order to identify the complete set of small ncRNAs involved in neural differentiation of mouse ES cells in vitro, we have generated specialized, RNP-derived cDNA libraries, as previously described (21,22), for three differentiation stages. Transcriptional profiling by high-throughput sequencing revealed the presence of numerous differentially expressed known and novel ncRNAs, which predominantly locate to intergenic and intronic regions of the mouse genome. The majority of the novel ncRNAs exhibited a sequence length bias of 18–30 nt. For selected, newly identified ncRNA transcripts differential expression in primary hippocampal neurons and astrocytes was confirmed by real-time PCR.

MATERIALS AND METHODS

RNP library generation

RNP libraries were generated as previously described (21,22). Briefly, cells were lyzed and cell extracts were size-fractionated on 10–30% glycerol gradients. Subsequently, glycerol gradients were fractionated and phenol–chloroform precipitated. The extracted RNA was 3′-C-tailed using poly (A) polymerase from yeast (Epicenter, Madison, USA). A 19-mer 5′-adaptor (–GTCAGCAATCCCTAACCAG, bold and underlined are ribonucleotides) was ligated by T4 RNA ligase (Fermentas, St. Leon-Rot, Germany) to the C-tailed RNA. The RNAs were reverse transcribed by using an anchor primer (5′-AGGAGCCATCGTATGTCGGGGGGGGH) and amplified by PCR at 53°C annealing, for 25 cycles using the following primers: 5′-libPCR GTCAGCAATCCCTAACGAG, 3′-libPCR AGGAGCCATCGTATGTCG. The cDNA was PAGE purified and size selected from 20 to 400 bp. cDNA was cloned into the pGEM-T vector, (Promega, Mannheim, Germany) for diagnostic Sanger sequencing before high-thoughput sequencing. For Solexa sequencing, additional bar-coded forward primers were added to cDNAs by PCR:

5′-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTATACGTCAGCAATCCCTAACGAG-3′ for the ES library,

5′-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTCATCGTCAGCAATCCCTAACGAG for the NP library 5′AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTGACAGTCAGCAATCCCTAACGAG for the N/G library. The following reverse primer was employed for all libraries: 5′-CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTAGGAGCCATCGTATGTCG 3′. The eluted cDNAs were analyzed by high-throughput sequencing employing the Solexa (Illumina) platform. Sequencing was performed with the Genome Analyzer GAII at FASTERIS SA, Plan-les-Quates, (Switzerland). Reads were generated as single reads with a maximum of 76 bp in length.

ES cells, at Day 0 of neural differentiation (D0), were separated from feeder cells, dissociated with Accutase (Sigma), and cultured as cell suspension (2 × 105 cells/ml) in ESCM without LIF in non-adhesive bacterial-grade dish. Cells that became floating cellular aggregates were cultured in the same medium for 2 days. The adherent colony culture was initiated by plating cellular aggregates on plastic or glass surface coated with polyornithine (Sigma, 15 μg/ml for plastic and 50 μg/ml for glass, 1 h RT) and laminin (20 μg/ml ON 4°C), and further cultured in NIM. Attached aggregates flattened over 1–2 days and columnar primitive neuroepithelial (NE) cells developed and formed neural rosettes at approximately D6. The neural tube-like rosettes formed during neural induction were used as a selection criterion; neural colonies were mechanically separated, collected and dissociated with Accutase, and cultured as cell suspension or in adherent culture onto polyornithine/laminin-coated plates in NPM (2 × 105 cells/ml) for 4 days, for the proliferation of the neural progenitor cells. For neural differentiation/specification, neural progenitors were dissociated with Accutase and plated onto polyornithine/laminin-coated plates or coverslips, followed by culturing in NDM.

Colonies and cellular morphologies were monitored by phase contrast microscopy. ES or neural colonies were mechanically detached or split under a stereomicroscope. Three independent experiments were performed. Samples for RNP libraries generation, RNA extraction or immunocytochemistry were taken at the following time-points: D0 (stage designated as ES), D10 (stage designated as NP) and D20 (stage designated as N/G).

Initial mapping and analysis of the cDNA libraries

Prior to analysis, the pool of sequencing reads has been divided into three libraries according to sequence barcodes: ATAC for ES library, CATC for NP library and GACA for N/G library. The analysis of the individual libraries has been performed employing the APART pipeline (automated pipeline for annontation of RNA transcripts) (28). All APART modules were used. For removal of adapter sequences, we employed a 20-nt window of the adaptor sequences with three allowed mismatches and included removal of the C-tail, located between the insert and 3′-adaptor. For downstream analysis, only reads with a minimal length of 18 nt by which both adaptors had been detected, were used. For sequence reads, containing only a partial 3′-adaptor sequence, at least six terminal bases were required to match the C-tail, or the beginning of the 3′-adaptor sequence. Next, sequences were aligned to the mouse genome (mm9), allowing for a single mismatch. Assembly and annotation was performed employing the APART pipeline, employing default settings.

Transcriptional profiling of three cDNA libraries from ES cells

For differential expression analysis the non-clustered list of contigs generated by APART, has been employed. The identification of overlapping contigs between libraries has been performed through BEDTools package (29) with 50% overlap threshold. Contigs appearing in at least two libraries have been selected for subsequent differential expression analysis. Next, the contig redundancy caused by allowing for multiple read matches has been removed by using in-house perl scripts employing APART clustering data. Normalization of the read number has been carried out utilizing the package edgeR (30) from Bioconductor (31). In order to test for differences in expression edgeR and several in-house scripts have been used.

miRNA prediction

For identification of novel miRNAs, present in our dataset, the software miRDeep2 (32) was employed, in particular scripts mapper.pl and miRDeep2.pl (33). Mapping was performed by using the default parameter set and the FASTQ files from each library, except that the adaptor sequences were already removed prior to analysis. The miRDeep2 script was then applied to find potential miRNAs in the data set based on the suggested parameters. Results were compared to data assembled in the differential expression analysis by using the package rtracklayer (34) of the Bioconductor project.

Real-time PCR

Total RNA was isolated from mouse ES cells, primary hippocampal neurons and primary astrocytes with TRI Reagent (Sigma-Aldrich, Vienna, Austria) according to the manufacturer’s protocol. An amount of 500 ng of total RNA were poly-A tailed and reverse transcribed to cDNA using the microRNA first strand synthesis kit (Agilent Technologies, Böblingen, Germany) technologies, following the manufacturer’s protocol.

The cDNA was used as template for the real-time PCR on ES cells, primary hippocampal neurons and astrocytes. The universal reverse primer provided with the kit was used together with the following forward primers in 5′–3′ orientation:

Primers were ordered from Sigma-Aldrich. Real-time PCR was performed using Power SYBR® Green PCR Master Mix (Applied Biosystems, Darmstadt, Germany). Reactions were performed at 60°C annealing for 1 min and for 40 cycles. Normalization was performed with U6.

Real-time PCR analysis of marker genes revealed high levels of Oct4 and Nanog in ES cells, while their levels were significantly down-regulated at the NP and not detected at the N/G stage. The neural marker genes Nestin and Sox1, were significantly up-regulated at NP and N/G stages, while the neuronal (Tau), astrocytic (Gfap) and oligodendrocytic (Osp) gene expression was highly up-regulated in the last stage (as compared to ES stage; see Supplementary Figure S1B).

From each of the three stages, specialized RNP libraries encoding small ncRNAs were generated (see ‘Materials and Methods’ section). Subsequently, cDNA libraries were analyzed by high-throughput sequencing employing the Solexa platform and ∼26 Mio. sequence reads for the three libraries were obtained [sequences have been deposited in the Sequence Read Archive (NCBI) with the accession number: SRP008250].

In order to determine differentially expressed RNA transcripts within the ES, NP and N/G stages, we bioinformatically analyzed sequence reads from the respective cDNA libraries by APART (automated pipeline for annotation of RNA transcripts), a bioinformatical algorithm recently developed in our lab (28). Thereby, differential expression analysis was based on barcoding of cDNAs from the three libraries prior to cloning and sequencing. Subsequently, two bioinformatical analyses were performed by employing APART: (i) we first annotated all sequence reads to known or unknown ncRNA species, e.g. miRNAs, snoRNAs, tRNAs, intraRNAs (i.e. intragenic RNAs) or interRNAs (intergenic RNAs; Figure 1A); (ii) we next grouped all identical sequences into ‘contigs’ and determined the distribution of these unique RNA transcripts in our libraries (Figure 1B). By this approach, 706 contigs with expression changes >2-fold between any of the three cDNA libraries and a mean expression value of at least 28 were identified (Supplementary File S1). Only those sequences were included which appeared at least in two of the three cDNA libraries in order to assess expression differences.

Distribution of ncRNA sequences between the three cDNA libraries. (A) Distribution based on sequence reads. (B) Distribution of contigs. The most abundant known ncRNAs are shown in addition to reads mapping in intergenic and intronic regions (i.e. intergenic,...

In this analysis, miRNAs represented the majority of differentially expressed known ncRNA sequences: while in the N/G library, miRNAs accounted for only 4% of all sequences, in ES or NP libraries 25–29% of sequences corresponded to miRNAs (Figure 1A). This is due to significantly fewer miRNA species being present at the N/G stage, as demonstrated by the number of unique contigs (1% versus 8–13%; Figure 1B), which hints to a more prominent participation of miRNAs in regulation of gene expression during the early stages of ES cell differentiation. Interestingly, a 9-fold increase in the abundance of sequences from the class of miscellaneous ncRNAs is observed in total sequence reads at the N/G stage, mainly due to an increase of 7SK and 7SL ncRNA expression (Figure 1A; see below).

The majority of novel ncRNA transcripts in all three cDNA libraries mapped to either intergenic or intronic regions of the mouse genome, which were designated as interRNAs (i.e. intergenic RNAs) or intraRNAs (i.e. intragenic RNAs such as intronically encoded RNAs), respectively (Figure 1B). Sequence reads representing intraRNAs or interRNAs showed a rather similar distribution in all cell stages (60% in ES and NP stages or 75% in the N/G stage, respectively). We have previously shown that intergenic regions between and intragenic regions within protein-coding genes harbour the majority of functional known ncRNAs (21), in particular snoRNAs and miRNAs, suggesting the presence of novel representatives of these ncRNA classes also in our cDNA libraries (see below).

From intraRNA species, about one-third of sequences mapped to introns in antisense orientation while the remaining sequences derived from the sense orientation of introns (Figure 1B). In addition, a small number of sequences were annotated to exons of mRNAs, in antisense orientation (Figure 1B). Previously, by analysis of a mouse brain cDNA library encoding small ncRNAs we have presented evidence that ncRNAs mapping in antisense orientation to introns or exon/intron boundaries might be involved in regulation of alternative splicing of respective genes (21).

To validate the transcriptional profiling analysis, we investigated expression of known miRNA species, previously described to be involved in neural differentiation. Indeed, we verified expression of stem cell specific microRNAs in the ES library, in particular abundant expression of the mir-290 (11) and the miR-17-92 clusters (35), as well as miRNAs miR-199, miR-106, miR-299 and miR-214 (Figure 2 and Supplementary File S2), as previously reported (11,36). In addition, we observed expression of these miRNA species being significantly down-regulated upon ES cell differentiation in the NP and N/G population of cells (see above and Figure 2), in agreement with earlier studies (35,36).

The 100 most differentially expressed miRNAs. Expression values were normalized against the mean expression value (257.6). Fold changes are depicted in a log2 scale. Inf: infinite fold-change because expression was not detectable in the respective library....

We were also able to verify expression of miRNAs miR-21, miR-15b, miR-669, miR-329, miR-335, miR-16, miR-411 and miR-541 in mouse ES cells (11,35) and, in addition, we observed that their expression was significantly down-regulated in the later stages of ES cell differentiation (i.e. NP, N/G stages; Figure 2). Notably, we find, as reported, the stem-cell-specific miR-302 (16,37–41) exclusively expressed in ES cells while completely absent at the NP and N/G stage (Supplementary File S2).

Conversely, expression of neural-specific microRNAs mir-9 and miR-124, as well as the miRNA let-7 family, which have been reported to modulate stem cell derived neurogenesis (42), was found to be significantly up-regulated at the NP stage in our analysis (Figure 2). For human ES cells, several microRNAs have been reported to be up-regulated upon spontaneous differentiation, such as mir-181a, mir-181b-2, mir-26a, mir-23b, mir-137 (39), consistent with their mouse homologues from this study.

Interestingly, we also identified microRNAs exhibiting expression patterns differing from previous studies. Thereby, expression of microRNAs miR-29a (16,36), miR-135a (39), miR-141 (40,43), miR-340 (41), miR-200c (16,37,43), miR-328 (36) and miR-30e (40) has been reported to be up-regulated in ES cells. In contrast, we find an increase in their expression at the NP stage (Figure 2). As for miR-135a, miR-141, miR-340, and miR-200c this is probably due to the fact that previous studies were carried out in human ES cells (16,39–41), which derive from a different developmental stage than mouse ES cells. In addition, expression analyses carried out in mouse ES cells (36) were only extended to embryoid bodies (EBs) and not to neural progenitor cells, which might explain the observed up-regulation of expression of miR-29a and miR-328 in NPs with respect to ES cells in our study.

Finally, we have identified expression of specific miRNAs, not previously being reported in ES cells, such as miR-715. Also, we observed microRNAs whose expression has been reported in mouse ES cells (35), without further analysis on their differential expression, to be significantly up-regulated in NPs and NGs (Figure 2). From these, miR-200a, miR-200b, miR-218-2, miR-153, miR-103-1, miR-26b, miR-30d, miR-324, miR-429 and miR-382, were found to be the highest expressed at the NP stage, while miR-805 and miR-674 were the highest expressed at the N/G stage (Figure 2 and Supplementary File S3). We also observed expression of miR-598 being up-regulated at the NP stage while miR-877, miR-682, miR-678 and miR-217 are expressed at the N/G stage, only. Taken together, our data demonstrate that the regulation of gene expression during neural differentiation by a large number of miRNAs might be even more complex than previously anticipated.

Differentially expressed known ncRNAs from the three stages of ES cell neural differentiation

In addition to miRNAs, also several C/D box snoRNAs (small nucleolar RNAs) were found among the most prominent differentially expressed known ncRNA species (Figure 3), designated as SNORD12, 29, 31, 35, 74, 101, 104 and 115 (44). The majority of snoRNAs has been reported to guide covalent modifications of ribosomal RNAs (rRNAs) or small nuclear RNAs (snRNAs), respectively (45,46). Thereby, most snoRNAs, with few exceptions, are encoded within intronic sequences of protein-coding genes (45,47,48). Up till now, two classes of snoRNAs have been described, i.e. C/D box snoRNAs and H/ACA box snoRNAs. Unlike canonical snoRNAs, so-called ‘orphan’ snoRNAs lack any complementarity to rRNA or snRNA targets and might target other RNA molecules such as mRNAs (45,46,49).

We show that snoRNAs SNORD12, 29, 31, 74, 101 and 104 are abundantly expressed in ES cells, while their expression significantly decreases upon ES cell differentiation by ∼10- to 30-fold (Figure 3). In contrast, expression of SNORD35 and 115 is low in ES cells while their expression is significantly up-regulated in NP cells or N/G cells (Figure 3). Interestingly, SNORD31, 101 and 115 have no validated rRNA targets (49,50) and might be involved in regulation of other RNA species such as mRNAs (see below). In addition, unlike canonical snoRNAs, SNORD12, 29, 31, 74, 104 and 115 are reported to be encoded within intronic sequences of non-protein coding transcripts or hypothetical open reading frames, which might not code for proteins (44,49,51,52); up till now, the role of these non-protein-coding snoRNA host gene transcripts has been elusive; this implies the possibility that, in addition to intronically encoded snoRNAs, also their respective host gene transcripts might be involved in the regulation of neural differentiation.

Interestingly, earlier reports have implicated SNORD115 (also designated as HBII-52) to be involved in brain development and disease (49,53). Thereby, SNORD115 has been proposed to target the brain-specific serotonin receptor 2C mRNA (49), thus regulating alternative splicing and/or editing of the serotonin receptor 2C pre-mRNA (54,55). We observed an ∼100-fold increase in SNORD115 expression between mouse ES cells and differentiating cells, which was confirmed by northern blotting (Supplementary Figure S3). At the N/G stage, expression of SNORD115 is reduced by ∼50-fold in comparison to the NP stage. However, the N/G library mainly consists of astrocytes and oligodendrocytes with fewer neurons (see above); since SNORD115 has previously been shown to be expressed exclusively in neurons, this might explain the lower abundance at the N/G stage.

Previously, it has been demonstrated that SNORD115 maps to the Prader–Willi Syndrome (PWS) locus on chromosome 15 (49). PWS is a neurodevelopmental disease with patients showing severe obesity and varying degrees of mental retardation (56). Notably, two snoRNAs from that locus have directly been implicated in the etiology of the disease, namely SNORD115 and SNORD116 (also designated as HBII-85), respectively (57–60). It is thus interesting to note that expression of SNORD115 is significantly up-regulated at a very early stage of development, i.e. already during neural differentiation.

In addition, expression of three other known ncRNAs, i.e. 7SL, 7SK and vault-2 RNA, respectively, is up-regulated at the N/G stage compared to NP and ES cells (Figure 3). Thereby, 7SL RNA is an abundant ncRNA species, which serves as an integral part of the signal recognition particle (SRP) (61,62). The SRP has been shown to promote the insertion of proteins into the cellular membrane or to regulate protein secretion (63). These two processes might especially be required at the N/G stage where neural cells interconnect to form neural networks guided by receptors on the surface of cells, such as serotonin receptor 2C (see above); concomitantly, an increase in protein secretion might be required by secretion of neurotransmitter peptides.

Previously, 7SK RNA has been reported to negatively regulate RNA Polymerase II transcription by inactivating the positive transcription elongation factor b (P-TEFb) as well as affecting the function of the chromatin regulator HMGA1 (64,65). By this mechanism, an important transcriptional regulatory role of 7SK RNA in HMGA1-dependent cell differentiation regulation has been described (64). Hence, 7SK RNA might also be involved in the regulation of the transition from mouse ES cells into neural/glial population of cells.

Lastly, among differentially expressed known ncRNAs we have identified expression of vault-2 ncRNA (66), an ncRNA component of the vault RNP, to be upregulated at NP and N/G stages, respectively (Figure 3). Recently, by a subtractive hybridization approach we have observed expression of vault RNAs 1–3 to be highly up-regulated upon EBV (Epstein–Barr virus) infection of human B cells (67,68).

Size distribution of novel ncRNAs from the three stages of ES cell neural differentiation

Analysis of the length distribution of novel and known ncRNA candidates from the three cDNA libraries showed a bias towards 18–30 nt sized ncRNA species in all three cDNA libraries (Figure 4). Thereby, in the NP library the 18–30 nt peak is ∼2-fold larger compared to ES and N/G libraries. A second, smaller peak of ncRNA sequences appears at RNAs sized 45–47 nt which increases at the N/G stage; a third potential peak, sized 71–78 nt, is also observed (Figure 4).

Length distribution and abundance of contigs from the three RNP libraries. NcRNAs were clustered in unique contigs; the consensus sequence was employed to determine the absolute abundance versus the length of respective ncRNAs (contigs exhibiting a size...

The fraction of 18–30 nt sized ncRNA candidates is comprised of miRNA species in addition to inter- and intraRNAs, i.e. ncRNAs located in intergenic or intronic regions (Figure 4). Although we tried to sub-classify the latter RNA species, including motif finding algorithms provided by the MEME suite (69) and searching for structural conservation employing the Vienna RNA package (70) and RNAz (71), as of now, we could not identify any common motifs. In addition, a correlation of features such as RNA size did not result in satisfying findings. However, due to their size and high abundance, similar to the abundance of miRNAs, some intra- and interRNAs might represent, yet undiscovered, novel microRNAs. Therefore, we employed the miRDeep algorithm (32,33) to search for novel microRNAs in our dataset. Thereby, we investigated folding of the pre-miRNAs by mfold, as well as the conservation of seed sequences and the abundance of the miR* sequences in the cDNA libraries. In some cases, novel identified microRNA candidates were predicted to derive from snoRNA sequences. Since it has previously been reported that some snoRNAs can indeed serve as precursor RNAs for miRNAs (72–74), we included these candidates in our analysis. Based on above criteria, we characterized 17 novel microRNA candidates (Supplementary File S4). Thereby, the highest scoring miRNA candidate was mmu-interRNA26, whose expression was also validated by a mouse ESTs (expressed sequence tags) database screen; mmu-interRNA26 is abundantly expressed in the N/G library, unlike most other miRNAs (Figure 3), as well as in primary hippocampal neurons and astrocytes (see below).

The remaining inter- and intraRNAs in the size range of 18–30 nt scored very low as potential miRNAs suggesting that these might represent a distinct class of entirely novel neural ncRNA candidates. Interestingly, we note that sequences of numerous inter- and intraRNAs are included in reported ESTs (Supplementary File S1), thus validating their expression. Several inter-and intraRNAs were found also to cover binding sites for transcription factor p300 and the transcriptional repressor CTCF, which are reported to regulate gene expression in ES cells and in the cortex and cerebellum of the mouse brain. We further identified a number of inter-and intraRNAs overlapping regions of chromatin acetylation, which has been reported in total brain and cerebellum (Supplementary File S1). Taken together, these results indicate that most inter- and intraRNAs are actively transcribed in the brain, while at least some of these might be involved in regulating brain function and development on a transcriptional level.

Processing analysis of known and novel ncRNA candidates by APART

In Eukarya, many functional known ncRNA species are processed from larger RNA transcripts (75). In particular, rRNA, miRNA-, snoRNA- or tRNA-precursor transcripts are processed by distinct processing pathways, which include site-specific exo- and endonucleases. By employing APART we have identified a large number of small stable RNA species, potentially derived from larger precursor transcripts (for selected examples see Supplementary Figure S2). Detection of RNA processing by APART is achieved by scanning of a contig-coverage plot to search for significant changes (28). APART considers a position as a putative processing site when the coverage shift between 2 nt is larger than one-third of the maximum coverage of the contig (Supplementary Figure S2).

For proof of principle we investigated miRNA processing by the APART algorithm for the class of miRNAs, which are reported to be processed by Drosha/Pasha and Dicer from larger primary and precursor miRNA transcripts (76,77). Indeed, we were able to identify this processing event for miRNAs mmu-let-7c-1 as well as for mmi-mir-21 (Supplementary Figure S2). In addition, in particular for the class of 18–30 nt sized ncRNA candidates, we were able to identify in many cases potential precursor transcripts (Figure 5) which might strengthen the functionality of these novel ncRNA candidates.

Length distribution and abundance of processed contigs from the three RNP libraries. NcRNAs were clustered in unique contigs; the consensus sequence was employed to determine the absolute abundance of the processed ncRNAs, as assessed by APART, versus...

Due to the heterogeneity of the N/G stage, comprised of neurons and glial cells (astrocytes and oligodendrocytes) (Supplementary Figure S1), we additionally analyzed in which type of neural cells selected inter- and intraRNAs are expressed. To that aim, we analyzed primary cell cultures of neuronal and glial cells for the presence of selected novel ncRNA candidates by real-time PCR. Based on the library profiling by protein markers (see above), we employed astrocytes (AS) as well as primary hippocampal neurons (HC) (see Supplementary Methods section) to investigate ncRNA expression.

Indeed, we observed neural-specific as well as ES cell-specific expression of selected candidates (Figure 6). For example, interRNA32 and interRNA26 were highly expressed in HC and AS but showed very low expression in ES cells; thereby, up-regulation of expression was observed between 70- and 200-fold (Figure 6). Other ncRNAs, such as interRNA30, interRNA35 and interRNA36, which exhibited a reduced expression level during differentiation (i.e. in the NP and N/G stages), also show reduced expression in HC and AS cell cultures comprised mainly of a single cell type (Figure 6). Interestingly, interRNA143 and intraRNA140 exhibited a highly AS-specific expression when compared to HC and ES cells, thereby being potentially involved in AS differentiation. Conversely, interRNA105, interRNA140 and interRNA68 are higher expressed in HC than in AS cells, with interRNA140 being ∼3-fold higher expressed in HC compared to AS (Figure 6).

Expression analysis of selected inter- and intraRNAs in primary hippocampal neurons and astrocytes by real-time PCR. Expression values were normalized against U6.

Abundance versus differential expression of ncRNA candidates

To further increase the likelihood for the identification of novel functional ncRNA candidates, we assessed their relative abundance in the libraries and compared it to their differential expression within the three differentiation stages of ES cells. The differentially expressed candidates were assigned scores, calculated by their mean expression value versus their absolute mean fold change in all three libraries. We rationalized that there should be a correlation between the abundance of a ncRNA, its differential expression, and the likelihood for representing a functional ncRNA species. This does not rule out, that lowly expressed ncRNAs with an even distribution between the three ES cell stages might not be functional. However, differential expression is one of the key indications of regulatory function of the gene products. Moreover, for future functional analysis such as the identification of protein binding partners, a robust, abundant and differential expression would be desirable.

The results of the abundance/differential expression plot show that ncRNA candidates with the highest scores (i.e. highest abundance/highest differential expression) are predominantly interRNAs or miRNAs, sized 18–30 nt next to miRNAs (Figure 7, upper right corner). We also identified other known ncRNAs such as 7SL RNA and 7SK RNA, vault-2 RNA and snoRNAs, in particular SNORD115 (see above), which are among the top-ranked candidates. The potentially novel miRNA candidate mmu-interRNA26 is indicated in Figure 7 (designated as 21 079, corresponding to the ID number indicated in Supplementary File S4). Differential expression of selected ncRNA candidates from this analysis was also verified by real-time PCR (Supplementary Figure S4). Further functional analysis of selected ncRNA candidates will thus focus on highest-ranked novel and known ncRNAs from our screen.

Plot of the mean fold-change versus the mean expression of ncRNAs expressed in all three libraries. The ncRNAs are assigned scores (see Supplementary Material section) based on their mean expression value (y-axis) versus their absolute mean fold change...

CONCLUSION

In this study, we defined the small ncRNA transcriptome involved in regulation of mouse ES cell differentiation into neural cells in mouse by a deep-sequencing approach. In general, high-throughput deep-sequencing approaches have the potential to identify a large number of potential ncRNA candidates, however, these studies usually lack functional evidence of novel RNA transcripts. Functional analysis of potential ncRNA candidates, however, is highly time consuming, with currently no high-throughput methods available. Hence, it is important to preselect or enrich for functional ncRNA candidates in deep-sequencing screens for further analysis. In order to increase the likelihood for identification of such functional ncRNA species, we applied three novel selection filters: (i) cDNA libraries were generated from RNPs rather than from protein-devoid RNAs, since most functional eukaryal RNAs are known to form RNA–protein complexes; (ii) we analyzed differential expression of novel ncRNA candidates by generation of three RNP libraries from three stages of ES cell differentiation taking into consideration that many functional ncRNAs are regulated during development; (iii) since most functional ncRNAs have been shown to be processed from larger precursor RNAs, we also analyzed potential RNA processing products.

By these analyses, we identified several known ncRNAs from two abundant RNA classes, i.e. miRNAs and snoRNAs, respectively, which were differentially expressed. In addition, expression of three other known ncRNAs, i.e. 7SL, 7SK and vault-2 RNA, respectively, was shown to be significantly up-regulated during neural differentiation.

While regulation of gene expression by 7SK RNA is exerted on the transcriptional level, miRNAs and some selected snoRNAs have been reported to regulate gene expression on the post-transcriptional level; finally, 7SL RNA was shown to act on a post-translational level. Hence, regulation of gene expression during neural differentiation might be exerted by these ncRNAs on transcriptional, translational and post-translational levels.

The majority of differentially expressed ncRNAs were represented by novel ncRNA candidates and mapped to intergenic and intronic regions, designated as interRNAs or intraRNA, respectively. These RNA species predominantly exhibited sizes between 18 and 30 nt. Thereby, 17 of these interRNAs and intraRNAs might represent novel members from the class of miRNAs, as assessed by the miRDeep algorithm. However, the majority of interRNAs or intraRNAs could not be assigned as miRNAs and thus might represent novel candidates for regulatory ncRNAs in neural differentiation. For future analyses, it will be interesting to study the effects of known and novel ncRNAs on ES cell differentiation by overexpression or inactivation of selected ncRNA candidates.

ACCESSION NUMBERS

Sequence Read Archive (NCBI) accession number: SRP008250.

SUPPLEMENTARY DATA

FUNDING

The Austrian Science Funds FWF (SFB F4411); the Austrian Ministry of Science and Research grant Gen-AU (D 110420-011-015); the PhD program W 1206-B18 (funding of K.S.) to A.H.; the Austrian Ministry of Science and Research grant Gen-AU (D110420-012-012) (funding of M. Z). Funding for open access charge: Austrian Ministry of Science and Research grant (SFB F4411).

52. Smith CM, Steitz JA. Classification of gas5 as a multi-small-nucleolar-RNA (snoRNA) host gene and a member of the 5′-terminal oligopyrimidine gene family reveals common features of snoRNA host genes. Mol. Cell. Biol. 1998;18:6897–6909.[PMC free article][PubMed]