This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. You may not use this work for commercial purposes

Abstract

Splicing is the removal of intron sequences from pre-mRNA by the spliceosome. Researchers working in multiple model organisms – notably yeast, insects and mammalian cells – have shown that pre-mRNA can be spliced during the process of transcription (i.e. co-transcriptionally), as well as after transcription termination (i.e. post-transcriptionally). Co-transcriptional splicing does not assume that transcription and splicing machineries are mechanistically coupled, yet it raises this possibility. Early studies were based on a limited number of genes, which were often chosen because of their experimental accessibility. Since 2010, eight studies have used global datasets as counting tools, in order to quantify co-transcriptional intron removal. The consensus view, based on four organisms, is that the majority of splicing events take place co-transcriptionally in most cells and tissues. Here, we discuss the nature of the various global datasets and how bioinformatic analyses were conducted. Considering the broad differences in experimental approach and analysis, the level of agreement on the prevalence of co-transcriptional splicing is remarkable.

Introduction

Transcription and pre-mRNA splicing are carried out by two distinct macromolecular machines, RNA polymerase II (Pol II) and the spliceosome, which are capable of functioning independently of one another. Polyadenylation of transcripts occurs upon co-transcriptional cleavage of the nascent RNA chain and, thereby, indicates that transcription is complete. Although most mature poly A+ transcripts are spliced, early studies yielded mixed results as to whether splicing is completed before polyadenylation [1-3]. Subsequently, a number of key studies in several model systems supported the notion that splicing at least begins co-transcriptionally, i.e. on nascent RNA tethered to chromatin by elongating Pol II [4-12]. Co-transcriptional splicing is not surprising: nucleoplasm is filled with spliceosomal components that should be able to associate with nascent RNA, much as ribosomes are capable of translating RNA co-transcriptionally in bacteria. Moreover, in vivo rates for the splicing reaction are fast: in the order of 30 seconds to 3 minutes from the time the intron is complete, depending on the species and the method used [5,12-15]; splicing takes significantly less time than gene transcription [16]. Nevertheless, some dramatic examples of post-transcriptional splicing – for example, in anucleate platelets upon activation and in developing fern gametes – remind us of the potential importance of not splicing co-transcriptionally in certain biological contexts [17,18].

The regulatory potential of co-transcriptional splicing is what the field has found so exciting. We started to consider the possibilities that Pol II and the spliceosome could interact physically, that transcription might influence splicing and vice versa, and that co-transcriptional RNA processing could yield a more “fit” mRNP [16,19]. We asked ourselves: how frequent is co-transcriptional splicing among genes and introns? Early studies were based on specific genes, which were handpicked because of special properties such as transcriptional inducibility, gene length, and accessibility to light and/or EM microscopy [20]. Since 2010, eight global studies, investigating multiple tissues and cell types in four organisms, have been published. The consensus view is that co-transcriptional splicing is widespread (see Table 1). These important studies are the subject of this review.

Global studies on co-transcriptional splicing in four different organisms, listed in chronological order of publication

Global studies of co-transcriptional splicing

The first global study on co-transcriptional splicing was undertaken in budding yeast, where it became clear that most introns are spliced co-transcriptionally; 50% of introns are >74% spliced before transcription termination [21]. Table 1 shows that co-transcriptional splicing frequencies are similarly high in fly and human cell lines and tissues [22-26]. Despite general agreement, there are important experimental and analytical differences among the studies. Carrillo Oesterreich et al. [21], Khodor et al. [23,27] and Tilgner et al. [24] all based their analyses on RNA isolated from biochemically purified chromatin, whereas Ameur et al. [22] and Windhager et al. [25] used total RNA sequencing and 4-thio-uridine (4sU)-labeling, respectively, to quantify co-transcriptional splicing. Even active spliceosomes and their nuclear location can be monitored, using protein biochemistry and immunofluorescence, showing that the majority of active spliceosomes is associated with chromatin [26]. That said, not all intron removal is co-transcriptional. In particular, terminal introns are least well removed co-transcriptionally, and 20% of activated spliceosomes in the cell are not chromatin-associated [24,26].

Table 1 and Figure 1 illustrate the key points of each global study and how co-transcriptional splicing frequencies were quantified. Due to different sample preparations and RNA pools analyzed, one can rarely apply the same method of analysis to a given dataset. For example, Ameur et al. performed total RNA sequencing; because exon reads could come from nascent or mature RNA, analysis was restricted to intron reads [22]. Gene architecture differences (e.g. short and few introns in yeast versus long and many introns in humans) also play a role. Three modes of analysis have been employed, using either junction reads – reads representing spliced (exon-exon) and unspliced (exon-intron and intron-exon) – or intron and/or exon coverage for calculating a splicing score:

Differences among the methods could well influence the numerical results obtained and/or the interpretations, sometimes making it hard to compare the studies. For example, intron length negatively correlates with co-transcriptional splicing frequency in Drosophila, mouse and human cells [23,24,27]. However, Ameur et al., focusing their analysis in human brain on highly expressed genes with long introns, conclude the opposite [22]. Experimental validation of RNA sequencing and array data by RT-qPCR strengthens and extends results from these approaches [21,22]. Though Ameur et al. could not calculate co-transcriptional splicing frequencies for short introns genome-wide, their RT-PCR results suggest that high co-transcriptional splicing observed for long introns can also be inferred for shorter ones [22]. Remarkably, numerous studies agree that constitutive splicing is more co-transcriptional than alternative splicing [22-24,27].

Last year, one study conducted in induced mouse macrophages reported that full-length, polyA cleaved RNAs accumulate on chromatin in a partially spliced state [28]. The inference that splicing is completed post-transcriptionally in this cell type has been rather hastily interpreted as evidence against co-transcriptional splicing in general [29,30]. However, no overall numerical values for co-transcriptional splicing are provided [28]. Direct comparison to the other global studies is more difficult, because this analysis calculates splicing values on a per gene basis, using coverage over the whole gene (see Table 1 and Figure 1); since splicing values vary from intron to intron (see above), calculations for individual splicing events are more informative. It is possible that the gene-based frequency of splicing yields an underestimate of intron removal; for example, summation of the coverage data from Tilgner et al. (see Table S2 in [24]) also yields a lower co-transcriptional splicing frequency than usage of splice junction reads (Table 1) [24]. Introns contain a number of stable RNAs, such as snoRNAs, which contribute reads that do not represent unspliced transcripts [31]. Gene-based calculations may be influenced by coverage biases that can reflect differences in nucleotide content (e.g. fraction GC), directionality of sequencing (e.g. from the 3’ end) or RT-priming, which all influence the preparation of RNAseq libraries [32]. The average co-transcriptional splicing frequencies obtained for each gene will likely be influenced by gene length and the total number of introns within the gene; terminal exons are long and generally full of reads, and terminal introns tend to be least well spliced co-transcriptionally [6,11,24]. Nevertheless, it is clear from this and previous studies (referenced within) that processing may be delayed in these cells, such that a higher proportion of splicing is post-transcriptional. It would be fascinating to know which introns are being retained and, indeed, whether all introns within the same transcript are retained. The relatively low co-transcriptional splicing frequencies from both mouse studies contrast sharply with the high co-transcriptional splicing frequencies from yeast, fly, and human (Table 1). Perhaps the easiest means of addressing this would be to analyze directly comparable human and mouse cell types.

It is difficult to resolve differences among studies when validation of co-transcriptional splicing frequencies by an independent method, such as RT-PCR, is omitted. Unfortunately, most current studies do not include validation. Validation acknowledges that something can be unexpected in either the experiment or the analysis, such as differences in biochemical purification, library biases or genome annotation [32]. For example, chromatin preparations can be contaminated with mRNA, which is highly abundant and could lead to an over-estimate in the degree of co-transcriptional splicing. A co-transcriptional process is one that occurs before polyA cleavage, so one would ideally like to incorporate this property into the validation. Due to fluctuations in read densities, the degree of polyA cleavage in the RNA sample can be difficult to ascertain from RNAseq. A prominently used assay employs reverse transcription to specifically copy only uncleaved transcripts, by utilizing a reverse primer placed downstream of the polyA cleavage site; subsequent PCR can query the spliced or unspliced status of the nascent RNA [11,12,21,33]. This method can be difficult in mammals, where polyA cleavage sites are hard to predict. Nevertheless, an independent, small-scale study focusing on 22 human genes was able to validate the high frequencies of co-transcriptional splicing seen in the global data, even among terminal and alternative introns [12].

Summary and future directions

Taken together, this array of high quality global studies enables us to reach a consensus on co-transcriptional splicing: it is widespread, albeit not 100%. Future challenges encompass the relative importance of co- and post-transcriptional splicing in terms of the fate of the RNA, on the one hand, and/or transcriptional activity of the gene, on the other hand. For example, histone modifications, which would have a bearing on co-transcriptional but not post-transcriptional splicing, can directly or indirectly recruit splicing factors and modify alternative splicing [34,35]. Moreover, transcription elongation rates are influenced by nucleosome positioning and histone modifications, which influence alternative splice site choice [16,35,36]. Co-transcriptional splicing may also have long-lasting effects on the RNA's lifetime, by ensuring proper assembly of export-competent mRNPs [19]. These examples show that co-transcriptional splicing is important for mRNA biogenesis. Co-transcriptional splicing has also emerged as an important regulator of transcriptional activity. It has long been known that the presence of promoter-proximal introns can stimulate gene expression [37-39]. Recent work shows that splicing feeds back to transcription through a distance-dependent enhancer-like activity of the first 5’ splice site [33].

Thus, genes and gene expression machinery have evolved coordinately to take advantage of crosstalk between transcription and splicing. If specific biological situations – such as the activation of transcriptional programs in macrophages or the repression of splicing in platelets – circumvent co-transcriptional processes, then perhaps there are additional regulatory reasons. In this sense, it is important to recognize that no study claims 100% of introns are 100% co-transcriptionally removed. Advances in high-throughput sequencing that enable sequencing of longer DNA molecules (in the kilobase range) will provide clarity and facilitate analysis, as well as providing insight into the order of intron removal and co-transcriptional dynamics of alternative splicing. Those introns, such as alternative introns, that are spliced post-transcriptionally may be subject to different regulatory mechanisms [19,40]. A challenge for the future will be to more fully explore the significance of post-transcriptional splicing for gene expression.

Acknowledgments

We thank members of our laboratory, Fernando Carrillo Oesterreich, Karen Adelman, Jean Beggs, and Thoru Pederson for helpful discussions and comments on the manuscript.

Abbreviations

mRNP

messenger ribonucleoprotein

Pol II

RNA polymerase II

polyA

polyadenylation

RNAseq

high-throughput sequencing of cDNA libraries (Illumina)

RT-PCR

reverse transcription polymerase chain reaction

RT-qPCR

reverse transcription - quantitative polymerase chain reaction

Notes

†The contribution of the first two authors is equal, and their names are listed alphabetically.

8. Wetterberg I, Baurén G, Wieslander L. The intranuclear site of excision of each intron in Balbiani ring 3 pre-mRNA is influenced by the time remaining to transcription termination and different excision efficiencies for the various introns. Rna. 1996;2:641–51.[PubMed]f1000.com/prime/717991040