In order to assess cDNA populations enriched following HAC, DSN, or Ribo-Zero treatment, mapped reads were binned based on where the reads aligned to the E. coli K-12 or the hg19 genome. For E. coli, mapped reads were binned into the following categories: rRNA, genes, intergenic, and tRNA (Figure 2A), and for human RNA, the category breakdown was rRNA, miRNA, exon, intron, lincRNA, intergenic, mitochondrial RNA, and tRNA (Figure 2B). In the untreated E. coli cDNA controls, 86% of E. coli reads mapped to rRNA transcripts compared with 14.4% of reads in the HAC ss-cDNA fraction and 22.3% in the Ribo-Zero RNA-seq libraries. Effective removal of rRNA sequences in both the HAC normalized and Ribo-Zero cDNA libraries corresponded to a 9.7-fold and an 8.7-fold increase, respectively, in the total number of reads mapping to non-rRNA E. coli K-12 genes. In the human PBMC cDNA libraries, the HAC fractions had a mean 28% rRNA sequences mapped, the DSN libraries contained 39% rRNA sequences while the untreated controls had 84% rRNA mapped. Removal of rRNA sequences by HAC normalization resulted in a 4.5-fold increase in the overall mapped non-rRNA reads compared with 3.8-fold for DSN normalized cDNA libraries. Like the rRNA fraction, the tRNA gene fraction was also decreased by HAC and DSN normalization, but all other RNA categories were increased (Figure 2B, Supplementary Table 3). Further analysis of highly abundant small RNAs (<200bps) showed both HAC and DSN normalization biased against these RNA populations (miRNAs and snoRNAs), so only RNAs >200bps were used in downstream analyses (Supplementary Figure 3). Thus, HAC, DSN, and Ribo-Zero treated RNA-seq libraries led to a greater proportion of SGS reads mapping to non-rRNA species in the cDNA libraries, as well as more in-depth coverage over a broader array of E. coli and human PBMC RNA transcripts.

While depleting high abundant rRNA sequences from a total RNA population prior to SGS, it is critical to maintain overall relative abundances of intracellular RNAs by minimizing biases toward underrepresented non-rRNA populations. In this study we utilized the E. coli K-12 genome to obtain deep coverage across the entire transcriptome to assess any potential biases introduced by our HAC normalization method, and then compared these results with the commercially available Ribo-Zero kit. First, we analyzed total E. coli K-12 transcriptome coverage profiles generated by the HAC and Ribo-Zero RNA-seq libraries (Figure 3). In both cases, E. coli K-12 transcriptome profiles were comparable across technical replicate series as well as to the untreated RNA-seq control libraries. Second, we compared the enrichment of gene-coding RNA transcripts by HAC and Ribo-Zero treated RNA-seq libraries based on intracellular RNA transcript abundance and size (Figure 4, Supplementary Figure 4). Enrichment of E. coli K-12 gene-coding transcripts normalized against the untreated controls showed an overall 6-fold enrichment across the entire E. coli K-12 transcriptome regardless of transcript abundance or size. Human PBMC total RNA-seq libraries treated by both DSN and the microcolumn HAC normalization methods increased the number of hits on exon sequences by 3–4 fold, but showed more variability when comparing across transcript abundance and size ranges (Figure 4, Supplementary Figure 3). The observed variability in the PBMC RNA-seq libraries is most likely due to the lack of transcriptome coverage depth across the hg19 genome. The evenness of 5′ and 3′ end coverage of gene coding RNA transcripts was assessed for both E. coli K-12 and human PBMC RNA-seq libraries (Supplementary Figure 5) and difference in gene coverage at the terminal ends between the untreated and treated RNA-seq libraries was found. The increased number of SGS hits to gene coding RNA transcripts in both the E. coli K-12 and human PBMC HAC normalized RNA-seq libraries resulted in broader and deeper coverage of those sequences without an appreciable bias on the overall transcriptional profile.

Hydroxyapatite chromatography is a well-established method for separating different nucleic acid species (RNA, ssDNA, dsDNA) from complex sample types. However, the methodology has not been widely applied in SGS RNA-seq library preparation workflows, possibly as a result of perceived disadvantages (34) including labor intensiveness, unacceptably high starting material requirements, and poor reproducibility (17). The micro-column HAC normalization method described here eliminates such issues and provides a viable alternative to other commercially available rRNA depletion kits for RNA-seq applications. This micro-column HAC system reproducibly separated ss-cDNA versus ds-cDNA fractions from simple (E. coli K-12) and complex nucleic acid populations (human PBMCs) and effectively reduced the number of highly abundant intracellular rRNA sequences found in the ssDNA fractionated RNA-seq libraries to levels comparable to the commercially available Ribo-Zero and DSN normalization kits. HAC-based cDNA normalization methods feature several advantages over the current rRNA depletion protocols, including a greater flexibility in total RNA sample input amount, assay conditions (temperature, reagents) and sample types (prokaryotic and eukaryotic) at a fraction of the cost per sample (<< $1 per sample in reagent costs). It should also be noted that HAC-based cDNA normalization can be easily integrated into any RNA-seq library preparation protocol that generates ds-cDNA as a product prior to SGS. Moreover, HAC normalization preserves the rRNA enriched ds-cDNA fraction for further analysis, if desired, which is especially useful when sequencing highly complex environmental samples where comprehensive 16S rRNA profiling is often required for characterizing diverse microbial communities (35).

Acknowledgments

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. The authors are grateful to Ron Renzi for his support of the technology components utilized in this work.