All cDNA libraries were diluted in Nimblegen 2× hybridization buffer with 20% formamide (Roche, Basel, Switzerland) to a final concentration of 10 ng/µL. Aliquots (10 µL in 200 µL PCR tubes) were denatured at 98°C for 3 min followed by incubation at 64°C for 4 h to allow the reannealing of highly abundant DNA fragments present in each library. Following reannealing, the DNA libraries were diluted 1:5 in pre-warmed 10mM sodium phosphate supplemented with 20% formamide (Invitrogen, Carlsbad, CA, USA). 50 µL of this hybridized DNA was injected into a 2.5 µL HAC microcolumn at ~50°C and subsequently washed with 10mM sodium phosphate containing 20% formamide. The ss-cDNA fraction was eluted using 100 mM sodium phosphate and the remaining ds-cDNA fraction was eluted using 320 mM sodium phosphate. 10 µL of the ss-cDNA were purified using 16 µL AMpure XP beads following manufacturer's instructions and eluted in 25 µL of nuclease-free water.

All multiplexed E. coli and PBMC cDNA libraries (29) were combined at 20 nM DNA prior to size selection, re-concentrated using Zymo IC columns and eluted in 10 µL nuclease-free water. Size selection (250 ± 15 bps) was carried out using the Caliper system under manufacturer's instructions. Final multiplexed DNA libraries were visualized with the Agilent Bioanalyzer high sensitivity chip and quantified using the Kappa Biosystems qPCR assay (Woburn, MA, USA). SGS was performed on a HiSeq 2000 (Vincent J. Coates Genomics Sequencing Laboratory, Berkeley, CA, USA). Libraries were diluted to 10pM and sequenced (100 bps single ended) in two lanes. Raw sequence files were demultiplexed using CASAVA and fastq sequence files generated for each library were processed with a custom perl script to remove any sequence with an overall quality or length below acceptable thresholds (see Supplementary materials for quality filter script).

Bioinformatic analysis of PBMC RNA transcriptome

In-depth descriptions of the bioinformatics analysis used for this study are provided as supplemental methods. Briefly, end-trimming and quality filtering of Illumina sequence data were done with a custom Perl script. Alignments to the E. coli (K-12, MG1655) or human (GRCh37/hg19) genomes were done with Bowtie2 (30) and the resulting BAM files were queried with BEDTools (31) to provide coverage estimates by genomic compartment. Transcript abundances (as fragments per kilobase of exon per million fragments mapped or FPKM) were estimated by TopHat (32) and Cufflinks (8)

In-depth descriptions of the bioinformatics analysis used for this study are provided as supplemental methods. Briefly, end-trimming and quality filtering of Illumina sequence data were done with a custom Perl script. Alignments to the E. coli (K-12, MG1655) or human (GRCh37/hg19) genomes were done with Bowtie2 (30) and the resulting BAM files were queried with BEDTools (31) to provide coverage estimates by genomic compartment. Transcript abundances (as fragments per kilobase of exon per million fragments mapped or FPKM) were estimated by TopHat (32) and Cufflinks (8)

Results and discussion

Hydroxyapatite chromatography conditions

In developing the microcolumn HAC approach, we evaluated the impact of a wide range of assay conditions including elution buffer concentration, the impact of salt or formamide in the hybridization buffer and assay temperature. Both ss-cDNA and ds-cDNA elution conditions were empirically determined; the optimal phosphate concentration in the ss-cDNA elution buffer was 100 mM, comparable to the conditions used by Patanjali et al. (33). There was no discernible size distribution of separation efficiency observed using the Bioanalyzer; no observable contamination of ds-cDNA fragments as small as 25 bps was evident in the ss-cDNA fractions assayed (Supplemental Figure 1).

Since the HAC approach does not require maintenance of enzyme activity, it is tolerant to a wide range of hybridization conditions and buffer additives. High-salt concentration, common in hybridization buffers, did not impact DNA binding or elution from the HAC columns up to the highest concentration tested of 250 mM (data not shown). Likewise, addition of formamide to the hybridization buffer had no observable impact on HAC performance up to 20% formamide, the highest concentration tested. Microcolumn HAC could also be carried out at a wide range of temperatures depending on the needs of the hybridization. This assay was carried out at ~50°C with 20% formamide added to the hybridization buffer to minimize secondary structures in the ss-cDNA (i.e., hairpin loops, self-complementation) (Supplementary Figure 2).

The 2.5 µL cartridge was chosen for all of the HAC normalization experiments using prokaryotic E. coli K-12 and eukaryotic human PBMC cDNA libraries. This was the smallest column tested and generated both the smallest elution volume (~10–12 µL) and the least contamination (ΔCt 13.0) between samples if the column was to be reused for replicate samples. Column re-use was done for convenience in these experiments; however, the cartridges are easy to pack and can be used as single-use devices. (Supplementary Table 1). The 2.5 µL column still had more than enough capacity for the present experiments, with the capacity to bind ~500 µg of DNA according to the manufacturer's specifications. Columns smaller than this are not recommended; the resulting very small elution volumes (<<5µL) would be difficult to manipulate.

Overview of sequence data

All sequence data were generated using two Illumina HiSeq lanes that produced 11–15 million raw 100bp sequence reads per E. coli K-12 RNA-seq library and 9–16 million raw reads per human PBMC library (29). Raw sequence reads were subjected to a quality filter (qfilter) to remove low quality reads and Illumina adaptor sequences (supplementary methods). Following qfilter, a mean of 32% of raw reads were removed in the untreated E. coli K-12 RNA-seq libraries and 52% of raw reads were filtered out in the Ribo-Zero treated libraries. In contrast, only 18% of the total raw reads from the HAC RNA-seq libraries were removed due to low quality or primer contamination. All sequence reads that passed the qfilter from the E. coli libraries were aligned to the E. coli K-12 genome. In the untreated libraries 98% of filtered reads aligned to the E. coli K-12 genome when compared with 94% in the HAC and 86% in the Ribo-Zero RNA-seq libraries. Both the HAC normalized and Ribo-Zero treated E. coli libraries produced 3–5 fold more sequences that did not map to the E. coli K-12 reference genome (Supplementary Table 2). Human PBMC libraries following qfilter had 44% of mean raw reads rejected in the untreated controls and only 26% of raw reads removed from both the HAC and DSN normalized RNA-seq libraries. For quality filtered human reads, 97% mapped to the hg19 genome in the untreated libraries and 94% of reads mapped from the HAC and DSN normalized RNA-seq libraries. In our analysis of data quality and read-mapability to a reference genome, HAC normalized E. coli and human PBMC RNA-seq libraries generated high quality SGS data that increased the overall number of mapped reads to the E. coli K-12 or human hg19 reference genomes (Supplementary Table 2).

Transcriptome mapping to E. coli K-12 and human hg19 genomes

In order to assess cDNA populations enriched following HAC, DSN, or Ribo-Zero treatment, mapped reads were binned based on where the reads aligned to the E. coli K-12 or the hg19 genome. For E. coli, mapped reads were binned into the following categories: rRNA, genes, intergenic, and tRNA (Figure 2A), and for human RNA, the category breakdown was rRNA, miRNA, exon, intron, lincRNA, intergenic, mitochondrial RNA, and tRNA (Figure 2B). In the untreated E. coli cDNA controls, 86% of E. coli reads mapped to rRNA transcripts compared with 14.4% of reads in the HAC ss-cDNA fraction and 22.3% in the Ribo-Zero RNA-seq libraries. Effective removal of rRNA sequences in both the HAC normalized and Ribo-Zero cDNA libraries corresponded to a 9.7-fold and an 8.7-fold increase, respectively, in the total number of reads mapping to non-rRNA E. coli K-12 genes. In the human PBMC cDNA libraries, the HAC fractions had a mean 28% rRNA sequences mapped, the DSN libraries contained 39% rRNA sequences while the untreated controls had 84% rRNA mapped. Removal of rRNA sequences by HAC normalization resulted in a 4.5-fold increase in the overall mapped non-rRNA reads compared with 3.8-fold for DSN normalized cDNA libraries. Like the rRNA fraction, the tRNA gene fraction was also decreased by HAC and DSN normalization, but all other RNA categories were increased (Figure 2B, Supplementary Table 3). Further analysis of highly abundant small RNAs (<200bps) showed both HAC and DSN normalization biased against these RNA populations (miRNAs and snoRNAs), so only RNAs >200bps were used in downstream analyses (Supplementary Figure 3). Thus, HAC, DSN, and Ribo-Zero treated RNA-seq libraries led to a greater proportion of SGS reads mapping to non-rRNA species in the cDNA libraries, as well as more in-depth coverage over a broader array of E. coli and human PBMC RNA transcripts.