Bottom Line:
TagDust2 extracts more reads of higher quality compared to other approaches.Taken together TagDust2 is a feature rich, flexible and adaptive solution to go from raw to mappable NGS reads in a single step.The ability to recognize and record the contents of raw reads will help to automate and demystify the initial, and often poorly documented, steps in NGS data analysis pipelines.

Background: Arguably the most basic step in the analysis of next generation sequencing data (NGS) involves the extraction of mappable reads from the raw reads produced by sequencing instruments. The presence of barcodes, adaptors and artifacts subject to sequencing errors makes this step non-trivial.

Results: Here I present TagDust2, a generic approach utilizing a library of hidden Markov models (HMM) to accurately extract reads from a wide array of possible read architectures. TagDust2 extracts more reads of higher quality compared to other approaches. Processing of multiplexed single, paired end and libraries containing unique molecular identifiers is fully supported. Two additional post processing steps are included to exclude known contaminants and filter out low complexity sequences. Finally, TagDust2 can automatically detect the library type of sequenced data from a predefined selection.

Conclusion: Taken together TagDust2 is a feature rich, flexible and adaptive solution to go from raw to mappable NGS reads in a single step. The ability to recognize and record the contents of raw reads will help to automate and demystify the initial, and often poorly documented, steps in NGS data analysis pipelines. TagDust2 is freely available at: http://tagdust.sourceforge.net .

Fig4: Demultiplexing of libraries with 5‘ and 3‘ linkers and 4nt barcodes assuming different sequencer error rates. From left to right: simulations using 8, 24 or 48 different barcodes. The top panels show the recall and the bottom panels the precision.

Mentions:
In a more complicated case, we add both 5’ and 3’ adapters (AGGGAGGACGATGCGG and GTGTCAGTCACTTCCAGCGG) to the simulated case from before (Figures 4 and 5). TagDust2 performs favorably in these cases. The additional long sequences make it easy to differentiate between real and random sequences and hence the recall is high.Figure 4

Fig4: Demultiplexing of libraries with 5‘ and 3‘ linkers and 4nt barcodes assuming different sequencer error rates. From left to right: simulations using 8, 24 or 48 different barcodes. The top panels show the recall and the bottom panels the precision.

Mentions:
In a more complicated case, we add both 5’ and 3’ adapters (AGGGAGGACGATGCGG and GTGTCAGTCACTTCCAGCGG) to the simulated case from before (Figures 4 and 5). TagDust2 performs favorably in these cases. The additional long sequences make it easy to differentiate between real and random sequences and hence the recall is high.Figure 4

Bottom Line:
TagDust2 extracts more reads of higher quality compared to other approaches.Taken together TagDust2 is a feature rich, flexible and adaptive solution to go from raw to mappable NGS reads in a single step.The ability to recognize and record the contents of raw reads will help to automate and demystify the initial, and often poorly documented, steps in NGS data analysis pipelines.

Background: Arguably the most basic step in the analysis of next generation sequencing data (NGS) involves the extraction of mappable reads from the raw reads produced by sequencing instruments. The presence of barcodes, adaptors and artifacts subject to sequencing errors makes this step non-trivial.

Results: Here I present TagDust2, a generic approach utilizing a library of hidden Markov models (HMM) to accurately extract reads from a wide array of possible read architectures. TagDust2 extracts more reads of higher quality compared to other approaches. Processing of multiplexed single, paired end and libraries containing unique molecular identifiers is fully supported. Two additional post processing steps are included to exclude known contaminants and filter out low complexity sequences. Finally, TagDust2 can automatically detect the library type of sequenced data from a predefined selection.

Conclusion: Taken together TagDust2 is a feature rich, flexible and adaptive solution to go from raw to mappable NGS reads in a single step. The ability to recognize and record the contents of raw reads will help to automate and demystify the initial, and often poorly documented, steps in NGS data analysis pipelines. TagDust2 is freely available at: http://tagdust.sourceforge.net .