Bioinformatics=(ACGAAG->AK)+(#!/bin/sh)+(P(A|B)=P(B|A)*P(A)/P(B))

Next-generation Sequencing Quality Control Tools

FASTX-Toolkit: The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/ FASTQ files preprocessing, including:

FASTQ-to-FASTA converter

Convert FASTQ files to FASTA files.

FASTQ Information

Chart Quality Statistics and Nucleotide Distribution

FASTQ/A Collapser

Collapsing identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)

FASTQ/A Trimmer

Shortening reads in a FASTQ or FASTQ files (removing barcodes or noise).

FASTQ/A Renamer

Renames the sequence identifiers in FASTQ/A file.

FASTQ/A Clipper

Removing sequencing adapters / linkers

FASTQ/A Reverse-Complement

Producing the Reverse-complement of each sequence in a FASTQ/FASTA file.

FASTQ/A Barcode splitter

Splitting a FASTQ/FASTA files containning multiple samples

FASTA Formatter

changes the width of sequences line in a FASTA file

FASTA Nucleotide Changer

Convets FASTA sequences from/to RNA/DNA

FASTQ Quality Filter

Filters sequences based on quality

FASTQ Quality Trimmer

Trims (cuts) sequences based on quality

FASTQ Masker

Masks nucleotides with ‘N’ (or other character) based on quality

Galaxy NGS QC and manipulation tools (citation): Galaxy provides a tool suite that functions on all of the commonly known FASTQ format variants and provides a pipeline for manipulating next generation sequencing data taken from a sequencing machine all the way through the quality filtering steps.

NGSQC (citation):The NGSQC pipeline provides a set of novel quality control measures for quickly detecting a wide variety of quality issues in deep sequencing data derived from two dimensional surfaces, regardless of the assay technology used. It also enables researchers to determine whether sequencing data related to their most interesting biological discoveries are caused by sequencing quality issues. NGSQC can help to ensure that biological conclusions, in particular those based on relatively rare sequences, are not caused by low quality sequencing.

NGS QC Toolkit (citation): A toolkit for the quality control (QC) of next generation sequencing (NGS) data. The toolkit comprises of user-friendly stand alone tools for quality control of the sequence data generated using Illumina and Roche 454 platforms with detailed results in the form of tables and graphs, and filtering of high-quality sequence data. It also includes few other tools, which are helpful in NGS data quality control and analysis.

PRINSEQ (citation): PRINSEQ can be used to filter, reformat, or trim your genomic and metagenomic sequence data. It generates summary statistics of your sequences in graphical and tabular format. It is easily configurable and provides a user-friendly interface.