System: FGASQ

IntroductionFGASQ is an efficient approximate substring query tool with support for local optimal matching. It was implemented using GNU C++.

FeaturesThe released source supports the following features: - build q-gram index for text - approximate substring search (local optimal similar substrings) - use q-gram index - do not use q-gram indexDownloadThe source code of FGASQ is available here for download. The software has been tested on a Linux environment (Ubuntu).

The following executable file will be generated: subsearch Perform approximate substring searchWe also provide example data here: example dataThe example data includes a 10M file text10M.txt as a text sequence and a file query5-100.txt containing 5 query sequences.RequirementsThe code could be run in Linux using g++ complier.

Step-by-Step Instruction

Run "./make" to compile the code and generate the executable file.

ALAE parameters are listed by typing “./subsearch”.

Syntax: subsearch <text file> <query file> -O <output file> -q <gram length> (default: 11) -H <threshold of edit distance> (default: 5) -G <use gram index or not> : 0 - do not use; 1 - use gram index (default);For example, run "./subsearch text10M.txt query5-100.txt -O result.txt" to perform approximate substring search for the example data. The q-gram index will be automatically built if the option use gram index is chosen and the final results would be in the file result.txt. Note: 1. Every single line in the query file would be treated as a query sequence.2. In our algorithm, we used a lower bound l+1-(k+1)*q where l is the length of the query, k is the ED threshold and q is the gram length. So, adjusting the value of parameter q to make sure this lower bound is above 0 is necessary. Otherwise, using q-gram index would not be able to speed up the query process. For example, in our example data, the length of the query is 100. So, if the threshold is fixed to 10, q should be adjust to below or equal to 9.DatasetsWe used human genomes in our experiments.