igrep

Submitted jobs

Submit a new job

A genome to search. Totally 26 assembled genomes are collected from ftp://ftp.ncbi.nih.gov/genomes. Their sizes vary from 3.50Gnt to 0.19Gnt, accounting for 44Gnt in total.

A set of queries. A query consists of a pattern of alphabet A, C, G, T, N, followed by an edit distance. N is a wildcard and can match either A, C, G, or T in the genome. The pattern length must be between 1 and 64. The edit distance must be between 0 and 9, and must not exceed the pattern length. Substitution, insertion and deletion have a uniform cost of one edit distance. For each job, up to 10,000 queries will be processed.

The output from igrep is twofold:

log.csv: summary of queries and results.

pos.csv: ending positions of matches. For each query, up to 1,000 matches will be returned.