Example 1 Eco1-family DOUTfinder analysis identifies an acetyltransferase domain within a provided group of orthologs. The true nature of this subsignificant domain similarity has been experimentally tested (PMID: 11864574).Example 2 DOUTfinder analysis using the single human SEF/IL17 protein as an input shows that the TIR domain appears multiple times as a subsignificant hit among the related proteins. The D-score identifies these hits as a likely true similarity. This similarity has been previously reported in PMID:12765832.&nbsp

Example 3DUF analysis. Analysis of domains of unknown function (DUFs) derived from Pfam18 database suggests that more that around 80 of these DUFs show probable similarities to already known domains. Several of these predictions could be later confirmed using the CLANS assignment within Pfam19.

About:Homologous Sequence Set input:
Sequences provided as a homologous sequence set will be made non-redundant, masked and afterwards analysed via rps-blast. Subsignificant rps-blast hits will be further evaluated using DOUTfinder. The use of selected conserved regions is advicable. Input is limited to 75 protein sequences in a fasta format.
Depending on the length of your query this analysis will take around several minutes, and is faster than the single sequence option, which requires a psi-blast search to be performed in addition.

Single Sequence input:
If a single sequences is provided an initial psi-blast search will be performed to collect a homologous sequence set. This initial analysis step will take several minutes. All consequent steps parallel the analysis of a homologous sequence set as described above.

Simple text file upload:
You can upload your sequences in a simple text format. To generate a simple text file you can use any text-processing software, and save file as text.

Rps-Blast flag: SEG Filter
You can determine whether rps-blast will be run with a SEG low complexity filter switched on (T) or off (F). If filtering is turned off, false positive hits can increase in compositionally biased regions.

Input flag: Coil Filter
You can determine whether a coiled-coil filter should be applied to your input (T) or not (F). If filtering is turned off, false positive hits can increase in compositionally biased regions.

Input flag: Transmembrane Filter
You can determine whether a HMMTOP-based filtering should be applied to your input (T) or not (F). If filtering is turned off, false positive hits can increase in compositionally biased regions.

Protein set flag: maximum-sequence identity
CD-HIT is used to obtain a non-redundant protein set with a user determined identity cut-off. The set is made non-redundant, in order to reduce noise due to highly similar sequences.

DOUT-analysis flag: Expect
Subsignificant domain hits (>0.01) are only taken into consideration as potential domain outliers in case the E-value is below this user defined threshold. False positive results are rare with the default setting of 0.01. Higher E-value thresholds give more false positives, while lower E-values increase reliability.

DOUT-analysis flag: Coverage
Subsignificant domain hits (>0.01) are only taken into consideration as potential domain outliers in case the coverage of the domain is above this user-defined threshold.

Single seq flag: Blastpgp rounds
If a single seqeuence is given as an input PSI-BLAST is used to obtain a sequence set. You can change the maximum number of PSI-BLAST passes to use in multipass version.

Single seq flag: Inclusion threshold
If a single seqeuence is given as an input PSI-BLAST is used to obtain a sequence set. You can change the E-value threshold for inclusion in this initial search.

Single seq flag: Database choise
If a single seqeuence is given as an input PSI-BLAST is used to obtain a sequence set. You can run the PSI-BLAST against two versions of the NCBI non-redundant database (mar06), which have both processed using cd-hit. nr80d is a 80% non-redundant derivate of NCBI nr supplemented by Pfam 19 seed files, and the CDD Smart and Pfam domain fasta files. nr90d is a 90% non-redundant derivate of NCBI nr supplemented by Pfam 19 seed files, and the CDD Smart and Pfam domain fasta files.