TFBSsearch.

TFBSsearch Instructions.

First screen: Masking your sequences?

TFBSsearch searches for transcription factor binding sites in areas
of complete identity between two or more aligned sequences, or in single
sequences. Highly conserved features, such as coding exons or repeats
should be masked to reduce the number of false positive hits.
A GFF file should be available for each sequence, containing
features that need to be masked.

If you wish exons to be masked then you should selected the 'Mask
features' box and select the number of sequences present in your
multiple sequence alignment.

Second screen: Set options for TFBSsearch.

Input options:Select strands to search.
Specifies which strand to search. Default is to search both strands.

Search using IUPAC strings.
A search may be defined as a comma delimited list of IUPAC patterns.
This is useful for a quick one-off search.

NOTE there should be no spaces in the list.
Only the following IUPAC characters are allowed:

A Adenine
C Cytosine
G Guanine
T Thymine
M AMino (A or C)
R PuRine (A or G)
W Weak (A or T)
S Strong (C or G)
Y PYrimidine (C or T)
K Keto (G or T)
V Not T (A or C or G)
H Not G (A or C or T)
D Not C (A or G or T)
B Not A (C or G or T)
N ANy (A or C or G or T)

Alternatively Searches may also be defined by a file containing a
carriage return separated list of IUPAC strings. This option is useful
if you periodically search for the same patterns.

TRANSFAC accession numbers may also be entered as a carriage
return seperated list file.

A list of TRANSFAC accession numbers and their name
can be viewed via this [LINK].

Output options:

Select format of output.
The 'unaligned' output reports of the location of conserved sites
will be in unaligned format (i.e. gaps '-' will be ignored). As
the unaligned numbering will be different in each
species, a reference sequence is used (see below).

Unaligned numbering is used as default. NOTE that
SynPlot converts
the unaligned numbering from a GFF file to plot
features, therefore unaligned numbering should be
selected if the GFF files are going to be used with SynPlot.

The 'aligned' output reports the feature in the global alignment position,
i.e. gaps '-' are respected.

Selecting a reference sequence.
This option is used to specify which sequence will be
used as the reference if unaligned numbering is in
use. It must correspond to the name of the sequence as
it appears in the FASTA sequence file.

Defaults to first sequence in the multiple fasta file.

Advanced output options:Select a name for GFF 'source' column.
Corresponds to the source field of the GFF file.
Defaults to 'TFBSsearch'.

Select a name for GFF 'feature' column.
Corresponds to the feature field of the GFF file.
Defaults to 'CNS' (i.e. conserved non-coding sequence).

Select to ensure the same motif is found in all sequences of the
alignment.
If this option is used ambiguous IUPAC codes (e.g. N= [ACTG] or
S= [CG]) specified in an IUPAC string or pattern will have to match
in the sequence alignment.

Select conserved range (deviation from exact alignment).
If this option is specified, TFBSsearch will allow the
sites to occur upto x bases apart in the two sequences
(i.e. they will not have to be exactly aligned).

Unless you are searching for a long and not very
degenerate motif (i.e. one that will not occur often
by chance), it does not make much sense to set x more
than a few bases (or even use this option at all).
However, if set at 1 or 2, it will allow small mis-
alignments to be ignored.

Select sequences to exclude from input file, OR leave blank to select all.
The list (which must not include spaces) can contain
one or more of the sequence names as they appear in
the alignment file. These sequences will be ignored when
looking for conserved motifs. For example, if you have
a multiple aligned file of four species:

> human .....
> mouse .....
> dog .....
> rat .....

and use the option with 'mouse,dog' then TFBSsearch will
only look for motifs that are conserved between human
and rat. Note, however, that the gaps generated by the
original 4-way alignment will be preserved and that
this will likely give you a different output to a
TFBSsearch search of a straight 2-way human-rat alignment.

Search using an IUPAC pattern.
This allows for a search for a pattern of IUPAC
strings to be searched. The format of 'pattern' is a
little complex. The * character is used as a delimiter
between alternate comma-separated (no spaces) lists of
IUPAC strings and ranges. For example:

GGAA,GATA*8-12*GGAA,GATA*8-12,18-22*CANNTG

This will search for an ETS or GATA site, then 8-12
bases, followed by a second ETS or GATA site, then
either 8-12 or 18-22 bases, then an EBOX site.

Search using an NWM pattern.
This is similar to using an IUPAC pattern (above) in syntax,
except that instead of the IUPAC strings, NWM accession numbers
are used.

Select a threshold for a NWN search.
The threshold to use for a NWM search. Remember to
supply the percent (%) sign e.g. 90%.