Options

Positive and negative sample

Two Sample Logo calculates statistical significance of the relative
position-specific symbol frequencies between two sets of aligned
sequences. For example, sequences that are known to share a sequence
motif may be locally aligned including positions upstream or downstream
from the motif. All aligned sequences in both samples are required to
be of the same length, so dash characters ("-") should be used to pad
the positions in case some sequences are shorter.

Sequences that contain a motif and at the same time have a certain
functional property (say, protein modification sites or transcription
factor binding regions) constitute a positive sample. Sequences
that contain the motif and at the same time do not have the functional
property constitute the negative sample. The distinction between
the samples does not necessarily have to be based on the presence and
absence of a functional property: as long as there is a clear way of
interpreting the data, any pair of sets of aligned sequences can be
used as positive and negative.

Sequences can be entered as flat files, or in FASTA
or ClustalW formats.

Sequence type

Either amino acid or nucleotide. If amino acid option is selected, all
symbols other than the standard 20 amino acid single-letter codes will
be replaced with dashes and will not be a part of the statistics. Likewise,
if nucleotide option is selected, all symbols other than the A, C, G,
T, and U will be replaced with dashes.

Statistical tests

Two Sample Logo supports two types of statistical tests:

two sample t-test

binomial test

Both tests estimate the p-value of the null hypothesis that both positive
and negative samples were generated by the same distribution, however
they use different assumptions.
Two sample t-test

Frequently used statistical procedure that tests whether two samples
were generated by the same Gaussian distribution. The assumptions of the
t-test are that all observations are independent and that the standard
deviations for both samples are identical, then it checks the equality
of means (Hogg and Craig, 1994).

Binomial test

Consider two 0-1 samples S1 and S2 of sizes n1
and n2 respectively, in which symbol 1 occurred k1 times
in S1 and k2 times in S2. Let us also assume
that the test statistic is the absolute difference of symbol’s relative
frequencies, i.e. θ = |k1/n1 – k2/n2|.
The binomial test calculates the probability that a difference ≥θ for the two
samples of sizes n1 and n2 randomly drawn from the underlying null distribution
could occur by chance alone. Since, according to the null model, both samples
are independent and identically distributed, an unbiased estimate of the
probability of success p of the underlying binomial distribution is calculated
as the relative frequency of occurrence of a symbol when S1 and S2
are concatenated, i.e. p = (k1 + k2)/(n1 +
n2). The achieved significance level P of the null hypothesis is
then the probability that the difference ≥θ will be observed between the
estimated success probabilities in the two samples of sizes n1 and
n2 randomly drawn from the underlying distribution. It is
calculated as:

P-value

P-value is defined as the lowest significance level at which the null
hypothesis can be rejected. In the case of two sample logos, null
hypothesis assumes that each symbol at each position in both samples is
generated according to the same probability distribution. Based on the
null hypothesis, p-value is calculated as the probability that the test
statistic as extreme or more extreme than in the original samples can
occur by chance alone. Here, the test statistic is the absolute value of
the difference in relative frequencies between positive and negative
samples. Since in most cases this probability cannot be calculated
exactly, p-value is only approximated.

Show conserved residues

Because conserved motifs will not be enriched nor depleted in the positive
sample in comparison to the negative sample (the difference of their
relative frequencies will be zero), by default they will not be displayed
in the logo. Checking this option forces the software to show conserved
residues.

Fixed height symbols

When this option is checked, all enriched and depleted symbols will have the
same height. When it is not checked, the height of the symbols will be
proportional to the difference of relative frequencies of corresponding
residues at a given position in the positive and negative sample.

Bonferroni correction

A correction of the p-value in cases when multiple dependent or
independent hypotheses are tested. See (Weisstein) for
details.

Advanced options

Title

Sets up the title for the two sequence logo.

Logo range

Limits the analysis to the specified colums in the samples of aligned
sequences.

First position index

Index assigned to the first symbol in the logo. For example, if the sample
is a 25 residue-long window centered around an active site, first position
symbol should be -12: then the active site will have index 0, and the last
symbol will be indexed as +12. The default value is 1.