If the probabilistic consensus algorithm is used it is possible to give
the expected number of errors in a particular consensus sequence. This is
produced by simply summing the error rates at each base.

Each confidence value has a known error rate determined by the formula
10^(-confidence / 10.0). We also know the frequency that each confidence value
occurs in the consensus sequence and hence know the expected number of errors
for each confidence value. Working on the assumption that we are likely to
check and fix the consensus bases with the lowest confidence values first,
this allows us to give information on the cumulative number of errors that we
would fix by checking every consensus base with a confidence value less than a
particular threshold.

The List Confidence option, in the View menu, provides this ability.
The dialogue simply allows selection of one or more contigs. Pressing OK then
produces a table similar to the following:

The above table tells us that we have 164068 bases in our consensus sequences
with an expected 169 errors (giving us an average error rate of one in 971).
Next it lists each confidence value along with the frequency of this value and
the expected number of errors. For any particular confidence value the
cumulative columns tell us how many bases in the sequence have the same or
lower confidences and how many errors are expected in those bases. From this
we know that if all these bases were checked and all the errors fixed we would
have a new expected error rate.

In the above table we see that there are 790 bases with confidence values of
10 or less. We expect there to be 157 errors in those 790 bases. As we expect
there to be about 169 errors in total that implies that manually checking
those 790 bases would leave only 12 undetected errors. Given that the sequence
length is 164068 bases this means an average error rate of 1 in 14069. Note
that this error rate could be achieved by checking only .48% of the total
number of consensus bases. In this particular example, editing the same
sequence with a 100% consensus cutoff using the either of the frequency bases
consensus methods would require checking 25165 bases (15.34%), although the
overall error rate would be better.

This page is maintained by
James Bonfield.
Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/gap4_122.html