Phred Quality Values and ABI 3700 Data

We are checking the accuracy of phred quality values
when run on ABI 3700 chromatograms using quality value lookup
tables calibrated for ABI 373/377 chromatograms. We used phred
version 990722 for the tests described here.

Our goal is to determine the accuracy of the phred quality values
on 3700 data using the current lookup tables in phred and whether
we must modify phred for these data.

We obtained chromatograms and consensus sequence for finished human BAC
projects from the
Genome Sequencing Center
at Washington University in Saint Louis, selecting projects with at least
10% ABI 3700 chromatograms. These projects contain 3700 chromatograms
generated using dye primer and terminator chemistries with the POP5 matrix
and dye terminator chemistry with the POP6 matrix. Essentially standard
run conditions were used to generate these data except for the dye
terminator POP6, which were run at 37C. Table 1 summarizes the quantities of
aligned reads and bases that we used for this work.

Table 1.

Aligned Reads and Bases

chemistry

matrix

number projects

number reads

number bases

primer

POP5

9

5767

3477864

terminator

POP5

18

10177

6354554

terminator

POP6

12

8274

4354631

Methods

We performed the following procedures to process the data for these tests.

run phred on the appropriate chromatograms to call bases
and assign quality values

run cross_match to mask vector sequence in the reads of each project

run cross_match to compare the reads to the finished sequence of
each project

identify high quality discrepancies with quality values of at least 40 and
possibly remove or alter them depending on the apparent reason for the
discrepancy. For example, the discrepancy may be due to a spurious alignment
of a contaminant read or misidentification of the location of a deletion in
a mononucleotide run of bases.

bin the aligned and discrepant bases by their phred quality
values (predicted quality values) and calculate the `observed' quality value
of each bin for each chemistry-matrix combination

plot the observed quality values against the phred quality values

Results

Dye Primer POP5
The dye primer POP5 quality value
accuracy plot
shows good quality value accuracy up to about quality value 25. For larger
phred quality values, the observed quality values are progressively
lower, meaning that phred underestimates the error rates. We examined
discrepancies with assigned quality values of 40 and higher and found a
greater tendency to form compressions in comparison to slab gel runs. Many
of the additional compressions have stem/loop motifs that are not a problem
with slab gels. We consider the number of aligned bases used in this test
to be marginal and hope to obtain additional data in the near future to
improve our confidence in the result.

Dye Terminator POP5
The dye terminator POP5 quality value
accuracy plot
shows consistently good agreement between the phred and observed
quality values up to quality value 30. For higher phred quality values,
the observed quality values vary around the phred values without an
apparent trend, suggesting that the variations are due to statistical
fluctuations resulting from the relatively small number of aligned bases
used for the test.

Dye Terminator POP6
The dye terminator POP6 quality value
accuracy plot
shows consistently good agreement between the phred and observed
quality values up to and slightly above quality value 30. For higher
phred quality values, the observed quality values, again, vary
around the phred values without a clear trend, suggesting that the
variations are due to the relatively small number of aligned bases.
need more data

Conclusions

Based on these limited data sets, it appears that the current phred
version (990722) assigns quality values with good accuracy up to quality value
25 for all tested dye chemistry/matrix run combinations. For dye primer
chemistry run in the POP5 matrix, the phred quality values above 25
show a trend of progressively overestimating the quality. This trend appears
to be due to a greater tendency of the strands to form compressions during the
electrophoresis in comparison to slab gel runs, suggesting that we will
need to modify phred to recognize a greater range of stem/loop
motifs, and possibly create a quality value lookup table specifically for it.
For dye terminator chemistry run on the POP5 and POP6 matrices,
the phred quality values maintain good accuracy up to about quality
value 30. Between phred quality values 30 and 40, the observed
quality values exhibit modest, apparently random, variation around the
phred quality values; the variation increases above quality value
40. This indicates that the phred quality values are generally
valid for these dye terminator data but we need additional data to improve
our confidence in the tests.