Oliver Serang, Michael J. MacCoss and William Stafford Noble

Journal of Proteome Research. 9(10):5346-5357, 2010.

Abstract

The problem of identifying proteins from a shotgun proteomics
experiment has not been definitively solved. Identifying the proteins
in a sample requires ranking them, ideally with interpretable
scores. In particular, “degenerate” peptides, which map to multiple
proteins, have made such a ranking difficult to compute. The problem
of computing posterior probabilities for the proteins, which can be
interpreted as confidence in a protein’s presence, has been especially
daunting. Previous approaches have either ignored the peptide
degeneracy problem completely, addressed it by computing a heuristic
set of proteins or heuristic posterior probabilities, or estimated the
posterior probabilities with sampling methods. We present a
probabilistic model for protein identification in tandem mass
spectrometry that recognizes peptide degeneracy. We then introduce
graph-transforming algorithms that facilitate efficient computation of
protein probabilities, even for large data sets. We evaluate our
identification procedure on five different well-characterized data
sets and demonstrate our ability to efficiently compute high-quality
protein posteriors.