Research Talk: Touring Protein Space with Matt

Abstract

The central dogma of proteomics states that genetic sequence
determines protein (amino acid) sequence which in turn determines
structure and ultimately biological function. Thus, differences in
protein structure lead to differences in function. However, while
differences in sequence ultimately lead to differences in structure,
they do not do so at a uniform rate. For this reason, it can be
useful to understand the evolutionary relationship among proteins,
and how they relate in terms of structure.

Since 1995, biologists have organized known proteins into a
hierarchy according to the three-dimensional structure they form.
Structural Classification of Proteins (SCOP) [Murzin et al. 1995] is
one such manually-curated hierarchy. However, advances in
bioinformatics have resulted in SCOP being a serious bottleneck; it
cannot keep up with the flood of newly solved protein structures.
There have been attempts, such as Families of Structurally Similar
Proteins (FSSP) [Holm et al. 1992], to automate the structural
organization of protein space. However, such attempts have not
satisfied biologists.

We believe that advances in structural alignment tools, in
particular Multiple Alignment with Translations and Twists (Matt)
[Menke et al. 2008], allow us to organize protein space
automatically in a manner consistent with SCOP, and thus with the
needs of biologists. We present Touring Protein Space with Matt, an
automated clustering of solved protein space using the Matt
alignment tool and a distance metric derived from training on a
subset of SCOP. Our hierarchy of protein structure agrees well with
SCOP at the family level (corresponding to a high degree of
similarity among structures) and the superfamily level (a lesser
degree of similarity). We hope that this automated hierarchy will
solve the bottleneck problem of SCOP while providing a useful tool
for understanding evolutionarily related protein structures, as well
as a benchmark set for improving the quality of sequence alignment
tools.