Motif-based protein ranking by network propagation

R. Kuang, J. Weston, W. S. Noble and C. Leslie

Bioinformatics. 21(19):3711-3718, 2005.

Abstract

Sequence similarity often suggests evolutionary relationships
between protein sequences that can be important for inferring
similarity of structure or function. The most widely-used homology
detection tools, such as BLAST and PSI-BLAST, are pairwise sequence
comparison algorithms that use sequence-sequence or profile-sequence
alignments to return a ranked list of sequences similar to a query.
However, these methods often fail to detect less conserved
remotely-related targets.

In this paper, we propose a new general graph-based propagation
algorithm called MotifProp to detect more subtle similarity
relationships than pairwise comparison methods. MotifProp is based on
a protein-motif network, in which edges connect proteins and
the k-mer based motif features that they contain. We show that our
new motif-based propagation algorithm can improve ranking over a base
algorithm, such as PSI-BLAST, that is used to initialize the
ranking. Despite the complex structure of the protein-motif-network,
MotifProp is an easily interpretable approach. Activation scores of
motif nodes provided by MotifProp are useful for retrieving top motifs
important to the ranking, which is a natural motif selection method
for discovering conserved structural components in remote
homologies. We can also map these activation scores of feature nodes
onto the query sequence to extract motif-rich regions, and by
comparing these regions with PDB annotations, we find that these
propagation-induced motif-rich regions contain meaningful structural
and functional information.