Protein ranking by semi-supervised network propagation

Jason Weston, Rui Kuang, Christina Leslie, William Stafford Noble

BMC Bioinformatics. 7(Suppl 1):S10, 2006.

Abstract

Background: Biologists regularly search DNA or protein
databases for sequences that share an evolutionary or functional
relationship with a given query sequence. Traditional search methods,
such as BLAST and PSI-BLAST, focus on detecting statistically
significant pairwise sequence alignments and often miss more subtle
sequence similarity. Recent work in the machine learning community has
shown that exploiting the global structure of the network defined by
these pairwise similarities can help detect more remote relationships
than a purely local measure.

Methods: We review RankProp, a ranking algorithm that exploits
the global network structure of similarity relationships among
proteins in a database by performing a diffusion operation on a
protein similarity network with weighted edges. The original RankProp
algorithm is unsupervised. Here, we describe a semi-supervised
version of the algorithm that uses labeled examples. Three possible
ways of incorporating label information are considered: (i) as a
validation set for model selection, (ii) to learn a new network, by
choosing which transfer function to use for a given query, and (iii)
to estimate edge weights, which measure the probability of inferring
structural similarity.

Results: Benchmarked on a human-curated database of protein
structures, the original RankProp algorithm provides significant
improvement over local network search algorithms such as PSIBLAST.
Furthermore, we show here that labeled data can be used to learn a
network without any need for estimating parameters of the transfer
function, and that diffusion on this learned network produces better
results than the original RankProp algorithm with a fixed network.

Conclusion: In order to gain maximal information from a
network, labeled and unlabeled data should be used to extract both
local and global structure.