Kernel methods for predicting protein-protein interactions

Asa Ben-Hur and William Stafford Noble

Abstract

Despite advances in high throughput methods for discovering
protein-protein interactions, the interaction networks of even
well-studied model organisms are sketchy at best, highlighting the
continued need for computational methods to help direct
experimentalists in the search for novel interactions.
We present a kernel method for predicting protein-protein interactions
using a combination of data sources, including protein sequences, Gene
Ontology annotations, local properties of the network, and homologous
interactions in other species. Whereas protein kernels proposed in
the literature provide a similarity between single proteins,
prediction of interactions requires a kernel between pairs of
proteins. We propose a pairwise kernel that converts a kernel
between single proteins into a kernel between pairs of proteins, and
we illustrate the kernel's effectiveness in conjunction with a support
vector machine classifier. Furthermore, we obtain improved
performance by combining several sequence-based kernels based on k-mer
frequency, motif and domain content and by further augmenting the
pairwise sequence kernel with features that are based on other sources
of data.

We apply our method to predict physical interactions in yeast using
data from the BIND database. At a false positive rate of 1% the
classifier retrieves close to 80% of a set of trusted interactions.
We thus demonstrate the ability of our method to make accurate
predictions despite the sizeable fraction of false positives that are
known to exist in interaction databases.