Struct2Net

Abstract Struct2Net is a web server for predicting interactions between arbitrary protein pairs using a structure-based approach.

Prediction of protein-protein interactions (PPIs) is a central area of interest and successful prediction would provide leads for experiments and drug design; however, the experimental coverage of the PPI interactome according to the manufacturer remains inadequate.

The manufacturer believes that Struct2Net is one of the first community-wide resources to provide structure-based PPI predictions that go beyond homology modeling.

Also, most web-resources for predicting PPIs currently rely on functional genomic data (e.g. GO annotation, gene expression, cellular localization, etc.). The manufacturer’s structure-based approach is independent of such methods and only requires the sequence information of the proteins being queried.

For the most commonly studied organisms (fly, human and yeast), predictions have been pre-computed and can be retrieved almost instantaneously.

For proteins from other species, users have the option of getting a quick-but-approximate result (using orthology over pre-computed results) or having a full-blown computation performed.

Struct2Net structure-based computational predictions --

Struct2Net is a server for structure-based computational predictions of protein-protein interactions (PPIs). Each prediction is also assigned a “score”, indicating the confidence in that prediction.

The predictions here may be used for proteins Not well-covered by experimental PPI datasets or used to shortlist the set of potential interactions to be experimentally validated.

Alternatively, it can be used as a genome-scale data source that is independent of most functional genomic data and can be integrated with the latter [e.g. co-expression, co-localization, Gene Ontology (GO) description].

How Struct2Net works --

Given two (2) protein sequences, the structure-based interaction prediction technique threads these two (2) sequences to all the protein complexes in the Protein Data Base (PDB) and then chooses the best potential match.

Based on this match, the method generates alignment scores, z-scores, and an interfacial energy for the sequence pair.

Logistic regression is then used to evaluate whether a set of scores corresponds to an interaction or Not. The algorithm is also extended to find all potential partners given a single protein sequence.

Note: Further details about this method are described in the following paper:

There has been significant interest for the systems biology community for computationally predicted PPIs, partly because the coverage of experimental PPI data remains relatively noisy and limited.

The manufacturers believe that the structure-based PPI predictions provided by Struct2Net are Not available anywhere else.

The value of this method, the manufacturers believe, lies in the effort they have put into identifying high-confidence positive and negative examples of PPIs as inputs to machine learning algorithms and the extensive computational effort involved in making each prediction.

It is independent of other computational approaches to predicting PPIs. You can use the predictions made here for proteins Not well-covered by experimental PPI datasets.

Alternatively, you can combine these predictions with your own predictions (using, say, gene co-expression) to achieve better sensitivity and specificity (as stated above…).

About the confidence score threshold used in Struct2Net --

The minimum score threshold for inclusion of a PPI in the manufacturer’s database corresponded to roughly 80% specificity on the test set.

The manufacturers believe a lower specificity than this would Not be useful.

You can choose an even higher threshold, to further increase specificity (i.e., reduce false positives). The manufacturers suggest two (2) ways to choose such a threshold.

One is to aim for a certain degree of sensitivity and specificity. The ‘Excel spreadsheet file’ (accessible on the manufacturer’s web-site, via the ‘About page’) describes sensitivity and specificity as a function of the chosen threshold for the test set.

For human PPIs, the manufacturers suggest choosing a higher threshold than for yeast or fly. The reason is that human proteins are over-represented in the PDB and since the manufacturer’s approach uses templates based on these structures, the manufacturer’s scores for human PPIs are typically somewhat better.

The second way to choose a threshold would be to start with the total number of PPIs you expect to see for the species.

You can use the number of experimentally known PPIs to calibrate your assumption (roughly, 49,000 for yeast, 23,000 for fly and 27,000 for human).

Then, using the ‘tables file’ (accessible on the manufacturer’s web-site, via the ‘About page’), you can choose a threshold that’s likely to give you the expected number of PPIs.

Evaluation of the Prediction Algorithm for Struct2Net --

Given a pair of proteins, the first stage of the manufacturer’s algorithm uses a structure-based threading approach to compute a set of numbers that express the quality of the putative complex structure formed by the two (2) proteins’ interaction.

The second stage of the algorithm uses logistic regression to integrate these various measures and compute a single score (between 0 and 1).

A score of 0 indicates minimal confidence in the possibility of an interaction between the two (2) proteins while a score of 1 indicates maximum confidence.

The manufacturer’s description of the training of the logistic regression predictor and its evaluation is also located on the manufacturer’s web-site, on the ‘About page’.

Note: Further info about Struct2Net is described in the following paper: