This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Torque is a tool for cross-species querying of protein–protein interaction networks. It aims to answer the following question: given a set of proteins constituting a known complex or a pathway in one species, can a similar complex or pathway be found in the protein network of another species? To this end, Torque seeks a matching set of proteins that are sequence similar to the query proteins and span a connected region of the target network, while allowing for both insertions and deletions. Unlike existing approaches, Torque does not require knowledge of the interconnections among the query proteins. It can handle large queries of up to 25 proteins. The Torque web server is freely available for use at http://www.cs.tau.ac.il/~bnet/torque.html.

INTRODUCTION

In a network querying problem, one is given a small network, corresponding to a known pathway or a complex of interest. The goal is to identify similar instances in a large network, where similarity is measured in terms of sequence or interaction patterns. The resulting matches constitute possible protein complexes in queried species.

Previous approaches to the query problem required precise information on the interaction pattern of the query and were usually limited to small queries (2–4 proteins). PathBLAST—a server for querying linear pathways within a protein–protein interaction (PPI) network (1)—was subsequently extended to allow searching for more general structures (2). A general framework for subnetwork querying was developed (3), but it is applicable only to very small queries due to its complexity. Two other methods are NetMatch, a Cytoscape (4) plugin implementing the work of Ferro et al. (5) that utilizes fast heuristics for subgraph isomorphism to identify approximate matches of queries within a collection of networks, and NetGrep (6), a system for searching networks for patterns corresponding to small sets of proteins with specified attributes and topology.

The Torque server implements a novel method for querying protein networks that does not require information on the interconnections (topology) among the query proteins (7) (Figure 1). This makes Torque applicable in broader scenarios, such as querying complexes or pathways whose topologies are not completely known, or even when querying from species for which PPI information is not available. Lacroix et al. (8) also studied queries with no topology information, but since their method is enumerative it was applied to very small queries (2–4 proteins). In contrast, Torque can currently support queries of up to 25 proteins. It was tested extensively on hundreds of queries of known complexes from a variety of species (7). It was shown to yield far more matches than the QNet topology-based approach (2), while providing results that are highly functionally coherent.

An example of a Torque query. (A) The query proteins; (B) the queried network. Colored vertices in the network match nodes of the same color in the query. Non-colored vertices do not match any query elements. The network induced by the vertices labeled...

IMPLEMENTATION AND FEATURES

The Torque web server implements the algorithms in (7) for querying protein sets across species. It combines three approaches: a dynamic programming method utilizing color coding, integer linear programming and a fast heuristic based on shortest paths. Torque automatically selects the best method to apply at each stage and outputs the highest scoring match. Scores are based on the underlying network structure, on interaction confidence values and on sequence similarities between matching proteins. The matching process is flexible, allowing a few insertions and deletions if needed. The server currently supports queries of size 4–25 (Figure 1).

Input

The input for Torque consists of:

a query set of proteins in species A;

their protein sequences;

a PPI network for species B;

the sequences of the network proteins.

All inputs are in simple text format:

the query set can be entered directly as a comma-delimited or whitespace-delimited list.

Protein sequences are given in the standard FASTA format.

The PPI network is given as a text file, where each row represents an interaction and contains the IDs of the interacting pair and a confidence value for it in the range [0, 1].

It is possible to use a single FASTA file (input 2) for many queries, if it contains the sequences for all proteins in all queries. When the query field is left blank, Torque will use all the proteins in input 2 as the query. If input 1 contains Uniprot protein IDs (www.uniprot.org), their sequences need not be entered in input 2; instead, Torque automatically retrieves them from the Uniprot database. For several target species, the user need not provide inputs 3 and 4. Currently, the server supports this option for the three target species Saccharomyces cerevisiae, Drosophila melanogaster and Homo sapiens. The user can indicate one of these target species instead of providing inputs 3 and 4. Details on how these networks were constructed can be found in (7).

The user can control two parameters of the algorithm, setting a trade-off between speed and sensitivity. First, the user can control the threshold for sequence similarities. Torque applies BLAST to find putative matches between query and target proteins. The user can set the threshold for BLAST similarity (E-value). By setting a lower threshold, less homologs will be identified for the query proteins, making the algorithm faster but less sensitive. The second parameter is a threshold for the confidence values of PPI network edges provided as part of input 3. Edges whose confidence value is lower than the threshold are discarded; hence, this parameter determines the sparsity of the target network and affects the number of possible matches and the running time.

Processing

The running time of Torque is typically a few minutes, but may be up to an hour, depending on the size and other properties of the query (for more details, see (7)). If several queries are submitted to the server at the same time, they are queued and executed sequentially. Rather than waiting for the results online, they can be accessed later, in two ways:

when a Torque job is started, it is assigned a nine-digit job ID. This ID can be used to access the results later from the main Torque page.

Before submitting a query, the user may enter an email address. Once Torque has finished processing the query, the results will be sent to the email address provided. This process is done automatically and the email address is then discarded.

Output

The web server generates a web page (Figure 2) with the image of the top-scoring match for the query in the target network, as well as an auxiliary file in.sif format that can be viewed using the Cytoscape software (4). The content of the.sif file includes, for each edge, a numerical value representing the confidence in the interaction it represents, as provided in the input. This value determines the thickness of the edge in the Cytoscape visualization. The image shows the subgraph induced by the top-scoring match in the PPI network. Each vertex is labeled with its protein name in the PPI network and its matching query protein, if such exists. Insertion vertices are shown in gray, and proteins from the query for which there was no match in the solution (deletions) are listed separately.

An example of the output of a Torque run. The query consists of 13 proteins. The match has 12 proteins with two insertions and three deletions.

A sample run

The following example uses as query the mouse DNA synthesome complex, downloaded from the CORUM website (http://mips.gsf.de/genre/proj/corum/index.html). This 13-member complex was queried in the yeast network (5430 proteins, 39 936 interactions). The result of the Torque run is shown in Figure 2. Examining the subnetwork identified by Torque we find that it is functionally coherent (nucleotidyltransferase activity, P < 1.4E − 10) and significantly intersects the yeast alpha DNA polymerase: primase complex, supporting its biological plausibility. This example can be run from the Torque main page by checking ‘Use example data’.

SUMMARY

The Torque web server allows users to run topology-free queries on predefined or user-provided target networks. The result is a subnetwork of the target network most similar to the query, and is presented both graphically and as a downloadable text file.