This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Summary: Protein–protein interaction detection methods are applied on a daily basis by molecular biologists worldwide. After generating a set of potential interactions, biologists face the problem of highlighting the ones that are novel and collecting evidence with respect to literature and annotation. This task can be as tedious as searching for every predicted interaction in several interaction data repositories, or manually screening the scientific literature. To facilitate the task of evidence mining and novelty assessment of protein–protein interactions, we have developed a Cytoscape plugin that automatically mines publication references, database references, interaction detection method descriptions and pathway annotation for a user-supplied network of interactions. The basis for the annotation is ConsensusPathDB—a meta-database that integrates numerous protein–protein, signaling, metabolic and gene regulatory interaction repositories for currently three species: Homo sapiens, Saccharomyces cerevisiae and Mus musculus.

Availability: The ConsensusPathDB plugin for Cytoscape (version 2.7.0 or later) can be installed within Cytoscape on a major operating system (Windows, Mac OS, Unix/Linux) with Sun Java 1.5 or later installed through Cytoscape's Plugin manager (category ‘Network and Attribute I/O’). The plugin is freely available for download on the ConsensusPathDB web site (http://cpdb.molgen.mpg.de).

1 INTRODUCTION

Due to the high explanatory power of protein–protein interactions for biological processes in health and disease (Ideker and Sharan, 2008), dedicated interaction detection methods like yeast-two-hybrid (Y2H) screening (Fields, 2005) and co-purification (Aebersold and Mann, 2003) are applied on a daily basis by molecular biologists worldwide and contribute to the completion of the map of protein–protein interactions for human and other species. An immediate task after generating a network of predicted interactions is to identify the ones that have not been published previously and to collect evidence for every single interaction from literature and annotation. This information is useful in order to estimate the performance of the interaction screen and to assess the contribution to the protein–protein interaction map of the species in question. To accomplish this task, biologists typically search their new data against every single protein–protein interaction repository like IntAct (Huntley et al., 2007) or MINT (Chatr-aryamontri et al., 2007). Even more tedious is the manual mining for interactions in scientific literature to collect the publication references and detection methods for the novel interaction list.

Cytoscape (Shannon et al., 2003) is a widely used, freely available software tool for visualization, manipulation and analysis of biomolecular interaction networks. To aid the process of interaction evidence mining, we have developed a plugin for Cytoscape that searches all interactions from the network of interest in the interaction space stored in ConsensusPathDB. ConsensusPathDB (Kamburov et al., 2009) is an interaction meta-database that integrates functional interaction repositories forming a heterogeneous interaction network which comprises protein–protein interactions, as well as signaling, metabolic and gene regulatory interactions. Currently, the database integrates 18 open-access repositories on human interactions and eight repositories for both yeast and mouse interactions and contains around 150 000 human, 195 000 yeast and 13 000 mouse distinct interactions (many of which are of non-binary nature, i.e. contain more than two interaction partners). In this article, we describe the functionality of the ConsensusPathDB plugin for Cytoscape and demonstrate its usage and performance.

2 DESCRIPTION

After installing the plugin, the user starts by loading the network of interest (denoted query network) represented by binary interactions in Cytoscape and launching the ConsensusPathDB plugin through Cytoscape's ‘Plugins’ menu (Fig. 1A). After setting a few parameters which we describe below, the user starts the evidence mining process. The plugin then communicates with the repository of ConsensusPathDB through a web service. Once the plugin sends the query network to the server, a search is executed on the server-side for all (or, optionally, just the selected) proteins and interactions from the query network in ConsensusPathDB through SQL queries. Proteins from the query network are matched to the data repository on the basis of accession numbers such as UniProt (The UniProt Consortium, 2010) or Ensembl (Flicek et al., 2010). Interactions from the query network are matched to the repository based on their participants.

(A) The splash screen of the plugin showing the different parameters; (B) the ConsensusPathDB visual style where reproduced interactions are weighted by evidence and novel interactions are highlighted in green; (C) newly imported attributes of a selected...

The performance of the interaction matching depends critically on how well proteins in the query network are annotated with accession numbers. In the case that accession numbers are not available, the user is prompted to specify whether the node labels represent accession numbers of a certain type. The interaction matching performance is influenced by two parameters, ‘protein annotation matching’ (strict/fuzzy) and ‘interaction cardinality matching’ (strict/allow containment). Strict protein annotation matching denotes that a protein from the query network and a protein from the database are considered identical only if all identifiers of a type match. Fuzzy matching means that the identifiers of the query protein may form a sub-set of the identifiers of the database counterpart or vice versa. Fuzzy matching is useful, e.g. when proteins on the one side are compared with protein families on the other side. The ‘interaction cardinality matching’ parameter specifies whether the binary interactions from the query network should be matched only with binary interactions from the database network (strict matching) or whether they may be matched to complex interactions, i.e. interactions of more than two proteins that contain the binary interactions. More details about protein and interaction mapping can be found in the Supplementary Material to this paper.

After matching proteins and interactions, the web service server sends annotation attributes for matched query interactions in the form of publication references (Pubmed identifiers), interaction detection methods, database references (such as IntAct and MINT) and pathway annotations (i.e. pathways that contain both participants of a protein–protein interaction) to the client plugin. The plugin creates a custom visual style in Cytoscape where the thickness of interaction edges reflects (optionally) the number of publications, number of containing interaction databases, number of distinct detection methods, or number of containing pathways for the protein interaction (Fig. 1B). Interactions that are not found in the repository, and thus represent potential novel interactions, are highlighted in green. In the results tab of Cytoscape, an interaction mapping summary is displayed together with a legend. The interaction attributes that have been retrieved from ConsensusPathDB can be viewed for selected interactions under the ‘Interaction details’ tab of the results panel (Fig. 1C). If applicable, this information is provided as web links to the primary data and can be viewed in a web browser.

Figure 1D shows the performance of the plugin implementation with respect to the mining of interaction annotation for different network sizes. Results show that even for large networks evidence mining executes in minutes, for example ~2 min for a network with 20 000 nodes. It should be noted, however, that the Internet connection speed of the client influences the overall speed of interaction matching.