Abstract

BACKGROUND:

High-throughput methods for obtaining global measurements of transcript and protein levels in biological samples has provided a large amount of data for identification of 'target' genes and proteins of interest. These targets may be mediators of functional processes involved in disease and therefore represent key points of control for viruses and bacterial pathogens. Genes and proteins that are the most highly differentially regulated are generally considered to be the most important. We present topological analysis of co-abundance networks as an alternative to differential regulation for confident identification of target proteins from two related global proteomics studies of hepatitis C virus (HCV) infection.

RESULTS:

We analyzed global proteomics data sets from a cell culture study of HCV infection and from a clinical study of liver biopsies from HCV-positive patients. Using lists of proteins known to be interaction partners with pathogen proteins we show that the most differentially regulated proteins in both data sets are indeed enriched in pathogen interactors. We then use these data sets to generate co-abundance networks that link proteins based on similar abundance patterns in time or across patients. Analysis of these co-abundance networks using a variety of network topology measures revealed that both degree and betweenness could be used to identify pathogen interactors with better accuracy than differential regulation alone, though betweenness provides the best discrimination. We found that though overall differential regulation was not correlated between the cell culture and liver biopsy data, network topology was conserved to an extent. Finally, we identified a set of proteins that has high betweenness topology in both networks including a protein that we have recently shown to be essential for HCV replication in cell culture.

CONCLUSIONS:

The results presented show that the network topology of protein co-abundance networks can be used to identify proteins important for viral replication. These proteins represent targets for further experimental investigation that will provide biological insight and potentially could be exploited for novel therapeutic approaches to combat HCV infection.

Highly differentially regulated proteins are preferentially targeted by pathogens and HCV. A. Pathogen interactors are enriched in differentially expressed proteins from cell culture experiments. The percentage of known pathogen interaction targets in general is shown for the top 20% of proteins ranked by differential regulation overall at each time point post-infection (red bars) versus background (blue bars). Statistically significant enrichment is indicated by asterisks with p-values less than 0.05 by Fisher's exact test. B. Pathogen interactors are enriched in proteins differentially expressed in patients with severe fibrosis. The percentage of pathogen targets in the top 20% of differentially regulated proteins is shown (red bars) versus the percentage in the other proteins (blue bars). None of these comparisons was significant by Fisher's exact test.

Network analysis of global proteomic data from HCV infection of Huh-7.5 cells allows identification of targets. We constructed association networks from proteomics alone (A) or with added protein-protein interactions (B) as described in the text, varying parameters as indicated in (Additional file : Table S2). The top 20% of proteins ranked by the topological degree (blue diamonds), betweenness (red squares), clustering coefficient (green diamonds), and closeness (purple x's) were evaluated for their enrichment in proteins that are known interaction partners of pathogen proteins. Fold enrichment (Y axis) is calculated as the percentage of pathogen interactors in the group divided by the percentage not in the group. In panel B the circled points indicate the values for the PPI network alone. Statistical significance is indicated in (Additional file : Table S2).

A. Comparison of topological measures in identification of important proteins. Networks were inferred as described in the text, varying parameters (see Additional file : Table S2), and topological measures calculated (X axis). The enrichment of known pathogen interactors in the top 20% of proteins ranked by each measure was calculated and is shown as the percentage enrichment in the group divided by the enrichment in background. The boxes represent the 25th and 75th percentiles, the bold bar represents the mean, and the dashed bars are at 1.5 times the interquartile range. The enrichment given by differential abundance is shown as a comparison. This figure shows that degree and betweeness perform much better than other, more sophisticated, measures of importance for these networks. B. Comparison of the enrichment of topological sub-types. For the cell culture co-abundance network integrated with PPIs we assessed statistical enrichment in pathogen targets for the following sub-types of topology: NH-NH, non-hub non-bottleneck nodes; HB, hubs; BN, bottlenecks; H-NB, hub non-bottlenecks; B-NH, bottleneck non-hubs; BH, bottleneck hubs. These sub-types were determined as the overlap of the top 20% of nodes ranked by degree (hubs) or betweenness (bottlenecks). The analysis shows that betweenness is the primary driver of importance in these networks similar to observations in directed regulatory networks.

Network analysis of global proteomic data from liver samples of HCV-positive patients. We constructed association networks from proteomics as described in the text, varying parameters as indicated in (Additional file : Table S3). The top 20% of proteins ranked by the topological degree (blue diamonds), betweenness (red squares), clustering coefficient (green diamonds), and closeness (purple circles) were evaluated for their enrichment in proteins that are pathogen interactors. Fold enrichment (Y axis) is calculated as the percentage of pathogen interactors in the group divided by the percentage not in the group. Statistical significance is indicated in (Additional file : Table S3).

Bottlenecks from cell culture-derived networks are enriched in proteins differentially regulated in HCV-positive patient samples. Bottlenecks were identified from a network derived from HCV infection of Huh-7.5 cells. The distribution of differential abundance values in proteomics from patient samples from the bottlenecks (red) versus non-bottlenecks (grey) are shown with the mean indicated by the dark line, the boxes are at 25th and 75th percentiles, and the dashed bars are at 1.5 time the interquartile range. P-values indicated are from a two-sided t test.

Topological enrichment of best networks. Bottlenecks (A) or hubs (B) were identified in the best networks inferred from the Huh-7.5 (blue bars) or fibrosis patient (red bars) proteomics data. Bottlenecks or hubs were examined for enrichment in general pathogen interactors (pathogen targets), specific HCV interactors (HCV targets), and proteins with cognate genes that exhibit positive evolutionary selection (selected). Fold-enrichment is calculated as the percentage of annotated proteins in the bottleneck group divided by the percentage of annotated proteins in non-bottlenecks or non-hubs as appropriate. Statistical significance was evaluated by Fisher's exact test, p-values for significant enrichment are indicated.

Overlap of predicted bottlenecks in networks from cell culture and patient data. Bottlenecks (top 10% or 20% as indicated) were identified for proteins in the networks derived from the HCV infected Huh-7.5 cell proteomic data and from liver biopsy samples from HCV-positive patients. Proteins (points) are plotted based on their betweenness centrality (number of paths passing through them) from the cell culture network (X axis) versus in the clinical network (Y axis). Boxes indicate a 10% or 20% threshold for bottleneck identification in both networks. Protein names for high-confidence shared bottlenecks are indicated.