Bottom Line:
Using a list of known disease-related genes from our earlier study as a training set ("seeds"), and the rest of the known genes as a test list, we perform large-scale cross validation to rank the candidate genes and also evaluate and compare the performance of our approach.Even though network-based methods are generally not as effective as integrated functional annotation-based methods for disease candidate gene prioritization, in a one-to-one comparison, PPIN-based candidate gene prioritization performs better than all other gene features or annotations.Additionally, we demonstrate that methods used for studying both social and Web networks can be successfully used for disease candidate gene prioritization.

Background: Although most of the current disease candidate gene identification and prioritization methods depend on functional annotations, the coverage of the gene functional annotations is a limiting factor. In the current study, we describe a candidate gene prioritization method that is entirely based on protein-protein interaction network (PPIN) analyses.

Results: For the first time, extended versions of the PageRank and HITS algorithms, and the K-Step Markov method are applied to prioritize disease candidate genes in a training-test schema. Using a list of known disease-related genes from our earlier study as a training set ("seeds"), and the rest of the known genes as a test list, we perform large-scale cross validation to rank the candidate genes and also evaluate and compare the performance of our approach. Under appropriate settings - for example, a back probability of 0.3 for PageRank with Priors and HITS with Priors, and step size 6 for K-Step Markov method - the three methods achieved a comparable AUC value, suggesting a similar performance.

Conclusion: Even though network-based methods are generally not as effective as integrated functional annotation-based methods for disease candidate gene prioritization, in a one-to-one comparison, PPIN-based candidate gene prioritization performs better than all other gene features or annotations. Additionally, we demonstrate that methods used for studying both social and Web networks can be successfully used for disease candidate gene prioritization.

Figure 3: Plots of AUC with different parameter values. The left panel shows the AUC values of PageRank with Priors with back probability varied from 0.01 to 0.9. The right panel shows the AUC values of the K-Step Markov method with random walk length varied from 1 to 6. The vertical bars indicate the standard deviations.

Mentions:
Based on our results, we observed that in terms of performance, HITSP was similar to PRankP under different back probability values. Therefore, only PRankP was tested for extreme back probability values such as 0.01 and 0.05. The 13 different test conditions (PRankP with back probability 0.01, 0.05, 0.1, 0.3, 0.5, 0.7, 0.9; KSMarkov with k = 1, 2, 4, 6; and HITSP with Priors with back probability 0.3 and 0.5) along with the AUC values from each validation run are listed in Table 1. Each of the methods, with the same parameter settings, was repeated 5 times. The performance values derived from each of the methods with respect to a particular parameter value are summarized in Table 2. The plots of AUC with different parameter values are shown in figure 3. The best performance of each method was selected, namely, PRankP and HITSP with back probability 0.3 and KSMarkov with K = 4, for Analysis of Variance (ANOVA). The p value of 0.5585 suggests that there is no significant difference among the best performance of the three methods.

Figure 3: Plots of AUC with different parameter values. The left panel shows the AUC values of PageRank with Priors with back probability varied from 0.01 to 0.9. The right panel shows the AUC values of the K-Step Markov method with random walk length varied from 1 to 6. The vertical bars indicate the standard deviations.

Mentions:
Based on our results, we observed that in terms of performance, HITSP was similar to PRankP under different back probability values. Therefore, only PRankP was tested for extreme back probability values such as 0.01 and 0.05. The 13 different test conditions (PRankP with back probability 0.01, 0.05, 0.1, 0.3, 0.5, 0.7, 0.9; KSMarkov with k = 1, 2, 4, 6; and HITSP with Priors with back probability 0.3 and 0.5) along with the AUC values from each validation run are listed in Table 1. Each of the methods, with the same parameter settings, was repeated 5 times. The performance values derived from each of the methods with respect to a particular parameter value are summarized in Table 2. The plots of AUC with different parameter values are shown in figure 3. The best performance of each method was selected, namely, PRankP and HITSP with back probability 0.3 and KSMarkov with K = 4, for Analysis of Variance (ANOVA). The p value of 0.5585 suggests that there is no significant difference among the best performance of the three methods.

Bottom Line:
Using a list of known disease-related genes from our earlier study as a training set ("seeds"), and the rest of the known genes as a test list, we perform large-scale cross validation to rank the candidate genes and also evaluate and compare the performance of our approach.Even though network-based methods are generally not as effective as integrated functional annotation-based methods for disease candidate gene prioritization, in a one-to-one comparison, PPIN-based candidate gene prioritization performs better than all other gene features or annotations.Additionally, we demonstrate that methods used for studying both social and Web networks can be successfully used for disease candidate gene prioritization.

Background: Although most of the current disease candidate gene identification and prioritization methods depend on functional annotations, the coverage of the gene functional annotations is a limiting factor. In the current study, we describe a candidate gene prioritization method that is entirely based on protein-protein interaction network (PPIN) analyses.

Results: For the first time, extended versions of the PageRank and HITS algorithms, and the K-Step Markov method are applied to prioritize disease candidate genes in a training-test schema. Using a list of known disease-related genes from our earlier study as a training set ("seeds"), and the rest of the known genes as a test list, we perform large-scale cross validation to rank the candidate genes and also evaluate and compare the performance of our approach. Under appropriate settings - for example, a back probability of 0.3 for PageRank with Priors and HITS with Priors, and step size 6 for K-Step Markov method - the three methods achieved a comparable AUC value, suggesting a similar performance.

Conclusion: Even though network-based methods are generally not as effective as integrated functional annotation-based methods for disease candidate gene prioritization, in a one-to-one comparison, PPIN-based candidate gene prioritization performs better than all other gene features or annotations. Additionally, we demonstrate that methods used for studying both social and Web networks can be successfully used for disease candidate gene prioritization.