Bottom Line:
We find that the citations of the new candidates in T1D-related publications are significantly (p<1e-7) more than random, even after excluding the co-citation with the known disease genes; they are significantly over-represented (p<1e-10) in the top 30 GO terms shared by known disease genes.Furthermore, sequence analysis reveals that they contain significantly (p<0.0004) more protein domains that are known to be relevant to T1D.CONCLUSION: Our study demonstrates the potential of the PPI information in prioritizing positional candidate genes for T1D.

ABSTRACTBACKGROUND: Proteins directly interacting with each other tend to have similar functions and be involved in the same cellular processes. Mutations in genes that code for them often lead to the same family of disease phenotypes. Efforts have been made to prioritize positional candidate genes for complex diseases utilize the protein-protein interaction (PPI) information. But such an approach is often considered too general to be practically useful for specific diseases. RESULTS: In this study we investigate the efficacy of this approach in type 1 diabetes (T1D). 266 known disease genes, and 983 positional candidate genes from the 18 established linkage loci of T1D, are compiled from the T1Dbase (http://t1dbase.org). We found that the PPI network of known T1D genes has distinct topological features from others, with significantly higher number of interactions among themselves even after adjusting for their high network degrees (p<1e-5). We then define those positional candidates that are first degree PPI neighbours of the 266 known disease genes to be new candidate disease genes. This leads to a list of 68 genes for further study. Cross validation using the known disease genes as benchmark reveals that the enrichment is ~17.1 fold over random selection, and ~4 fold better than using the linkage information alone. We find that the citations of the new candidates in T1D-related publications are significantly (p<1e-7) more than random, even after excluding the co-citation with the known disease genes; they are significantly over-represented (p<1e-10) in the top 30 GO terms shared by known disease genes. Furthermore, sequence analysis reveals that they contain significantly (p<0.0004) more protein domains that are known to be relevant to T1D. These findings provide indirect validation of the newly predicted candidates. CONCLUSION: Our study demonstrates the potential of the PPI information in prioritizing positional candidate genes for T1D.

Mentions:
To avoid any potential bias toward well studies genes (whose interaction with other genes are better characterized) (Oti and Brunner, 2007; Ideker and Sharan, 2008), we initially examined the PPI networks using information both from the HPRD annotation, and from the 2 HT data sets (Rual et al., 2005; Stelzl et al., 2005). Figure 1 presents the results. We found that the T1D genes interact with each other significantly more often than randomly selected gene sets. Of all 20152 known human genes (according to NCBI's Gene database), 9222 are annotated in HPRD, and 4157 in HT. For the 266 known T1D genes, 222 are annotated in HPRD, and 75 in HT. There are a total of 34398 edges (in network's language, each node represents one protein molecule, and an edge between two nodes means the two molecules interact with each other) among the 9222 proteins in HPRD, and 9277 edges among the 4157 proteins in HT. The numbers for the T1D genes are 169 in HPRD, and 25 in HT, respectively. In contrast, bootstrapping yields only 21.1±4.2 and 3.7±2.3 interactions for a random gene set of the same sizes. These are 8.0 and 6.8 fold enrichment, respectively. The results from HPRD and HT are comparable, and we do not observe any noticeable bias in the HPRD dataset. In the rest of this study, we used HPRD only as it contains more comprehensive information of PPI.

Mentions:
To avoid any potential bias toward well studies genes (whose interaction with other genes are better characterized) (Oti and Brunner, 2007; Ideker and Sharan, 2008), we initially examined the PPI networks using information both from the HPRD annotation, and from the 2 HT data sets (Rual et al., 2005; Stelzl et al., 2005). Figure 1 presents the results. We found that the T1D genes interact with each other significantly more often than randomly selected gene sets. Of all 20152 known human genes (according to NCBI's Gene database), 9222 are annotated in HPRD, and 4157 in HT. For the 266 known T1D genes, 222 are annotated in HPRD, and 75 in HT. There are a total of 34398 edges (in network's language, each node represents one protein molecule, and an edge between two nodes means the two molecules interact with each other) among the 9222 proteins in HPRD, and 9277 edges among the 4157 proteins in HT. The numbers for the T1D genes are 169 in HPRD, and 25 in HT, respectively. In contrast, bootstrapping yields only 21.1±4.2 and 3.7±2.3 interactions for a random gene set of the same sizes. These are 8.0 and 6.8 fold enrichment, respectively. The results from HPRD and HT are comparable, and we do not observe any noticeable bias in the HPRD dataset. In the rest of this study, we used HPRD only as it contains more comprehensive information of PPI.

Bottom Line:
We find that the citations of the new candidates in T1D-related publications are significantly (p<1e-7) more than random, even after excluding the co-citation with the known disease genes; they are significantly over-represented (p<1e-10) in the top 30 GO terms shared by known disease genes.Furthermore, sequence analysis reveals that they contain significantly (p<0.0004) more protein domains that are known to be relevant to T1D.CONCLUSION: Our study demonstrates the potential of the PPI information in prioritizing positional candidate genes for T1D.

ABSTRACTBACKGROUND: Proteins directly interacting with each other tend to have similar functions and be involved in the same cellular processes. Mutations in genes that code for them often lead to the same family of disease phenotypes. Efforts have been made to prioritize positional candidate genes for complex diseases utilize the protein-protein interaction (PPI) information. But such an approach is often considered too general to be practically useful for specific diseases. RESULTS: In this study we investigate the efficacy of this approach in type 1 diabetes (T1D). 266 known disease genes, and 983 positional candidate genes from the 18 established linkage loci of T1D, are compiled from the T1Dbase (http://t1dbase.org). We found that the PPI network of known T1D genes has distinct topological features from others, with significantly higher number of interactions among themselves even after adjusting for their high network degrees (p<1e-5). We then define those positional candidates that are first degree PPI neighbours of the 266 known disease genes to be new candidate disease genes. This leads to a list of 68 genes for further study. Cross validation using the known disease genes as benchmark reveals that the enrichment is ~17.1 fold over random selection, and ~4 fold better than using the linkage information alone. We find that the citations of the new candidates in T1D-related publications are significantly (p<1e-7) more than random, even after excluding the co-citation with the known disease genes; they are significantly over-represented (p<1e-10) in the top 30 GO terms shared by known disease genes. Furthermore, sequence analysis reveals that they contain significantly (p<0.0004) more protein domains that are known to be relevant to T1D. These findings provide indirect validation of the newly predicted candidates. CONCLUSION: Our study demonstrates the potential of the PPI information in prioritizing positional candidate genes for T1D.