Abstract

Background

Multigenic diseases are often associated with protein complexes or interactions involved
in the same pathway. We wanted to estimate to what extent this is true given a consolidated
protein interaction data set. The study stresses data integration and data representation
issues.

Results

We constructed 497 multigenic disease groups from OMIM and tested for overlaps with
interaction and pathway data. A total of 159 disease groups had significant overlaps
with protein interaction data consolidated by iRefIndex. A further 68 disease overlaps
were found only in the KEGG pathway database. No single database contained all significant
overlaps thus stressing the importance of data integration. We also found that disease
groups overlapped with all three interaction data types: n-ary, spoke-represented
complexes and binary data – thus stressing the importance of considering each of these
data types separately.

Conclusions

Almost half of our multigenic disease groups could potentially be explained by protein
complexes and pathways. However, the fact that no database or data type was able to
cover all disease groups suggests that no single database has systematically covered
all disease groups for potential related complex and pathway data. This survey provides
a basis for further curation efforts to confirm and search for overlaps between diseases
and interaction data. The accompanying R script can be used to reproduce the work
and track progress in this area as databases change. Disease group overlaps can be
further explored using the iRefscape plugin for Cytoscape.