Data Integration Architectures and Methodology for the Life Sciences

Definition

Given a set of data sources, data integration is the process of creating an integrated resource combining data from the data sources, in order to allow queries and analyses that could not be supported by the individual data sources alone. Biological data sources are characterized by their high degree of heterogeneity, in terms of their data model, query interfaces and query processing capabilities, data types used, and nomenclature adopted for actual data values. Coupled with the variety, complexity and volumes of biological data available, the integration of biological data sources poses many challenges, and a number of methodologies, architectures and systems have been developed to support it.

Historical Background

If an application requires data from different data sources to be integrated in order to support users' queries and analyses, one possible solution is for the required data transformation and aggregation functionality to be encoded into the application's...

Maibaum M, et al. Cluster based integration of heterogeneous biological databases using the AutoMed toolkit. In: Proceedings of the 2nd International Workshop on Data Integration in the Life Sciences; 2005. p. 191–207.Google Scholar