Gliederung

Introduction

To investigate molecular-genetic causes and effects of diseases and their therapies it becomes increasingly important to combine data from clinical trials with high volumes of experimental data generated using various chip technologies and their annotations. We present our approach to integrate such data for two large collaborative cancer research studies in Germany – the Molecular Mechanism of Malignant Lymphoma (MMML) and the German Glioma Network. Our platform interconnects a commercial study management system (eRN) with a data warehouse-based gene expression analysis system (GeWare) [Ref.Â 1]. We utilize a generic approach to import different anonymized pathological and patient-related annotations into the warehouse. The platform also integrates different forms of experimental data and public molecular-genetic annotation data and thus supports a wide range of collaborative analyses for both clinical and non-clinical parameters.

Methods

We have developed a comprehensive data integration and analysis platform at the University of Leipzig interconnecting two existing data management systems. On the one hand the study management system eRN allows users at participating institutions to remotely enter all data typically handled in traditional clinical trials e.g. patient-related personal, clinical, and pathological data. To support high data quality the system implements different rule-based input and consistency checks which indicate input imbalances or missing data to be corrected by users. On the other hand the GeWare system deals with chip-based gene expression and array-CGH data and comprises different reports and analysis methods. Chip data is much more voluminous than the patient-related data and cannot be stored within eRN. GeWare provides web interfaces to upload new experimental data and to specify further annotations like laboratory parameters. To combine patient-related data with chip-based data for combined analysis, GeWare also imports a subset of patient-related data from eRN in a generic manner using so called annotation templates. While the patient-related data is identified by the patient identifier, the chip-based data utilizes a chip identifier from which the patient identifier can not be derived. We thus provide a mapping table associating each chip identifier with the corresponding patient identifier to correctly combine clinical, pathological and experimental data and to permit an over-spanning data analysis. In addition, GeWare integrates publicly available gene/clone annotation data for extended analysis possibilities. This data integration is performed by a query mediator approach [Ref.Â 2].

Results

We established a warehouse-based platform combining clinical experimental chip data for large-scale collaborative cancer research studies and based on two dedicated subsystems for managing clinical trials and gene expression analysis. Selected clinical annotations were imported by daily transfer from the study system and combined with data of centrally performed molecular-biological high-throughput experiments. Annotations are managed generically to easily support different studies and changing analysis needs. Grouping functions for genes, probes and samples that can be used later within analyses are available. Interactive Analyses for data visualization (e.g. heatmaps as displayed in Figure 1 [Fig.Â 1]) allow a quick overview for hypothesis generation and statistic reports indicate significant values of the large-scale array data. Furthermore desired data can be extracted for specific analyses outside the platform.

Discussion

The analysis platform described here proved to be a valuable tool for storing, accessing and analysing high-dimensional gene expression and array-CGH data together with clinical, histopathological and other experimental data. The web-based interface allows interactive analyses for experimenters and the results are stored for further methods. The platform runs successfully within the cancer project MMML and will be extended for the aims of the German Glioma Network.