Research Projects

To solve future grand challenges, data, computational power and analytics expertise need to be brought together at unprecedented scale. The need for data has become even larger in the context of recent advances in machine learning. Therefore, data-centric digital systems commonly exhibit a strong tendency towards centralized structures. While data centralization can greatly facilitate analysis, it also comes with several intrinsic disadvantages and threats not only from a technical but more importantly also from a legal, political and ethical perspective. Rooting in sophisticated security or trust requirements, overcoming these issues is cumbersome and time consuming. As a consequence, many research projects are substantially hindered, fail or are simply not addressed. In this interdisciplinary project we aim at facilitating the implementation of decentralized, cooperative data analytics architectures within and beyond Helmholtz by addressing the most relevant issues in such scenarios. Trustworthy Federated Data Analytics (TFDA) will facilitate bringing the algorithms to the data in a trustworthy and regulatory compliant way instead of going a data-centric way. TFDA will address the technical, methodical and legal aspects when ensuring trustworthiness of analysis and transparency regarding the analysis in- and outputs without violating privacy constraints. To demonstrate applicability and to ensure the adaptability of the methodological concepts, we will validate our developments for the usage in medical research with the use case “Federated radiation therapy study” before disseminating the results.

Project Coordinators:

Prof. Dr. Mario Fritz

CISPA Helmholtz Center for Information Security

Dr. Ralf Floca

German Cancer Research Center (DKFZ)

Funding by the Helmholtz Association through the Initiative and Networking Fund

Genetic data is highly privacy sensitive information and therefore is protected under stringent legal regulations, making sharing it burdensome. However, leveraging genetic information bears great potential in diagnosis and treatment of diseases and is essential for personalized medicine to become a reality. While privacy preserving mechanisms have been introduced, they either pose significant overheads or fail to fully protect the privacy of sensitive patient data. This reduces the ability to share data with the research community which hinders scientific discovery as well as reproducibility of results. Hence, we propose a different approach using synthetic data sets that share the properties of patient data sets while respecting the privacy. We achieve this by leveraging the latest advances in generative modeling to synthesize virtual cohorts. Such synthetic data can be analyzed with established tool chains, repeated access does not affect the privacy budget and can even be shared openly with the research community. While generative modeling of high dimensional data like genetic data has been prohibitive, latest developments in deep generative models have shown a series of success stories on a wide range of domains. The project will provide tools for generative modeling of genetic data as well as insights into the long-term perspective of this technology to address open domain problems. The approaches will be validated against existing analysis that are not privacy preserving. We will closely collaborate with the scientific community and propose guidelines how to deploy and experiment with approaches that are practical in the overall process of scientific discovery. This unique project will be the first to allow the generation of synthetic high-dimensional genomic information to boost privacy compliant data sharing in the medical community.

Project Coordinators:

Dr. Matthias Becker

German Research Center for Neurodegenerative Diseases (DZNE)

Prof. Dr. Mario Fritz

CISPA Helmholtz Center for Information Security

Funding by the Helmholtz Association through the Initiative and Networking Fund