e-Science

Scientific applications are more and more faced with very large volumes of raw and derived data and complex, together with resource-intensive long-running workflows that process or analyze these data.

Work at the Databases and Information Systems group in the area of e-Science includes:

Dynamic and Freshness-aware Management and Archiving of Large Volumes of Data: Scientific data is usually big data and thus needs novel approaches for managing data in a cost-effective way (e.g., regarding distribution, partitioning, and/or replication) and for providing access to archived data by taking into account the access patterns to these data.

Crowdsourcing Data Capturing and Data Analysis: Novel, so-called Citizens' Observatories, make use of the wisdom of the crowd for capturing data that is subsequently used in e-Science applications, and/or for analyzing the data. Important aspects are the integration of the contributions of several heterogeneous crowd workers in a coherent data set.

Predicatable execution of long-running services and workflows: In order to provide dedicated execution guarantees for long-running e-Science workflows, the necessary resources have to be allocated and reserved in advance. In our work, we have developed an approach that provides advance resource reservation for service-oriented complex scientific workflows that optimize resource consumption based on user-defined criteria (e.g., cost or time). It exploits optimization techniques using genetic algorithms for finding optimal or near-optimal allocations in a distributed system consisting of competing service providers, with each of them offering only limited resources.