Funktionen

Comparison of Reproduction Platforms for Continuous Science

The idea of Continuous Integration (CI), i.e. the execution of test and build workflows on every commit, has an analogy in science:
The reproduction of scientific insights becomes "continuous" in the sense that
updates to data lead to the recalculation of statistical hypotheses like
an update to source code leads to the rebuilding of software assets in the context of CI.
This idea can even be put a step further, if the scientific data themselves are
created entirely in silico, e.g. if the data are created by a simulation.
We can then talk of replication (for a differentation between reproduction/reproducibility and replication see [1]).

There are various platforms supporting such continuous science workflows:

These Reproduction Platforms for Continuous Science (RPCS) differ (amongst other things) with regard to the set of feature they support, their performance, their license, their installation procedure and the protocols and standards they support.

Goal of this thesis will be to research all relevant platforms, develop a comparison scheme (such as a capability model) and test them against that scheme.

This master thesis can also be worked on by a group of motivated bachelor students.

Requirements

Know-How in or motivation to learn:

Continuous Integration tools (Jenkins, Bamboo, gitlab-ci)

Linux system administration (setting up and operating services on linux machines)

Interest in issues of reproducibility of scientific findings, (a nice read to start is [4])

Good skills in written English and sociability, since you probably will contact developers of several platforms to ask for code, documentation or support

Tasks

Research of all relevant RPCS (if necessary define a catalogue of criteria to include/exclude RPCS)

Test-installation of all chosen RPCS

Development of a comparison scheme between those platforms

Data collection necessary for the comparison

Presentation of the collected data

Compute Resources

Installing the RPCS might necessitate computing resources exceeding the capacities of a normal desktop computer. In that case cloud computing capacities provided by the LRZ can be used.