View/Open

Date

Item status

Embargo end date

Author

Metadata

Abstract

The reproducibility of scientific studies is an important issue facing modern biology. A large
number of studies published today cannot be reproduced, and the situation has been
described as a reproducibility crisis. It has been shown that the inclusion of computational
analysis within a study, adds a further level of complexity in reproducing the findings in that
study. Even the reproduction of only the computational component of a study is fraught with
difficulty. When provided with the source data, a list of the tools used and a protocol, it can
still be difficult to produce the same results. One reason for this is that variation between
different tools, versions, configurations, dependencies, operating systems and hardware, all
contribute towards variation in the results. The work presented here addresses the problem
of reproducibility through the design and implementation of a novel reproducible analysis
system, Cumulus. The Cumulus system combines technologies such as virtualisation and
high-throughput workflow systems, to automate the process of fully recording an analysis
environment. Recording of an analysis environment allows it to be shared and reliably
reproduced by other researchers. Automating this process enables reproduction of
bioinformatic analysis by high-throughput analysis systems.
The thesis then goes on to show how the Cumulus system was applied to reproduce and
amend a published RNA-seq analysis and to create a novel proteomic analysis pipeline. This
proteomic pipeline was then used in the analysis of a pilot study, to identify binding partners
of the Nanog protein, dependant on a part of the protein previously shown to be required for
the maintenance of pluripotency. This analysis resulted in the identification of a novel Nanog
interactome. In addition to this, a further set of tools are presented, including the Stembio
Visualisation Framework, a framework which enables the construction of interactive
visualisations using the Cumulus system. The initial application of this framework has been
accepted as part of a publication in the Journal of Experimental Medicine.