The Turing Way: A handbook for reproducible data science

Rachael Ainsworth - The University of Manchester and The Turing Way - Alan Turing Institute

Abstract:The Turing Way is a handbook to support students, their supervisors, funders and journal editors in ensuring that reproducible data science is "too easy not to do" (https://the-turing-way.netlify.com). It includes training material on topics such as version control and analysis testing, and will build upon Alan Turing Institute case studies and workshops. The project also demonstrates open and transparent project management and communication with future users, as it is openly developed at our GitHub repository: https://github.com/alan-turing-institute/the-turing-way. All resources associated with workshops we have delivered, as well as how to organise a Book Dash (a one-day book sprint), are also openly available.

Reproducible research is necessary to ensure that scientific work can be trusted. Funders and publishers are beginning to require that publications include access to the underlying data and the analysis code. The goal is to ensure that all results can be independently verified and built upon in future work, which is sometimes easier said than done. Sharing these research outputs means understanding data management, library sciences, software development, and continuous integration techniques: skills that are not widely taught or expected of academic researchers and data scientists.

During this session, we will lead a collaborative review of the handbook so far and show Open Science Fair participants how they can contribute their knowledge to make it even better going forwards or how to open up their own projects to a wider contributor community. This demo relates to the overall theme of the conference, as the Turing Way provides the tools to improve research habits in a self-contained handbook. It will also ensure that PhD students, postdocs, PIs and funding teams know which parts of the "responsibility of reproducibility" they can affect, and what they should do to nudge research and data science to being more efficient, effective and understandable.