Research Software Group

The LUX-ZEPLIN project is building the largest and most sensitive dark matter detector of its type ever constructed. The detector will be built a mile underground in the Sanford Underground Research Facility (SURF) in Lead, South Dakota and is due to go live in 2020.

Potential detector materials are currently being screened prior to their use in the experiment, and the results are collated and analysed using a 43-sheet Microsoft Excel spreadsheet. The spreadsheet has worked well to date, allowing researchers to share and view data, but moving to a more versatile and robust database solution will be very useful once the experiment begins, says Dr Alex Lindote, LZ Background Simulations project lead, who is based at Laboratory of Instrumentation and Experimental Particle Physics (LIP)-Coimbra, Portugal.

Lindote set up the spreadsheet in late 2015, bringing in data from a Google spreadsheet that had been set up by researchers to share their data.

“It was getting hard to track who was making changes and what was happening, so I was asked to start taking care of it. I decided to move it to an Excel file that I could control more easily,” Lindote says.

After a busy year, we need a little break to get ready for everything we plan to do in 2018, like our Collaborations Workshop 2018. So please excuse us while we switch off our email and social media from the 23rd December to the 3rd January.

As part of my open call consultancy for LUX-ZEPLIN (LZ), I was asked to review web frameworks for Python, in particular those that could be used with MongoDB, the database management system used by LZ. In this blog post, I survey four frameworks for implementing web applications: Django, TurboGears, Flask and Pyramid.

These four web frameworks were selected from the many available because they each meet LZ’s requirements that they can be deployed under the popular Apache web server, that they support authentication and authorisation, and that they support directly, or via third-party libraries, the use of MongoDB for holding application-specific data. Additionally, the four web frameworks are popular, with a large user communities and each have permissive open source licences. This latter selection criteria follows our guide on Choosing the right open source software for your project which summarises factors to be considered when choosing open source software for use on projects.

As part of my open call consultancy for LUX-ZEPLIN (LZ), I was asked about the feasibility of developing a web service that accepted Python code from users and executed their code server-side within a Linux environment. In this blog post I give a brief overview of a number of approaches that could be taken to implement such a service, focusing on those that protect the web service, and its underlying server, from code that is, whether by accident or design, malicious.

First things first, developing a web service that accepts Python code from users and runs this server-side is, in itself, it is not technically challenging. Any developer could knock up a proof-of-concept quite rapidly. The challenges are how to ensure that the web service is able to successfully run a user’s code, and how to protect the web service from the user’s code.

The first challenge, how to ensure that the server is able to successfully run a user’s code, can be restated as how to ensure that users only submit code that can successfully run on the server. At its simplest, this can be handled by publishing information about the environment within which the server will run the user’s code (e.g. operating system version, Python interpreter and…

In Using Excel for data storage and analysis in LUX-ZEPLIN, I summarised how Excel is both used and managed within the LUX-ZEPLIN (LZ) project and recommendations for improvements. In this second of two blog posts, I describe how LZ could migrate their data within Excel to MongoDB with supporting software, in Python, for computation and presentation. I also describe a proof-of-concept which extracts data from Excel, populates MongoDB with this data, and computes the radiogenic backgrounds expected from a subset of the possible sources of contamination.

As a reminder, the BG table is an Excel spreadsheet, with 43 sheets, used by LZ to calculate radiogenic backgrounds, and the WS Backgrounds Table is a sheet within the BG table which summarises the radiogenic backgrounds expected during the lifetime of the experiment from each source of contamination.

Migrating from Excel to MongoDB and Python

Excel combines data, computation and presentation. For example, a cell with a formula in Excel is a combination of data and computation, in effect a tiny program. The migration plan was based around migrating from the BG table into a solution…

The LUX-ZEPLIN (LZ) project are building one of the largest and most sensitive dark matter detectors ever constructed. I’ve been providing consultancy, as part of an Institute open call project, on how LZ can migrate their data storage and analysis software from Microsoft Excel to a database management system-centred solution. In the first of two blog posts, I summarise how Excel is both used and managed within LZ and recommendations for improvements.

As described in my blog post at the outset of the consultancy, Shining a light on dark matter, LZ partners at University College London and University of Coimbra, maintain LZ's backgrounds control software. At the heart of the backgrounds control software is a Microsoft Excel spreadsheet (termed the “BG table”). While fit for purpose in the experiment’s early design and procurement stage, Excel is now reaching its limits in terms of sustainability, its ability to interface with other software in the experiment (for example, analysis software that interprets dark matter data), and the interface with…

As part of my open call consultancy for LUX-ZEPLIN (LZ), I looked at how LZ could migrate their data and computation from Excel to MongoDB and Python. There are many resources with valuable advice on cleaning data in Excel into a form suitable for analysis using Python, R or other data analysis packages. Unfortunately, how to handle formulae and cross-references is little discussed.

Based on my experiences, I have written a guide on “Tips for porting formulae from Excel into code” in which I provide some (hopefully) helpful hints on how to identify and highlight formulae and cross-references, which can help when porting these to Python or R, and to restructure tables so that raw data is contiguous, and so is easy read by data analysis packages or to export into a database or files. Feedback, suggestions and additional advice is more than welcome.

James is a Research Software Engineer at the Software Sustainability Institute. He received an MChem in Chemistry for Drug Discovery from the University of Bath, before joining the Institute for Complex Systems Simulation PhD programme in 2013. He joined the SSI in September of 2017.

During his PhD studies James worked on a number of software projects, including the Monte-Carlo simulation package ProtoMS. ProtoMS was successful in an SSI Open Call and received assistance in developing a test suite to ensure correctness of the core Fortran component. His role in this project was in further development of the test suite, both code and infrastructure, and in ensuring the reproducibility of simulation across a range of platforms and compilers.

This article is part of our series: Breaking Software Barriers, in which we investigate how our Research Software Group has helped projects improve their research software. If you would like help with your software, get in touch.

Adrian Hill, the project’s primary contact, talked to us about the usefulness of the Institute’s collaboration with the Met Office and EPCC to promote the uptake and development of MONC. Adrian especially highlighted the invaluable help he received from Mike Jackson, Research Software Engineer, in setting up the basis for what has progressed into successful software with unexpected benefits and long-term value, used by researchers as well as PhD and masters' students.

Collaborative efforts

In collaboration with EPCC (Edinburgh Parallel Computing Centre) and the Met Office, the Institute provided help to rewrite the Large Eddy simulation model (LEM) as its successor, the Met Office NERC cloud (MONC). MONC is a complete re-engineering of LEM, which preserves LEM's underlying science. MONC has been developed to provide a flexible community model that can exploit modern supercomputers…

The Research Software Group and the Software Sustainability have organised a Data Carpentry workshop, which will take place on 1st & 2nd August 2017 at the University of Southampton.

The course will cover data organisation in spreadsheets and OpenRefine, SQL for data management, and an introduction to R for data analysis. By the end of the workshop, learners will be able to more effectively analyse and manage their data to aid reproducibilty and to increase their chances of furthering their research.

For further information and registration, please visit the event page.

Data Carpentry is an international movement to teach researchers better software skills. For more information about Data Carpentry, visit their website.