The Large Hadron Collider currently is the biggest machine in the world. It also produces real Big Data, computing up to 2 million tasks per day that accumulated to 25 petabytes in 2016. But there’s more. CERN, the European Organisation for Nuclear Research, is the birthplace of the World Wide Web, and myriads of projects and experiments investigating and explaining fractions of the universe.

One of CERN’s current projects is CLOUD - Cosmics Leaving Outdoor Droplets. It is an experiment investigating the possible link between cosmic rays and cloud formation, thereby providing more information about our atmosphere and climate. While it has nothing to do with cloud infrastructure per se, it nevertheless uses distributed infrastructure to process and manage its data output.

The vast amount of research data obviously needs an equally vast amount of computing infrastructure and collaboration. A hybrid cloud platform that supports high-performance computing and Big Data research therefore seems like the logical next step and that’s where the Helix Nebula Science Cloud (HNSciCloud) comes in - a European initiative aiming at connecting reliable hybrid cloud solutions to scientific institutions. Commercial cloud services have been barely used in today’s infrastructure for publicly funded research. There are several reasons for that, among them costs, security, privacy, and the fact that it still is fairly new territory. Research needs stable partners and HNSciCloud is the initiative tasked at finding these partners.

The Pre-Commercial Procurement (PCP) call as part of the Horizon 2020 EU funding programme employed the following criteria in their selection process:

computing and storage have to support and connect large datasets, i.e. petabytes of data

container support

network connectivity and identity management

support services

reporting and usage monitoring

service payment models have to be appropriate for scientific application workloads

On April 3 2017, three consortia were awarded with participation for the next phase of the hybrid cloud platform project and are now working on providing the prototype for various scientific institutions, the main one being CERN with high energy physics analysis, proton collision simulation, and other physics-related experiments. Other institutions that need to process and manage Big Data workloads include the DESY, a leading accelerator centre exploring the microcosm as well as EMBL, conducting genomic analysis and researching other areas of molecular biology, and many more.

This year, Exoscale as a member of one of the three chosen consortia in the prototype phase, will gradually deploy 1,750 of altogether 3,500 planned computing instances. The goal for the pilot phase next year is to ultimately spawn 10,000 instances using 1 petabyte to compute raw scientific data quickly and then push the results to a unified data storage. In this prototype phase, around 100 terabytes of storage in raw scientific data and computing results are processed.

Exoscale as a scalable and safe cloud hosting provider is part of the winning consortium managed by the RHEA group with the cloud being orchestrated by Nuvla from SixSq who will use T-Systems and Exoscale as cloud services behind their deployment platform for HNSciCloud. Nuvla itself runs on Exoscale as well.

All scientific members of the HNSciCloud have access to the data in order to test their applications and potentially collaborate on experiments which leads to even better progress for science, especially for experiments and projects that are conducted long-term. An integration of scientific research into a business environment and well-designed cloud infrastructure means more processing power as well as more consistent and standardised workflows for researchers who are spread across several teams. This initiative is a big step towards research and innovation on a large and connected scale.