Context Navigation

ExArch: Climate analytics on distributed exascale data archives

Summary

Climate science demands on data management are growing rapidly as climate models grow in the precision with
which they depict spatial structures and in the completeness with which they describe a vast range of physical
processes.

For the Climate Model Inter-comparison Project 5 (CMIP5), a distributed archive is being constructed to provide
access to what is expected to be in excess of 10 Peta-bytes of global climate change projections. The data will be
held at 30 or more computing centres and data archives around the world, but for users it will appear as a single
archive described by one catalogue. In addition, the usability of the data will be enhanced by a three-step
validation process and the publication of Digital Object Identifiers (doi) for all the data.
For many users the spatial resolution provided by the global climate models (around 150km) is inadequate: the
CORDEX project will provide data scaled down to around 10km. Evaluation of climate impacts often revolves
around extremes and complex impact factors, requiring high volumes of data to be stored. At the same time,
uncertainty about the optimal configuration of the models imposes the requirement that each scenario be explored
with multiple models.

This project will explore the challenges of developing a software management infrastructure which will scale to
the multi-exabyte archives of climate data which are likely to be crucial to major policy decisions in by the end of
the decade. Support for automated processing of the archived data and metadata will be essential. In the short term
goal, strategies will be evaluated by applying them to the CORDEX project data.