Climate modeling experiments on modern-day supercomputers produce a large volume of data. These complex experiments and historical simulations are designed, planned and executed by climate scientists in collaboration with computational and data scientists and professionals.

The suite of Coupled Model Intercomparison Project Phase 6 (CMIP6) experiments generate several hundred terabytes of data across thousands of files. The team of scientists are often spread across geographically distributed centers and research institutions that make it a challenge to distribute the data for research collaborations.

The Energy Exascale Earth System Modeling Project is a large multi-laboratory and multi-institution effort sponsored by the Earth System Modeling program in the US Department of Energy (DOE) Office of Biological and Environmental Research. This collaborative project involves over 100 scientists, software developers and management personnel from eight US DOE national laboratories, several academic institutions and private industry partners. The main objective of the project is to develop and utilize the ultra-high resolution state-of-the-art earth system model, called the Energy Exascale Earth System Model (E3SM) to address the mission-specific climate change and energy research priorities of the United States and to optimize the use of DOE’s next-generation computational facilities.

E3SM is designed and developed to exploit the modern computer hardware architectures and the associated investments in software development. The 10-year roadmap for E3SM spans three generations of leadership class computer architectures and four phases of model development and corresponding major simulation experiments, supported by the implementation of the E3SM Infrastructure that will maintain a disciplined software engineering structure and turnkey workflows for computational experiment design, execution, analysis of output and distribution of results. The first series of production simulation experiments are expected to generate nearly 2PB of simulation output while consuming over 1 billion core-hours of computing resources across the three major DOE Computing Facilities.

Much of the labor-intensive analysis and data management activities have been streamlined and integrated via an automated end-to-end workflow process. In this talk, Valentine Anantharaj will describe how (a) we plan and schedule the modeling across various supercomputing facilities; (b) gather and manage the data; (c) analyze and publish the results; and (d) preserve the provenance of our activities. He will also discuss some of the lessons learned, especially the challenges involved in orchestrating our simulations and workflow across major computing and data facilities.