Background – NEX is a collaborative platform that brings together a state-of-the-art computing facility with large volumes (hundreds of terabytes) of NASA satellite and climate data as well as number of modeling and data analysis tools and services. In order to facilitate a broader community engagement, NEX has deployed a cloud component – OpenNEX, which provides access to a number of NASA datasets together with tools and services, hands-on tutorials and documentation.

Technical Objective – The goal of this project is to develop capabilities for an integrated collaborative petabyte-scale Earth science data analysis and visualization environment. We will deploy this environment within the NASA Earth Exchange (NEX) in order to enhance existing science data analysis capabilities in both high-performance computing (HPC) and cloud environments. This system will significantly enhance the ability of the scientific community to accelerate transformation of Earth science observational data from NASA missions, model outputs and other sources into science information and knowledge. We propose to develop a web-based system that seamlessly interfaces with both HPC and cloud environments, providing tools that enable science teams to develop and deploy large-scale data analysis and visualization pipelines and enable sharing results with the community. The HPC component will interface with the NASA Earth Exchange (NEX), a collaboration platform for the Earth science community that provides a mechanism for scientific collaboration, knowledge and data sharing together with direct access to over 1PB of Earth science data and 200,000-cores processing system.

Architecture – NASA already maintains a wide array of data on Amazon’s S3 service (https://aws.amazon.com/nasa/nex/). This makes the AWSEC2 cloud infrastructure a natural choice for deploying custom clusters for batch and interactive data processing. Our approach provides a custom web-based interface leveraging a number of technologies in the [Resonant] (http://resonant.kitware.com/) stack. These include Girder for data management, Romanesco for asychronous workflow execution and most importaintlyMinerva and GeoJS for visualization. This stack is deployed to an ‘always on’ cloud server that provides user authentication and persistent session based access to the geospatial analytic and visualization tools.